API Reference¶
dataset¶
-
etcetera.dataset(name: str, auto_pull=False, config=None)¶ Returns
etcetera.Datasetobject, given a dataset name.Parameters: - name (str) – the name of the dataset
- auto_pull (bool) – if set, automatically pulls the dataset from the cloud
- config (etcetera.Config) – configuration to use
Returns: etcetera.Datasetrepresenting this
Dataset¶
-
class
etcetera.Dataset(location: str, name: Optional[str] = None)¶ Represents locally installed dataset
-
file(*av)¶ convenience method to build a file path relative to dataset root.
Example:
dataset.file('README.md')
-
data¶ Path to the data directory within the dataset
-
partitions()¶ Returns sorted list of partition names
-
__len__()¶ Dataset length is the number of partitions
-
__getitem__(partition)¶ Returns
pathlib.Pathobject for the partition directoryParameters: partition (str) – name of the partition - Example::
dataset = … partition = dataset[‘train’]
- for filename in partition.iterdir():
- print(filename)
-
Config¶
ls¶
-
etcetera.ls(remote=False, config=None)¶ Lists datasets.
By the default, local datasets are listed.
Parameters: - remote (bool) – if True, list remote datasets
- config (etcetera.Config) – configuration to use
register¶
-
etcetera.register(dirname: str, name: Optional[str] = None, force=False, config=None)¶ Register local directory as a dataset.
Parameters: - dirname (str) – path to the local directory with data
- name (str) – dataset name (if not specified, directory name is used)
- force (bool) – allows overriding existing dataset
- config (etcetera.Config) – configuration to use
pull¶
-
etcetera.pull(name: str, force=False, config=None)¶ Pull dataset from cloud storage.
Parameters: - name (str) – dataset name
- force (bool) – if True, overrides the existing local dataset
- config (etcetera.Config) – configuration to use
push¶
-
etcetera.push(name: str, force=False, config=None)¶ Pushes dataset to the cloud.
Parameters: - name (str) – dataset name
- force (bool) – if true, overrides remote dataset
- config (etcetera.Config) – configuration to use
purge¶
-
etcetera.purge(name: str, config=None)¶ Deletes local dataset.
Parameters: - name (str) – dataset name
- config (etcetera.Config) – configuration to use