AIStore Python API is a growing set of client-side objects and methods to access and utilize AIS clusters.

For PyTorch integration and usage examples, please refer to AIS Python SDK available via Python Package Index (PyPI), or see https://github.com/NVIDIA/aistore/tree/master/sdk/python.

Class: Client

class Client()

AIStore client for managing buckets, objects, ETL jobs

Arguments:

  • endpoint str - AIStore endpoint

bucket

def bucket(bck_name: str, provider: str = ProviderAIS, ns: str = "")

Factory constructor for bucket object. Does not make any HTTP request, only instantiates a bucket object owned by the client.

Arguments:

  • bck_name str - Name of bucket (optional, defaults to “ais”).
  • provider str - Provider of bucket (one of “ais”, “aws”, “gcp”, ...).

Returns:

The bucket object created.

cluster

def cluster()

Factory constructor for cluster object. Does not make any HTTP request, only instantiates a cluster object owned by the client.

Arguments:

None

Returns:

The cluster object created.

xaction

def xaction()

Factory constructor for xaction object, which contains xaction-related functions. Does not make any HTTP request, only instantiates an xaction object bound to the client.

Arguments:

None

Returns:

The xaction object created.

etl

def etl()

Factory constructor for ETL object. Contains APIs related to AIStore ETL operations. Does not make any HTTP request, only instantiates an xaction object bound to the client.

Arguments:

None

Returns:

The xaction object created.

list_objects_iter

def list_objects_iter(bck_name: str,
                      provider: str = ProviderAIS,
                      prefix: str = "",
                      props: str = "",
                      page_size: int = 0) -> BucketLister

Returns an iterator for all objects in a bucket

Arguments:

  • bck_name str - Name of a bucket
  • provider str, optional - Name of bucket provider, one of “ais”, “aws”, “gcp”, “az”, “hdfs” or “ht”. Defaults to “ais”. Empty provider returns buckets of all providers.
  • prefix str, optional - return only objects that start with the prefix
  • props str, optional - comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.

Returns:

  • BucketLister - object iterator

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

get_object

def get_object(bck_name: str,
               obj_name: str,
               provider: str = ProviderAIS,
               archpath: str = "",
               chunk_size: int = 1) -> ObjStream

Reads an object

Arguments:

  • bck_name str - Name of a bucket
  • obj_name str - Name of an object in the bucket
  • provider str, optional - Name of bucket provider, one of “ais”, “aws”, “gcp”, “az”, “hdfs” or “ht”.
  • archpath str, optional - If the object is an archive, use archpath to extract a single file from the archive
  • chunk_size int, optional - chunk_size to use while reading from stream

Returns:

The stream of bytes to read an object or a file inside an archive.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

Class: Cluster

class Cluster()

A class representing a cluster bound to an AIS client.

Arguments:

None

client

@property
def client()

The client object bound to this cluster.

get_info

def get_info() -> Smap

Returns state of AIS cluster, including the detailed information about its nodes.

Arguments:

None

Returns:

  • aistore.msg.Smap - Smap containing cluster information

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

list_buckets

def list_buckets(provider: str = ProviderAIS)

Returns list of buckets in AIStore cluster.

Arguments:

  • provider str, optional - Name of bucket provider, one of “ais”, “aws”, “gcp”, “az”, “hdfs” or “ht”. Defaults to “ais”. Empty provider returns buckets of all providers.

Returns:

  • List[Bck] - A list of buckets

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

is_aistore_running

def is_aistore_running() -> bool

Returns True if cluster is ready, or false if cluster is still setting up.

Arguments:

None

Returns:

  • bool - True if cluster is ready, or false if cluster is still setting up

Class: Bucket

class Bucket()

A class representing a bucket that contains user data.

Arguments:

  • bck_name str - name of bucket
  • provider str, optional - provider of bucket (one of “ais”, “aws”, “gcp”, ...), defaults to “ais”
  • ns str, optional - namespace of bucket, defaults to “”

client

@property
def client()

The client bound to this bucket.

bck

@property
def bck()

The custom type [Bck] corresponding to this bucket.

qparam

@property
def qparam()

The QParamProvider of this bucket.

provider

@property
def provider()

The provider for this bucket.

name

@property
def name()

The name of this bucket.

namespace

@property
def namespace()

The namespace for this bucket.

create

def create()

Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.

Arguments:

None

Returns:

None

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • aistore.client.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

delete

def delete()

Destroys bucket in AIStore cluster. In all cases removes both the bucket’s content and the bucket’s metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).

Arguments:

None

Returns:

None

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • aistore.client.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

rename

def rename(to_bck: str) -> str

Renames bucket in AIStore cluster. Only works on AIS buckets. Returns xaction id that can be used later to check the status of the asynchronous operation.

Arguments:

  • to_bck str - New bucket name for bucket to be renamed as

Returns:

xaction id (as str) that can be used to check the status of the operation

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • aistore.client.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

evict

def evict(keep_md: bool = False)

Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.

Arguments:

  • keep_md bool, optional - If true, evicts objects but keeps the bucket’s metadata (i.e., the bucket’s name and its properties)

Returns:

None

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • aistore.client.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

def head() -> Header

Requests bucket properties.

Arguments:

None

Returns:

Response header with the bucket properties

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

copy

def copy(to_bck_name: str,
         prefix: str = "",
         dry_run: bool = False,
         force: bool = False,
         to_provider: str = ProviderAIS) -> str

Returns xaction id that can be used later to check the status of the asynchronous operation.

Arguments:

  • to_bck_name str - Name of the destination bucket
  • prefix str, optional - If set, only the objects starting with provider prefix will be copied
  • dry_run bool, optional - Determines if the copy should actually happen or not
  • force bool, optional - Override existing destination bucket
  • to_provider str, optional - Name of destination bucket provider

Returns:

Xaction id (as str) that can be used to check the status of the operation

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_objects

def list_objects(prefix: str = "",
                 props: str = "",
                 page_size: int = 0,
                 uuid: str = "",
                 continuation_token: str = "") -> BucketList

Returns a structure that contains a page of objects, xaction UUID, and continuation token (to read the next page, if available).

Arguments:

  • prefix str, optional - Return only objects that start with the prefix
  • props str, optional - Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - Return at most “page_size” objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects.
  • uuid str, optional - Job UUID, required to get the next page of objects
  • continuation_token str, optional - Marks the object to start reading the next page

Returns:

  • BucketList - the page of objects in the bucket and the continuation token to get the next page Empty continuation token marks the final page of the object list

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_objects_iter

def list_objects_iter(prefix: str = "",
                      props: str = "",
                      page_size: int = 0) -> BucketLister

Returns an iterator for all objects in bucket

Arguments:

  • prefix str, optional - Return only objects that start with the prefix
  • props str, optional - Comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects

Returns:

  • BucketLister - object iterator

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_all_objects

def list_all_objects(prefix: str = "",
                     props: str = "",
                     page_size: int = 0) -> List[BucketEntry]

Returns a list of all objects in bucket

Arguments:

  • prefix str, optional - return only objects that start with the prefix
  • props str, optional - comma-separated list of object properties to return. Default value is “name,size”. Properties: “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects

Returns:

  • List[BucketEntry] - list of objects in bucket

Raises:

  • aistore.client.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

transform

def transform(etl_id: str,
              to_bck: str,
              prefix: str = "",
              ext: Dict[str, str] = None,
              force: bool = False,
              dry_run: bool = False)

Transforms all objects in a bucket and puts them to destination bucket.

Arguments:

  • etl_id str - id of etl to be used for transformations
  • to_bck str - destination bucket for transformations
  • prefix str, optional - prefix to be added to resulting transformed objects
  • ext Dict[str, str], optional - dict of new extension followed by extension to be replaced (i.e. {“jpg”: “txt”})
  • dry_run bool, optional - determines if the copy should actually happen or not
  • force bool, optional - override existing destination bucket

Returns:

Xaction id (as str) that can be used to check the status of the operation

object

def object(obj_name: str)

Factory constructor for object bound to bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.

Arguments:

  • obj_name str - Name of object

Returns:

The object created.

Class: Object

class Object()

A class representing an object of a bucket bound to a client.

Arguments:

  • obj_name str - name of object

bck

@property
def bck()

The custom type [Bck] bound to this object.

obj_name

@property
def obj_name()

The name of this object.

head

def head() -> Header

Requests object properties.

Arguments:

None

Returns:

Response header with the object properties.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • requests.exeptions.HTTPError(404) - The object does not exist

get

def get(archpath: str = "",
        chunk_size: int = DEFAULT_CHUNK_SIZE,
        etl_id: str = None) -> ObjStream

Reads an object

Arguments:

  • archpath str, optional - If the object is an archive, use archpath to extract a single file from the archive
  • chunk_size int, optional - chunk_size to use while reading from stream etl_id(str, optional): Transforms an object based on ETL with etl_id

Returns:

The stream of bytes to read an object or a file inside an archive.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

put

def put(path: str = None, content: bytes = None) -> Header

Puts a local file or bytes as an object to a bucket in AIS storage.

Arguments:

  • path str - path to local file or bytes.
  • content bytes - bytes to put as an object.

Returns:

Object properties

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • ValueError - Path and content are mutually exclusive

delete

def delete()

Delete an object from a bucket.

Arguments:

None

Returns:

None

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • requests.exeptions.HTTPError(404) - The object does not exist

Class: Etl

class Etl()

A class containing ETL-related functions.

Arguments:

None

client

@property
def client()

The client bound to this ETL object.

init_spec

def init_spec(template: str,
              etl_id: str,
              communication_type: str = "hpush",
              timeout: str = "5m")

Initializes ETL based on POD spec template. Returns ETL_ID. Existing templates can be found at aistore.client.etl_templates For more information visit: https://github.com/NVIDIA/ais-etl/tree/master/transformers

Arguments:

  • docker_image str - docker image name looks like: /:
  • etl_id str - id of new ETL
  • communication_type str - Communication type of the ETL (options: hpull, hrev, hpush)
  • timeout str - timeout of the ETL (eg. 5m for 5 minutes)

Returns:

  • etl_id str - ETL ID

init_code

def init_code(transform: Callable,
              etl_id: str,
              before: Callable = None,
              after: Callable = None,
              dependencies: List[str] = None,
              runtime: str = "python3.8v2",
              communication_type: str = "hpush",
              timeout: str = "5m",
              chunk_size: int = None)

Initializes ETL based on the provided source code. Returns ETL_ID.

Arguments:

  • transform Callable - Transform function of the ETL
  • etl_id str - Id of new ETL
  • before Callable - Code function to be executed before transform function, will initialize and return objects used in transform function
  • after Callable - Code function to be executed after transform function, will return results
  • dependencies List[str] - [optional] List of the necessary dependencies with version (eg. aistore>1.0.0)
  • runtime str - [optional, default=”python3.8v2”] Runtime environment of the ETL [choose from: python3.8v2, python3.10v2] (see etl/runtime/all.go)
  • communication_type str - [optional, default=”hpush”] Communication type of the ETL (options: hpull, hrev, hpush, io)
  • timeout str - [optional, default=”5m”] Timeout of the ETL (eg. 5m for 5 minutes)
  • chunk_size int - Chunk size in bytes if transform function in streaming data. (whole object is read by default)

Returns:

  • etl_id str - ETL ID

list

def list() -> List[ETLDetails]

Lists all running ETLs.

Note: Does not list ETLs that have been stopped or deleted.

Arguments:

Nothing

Returns:

  • List[ETL] - A list of running ETLs

view

def view(etl_id: str) -> ETLDetails

View ETLs Init spec/code

Arguments:

  • etl_id str - id of ETL

Returns:

  • ETLDetails - details of the ETL

start

def start(etl_id: str)

Resumes a stopped ETL with given ETL_ID.

Note: Deleted ETLs cannot be started.

Arguments:

  • etl_id str - id of ETL

Returns:

Nothing

stop

def stop(etl_id: str)

Stops ETL with given ETL_ID. Stops (but does not delete) all the pods created by Kubernetes for this ETL and terminates any transforms.

Arguments:

  • etl_id str - id of ETL

Returns:

Nothing

delete

def delete(etl_id: str)

Delete ETL with given ETL_ID. Deletes pods created by Kubernetes for this ETL and specifications for this ETL in Kubernetes.

Note: Running ETLs cannot be deleted.

Arguments:

  • etl_id str - id of ETL

Returns:

Nothing