AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters.

For PyTorch integration and usage examples, please refer to AIS Python SDK available via Python Package Index (PyPI), or see https://github.com/NVIDIA/aistore/tree/main/python/aistore.

Class: Client

class Client()

AIStore client for managing buckets, objects, ETL jobs

Arguments:

  • endpoint str - AIStore endpoint

bucket

def bucket(bck_name: str,
           provider: str = PROVIDER_AIS,
           namespace: Namespace = None)

Factory constructor for bucket object. Does not make any HTTP request, only instantiates a bucket object.

Arguments:

  • bck_name str - Name of bucket
  • provider str - Provider of bucket, one of “ais”, “aws”, “gcp”, ... (optional, defaults to ais)
  • namespace Namespace - Namespace of bucket (optional, defaults to None)

Returns:

The bucket object created.

cluster

def cluster()

Factory constructor for cluster object. Does not make any HTTP request, only instantiates a cluster object.

Returns:

The cluster object created.

job

def job(job_id: str = "", job_kind: str = "")

Factory constructor for job object, which contains job-related functions. Does not make any HTTP request, only instantiates a job object.

Arguments:

  • job_id str, optional - Optional ID for interacting with a specific job
  • job_kind str, optional - Optional specific type of job empty for all kinds

Returns:

The job object created.

etl

def etl(etl_name: str)

Factory constructor for ETL object. Contains APIs related to AIStore ETL operations. Does not make any HTTP request, only instantiates an ETL object.

Arguments:

  • etl_name str - Name of the ETL

Returns:

The ETL object created.

dsort

def dsort(dsort_id: str = "")

Factory constructor for dSort object. Contains APIs related to AIStore dSort operations. Does not make any HTTP request, only instantiates a dSort object.

Arguments:

  • dsort_id - ID of the dSort job

Returns:

dSort object created

Class: Cluster

class Cluster()

A class representing a cluster bound to an AIS client.

client

@property
def client()

Client this cluster uses to make requests

get_info

def get_info() -> Smap

Returns state of AIS cluster, including the detailed information about its nodes.

Returns:

  • aistore.sdk.types.Smap - Smap containing cluster information

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

list_buckets

def list_buckets(provider: str = PROVIDER_AIS)

Returns list of buckets in AIStore cluster.

Arguments:

  • provider str, optional - Name of bucket provider, one of “ais”, “aws”, “gcp”, “az”, or “ht”. Defaults to “ais”. Empty provider returns buckets of all providers.

Returns:

  • List[BucketModel] - A list of buckets

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

list_jobs_status

def list_jobs_status(job_kind="", target_id="") -> List[JobStatus]

List the status of jobs on the cluster

Arguments:

  • job_kind str, optional - Only show jobs of a particular type
  • target_id str, optional - Limit to jobs on a specific target node

Returns:

List of JobStatus objects

list_running_jobs

def list_running_jobs(job_kind="", target_id="") -> List[str]

List the currently running jobs on the cluster

Arguments:

  • job_kind str, optional - Only show jobs of a particular type
  • target_id str, optional - Limit to jobs on a specific target node

Returns:

List of jobs in the format job_kind[job_id]

list_running_etls

def list_running_etls() -> List[ETLInfo]

Lists all running ETLs.

Note: Does not list ETLs that have been stopped or deleted.

Returns:

  • List[ETLInfo] - A list of details on running ETLs

is_aistore_running

def is_aistore_running() -> bool

Checks if cluster is ready or still setting up.

Returns:

  • bool - True if cluster is ready, or false if cluster is still setting up

Class: Bucket

class Bucket()

A class representing a bucket that contains user data.

Arguments:

  • client RequestClient - Client for interfacing with AIS cluster
  • name str - name of bucket
  • provider str, optional - Provider of bucket (one of “ais”, “aws”, “gcp”, ...), defaults to “ais”
  • namespace Namespace, optional - Namespace of bucket, defaults to None

client

@property
def client() -> RequestClient

The client bound to this bucket.

qparam

@property
def qparam() -> Dict

Default query parameters to use with API calls from this bucket.

provider

@property
def provider() -> str

The provider for this bucket.

name

@property
def name() -> str

The name of this bucket.

namespace

@property
def namespace() -> Namespace

The namespace for this bucket.

create

def create(exist_ok=False)

Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.

Arguments:

  • exist_ok bool, optional - Ignore error if the cluster already contains this bucket

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

delete

def delete(missing_ok=False)

Destroys bucket in AIStore cluster. In all cases removes both the bucket’s content and the bucket’s metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).

Arguments:

  • missing_ok bool, optional - Ignore error if bucket does not exist

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

rename

def rename(to_bck_name: str) -> str

Renames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.

Arguments:

  • to_bck_name str - New bucket name for bucket to be renamed as

Returns:

Job ID (as str) that can be used to check the status of the operation

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

evict

def evict(keep_md: bool = False)

Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.

Arguments:

  • keep_md bool, optional - If true, evicts objects but keeps the bucket’s metadata (i.e., the bucket’s name and its properties)

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • aistore.sdk.errors.InvalidBckProvider - Invalid bucket provider for requested operation
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

def head() -> Header

Requests bucket properties.

Returns:

Response header with the bucket properties

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

copy

def copy(to_bck: Bucket,
         prefix_filter: str = "",
         prepend: str = "",
         dry_run: bool = False,
         force: bool = False) -> str

Returns job ID that can be used later to check the status of the asynchronous operation.

Arguments:

  • to_bck Bucket - Destination bucket
  • prefix_filter str, optional - Only copy objects with names starting with this prefix
  • prepend str, optional - Value to prepend to the name of copied objects
  • dry_run bool, optional - Determines if the copy should actually happen or not
  • force bool, optional - Override existing destination bucket

Returns:

Job ID (as str) that can be used to check the status of the operation

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_objects

def list_objects(prefix: str = "",
                 props: str = "",
                 page_size: int = 0,
                 uuid: str = "",
                 continuation_token: str = "",
                 flags: List[ListObjectFlag] = None,
                 target: str = "") -> BucketList

Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).

Arguments:

  • prefix str, optional - Return only objects that start with the prefix
  • props str, optional - Comma-separated list of object properties to return. Default value is “name,size”.
  • Properties - “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - Return at most “page_size” objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number of objects.
  • uuid str, optional - Job ID, required to get the next page of objects
  • continuation_token str, optional - Marks the object to start reading the next page
  • flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

  • BucketList - the page of objects in the bucket and the continuation token to get the next page Empty continuation token marks the final page of the object list

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_objects_iter

def list_objects_iter(prefix: str = "",
                      props: str = "",
                      page_size: int = 0,
                      flags: List[ListObjectFlag] = None,
                      target: str = "") -> ObjectIterator

Returns an iterator for all objects in bucket

Arguments:

  • prefix str, optional - Return only objects that start with the prefix
  • props str, optional - Comma-separated list of object properties to return. Default value is “name,size”.
  • Properties - “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects
  • flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

  • ObjectIterator - object iterator

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

list_all_objects

def list_all_objects(prefix: str = "",
                     props: str = "",
                     page_size: int = 0,
                     flags: List[ListObjectFlag] = None,
                     target: str = "") -> List[BucketEntry]

Returns a list of all objects in bucket

Arguments:

  • prefix str, optional - return only objects that start with the prefix
  • props str, optional - comma-separated list of object properties to return. Default value is “name,size”.
  • Properties - “name”, “size”, “atime”, “version”, “checksum”, “cached”, “target_url”, “status”, “copies”, “ec”, “custom”, “node”.
  • page_size int, optional - return at most “page_size” objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.
  • NOTE - If “page_size” is greater than a backend maximum, the backend maximum objects are returned. Defaults to “0” - return maximum number objects
  • flags List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node

Returns:

  • List[BucketEntry] - list of objects in bucket

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

transform

def transform(etl_name: str,
              to_bck: Bucket,
              timeout: str = DEFAULT_ETL_TIMEOUT,
              prefix_filter: str = "",
              prepend: str = "",
              ext: Dict[str, str] = None,
              force: bool = False,
              dry_run: bool = False) -> str

Visits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket

Arguments:

  • etl_name str - name of etl to be used for transformations
  • to_bck str - destination bucket for transformations
  • timeout str, optional - Timeout of the ETL job (e.g. 5m for 5 minutes)
  • prefix_filter str, optional - Only transform objects with names starting with this prefix
  • prepend str, optional - Value to prepend to the name of resulting transformed objects
  • ext Dict[str, str], optional - dict of new extension followed by extension to be replaced (i.e. {“jpg”: “txt”})
  • dry_run bool, optional - determines if the copy should actually happen or not
  • force bool, optional - override existing destination bucket

Returns:

Job ID (as str) that can be used to check the status of the operation

put_files

def put_files(path: str,
              prefix_filter: str = "",
              pattern: str = "*",
              basename: bool = False,
              prepend: str = None,
              recursive: bool = False,
              dry_run: bool = False,
              verbose: bool = True) -> List[str]

Puts files found in a given filepath as objects to a bucket in AIS storage.

Arguments:

  • path str - Local filepath, can be relative or absolute
  • prefix_filter str, optional - Only put files with names starting with this prefix
  • pattern str, optional - Regex pattern to filter files
  • basename bool, optional - Whether to use the file names only as object names and omit the path information
  • prepend str, optional - Optional string to use as a prefix in the object name for all objects uploaded No delimiter (“/”, “-“, etc.) is automatically applied between the prepend value and the object name
  • recursive bool, optional - Whether to recurse through the provided path directories
  • dry_run bool, optional - Option to only show expected behavior without an actual put operation
  • verbose bool, optional - Whether to print upload info to standard output

Returns:

List of object names put to a bucket in AIS

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • ValueError - The path provided is not a valid directory

object

def object(obj_name: str) -> Object

Factory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.

Arguments:

  • obj_name str - Name of object

Returns:

The object created.

objects

def objects(obj_names: list = None,
            obj_range: ObjectRange = None,
            obj_template: str = None) -> ObjectGroup

Factory constructor for multiple objects belonging to this bucket.

Arguments:

  • obj_names list - Names of objects to include in the group
  • obj_range ObjectRange - Range of objects to include in the group
  • obj_template str - String template defining objects to include in the group

Returns:

The ObjectGroup created

make_request

def make_request(method: str,
                 action: str,
                 value: dict = None,
                 params: dict = None) -> requests.Response

Use the bucket’s client to make a request to the bucket endpoint on the AIS server

Arguments:

  • method str - HTTP method to use, e.g. POST/GET/DELETE
  • action str - Action string used to create an ActionMsg to pass to the server
  • value dict - Additional value parameter to pass in the ActionMsg
  • params dict, optional - Optional parameters to pass in the request

Returns:

Response from the server

verify_cloud_bucket

def verify_cloud_bucket()

Verify the bucket provider is a cloud provider

get_path

def get_path() -> str

Get the path representation of this bucket

as_model

def as_model() -> BucketModel

Return a data-model of the bucket

Returns:

BucketModel representation

Class: Object

class Object()

A class representing an object of a bucket bound to a client.

Arguments:

  • bucket Bucket - Bucket to which this object belongs
  • name str - name of object

bucket

@property
def bucket()

Bucket containing this object

name

@property
def name()

Name of this object

head

def head() -> Header

Requests object properties.

Returns:

Response header with the object properties.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • requests.exceptions.HTTPError(404) - The object does not exist

get

def get(archpath: str = "",
        chunk_size: int = DEFAULT_CHUNK_SIZE,
        etl_name: str = None,
        writer: BufferedWriter = None) -> ObjectReader

Reads an object

Arguments:

  • archpath str, optional - If the object is an archive, use archpath to extract a single file from the archive
  • chunk_size int, optional - chunk_size to use while reading from stream
  • etl_name str, optional - Transforms an object based on ETL with etl_name
  • writer BufferedWriter, optional - User-provided writer for writing content output. User is responsible for closing the writer

Returns:

The stream of bytes to read an object or a file inside an archive.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

put_content

def put_content(content: bytes) -> Header

Puts bytes as an object to a bucket in AIS storage.

Arguments:

  • content bytes - Bytes to put as an object.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

put_file

def put_file(path: str = None)

Puts a local file as an object to a bucket in AIS storage.

Arguments:

  • path str - Path to local file

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • ValueError - The path provided is not a valid file

promote

def promote(path: str,
            target_id: str = "",
            recursive: bool = False,
            overwrite_dest: bool = False,
            delete_source: bool = False,
            src_not_file_share: bool = False) -> Header

Promotes a file or folder an AIS target can access to a bucket in AIS storage. These files can be either on the physical disk of an AIS target itself or on a network file system the cluster can access. See more info here: https://aiatscale.org/blog/2022/03/17/promote

Arguments:

  • path str - Path to file or folder the AIS cluster can reach
  • target_id str, optional - Promote files from a specific target node
  • recursive bool, optional - Recursively promote objects from files in directories inside the path
  • overwrite_dest bool, optional - Overwrite objects already on AIS
  • delete_source bool, optional - Delete the source files when done promoting
  • src_not_file_share bool, optional - Optimize if the source is guaranteed to not be on a file share

Returns:

Object properties

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • AISError - Path does not exist on the AIS cluster storage

delete

def delete()

Delete an object from a bucket.

Returns:

None

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • requests.exceptions.HTTPError(404) - The object does not exist

Class: ObjectGroup

class ObjectGroup()

A class representing multiple objects within the same bucket. Only one of obj_names, obj_range, or obj_template should be provided.

Arguments:

  • bck Bucket - Bucket the objects belong to
  • obj_names list[str], optional - List of object names to include in this collection
  • obj_range ObjectRange, optional - Range defining which object names in the bucket should be included
  • obj_template str, optional - String argument to pass as template value directly to api

delete

def delete()

Deletes a list or range of objects in a bucket

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

evict

def evict()

Evicts a list or range of objects in a bucket so that they are no longer cached in AIS NOTE: only Cloud buckets can be evicted.

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

prefetch

def prefetch()

Prefetches a list or range of objects in a bucket so that they are cached in AIS NOTE: only Cloud buckets can be prefetched.

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

copy

def copy(to_bck: "Bucket",
         prepend: str = "",
         continue_on_error: bool = False,
         dry_run: bool = False,
         force: bool = False)

Copies a list or range of objects in a bucket

Arguments:

  • to_bck Bucket - Destination bucket
  • prepend str, optional - Value to prepend to the name of copied objects
  • continue_on_error bool, optional - Whether to continue if there is an error copying a single object
  • dry_run bool, optional - Skip performing the copy and just log the intended actions
  • force bool, optional - Force this job to run over others in case it conflicts (see “limited coexistence” and xact/xreg/xreg.go)

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

transform

def transform(to_bck: "Bucket",
              etl_name: str,
              timeout: str = DEFAULT_ETL_TIMEOUT,
              prepend: str = "",
              continue_on_error: bool = False,
              dry_run: bool = False,
              force: bool = False)

Performs ETL operation on a list or range of objects in a bucket, placing the results in the destination bucket

Arguments:

  • to_bck Bucket - Destination bucket
  • etl_name str - Name of existing ETL to apply
  • timeout str - Timeout of the ETL job (e.g. 5m for 5 minutes)
  • prepend str, optional - Value to prepend to the name of resulting transformed objects
  • continue_on_error bool, optional - Whether to continue if there is an error transforming a single object
  • dry_run bool, optional - Skip performing the transform and just log the intended actions
  • force bool, optional - Force this job to run over others in case it conflicts (see “limited coexistence” and xact/xreg/xreg.go)

Raises:

  • aistore.sdk.errors.AISError - All other types of errors with AIStore
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.exceptions.HTTPError - Service unavailable
  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ReadTimeout - Timed out receiving response from AIStore

Returns:

Job ID (as str) that can be used to check the status of the operation

archive

def archive(archive_name: str,
            mime: str = "",
            to_bck: "Bucket" = None,
            include_source_name: bool = False,
            allow_append: bool = False,
            continue_on_err: bool = False)

Create or append to an archive

Arguments:

  • archive_name str - Name of archive to create or append
  • mime str, optional - MIME type of the content
  • to_bck Bucket, optional - Destination bucket, defaults to current bucket
  • include_source_name bool, optional - Include the source bucket name in the archived objects’ names
  • allow_append bool, optional - Allow appending to an existing archive
  • continue_on_err bool, optional - Whether to continue if there is an error archiving a single object

Returns:

Job ID (as str) that can be used to check the status of the operation

list_names

def list_names() -> List[str]

List all the object names included in this group of objects

Returns:

List of object names

Class: ObjectNames

class ObjectNames(ObjectCollection)

A collection of object names, provided as a list of strings

Arguments:

  • names List[str] - A list of object names

Class: ObjectRange

class ObjectRange(ObjectCollection)

Class representing a range of object names

Arguments:

  • prefix str - Prefix contained in all names of objects
  • min_index int - Starting index in the name of objects
  • max_index int - Last index in the name of all objects
  • pad_width int, optional - Left-pad indices with zeros up to the width provided, e.g. pad_width = 3 will transform 1 to 001
  • step int, optional - Size of iterator steps between each item
  • suffix str, optional - Suffix at the end of all object names

Class: ObjectTemplate

class ObjectTemplate(ObjectCollection)

A collection of object names specified by a template in the bash brace expansion format

Arguments:

  • template str - A string template that defines the names of objects to include in the collection

Class: Job

class Job()

A class containing job-related functions.

Arguments:

  • client RequestClient - Client for interfacing with AIS cluster
  • job_id str, optional - ID of a specific job, empty for all jobs
  • job_kind str, optional - Specific kind of job, empty for all kinds

job_id

@property
def job_id()

Return job id

job_kind

@property
def job_kind()

Return job kind

status

def status() -> JobStatus

Return status of a job

Returns:

The job status including id, finish time, and error info.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

wait

def wait(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, verbose: bool = True)

Wait for a job to finish

Arguments:

  • timeout int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.
  • verbose bool, optional - Whether to log wait status to standard output

Returns:

None

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • errors.Timeout - Timeout while waiting for the job to finish

wait_for_idle

def wait_for_idle(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
                  verbose: bool = True)

Wait for a job to reach an idle state

Arguments:

  • timeout int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.
  • verbose bool, optional - Whether to log wait status to standard output

Returns:

None

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore
  • errors.Timeout - Timeout while waiting for the job to finish

start

def start(daemon_id: str = "",
          force: bool = False,
          buckets: List[Bucket] = None) -> str

Start a job and return its ID.

Arguments:

  • daemon_id str, optional - For running a job that must run on a specific target node (e.g. resilvering).
  • force bool, optional - Override existing restrictions for a bucket (e.g., run LRU eviction even if the bucket has LRU disabled).
  • buckets List[Bucket], optional - List of one or more buckets; applicable only for jobs that have bucket scope (for details on job types, see Table in xact/api.go).

Returns:

The running job ID.

Raises:

  • requests.RequestException - “There was an ambiguous exception that occurred while handling...”
  • requests.ConnectionError - Connection error
  • requests.ConnectionTimeout - Timed out connecting to AIStore
  • requests.ReadTimeout - Timed out waiting response from AIStore

Class: ObjectReader

class ObjectReader()

Represents the data returned by the API when getting an object, including access to the content stream and object attributes

attributes

@property
def attributes() -> ObjectAttributes

Object metadata attributes

Returns:

Object attributes parsed from the headers returned by AIS

read_all

def read_all() -> bytes

Read all byte data from the object content stream. This uses a bytes cast which makes it slightly slower and requires all object content to fit in memory at once

Returns:

Object content as bytes

raw

def raw() -> bytes

Returns: Raw byte stream of object content

__iter__

def __iter__() -> Iterator[bytes]

Creates a generator to read the stream content in chunks

Returns:

An iterator with access to the next chunk of bytes

Class: ObjectIterator

class ObjectIterator()

Represents an iterable that will fetch all objects from a bucket, querying as needed with the specified function

Arguments:

  • list_objects Callable - Function returning a BucketList from an AIS cluster

Class: Etl

class Etl()

A class containing ETL-related functions.

name

@property
def name() -> str

Name of the ETL

init_spec

def init_spec(template: str,
              communication_type: str = DEFAULT_ETL_COMM,
              timeout: str = DEFAULT_ETL_TIMEOUT) -> str

Initializes ETL based on Kubernetes pod spec template. Returns etl_name.

Arguments:

  • template str - Kubernetes pod spec template Existing templates can be found at sdk.etl_templates For more information visit: https://github.com/NVIDIA/ais-etl/tree/master/transformers
  • communication_type str - Communication type of the ETL (options: hpull, hrev, hpush)
  • timeout str - Timeout of the ETL job (e.g. 5m for 5 minutes)

Returns:

Job ID string associated with this ETL

init_code

def init_code(transform: Callable,
              dependencies: List[str] = None,
              preimported_modules: List[str] = None,
              runtime: str = _get_default_runtime(),
              communication_type: str = DEFAULT_ETL_COMM,
              timeout: str = DEFAULT_ETL_TIMEOUT,
              chunk_size: int = None,
              transform_url: bool = False) -> str

Initializes ETL based on the provided source code. Returns etl_name.

Arguments:

  • transform Callable - Transform function of the ETL
  • dependencies list[str] - Python dependencies to install
  • preimported_modules list[str] - Modules to import before running the transform function. This can be necessary in cases where the modules used both attempt to import each other circularly
  • runtime str - [optional, default= V2 implementation of the current python version if supported, else python3.8v2] Runtime environment of the ETL [choose from: python3.8v2, python3.10v2, python3.11v2] (see ext/etl/runtime/all.go)
  • communication_type str - [optional, default=”hpush”] Communication type of the ETL (options: hpull, hrev, hpush, io)
  • timeout str - [optional, default=”5m”] Timeout of the ETL job (e.g. 5m for 5 minutes)
  • chunk_size int - Chunk size in bytes if transform function in streaming data. (whole object is read by default)
  • transform_url optional, bool - If True, the runtime will provide the transform function with the URL to the object on the target rather than the raw bytes read from the object

Returns:

Job ID string associated with this ETL

view

def view() -> ETLDetails

View ETL details

Returns:

  • ETLDetails - details of the ETL

start

def start()

Resumes a stopped ETL with given ETL name.

Note: Deleted ETLs cannot be started.

stop

def stop()

Stops ETL. Stops (but does not delete) all the pods created by Kubernetes for this ETL and terminates any transforms.

delete

def delete()

Delete ETL. Deletes pods created by Kubernetes for this ETL and specifications for this ETL in Kubernetes.

Note: Running ETLs cannot be deleted.