CLI Reference for ETLs

This section documents ETL management operations with ais etl.

As with global rebalance, dSort, and download, all ETL management commands can also be executed via ais job and ais show—the commands that, by definition, support all AIS xactions, including AIS-ETL.

In the ais etl namespace, the commands include:

$ ais etl <TAB-TAB>

init     show      view-logs   start     stop      rm      object    bucket

For background on AIS-ETL, getting started, working examples, and tutorials, please refer to:

Table of Contents

Getting Started

Initializing an ETL

ETL Management

ETL Lifecycle Operations

Data Transformation


Commands

Top-level ETL commands include init, stop, show, and more:

$ ais etl -h
NAME:
   ais etl - Manage and execute custom ETL (Extract, Transform, Load) jobs

USAGE:
   ais etl command [arguments...]  [command options]

COMMANDS:
   init  Initialize ETL using a runtime spec or full Kubernetes Pod spec YAML file (local or remote).
   Examples:
     - 'ais etl init -f my-etl.yaml'                      deploy ETL from a local YAML file;
     - 'ais etl init -f https://example.com/etl.yaml'     deploy ETL from a remote YAML file;
     - 'ais etl init -f multi-etl.yaml'                   deploy multiple ETLs from a single file (separated by '---');
     - 'ais etl init -f spec.yaml --name my-custom-etl'   override ETL name from command line;
     - 'ais etl init -f spec.yaml --comm-type hpull'      override communication type;
     - 'ais etl init -f spec.yaml --object-timeout 30s'   set custom object transformation timeout.
     - 'ais etl init --spec <file|URL>'                   deploy ETL jobs from a local spec file, remote URL, or multi-ETL YAML.

Additional Info:
   - You may define multiple ETLs in a single spec file using YAML document separators ('---').
   - CLI flags like '--name' or '--comm-type' can override values in the spec, but not when multiple ETLs are defined.

   show  Show ETL(s).
   Examples:
              - 'ais etl show'                             list all ETL jobs with their status and details;
              - 'ais etl show my-etl'                      show detailed specification for a specific ETL job;
              - 'ais etl show my-etl another-etl'          show detailed specifications for multiple ETL jobs;
              - 'ais etl show errors my-etl'               show transformation errors for inline object transformations;
              - 'ais etl show errors my-etl job-123'       show errors for a specific offline (bucket-to-bucket) transform job.
   view-logs  View ETL logs.
   Examples:
          - 'ais etl view-logs my-etl'                      show logs from all target nodes for the specified ETL;
          - 'ais etl view-logs my-etl target-001'           show logs from a specific target node;
          - 'ais etl view-logs data-converter target-002'   view logs from target-002 for data-converter ETL.
   start  Start ETL.
   Examples:
     - 'ais etl start my-etl'                            start the specified ETL (transitions from stopped to running state);
     - 'ais etl start my-etl another-etl'                start multiple ETL jobs by name;
     - 'ais etl start -f spec.yaml'                      start ETL jobs defined in a local YAML file;
     - 'ais etl start -f https://example.com/etl.yaml'   start ETL jobs defined in a remote YAML file;
     - 'ais etl start -f multi-etl.yaml'                 start all ETL jobs defined in a multi-ETL file;
     - 'ais etl start --spec <file|URL>'                 start ETL jobs from a local spec file, remote URL, or multi-ETL YAML.

   stop  Stop ETL. Also aborts related offline jobs and can be used to terminate ETLs stuck in 'initializing' state.
   Examples:
     - 'ais etl stop my-etl'                              stop the specified ETL (transitions from running to stopped state);
     - 'ais etl stop my-etl another-etl'                  stop multiple ETL jobs by name;
     - 'ais etl stop --all'                               stop all running ETL jobs;
     - 'ais etl stop -f spec.yaml'                        stop ETL jobs defined in a local YAML file;
     - 'ais etl stop -f https://example.com/etl.yaml'     stop ETL jobs defined in a remote YAML file;
     - 'ais etl stop stuck-etl'                           terminate ETL that is stuck in 'initializing' state;
     - 'ais etl stop --spec <file|URL>'                   stop ETL jobs from a local spec file, remote URL, or multi-ETL YAML.

   rm  Remove ETL.
   Examples:
           - 'ais etl rm my-etl'                              remove (delete) the specified ETL;
           - 'ais etl rm my-etl another-etl'                  remove multiple ETL jobs by name;
           - 'ais etl rm --all'                               remove all ETL jobs;
           - 'ais etl rm -f spec.yaml'                        remove ETL jobs defined in a local YAML file;
           - 'ais etl rm -f https://example.com/etl.yaml'     remove ETL jobs defined in a remote YAML file;
           - 'ais etl rm running-etl'q                        remove ETL that is currently running (will be stopped first).
           - 'ais etl rm --spec <file|URL>'                   remove ETL jobs from a local spec file, remote URL, or multi-ETL YAML.
             NOTE: If an ETL is in 'running' state, it will be stopped automatically before removal.
   object  Transform an object.
   Examples:
           - 'ais etl object my-etl ais://src/image.jpg /tmp/output.jpg'                   transform object and save to file;
           - 'ais etl object my-etl ais://src/data.json -'                                 transform and output to stdout;
           - 'ais etl object my-etl ais://src/doc.pdf /dev/null'                           transform and discard output;
           - 'ais etl object my-etl cp ais://src/image.jpg ais://dst/'                     transform and copy to another bucket;
           - 'ais etl object my-etl ais://src/data.xml output.json --args "format=json"'   transform with custom arguments.
   bucket  Transform entire bucket or selected objects (to select, use '--list', '--template', or '--prefix').
   Examples:
     - 'ais etl bucket my-etl ais://src ais://dst'                                       transform all objects from source to destination bucket;
     - 'ais etl bucket my-etl ais://src ais://dst --prefix images/'                      transform objects with prefix 'images/';
     - 'ais etl bucket my-etl ais://src ais://dst --template "shard-{0001..0999}.tar"'   transform objects matching the template;
     - 'ais etl bucket my-etl s3://remote-src ais://dst --all'                           transform all objects including non-cached ones;
     - 'ais etl bucket my-etl ais://src ais://dst --dry-run'                             preview transformation without executing;
     - 'ais etl bucket my-etl ais://src ais://dst --num-workers 8'                       use 8 concurrent workers for transformation;
     - 'ais etl bucket my-etl ais://src ais://dst --prepend processed/'                  add prefix to transformed object names.

OPTIONS:
   --help, -h  Show help

Additionally, use --help to display any specific command.

Initializing an ETL

AIStore provides two ways to initialize an ETL using the CLI:


This method uses a YAML file that defines how your ETL should be initialized and run.

Key Fields in the Spec

Field Description Default
name Unique name for the ETL. See naming rules Required
runtime.image Docker image for the ETL container Required
runtime.command (Optional) Override the container’s default ENTRYPOINT with custom command and arguments None
communication (Optional) Communication method between AIS and the ETL container hpush://
argument (Optional) Argument passing method: "" (default) or "fqn" (mounts host filesystem) ""
init_timeout (Optional) Max time to wait for ETL to become ready 5m
obj_timeout (Optional) Max time to process a single object 45s
support_direct_put (Optional) Enable direct put optimization for offline transforms false

Sample ETL Spec

name: hello-world-etl
runtime:
  image: aistorage/transformer_hello_world:latest
  # Optional: Override the container entrypoint
  # command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

communication: hpush://
argument: fqn
init_timeout: 5m
obj_timeout: 45s
support_direct_put: true

CLI Usage

# From a local file
$ ais etl init -f spec.yaml

# From a remote URL
$ ais etl init -f <URL>

# Override values from the spec
$ ais etl init -f <URL> \
  --name=ETL_NAME \
  --comm-type=COMMUNICATION_TYPE \
  --init-timeout=TIMEOUT \
  --obj-timeout=TIMEOUT

Note: CLI parameters take precedence over the spec file.


2. Using a Full Kubernetes Pod Spec (Advanced)

Use this option if you need full control over the ETL container’s deployment—such as advanced init containers, health checks, or if you’re not using the AIS ETL framework.

Example Pod Spec

# pod_spec.yaml
apiVersion: v1
kind: Pod
metadata:
  name: etl-echo
  annotations:
    communication_type: "hpush://"
    wait_timeout: "5m"
spec:
  containers:
    - name: server
      image: aistorage/transformer_md5:latest
      ports: [{ name: default, containerPort: 8000 }]
      command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4", "--log-level", "info", "--ws-max-size", "17179869184", "--ws-ping-interval", "0", "--ws-ping-timeout", "86400"]
      readinessProbe:
        httpGet: { path: /health, port: default }

CLI Usage

# Initialize ETL from a Pod spec
$ ais etl init -f pod_spec.yaml --name transformer-md5

Additional Notes

  • You can define multiple ETLs in a single YAML file by separating them with the standard YAML document separator ---.

    Example:

    name: hello-world-etl
    runtime:
      image: aistorage/transformer_hello_world:latest
    ---
    name: md5-etl
    runtime:
      image: aistorage/transformer_md5:latest
    
  • You may override fields in the spec using CLI flags such as --name, --comm-type, etc.

    However, if your YAML file contains multiple ETL definitions, override flags cannot be used and will result in an error.

    In such cases, you should either:

    • Remove the override flags and apply the full multi-ETL spec as-is, or
    • Split the YAML file into individual files and initialize each ETL separately:

Listing ETLs

To view all currently initialized ETLs in the AIStore cluster, use either of the following commands:

ais etl show

or the equivalent:

ais job show etl

This will display all available ETLs along with their current status (initializing, running, stopped, etc.).


View ETL Specification

To view detailed information about one or more ETL jobs and their configuration, use:

ais etl show <ETL_NAME> [<ETL_NAME> ...]

This command displays detailed attributes of each ETL, including:

  • ETL Name
  • Communication Type
  • Runtime Configuration
    • Container image
    • Command
    • Environment variables
  • ETL Source (Full Pod specification, if applicable)

Note: You can also use the alias ais show etl <ETL_NAME> [<ETL_NAME> ...] for the same functionality.


View ETL Errors

Use this command to view errors encountered during ETL processing—either during inline transformations or offline (bucket-to-bucket) jobs.

Inline ETL Errors

To list errors from inline object transformations:

ais etl show errors <ETL_NAME>

Example Output:

OBJECT                 ECODE   ERROR
ais://non-exist-obj    404     object not found

Offline ETL (Bucket-to-Bucket) Errors

To list errors from a specific offline ETL job, include the job ID:

ais etl show errors <ETL_NAME> <OFFLINE-JOB-ID>

Example Output:

OBJECT                   ECODE   ERROR
ais://test-src/7         500     ETL error: <your-custom-error>
ais://test-src/8         500     ETL error: <your-custom-error>
ais://test-src/6         500     ETL error: <your-custom-error>

Here, <your-custom-error> refers to the error raised from within your custom transform function (e.g., in Python).


View ETL Logs

Use the following command to view logs for a specific ETL container:

ais etl view-logs <ETL_NAME> [TARGET_ID]
  • <ETL_NAME>: Name of the ETL.
  • [TARGET_ID] (optional): Retrieve logs from a specific target node. If omitted, logs from all targets will be aggregated.

Stop ETL

Stops a running ETL and tears down its underlying Kubernetes resources.

ais etl stop <ETL_NAME> [<ETL_NAME> ...]
  • Frees up system resources without deleting the ETL definition.
  • ETL can be restarted later without reinitialization.

You can also stop ETLs from a specification file:

ais etl stop -f <spec-file.yaml>   # Local file with one or more ETL specs
ais etl stop -f <URL>              # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle

Start ETL

Restarts a previously stopped ETL by recreating its associated containers on each target.

ais etl start <ETL_NAME> [<ETL_NAME> ...]
  • Useful when resuming work after a manual or error-triggered stop.
  • Retains all original configuration and transformation logic.

You can also start ETLs from a specification file:

ais etl start -f <spec-file.yaml>   # Local file with one or more ETL specs
ais etl start -f <URL>              # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle


Remove (Delete) ETL

Remove (delete) ETL jobs.

ais etl rm <ETL_NAME> [<ETL_NAME> ...]
  • Useful when resuming work after a manual or error-triggered stop.
  • Retains all original configuration and transformation logic.

You can also remove ETLs from a specification file:

ais etl rm -f <spec-file.yaml>   # Local file with one or more ETL specs
ais etl rm -f <URL>              # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle


Inline Transformation

Use inline transformation to process an object on-the-fly with a registered ETL. The transformed output is streamed directly to the client.

ais etl object <ETL_NAME> <BUCKET/OBJECT_NAME> <OUTPUT>

Examples

Transform an object and print to STDOUT

ais etl object transformer-md5 ais://shards/shard-0.tar -

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object and save the output to a file

ais etl object transformer-md5 ais://shards/shard-0.tar output.txt
cat output.txt

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object using ETL arguments

Use runtime arguments for customizable transformations. The argument is passed as a query parameter (etl_args) and must be handled by the ETL web server.

ais etl object transformer-hash-with-args ais://shards/shard-0.tar - --args=123

Output:

4af87d32ee1fb306

Learn more: Inline ETL Transformation


Single-Object Transformation

For operations on selected objects, use ais object and its subcommands.

In particular, notice two highlighted subcommands:

$ ais object <TAB-TAB>
get       put          *cp*        *etl*       set-custom   prefetch    show     cat
ls        promote      archive     concat      rm           evict       mv

To transform or copy a single object, you can interchangeably use ais object etl (or ais object cp), or their respective aliases - as shown below.

Examples

Copy and transform to a destination object (same or different bucket)

ais etl object transformer-md5 cp ais://src/image.jpg ais://dst/image-md5.txt

This command applies the ETL to the source object and stores the transformed result at the destination location.

  • <ETL_NAME> is the name of the registered ETL
  • cp indicates copy-and-transform
  • <SOURCE_OBJECT> is the full AIS URL of the object to transform
  • <DESTINATION> is either a specific object or a destination bucket (preserving source name)

For details and performance, see technical blog: Single-Object Transformation.


Offline Transformation

Use offline transformation to process entire buckets or a selected set of objects. The result is saved in a new destination bucket.

ais etl bucket <ETL_NAME> <SRC_BUCKET> <DST_BUCKET>

Here’s the command’s help as of v3.30:

$ ais etl bucket --help
NAME:
   ais etl bucket - Transform entire bucket or selected objects (to select, use '--list', '--template', or '--prefix').
   Examples:
     - 'ais etl bucket my-etl ais://src ais://dst'                                       transform all objects from source to destination bucket;
     - 'ais etl bucket my-etl ais://src ais://dst --prefix images/'                      transform objects with prefix 'images/';
     - 'ais etl bucket my-etl ais://src ais://dst --template "shard-{0001..0999}.tar"'   transform objects matching the template;
     - 'ais etl bucket my-etl s3://remote-src ais://dst --all'                           transform all objects including non-cached ones;
     - 'ais etl bucket my-etl ais://src ais://dst --dry-run'                             preview transformation without executing;
     - 'ais etl bucket my-etl ais://src ais://dst --num-workers 8'                       use 8 concurrent workers for transformation;
     - 'ais etl bucket my-etl ais://src ais://dst --prepend processed/'                  add prefix to transformed object names.

USAGE:
   ais etl bucket ETL_NAME SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET [command options]

OPTIONS:
   all          Transform all objects from a remote bucket including those that are not present (not cached) in cluster
   cont-on-err  Keep running archiving xaction (job) in presence of errors in any given multi-object transaction
   dry-run      Show total size of new objects without really creating them
   ext          Mapping from old to new extensions of transformed objects' names
   force,f      Force execution of the command (caution: advanced usage only)
   list         Comma-separated list of object or file names, e.g.:
                --list 'o1,o2,o3'
                --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
                or, when listing files and/or directories:
                --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
   num-workers  Number of concurrent workers; if omitted or zero defaults to a number of target mountpaths (disks);
                use (-1) to indicate single-threaded serial execution (ie., no workers);
                any positive value will be adjusted _not_ to exceed the number of target CPUs
   prefix       Select virtual directories or objects with names starting with the specified prefix, e.g.:
                '--prefix a/b/c'   - matches names 'a/b/c/d', 'a/b/cdef', and similar;
                '--prefix a/b/c/'  - only matches objects from the virtual directory a/b/c/
   prepend      Prefix to prepend to every object name during operation (copy or transform), e.g.:
                --prepend=abc   - prefix all object names with "abc"
                --prepend=abc/  - use "abc" as a virtual directory (note trailing filepath separator)
                                - during 'copy', this flag applies to copied objects
                                - during 'transform', this flag applies to transformed objects
   template     Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
                (with optional steps and gaps), e.g.:
                --template "" # (an empty or '*' template matches everything)
                --template 'dir/subdir/'
                --template 'shard-{1000..9999}.tar'
                --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
                and similarly, when specifying files and directories:
                --template '/home/dir/subdir/'
                --template "/abc/prefix-{0010..9999..2}-suffix"
   timeout      Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
                valid time units: ns, us (or µs), ms, s (default), m, h
   wait         Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
   help, h      Show help

Available Flags

Flag Description
--list Comma-separated list of object names (obj1,obj2).
--template Template pattern for object names (obj-{000..100}.tar).
--ext Extension transformation map ({jpg:txt}).
--prefix Prefix to apply to output object names.
--wait Block until transformation is complete.
--requests-timeout Per-object timeout for transformation.
--dry-run Simulate transformation without modifying cluster state.
--num-workers Number of concurrent workers to use during transformation.

Examples

Transform an entire bucket

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket
ais wait xaction <XACTION_ID>

Transform a subset of objects using a template

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --template "shard-{10..12}.tar"

Apply extension mapping and add a prefix

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --ext="{in1:out1,in2:out2}" --prefix="etl-" --wait

Perform a dry-run to preview changes

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --dry-run --wait

Output:

[DRY RUN] No modifications on the cluster
2 objects (20MiB) would have been put into bucket ais://dst_bucket

Learn more: Offline ETL Transformation