CLI Reference for ETLs

This section documents ETL management operations with ais etl. But first, note:

As with global rebalance, dSort, and download, all ETL management commands can also be executed via ais job and ais show—the commands that, by definition, support all AIS xactions, including AIS-ETL.

For background on AIS-ETL, getting started, working examples, and tutorials, please refer to:

Table of Contents

Getting Started

Initializing an ETL

ETL Management

ETL Lifecycle Operations

Data Transformation


Commands

Top-level ETL commands include init, stop, show, and more:

$ ais etl --help
NAME:
   ais etl - Execute custom transformations on objects

USAGE:
   ais etl command [arguments...]  [command options]

COMMANDS:
   init  Initialize ETL using a runtime spec or full Kubernetes Pod spec YAML file (local or remote).
         - 'ais etl init -f <spec-file.yaml>'   deploy ETL from a local YAML file.
         - 'ais etl init -f <URL>'              deploy ETL from a remote YAML file.

   show       Show ETL(s).
              - 'ais etl show'                          list all ETL jobs.
              - 'ais etl show <ETL_NAME> [<ETL_NAME> ...]'   show detailed specification for specified ETL jobs.
              - 'ais etl show errors <ETL_NAME>'    show transformation errors for specified ETL.
   view-logs  View ETL logs.
              - 'ais etl view-logs <ETL_NAME>'                 show logs from all target nodes for specified ETL.
              - 'ais etl view-logs <ETL_NAME> <TARGET_ID>'   show logs from specific target node.
   start      Start ETL.
              - 'ais etl start <ETL_NAME>'   start the specified ETL (transitions from stopped to running state).
   stop       Stop ETL.
              - 'ais etl stop <ETL_NAME>'                 stop the specified ETL (transitions from running to stopped state).
              - 'ais etl stop --all'                        stop all running ETL jobs.
              - 'ais etl stop <ETL_NAME> <ETL_NAME2>'   stop multiple ETL jobs by name.
   rm         Remove ETL.
              - 'ais etl rm <ETL_NAME>'   remove (delete) the specified ETL.
                NOTE: If the ETL is in 'running' state, it will be automatically stopped before removal.
   object     Transform an object.
              - 'ais etl object <ETL_NAME> <BUCKET/OBJECT_NAME> <OUTPUT>'   transform object and save to file.
              - 'ais etl object <ETL_NAME> <BUCKET/OBJECT_NAME> -'            transform and output to stdout.
   bucket     Transform entire bucket or selected objects (to select, use '--list', '--template', or '--prefix').
              - 'ais etl bucket <ETL_NAME> <SRC_BUCKET> <DST_BUCKET>'                       transform all objects from source to destination bucket.
              - 'ais etl bucket <ETL_NAME> <SRC_BUCKET> <DST_BUCKET> --prefix <PREFIX>'   transform objects with specified prefix.

OPTIONS:
   --help, -h  Show help

Additionally, use --help to display any specific command.

Initializing an ETL

AIStore provides two ways to initialize an ETL using the CLI:


This method uses a YAML file that defines how your ETL should be initialized and run.

Key Fields in the Spec

Field Description Default
name Unique name for the ETL. See naming rules Required
runtime.image Docker image for the ETL container Required
runtime.command (Optional) Override the container’s default ENTRYPOINT with custom command and arguments None
communication (Optional) Communication method between AIS and the ETL container hpush://
argument (Optional) Argument passing method: "" (default) or "fqn" (mounts host filesystem) ""
init_timeout (Optional) Max time to wait for ETL to become ready 5m
obj_timeout (Optional) Max time to process a single object 45s
support_direct_put (Optional) Enable direct put optimization for offline transforms false

Sample ETL Spec

name: hello-world-etl
runtime:
  image: aistorage/transformer_hello_world:latest
  # Optional: Override the container entrypoint
  # command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

communication: hpush://
argument: fqn
init_timeout: 5m
obj_timeout: 45s
support_direct_put: true

CLI Usage

# From a local file
$ ais etl init -f spec.yaml

# From a remote URL
$ ais etl init -f <URL>

# Override values from the spec
$ ais etl init -f <URL> \
  --name=ETL_NAME \
  --comm-type=COMMUNICATION_TYPE \
  --arg-type=ARGUMENT_TYPE \
  --init-timeout=TIMEOUT \
  --obj-timeout=TIMEOUT

Note: CLI parameters take precedence over the spec file.


2. Using a Full Kubernetes Pod Spec (Advanced)

Use this option if you need full control over the ETL container’s deployment—such as advanced init containers, health checks, or if you’re not using the AIS ETL framework.

Example Pod Spec

# pod_spec.yaml
apiVersion: v1
kind: Pod
metadata:
  name: etl-echo
  annotations:
    communication_type: "hpush://"
    wait_timeout: "5m"
spec:
  containers:
    - name: server
      image: aistorage/transformer_md5:latest
      ports: [{ name: default, containerPort: 8000 }]
      command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4", "--log-level", "info", "--ws-max-size", "17179869184", "--ws-ping-interval", "0", "--ws-ping-timeout", "86400"]
      readinessProbe:
        httpGet: { path: /health, port: default }

CLI Usage

# Initialize ETL from a Pod spec
$ ais etl init -f pod_spec.yaml --name transformer-md5

Additional Notes

  • You can define multiple ETLs in a single YAML file by separating them with the standard YAML document separator ---.

    Example:

    name: hello-world-etl
    runtime:
      image: aistorage/transformer_hello_world:latest
    ---
    name: md5-etl
    runtime:
      image: aistorage/transformer_md5:latest
    
  • You may override fields in the spec using CLI flags such as --name, --comm-type, --arg-type, etc.

    However, if your YAML file contains multiple ETL definitions, override flags cannot be used and will result in an error.

    In such cases, you should either:

    • Remove the override flags and apply the full multi-ETL spec as-is, or
    • Split the YAML file into individual files and initialize each ETL separately:

Listing ETLs

To view all currently initialized ETLs in the AIStore cluster, use either of the following commands:

ais etl show

or the equivalent:

ais job show etl

This will display all available ETLs along with their current status (initializing, running, stopped, etc.).


View ETL Specification

To view detailed information about one or more ETL jobs and their configuration, use:

ais etl show <ETL_NAME> [<ETL_NAME> ...]

This command displays detailed attributes of each ETL, including:

  • ETL Name
  • Communication Type
  • Argument Type (e.g., “” or “fqn”(fully qualified path))
  • Runtime Configuration
    • Container image
    • Command
    • Environment variables
  • ETL Source (Full Pod specification, if applicable)

Note: You can also use the alias ais show etl <ETL_NAME> [<ETL_NAME> ...] for the same functionality.


View ETL Errors

Use this command to view errors encountered during ETL processing—either during inline transformations or offline (bucket-to-bucket) jobs.

Inline ETL Errors

To list errors from inline object transformations:

ais etl show errors <ETL_NAME>

Example Output:

OBJECT                 ECODE   ERROR
ais://non-exist-obj    404     object not found

Offline ETL (Bucket-to-Bucket) Errors

To list errors from a specific offline ETL job, include the job ID:

ais etl show errors <ETL_NAME> <OFFLINE-JOB-ID>

Example Output:

OBJECT                   ECODE   ERROR
ais://test-src/7         500     ETL error: <your-custom-error>
ais://test-src/8         500     ETL error: <your-custom-error>
ais://test-src/6         500     ETL error: <your-custom-error>

Here, <your-custom-error> refers to the error raised from within your custom transform function (e.g., in Python).


View ETL Logs

Use the following command to view logs for a specific ETL container:

ais etl view-logs <ETL_NAME> [TARGET_ID]
  • <ETL_NAME>: Name of the ETL.
  • [TARGET_ID] (optional): Retrieve logs from a specific target node. If omitted, logs from all targets will be aggregated.

Stop ETL

Stops a running ETL and tears down its underlying Kubernetes resources.

ais etl stop <ETL_NAME>
  • Frees up system resources without deleting the ETL definition.
  • ETL can be restarted later without reinitialization.

More info ETL Pod Lifecycle

Start ETL

Restarts a previously stopped ETL by recreating its associated containers on each target.

ais etl start <ETL_NAME>
  • Useful when resuming work after a manual or error-triggered stop.
  • Retains all original configuration and transformation logic.

More info ETL Pod Lifecycle


Inline Transformation

Use inline transformation to process an object on-the-fly with a registered ETL. The transformed output is streamed directly to the client.

ais etl object <ETL_NAME> <BUCKET/OBJECT_NAME> <OUTPUT>

Examples

Transform an object and print to STDOUT

ais etl object transformer-md5 ais://shards/shard-0.tar -

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object and save the output to a file

ais etl object transformer-md5 ais://shards/shard-0.tar output.txt
cat output.txt

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object using ETL arguments

Use runtime arguments for customizable transformations. The argument is passed as a query parameter (etl_args) and must be handled by the ETL web server.

ais etl object transformer-hash-with-args ais://shards/shard-0.tar - --args=123

Output:

4af87d32ee1fb306

Learn more: Inline ETL Transformation


Offline Transformation

Use offline transformation to process entire buckets or a selected set of objects. The result is saved in a new destination bucket.

ais etl bucket <ETL_NAME> <SRC_BUCKET> <DST_BUCKET>

Available Flags

Flag Description
--list Comma-separated list of object names (obj1,obj2).
--template Template pattern for object names (obj-{000..100}.tar).
--ext Extension transformation map ({jpg:txt}).
--prefix Prefix to apply to output object names.
--wait Block until transformation is complete.
--requests-timeout Per-object timeout for transformation.
--dry-run Simulate transformation without modifying cluster state.
--num-workers Number of concurrent workers to use during transformation.

Examples

Transform an entire bucket

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket
ais wait xaction <XACTION_ID>

Transform a subset of objects using a template

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --template "shard-{10..12}.tar"

Apply extension mapping and add a prefix

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --ext="{in1:out1,in2:out2}" --prefix="etl-" --wait

Perform a dry-run to preview changes

ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --dry-run --wait

Output:

[DRY RUN] No modifications on the cluster
2 objects (20MiB) would have been put into bucket ais://dst_bucket

Learn more: Offline ETL Transformation