S3COMPAT
AIStore & Amazon S3 Compatibility
AIStore (AIS) is a lightweight, distributed object storage system designed to scale linearly with each added storage node. It provides a uniform API across various storage backends while maintaining high performance for AI/ML and data analytics workloads.
AIS integrates with Amazon S3 on three fronts:
-
Backend storage – via the backend provider abstraction, AIStore can be utiized to access (and cache or reliably store in-cluster) a remote cloud bucket such as
s3://my-bucket
(aws://
is accepted as an alias). This provides seamless access to existing S3 data. -
Front‑end compatibility – every gateway speaks the S3 REST API. The default endpoint is
http(s)://gw-host:port/s3
, but you can enable theS3-API-via-Root
feature flag to serve requests at the cluster root (http(s)://gw-host:port/
). The same API works uniformly across all bucket types—nativeais://
, cloud‑backeds3://
,gs://
, and more. -
Presigned request offload – AIS can receive a presigned S3 URL, execute it, and store the resulting object in the cluster. This lets you leverage S3’s authentication while using AIS for storage.
Which interface should I use?
AIS exposes a pure S3 surface for seamless compatibility and a native API for advanced, cluster‑aware workloads. The table below helps decide which path fits your scenario:
Use the S3 compatibility API when you… | Use the native AIS API / CLI when you… |
---|---|
Need drop‑in support for unmodified S3 tools & SDKs (aws , boto3 , s3cmd , …) |
Want cluster‑wide batch jobs (ais etl , ais prefetch , ais copy , ais archive , …) |
Rely on an existing S3‑centric workflow or third‑party app | Need fine‑grained control‑plane ops (ais cluster , ais bucket props set , node lifecycle) |
Accept MD5‑based ETag semantics—even though MD5 is slower and not crypto‑secure | Value AIS‑native features: virtual directories, adaptive rate‑limiting, WebSocket ETL, streaming cold‑GET, etc. |
Accept that some S3 features (CORS, Website hosting, CloudFront) are not yet implemented | Care about advanced list-objects options (to list shards), working with remote clusters, non-S3 buckets) |
Are okay with slight performance overhead from the S3‑to‑AIS adaptation layer (MD5 hashing, XML marshaling/translation) | Want full Prometheus visibility with AIS‑rich metrics & labels |
Table of Contents
- Quick Start
- Configuring Clients
- Using s3cmd with AIS
- Supported Operations
- S3 Bucket Inventory
- Compatibility Matrix
- Boto3 Examples
- FAQs & Troubleshooting
- Further reading
Environment assumption – Local Playground
The CLI examples below use
localhost:8080
, which is the default endpoint when running AIS in the Local Playground.For other deployment modes (including Kubernetes Playground, Docker Compose, bare-metal cluster, or Kubernetes for production deployments) — replace the
host:port
with any AIS gateway endpoint.See Deployment Options and the main project Features list for a broader overview.
Quick Start
Quick start with aws CLI
AWS_EP=http://localhost:8080/s3
aws --endpoint-url "$AWS_EP" s3 mb s3://demo
aws --endpoint-url "$AWS_EP" s3 cp README.md s3://demo/
aws --endpoint-url "$AWS_EP" s3 ls s3://demo
# Expected output:
# 2023-05-14 14:25 10493 s3://demo/README.md
Quick start with s3cmd
One‑liner (HTTP):
s3cmd put README.md s3://demo \
--no-ssl \
--host=localhost:8080/s3 \
--host-bucket="localhost:8080/s3/%(bucket)"
# Expected output:
# upload: 'README.md' -> 's3://demo/README.md' [1 of 1]
# 10493 of 10493 100% in 0s 4.20 MB/s done
Tip — use a cluster‑specific
.s3cfg
so you can drop the--host*
flags. See the Example .s3cfg section below.
Configuring Clients
Finding the AIS endpoint
Choose any gateway’s host:port
and append /s3
, e.g. 10.10.0.1:51080/s3
. All gateways accept reads and writes, so you can connect to any of them.
In fact, AIS gateways are completely equivalent, API-wise.
Checksum considerations
Amazon S3’s ETag
is MD5 (or a multipart hash); AIS defaults to xxhash for better performance. To avoid client mismatch warnings, set MD5 per bucket:
ais bucket props set ais://demo checksum.type=md5 # per bucket
# OR cluster‑wide default
ais config cluster checksum.type=md5
Setting the checksum type to MD5 ensures compatibility with S3 clients that validate checksums, though it comes with a minor performance cost compared to xxhash.
HTTPS vs HTTP
Enable TLS in ais.json
(net.http.use_https=true
) or pass --no-ssl
/--insecure
flags when using tools. By default, AIS uses HTTP, while many S3 clients expect HTTPS.
Using s3cmd
with AIS
Interactive s3cmd --configure
transcript
$ s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Access Key: FAKEKEY
Secret Key: FakeSecret
Default Region [US]:
Use HTTPS protocol [Yes]: n
HTTP Proxy server name:
New settings:
Access Key: FAKEKEY
Secret Key: FAKESECRET
Region: US
Use HTTPS protocol: False
HTTP Proxy server name:
Test access with supplied credentials? [Y/n] y
Please wait, attempting to list buckets...
During the test, enter your AIS endpoint when prompted:
Hostname: localhost:8080/s3
Bucket host: localhost:8080/s3/%(bucket)
Success. Your access key and secret key worked fine :-)
...
Save settings? [y/N] y
Configuration saved to ~/.s3cfg
Example .s3cfg
Edit your ~/.s3cfg
file to include these lines (replace with your actual gateway endpoint):
host_base = 10.10.0.1:51080/s3
host_bucket = 10.10.0.1:51080/s3/%(bucket)
access_key = FAKEKEY
secret_key = FAKESECRET
signature_v2 = False
use_https = False
This configuration allows you to run s3cmd
commands without having to specify the host parameters each time.
Multipart uploads with s3cmd
For large files, use multipart uploads to improve reliability and performance:
s3cmd put ./large.bin s3://demo --multipart-chunk-size-mb=8
The optimal chunk size depends on your network conditions and file size, but 8-16MB chunks work well for most cases.
Authentication (JWT) tips
If AIS Authentication is enabled, you’ll need to attach a JWT token manually because s3cmd
’s signer overwrites the --add-header
option:
# s3cmd/S3/S3.py (single‑line patch)
- pass
+ self.headers["Authorization"] = "Bearer <token>"
Replace <token>
with your actual JWT token. This modification ensures the token is included in every request.
Supported Operations
PUT / GET / HEAD
Regular verbs work with aws
, s3cmd
, or the native ais
CLI:
# Using aws CLI
aws --endpoint-url "$AWS_EP" s3api put-object --bucket demo --key obj --body file.txt
aws --endpoint-url "$AWS_EP" s3api get-object --bucket demo --key obj output.txt
aws --endpoint-url "$AWS_EP" s3api head-object --bucket demo --key obj
# Output:
# {
# "AcceptRanges": "bytes",
# "LastModified": "Wed, 14 May 2025 14:30:22 GMT",
# "ContentLength": 1024,
# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\"",
# "ContentType": "text/plain"
# }
# Native AIS CLI equivalents
ais put file.txt ais://demo/obj
ais get ais://demo/obj output.txt
ais object show ais://demo/obj
Range reads
S3 API supports byte range requests for partial object downloads:
aws s3api get-object \
--range bytes=0-99 \
--bucket demo --key README.md \
part.txt \
--endpoint-url "$AWS_EP"
# Output:
# {
# "AcceptRanges": "bytes",
# "LastModified": "Wed, 14 May 2025 14:30:22 GMT",
# "ContentRange": "bytes 0-99/10493",
# "ContentLength": 100,
# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\"",
# "ContentType": "text/plain"
# }
This would download only the first 100 bytes of the file.
Multipart uploads with aws CLI
# 1 — initiate
aws s3api create-multipart-upload --bucket demo --key big \
--endpoint-url "$AWS_EP"
# Output:
# {
# "Bucket": "demo",
# "Key": "big",
# "UploadId": "xu3DvVzJK"
# }
# 2 — upload parts individually
aws s3api upload-part --bucket demo --key big \
--part-number 1 --body part1.bin \
--upload-id "YOUR-UPLOAD-ID" \
--endpoint-url "$AWS_EP"
# Repeat for each part, incrementing the part-number
# Output:
# {
# "ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\""
# }
# 3 — complete (requires a JSON file listing all parts)
aws s3api complete-multipart-upload --bucket demo --key big \
--upload-id "YOUR-UPLOAD-ID" --multipart-upload file://parts.json \
--endpoint-url "$AWS_EP"
Example parts.json
:
{
"Parts": [
{
"PartNumber": 1,
"ETag": "etag-from-upload-part-response"
},
{
"PartNumber": 2,
"ETag": "etag-from-upload-part-response"
}
]
}
Presigned S3 requests
Presigned URLs allow temporary access to objects without sharing credentials:
-
Enable feature:
ais config cluster features S3-Presigned-Request
-
Generate URL (typically done on the system with AWS credentials):
aws s3 presign s3://demo/README.md --endpoint-url https://s3.amazonaws.com # Returns a URL with authentication parameters in the query string
-
Replace the host with
AIS_ENDPOINT/s3
and add header when using path style:curl -H 'ais-s3-signed-request-style: path' '<PRESIGNED_URL>' -o README.md
This allows AIS to handle the authenticated S3 request on behalf of the client.
S3 Bucket Inventory Support
Why inventories matter
For buckets with millions of objects, using S3’s ListObjectsV2
API can be slow, expensive, and time-consuming. S3 Bucket Inventory provides a manifest file (CSV/ORC/Parquet) of all objects that AIS can process instantly. This is crucial for efficiently working with very large buckets.
For example, listing a 10-million object bucket could take:
- Without inventory: Minutes of API calls, potentially costing money
- With inventory: Seconds to parse a pre-generated manifest file
Enabling inventory via AWS CLI
aws s3api put-bucket-inventory-configuration \
--bucket webdataset \
--id ais-scan \
--inventory-configuration file://inventory.json
inventory.json
{
"Id": "ais-scan",
"IsEnabled": true,
"IncludedObjectVersions": "All",
"Destination": {
"S3BucketDestination": {
"Bucket": "arn:aws:s3:::webdataset-inv",
"Format": "CSV"
}
},
"Prefix": "",
"Schedule": { "Frequency": "Daily" }
}
Managing inventories with helper scripts
AIS repo ships ready‑to‑use wrappers in scripts/s3/
:
Script | Purpose |
---|---|
put-bucket-inventory.sh |
create/update an inventory |
list-bucket-inventory.sh |
list existing configs |
delete-bucket-inventory.sh |
disable inventory |
get-bucket-inventory.sh |
show detailed info for a specific inventory |
put-bucket-policy.sh |
grant access for S3 to store inventory files |
These scripts simplify inventory management with minimal required parameters.
Inventory XML example
<InventoryConfiguration>
<Id>ais-scan</Id>
<IsEnabled>true</IsEnabled>
<IncludedObjectVersions>All</IncludedObjectVersions>
<Schedule>
<Frequency>Daily</Frequency>
</Schedule>
<Destination>
<S3BucketDestination>
<Bucket>arn:aws:s3:::webdataset-inv</Bucket>
<Format>CSV</Format>
</S3BucketDestination>
</Destination>
</InventoryConfiguration>
Once inventory lands (typically after 24 hours for the first run), you can instantly list bucket contents:
ais ls s3://webdataset --all --inventory
This is dramatically faster than listing objects via the standard API calls, particularly for large buckets.
Compatibility Matrix
S3 feature | AIS | s3cmd | aws CLI |
---|---|---|---|
Create/Destroy bucket | ✅ | ✅ mb/rb |
✅ mb/rb |
PUT / GET / HEAD object | ✅ | ✅ put/get/info |
✅ cp/head |
Range reads | ✅ | — | ✅ get-object --range |
Multipart upload | ✅ | ✅ | ✅ |
Copy object | S3 API only | partial | ✅ |
Inventory listing | ✅ | — | — |
Authentication | JWT | modified | ✅ |
Presigned URLs | ✅ | — | ✅ |
Not yet supported: Regions, CORS, Website hosting, CloudFront; full ACL parity (AIS uses its own ACL model).
Boto3 Examples
Python applications can use Boto3 (the AWS SDK for Python) to connect to AIStore. Since AIStore implements S3 API compatibility, most standard Boto3 S3 operations work with minimal changes.
Prerequisites
For Boto3 to work with AIStore, you need to patch Boto3’s redirect handling:
# Import the patch before using boto3
from aistore.botocore_patch import botocore
import boto3
This patch modifies Boto3’s HTTP client behavior to handle AIStore’s redirect-based load balancing. For details, see the Boto3 compatibility documentation.
Client Initialization
client = boto3.client(
"s3",
region_name="us-east-1", # Any valid region will work
endpoint_url="http://localhost:8080/s3", # Your AIStore endpoint
aws_access_key_id="YOUR_ACCESS_KEY", # Can be dummy values when
aws_secret_access_key="YOUR_SECRET_KEY", # AIS auth is disabled
)
Basic Bucket Operations
# Create a bucket
client.create_bucket(Bucket="my-bucket")
# List all buckets
response = client.list_buckets()
bucket_names = [bucket["Name"] for bucket in response["Buckets"]]
print(f"Existing buckets: {bucket_names}")
# Delete a bucket
client.delete_bucket(Bucket="my-bucket")
Object Operations
# Upload an object
client.put_object(
Bucket="my-bucket",
Key="sample.txt",
Body="Hello, AIStore!"
)
# Download an object
response = client.get_object(Bucket="my-bucket", Key="sample.txt")
content = response["Body"].read().decode("utf-8")
print(f"Object content: {content}")
# Delete an object
client.delete_object(Bucket="my-bucket", Key="sample.txt")
Multipart Upload Example
For large files, you can use multipart uploads:
import boto3
from aistore.botocore_patch import botocore
# Initialize the client
client = boto3.client(
"s3",
region_name="us-east-1",
endpoint_url="http://localhost:8080/s3",
aws_access_key_id="FAKEKEY",
aws_secret_access_key="FAKESECRET",
)
# Create a bucket if it doesn't exist
bucket_name = "multipart-demo"
try:
client.head_bucket(Bucket=bucket_name)
except:
client.create_bucket(Bucket=bucket_name)
# Prepare data for multipart upload
object_key = "large-file.txt"
data = "x" * 10_000_000 # 10MB of data
chunk_size = 5_000_000 # 5MB chunks
chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]
# Initiate multipart upload
response = client.create_multipart_upload(Bucket=bucket_name, Key=object_key)
upload_id = response["UploadId"]
# Upload individual parts
parts = []
for i, chunk in enumerate(chunks):
part_num = i + 1 # Part numbers start at 1
response = client.upload_part(
Body=chunk,
Bucket=bucket_name,
Key=object_key,
PartNumber=part_num,
UploadId=upload_id,
)
parts.append({"PartNumber": part_num, "ETag": response["ETag"]})
# Complete the multipart upload
client.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id,
MultipartUpload={"Parts": parts}
)
# Verify upload was successful
response = client.head_object(Bucket=bucket_name, Key=object_key)
print(f"Successfully uploaded {response['ContentLength']} bytes")
FAQs & Troubleshooting
Symptom | Cause & Fix |
---|---|
MD5 sum mismatch |
Set bucket checksum to md5 with ais bucket props set BUCKET checksum.type=md5 |
SSL certificate problem |
Self-signed cert ➜ use --no-ssl or --insecure |
Presigned URL fails | Add header ais-s3-signed-request-style: path for path‑style URLs |
Authorization header missing (s3cmd + AuthN) |
Patch S3.py to include JWT as shown in Authentication |
Upload fails with timeout | Try with smaller multipart chunks (e.g., --multipart-chunk-size-mb=5 ) |
Unable to list large S3 bucket | Use bucket inventory to efficiently list contents |
Boto3/TensorFlow integration issues | See Boto3 compatibility patch for redirects |