AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. AIS consistently shows balanced I/O distribution and linear scalability across arbitrary numbers of clustered servers, producing performance charts that look as follows:
The picture above comprises 120 HDDs.
The ability to scale linearly with each added disk was, and remains, one of the main incentives behind AIStore. Much of the development is also driven by the ideas to offload dataset transformations to AIS clusters.
- Deploys anywhere. AIS clusters are immediately deployable on any commodity hardware, on any Linux machine(s).
- Highly available control and data planes, end-to-end data protection, self-healing, n-way mirroring, erasure coding, and arbitrary number of extremely lightweight access points.
- REST API. Comprehensive native HTTP-based API, as well as compliant Amazon S3 API to run unmodified S3 clients and apps.
- Unified namespace across multiple remote backends including Amazon S3, Google Cloud, and Microsoft Azure.
- Network of clusters. Any AIS cluster can attach any other AIS cluster thus gaining immediate visibility and fast access to the respective hosted datasets.
- Turn-key cache. Can be used as a standalone highly-available protected storage and/or LRU-based fast cache. Eviction watermarks, as well as numerous other management policies, are per-bucket configurable.
- ETL offload. The capability to run I/O intensive custom data transformations close to data, offline (dataset to dataset) and inline (on-the-fly).
- File datasets. AIS can be immediately populated from any file-based data source (local or remote, ad-hoc/on-demand or via asynchronus batch);
- Small files. Sharding. To serialize small files, AIS supports TAR, TAR.GZ, ZIP, and MessagePack formats, and provides the entire spectrum of operations to make the corresponding sharding transparent to the apps.
- Kubernetes. Provides for easy Kubernetes deployment via a separate GitHub repo and AIS/K8s Operator.
- Command line management. Integrated powerful CLI for easy management and monitoring.
- Access control. For security and fine-grained access control, AIS includes OAuth 2.0 compliant Authentication Server (AuthN). A single AuthN instance executes CLI requests over HTTPS and can serve multiple clusters.
- Distributed shuffle extension for massively parallel resharding of very large datasets;
- Batch jobs. APIs and CLI to start, stop, and monitor documented batch operations, such as
download, copy or transform datasets, and many more.
AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux
rsync(1) and similar.
For developers and data scientists, there’s also:
- native Go (language) API that we utilize in a variety of tools including CLI and Load Generator;
- native Python API, and Python SDK that also contains PyTorch integration and usage examples.
For the original AIStore white paper and design philosophy, for introduction to large-scale deep learning and the most recently added features, please see AIStore Overview (where you can also find six alternative ways to work with existing datasets). Videos and animated presentations can be found at videos.
Finally, getting started with AIS takes only a few minutes.
AIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all summarized here.
Since prerequisites boil down to, essentially, having Linux with a disk the deployment options range from all-in-one container to a petascale bare-metal cluster of any size, and from a single VM to multiple racks of high-end servers. But practical use cases require, of course, further consideration and may include:
|Local playground||AIS developers and development, Linux or Mac OS|
|Minimal production-ready deployment||This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets)|
|Easy automated GCP/GKE deployment||Developers, first-time users, AI researchers|
|Large-scale production deployment||Requires Kubernetes and is provided via a separate repository: ais-k8s|
Further, there’s the capability referred to as global namespace: given HTTP(S) connectivity, AIS clusters can be easily interconnected to “see” each other’s datasets. Hence, the idea to start “small” to gradually and incrementally build high-performance shared capacity.
For detailed discussion on supported deployments, please refer to Getting Started.
For performance tuning and preparing AIS nodes for bare-metal deployment, see performance.
When it comes to PyTorch, WebDataset is the preferred AIStore client.
WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives.
Further references include technical blog titled AIStore & ETL: Using WebDataset to train on a sharded dataset where you can also find easy step-by-step instruction.
Guides and References
- Getting Started
- Technical Blog
- API and SDK
- Amazon S3
- Create, destroy, list, copy, rename, transform, configure, evict buckets
- GET, PUT, APPEND, PROMOTE, and other operations on objects
- Cluster and node management
- Mountpath (disk) management
- Attach, detach, and monitor remote clusters
- Start, stop, and monitor downloads
- Distributed shuffle
- User account and access management
- Job (aka
- Security and Access Control
- Power tools and extensions
- Benchmarking and tuning Performance
- Buckets and Backend Providers
- Storage Services
- Cluster Management
- For developers
- Batch operations
Alex Aizman (NVIDIA)