MONITORING-CLI
AIStore Observability: CLI
The CLI is the fastest way to interrogate an AIS cluster from a terminal. This page is a jump‑table to the handful of commands every SRE or developer uses when triaging performance or capacity issues. For full syntax hit --help on any command or see the separate CLI reference.
Table of Contents
- Installation
- Cluster Status
- Node Alerts
- Live Performance Monitoring
- Log Management
- Common Command Examples
- Best Practices
- Troubleshooting Common Issues
- CLI Resources
- Related Documentation
Installation
There are several ways to install AIS CLI:
- Using the installation script (recommended):
./scripts/install_from_binaries.sh --help
This script installs aisloader and CLI from the latest or previous GitHub release and enables CLI auto-completions.
-
Follow the quick-start instructions.
-
For detailed introduction (including installation) and usage, see the CLI Overview.
After installation, configure your AIS endpoint via the ais config cli
command or environment variables:
## HTTP
export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port
## or HTTPS
export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port
Cluster Status
Question | Command | Typical flags |
---|---|---|
Nodes and their respective health? Any alerts? Out of space? Out of memory? | ais show cluster |
--refresh 1m |
How much space is left? | ais storage summary |
--cached , --units , --prefix , --refresh |
Are any mountpaths down? | ais storage mountpath |
--fshc (to run filesystem health checker), --rescan-disks |
# Get summary of cluster membership, capacity, and health
ais show cluster
# As always, this (and all other) command's options are available via `--help`
ais show cluster --help
Example: Node-level Alerts
$ ais show cluster
PROXY MEM AVAIL LOAD AVERAGE UPTIME STATUS ALERT
p[KKFpNjqo][P] 127.77GiB [5.2 7.2 3.1] 108h30m40s online **tls-cert-will-soon-expire**
...
TARGET MEM AVAIL CAP USED(%) CAP AVAIL LOAD AVERAGE UPTIME STATUS ALERT
t[pDztYhhb] 98.02GiB 16% 960.824GiB [9.1 13.4 8.3] 108h30m1s online **tls-cert-will-soon-expire**
...
...
Node Alerts
AIStore node states are categorized into three severity levels:
- Red Alerts - Critical issues requiring immediate attention:
OOS
- Out of space conditionOOM
- Out of memory conditionOOCPU
- Out of CPU resourcesDiskFault
- Disk failures detectedNoMountpaths
- No available mountpathsNumGoroutines
- Excessive number of goroutinesCertificateExpired
- TLS certificate has expiredCertificateInvalid
- TLS certificate is invalid
- Warning Alerts - Potential issues that may require attention:
Rebalancing
- Rebalance operation in progressRebalanceInterrupted
- Rebalance was interruptedResilvering
- Resilvering operation in progressResilverInterrupted
- Resilver was interruptedNodeRestarted
- Node was restarted (powercycle, crash)MaintenanceMode
- Node is in maintenance modeLowCapacity
- Low storage capacity (OOS possible soon)LowMemory
- Low memory condition (OOM possible soon)LowCPU
- Low CPU availabilityCertWillSoonExpire
- TLS certificate will expire soonKeepAliveErrors
- Recent keep-alive errors detected
- Information States - Normal operational states:
ClusterStarted
- Cluster has started (primary) or node has joined clusterNodeStarted
- Node has started (may not have joined cluster yet)VoteInProgress
- Voting process is in progress
Node state flags are also exposed via Prometheus metrics - for details, see:
Live Performance Monitoring
ais performance
(alias ais show performance
) exposes five sub‑commands. The two most used are throughput and latency.
# 30‑second rolling throughput for all targets
$ ais performance throughput --refresh 30
# 10‑second latency slice, filter to GET operations
$ ais performance latency --refresh 10 --regex "get"
Key Flags
Flag | Meaning |
---|---|
--refresh <dur> |
Continuous mode; prints every dur |
--count <n> |
Stop after n refreshes |
--regex <re> |
Show only columns matching the regexp |
--no‑headers |
Suppress table headers |
See
cli-performance.md
for sub‑command specifics.
Log Management
Task | Command |
---|---|
Tail a given node’s log | ais log show --refresh DURATION --help |
Download all logs for a support bundle | ais cluster download-logs |
Rotate logs on one node | ais advanced rotate-logs <NODE_ID> |
For more details on log configuration and analysis, see Observability: Logs.
Common Command Examples
Here are some frequently used command combinations for everyday operations:
# Daily capacity & health snapshot
ais show cluster && ais storage summary
# Watch GET latency for a single target
ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)|cold)"
# Verify no misplaced objects in GCS buckets (non‑recursive)
ais scrub gs --nr --refresh 20s --count 3
Flags such as
--refresh <duration>
,--count <n>
,--regex <re>
,--no-headers
, and--units
are accepted by most monitoring commands; see--help
for the definitive list.
Best Practices
- Regular Health Checks: Run
ais show cluster
andais storage summary
daily to ensure cluster health and capacity - Performance Baselines: Establish baseline performance with
ais performance show
after initial deployment - Monitoring Script: Create a shell script with key monitoring commands for daily checks
- Alert Integration: Pipe CLI output to monitoring systems for automated alerting
- Log Collection: To collect logs, integrate with a Kubernetes monitoring stack or (at least) use
ais cluster download-logs
Troubleshooting Common Issues
Issue | CLI Command | What to Look For |
---|---|---|
Node experiencing problems or went offline | ais show cluster |
Check the ALERT column (example above) |
Disk failures | ais storage mountpath |
Look for disabled or detached mountpaths |
Performance degradation | ais performance --refresh 30s |
Compare against baseline numbers |
Failed operations | ais log show --severity error |
Common error patterns |
Network issues | ais status network |
High latency or timeout errors |
CLI Resources
ais help
- Reference guide
- Monitoring
- Cluster and node management
- Mountpath (disk) management
- Attach, detach, and monitor remote clusters
- Start, stop, and monitor downloads
- Distributed shuffle
- User account and access management
- Jobs
- AIS CLI Reference
Related Documentation
Document | Description |
---|---|
Overview | Introduction to AIS observability |
Logs | Configuring, accessing, and utilizing AIS logs |
Prometheus | Configuring Prometheus with AIS |
Metrics Reference | Complete metrics catalog |
Grafana | Visualizing AIS metrics with Grafana |
Kubernetes | Working with Kubernetes monitoring stacks |