User Tools¶
Rclone¶
rclone is a command-line tool for interacting with cloud and object storage systems. On Anvil, it can be used to connect to the S3-compatible object storage service and manage buckets and files.
1. Load the rclone module¶
First, load the rclone module on Anvil:
Verify installation:
2. Start the rclone configuration¶
Run the configuration and create a new remote:
Inside the interactive menu:
- Type n to create a new remote
- Enter a name (example):
3. Select storage type¶
When prompted for the storage type, choose:
4. Enter credentials¶
Provide Access Key ID and Secret Access Key These are provided by RCAC team when your object storage account is created.
5. Set connection parameters¶
Use:
- Endpoint URL: https://s3.anvil.rcac.purdue.edu
- Region: leave blank (unless provided)
- Location constraint: leave blank
For the remaining configuration prompts, accept the default values unless otherwise specified, then confirm and save the configuration.
Test the connection¶
List files in your bucket (the bucket name will be provided by the RCAC team):
Common rclone commands¶
Show config
List files in a bucket
List directories in a bucket
Upload a file
Upload a directory
Download data
sync directories
Warning
sync will delete files at the destination that do not exist in the source.
Example configuration file¶
Location:
Example:
s3cmd¶
s3cmd is a command-line tool for interacting with S3-compatible object storage. On Anvil, it can be used to manage buckets and transfer data to and from object storage.
Note
Unlike rclone, which supports multiple named remotes, s3cmd relies on a single configuration file (~/.s3cfg) and is generally limited to one endpoint at a time. To work with multiple endpoints, use separate configuration files with the --config option.
1. Verify s3cmd¶
Verify installation on the cluster:
2. Start the configuration¶
Run:
3. Enter credentials¶
Provide the following when prompted: Access Key and Secret Key. These are provided by the RCAC team when your object storage account is created.
4. Set connection parameters¶
When prompted for endpoint and DNS settings, use:
-
S3 Endpoint:
s3.anvil.rcac.purdue.edu -
DNS-style bucket+hostname:
%(bucket)s.s3.anvil.rcac.purdue.edu -
Use HTTPS:
Yes -
Default region:
Leave blank
For the remaining configuration prompts, accept the default values unless otherwise specified, then confirm and save the configuration.
After configuration completes, you can check them here:
Ensure the following settings:
Warning
Leave host_bucket empty if you want to enforce path-style addressing.
Test the connection¶
List files in your bucket (the bucket name will be provided by the RCAC team):
Common s3cmd commands¶
List buckets
List files in a bucket
Upload a file
Upload a directory
Download a file
Download a directory
Remove a file
Verbose debug
Example configuration file¶
Location:
Example:
Python boto3¶
boto3 is the official AWS SDK for Python. It can also be used with S3-compatible object storage such as Anvil’s Ceph-based storage by specifying a custom endpoint.
1. Load Conda module and create your environment¶
On Anvil, load conda module or activate your environment:
2. Configure credentials¶
You can provide credentials in two ways:
Environment variables¶
Then in Python:
Directly in code¶
3. Test the connection¶
List available buckets:
4. Access your bucket¶
List files in your bucket (the bucket name will be provided by the RCAC team):
Common boto3 operations¶
Upload a file
Download a file
Upload a directory
Delete a file
Advanced: Use a custom session
Advanced: Streaming file (no download)
Notes¶
- Keep your credentials secure
- Access depends on bucket permissions
- Shared buckets may be accessible without ownership