Year-End Productivity Toolkit¶
1. Zero-Friction Access¶
A. SSH Keys¶
Goal: Enable password-less login to run automated scripts and transfer files seamlessly.
-
Generate a Key Pair
Run this on your local computer (Terminal or PowerShell). Press
Enterto accept defaults (file location and no passphrase). -
Copy Public Key to Cluster
Send your public key to the cluster. Replace
boileridwith your actual username.Duo Auth Tip
When prompted for your password, type your password followed by
,push(e.g.,password,push) to approve the Duo request. -
Test Connection
You should now be able to log in without typing a password:
B. SSH Configuration¶
Goal: Simplify login commands (e.g., type ssh bell instead of ssh user@bell.rcac.purdue.edu).
-
Create/Edit the Config File
Open
~/.ssh/configon your local computer using a text editor (VS Code, Nano, Notepad, etc.). -
Paste the Configuration
Copy the block below. Be sure to replace
boileridwith your specific username. -
Usage
You can now connect instantly using the short names:
2. Shell Aliases & Dotfiles¶
Goal: Save keystrokes on repetitive commands and prevent home directory quota issues.
A. The Setup¶
Instead of cluttering your main .bashrc file, create a separate file for shortcuts.
-
Create the file:
-
Paste the content below (Customize the
APPTAINER_CACHEDIRpath!). -
Activate it:
Open your
.bashrc(nano ~/.bashrc) and ensure this block exists (it usually does by default): -
Reload:
Run
source ~/.bashrcto apply changes immediately.
B. Recommended Aliases¶
Copy this into your ~/.bash_aliases file.
C. Dotfiles Management¶
Version control your configuration to keep clusters and local machines in sync.
-
Create a private GitHub repository named
dotfiles. -
Move your config files there and symlink them back.
3. VS Code Remote SSH¶
Goal: Edit cluster files with a full graphical interface, syntax highlighting, and integrated terminals.
Setup Steps
-
Install: Download VS Code locally.
-
Extension: Open VS Code, go to Extensions (square icon on left), and install Remote - SSH.
-
Connect: Click the green "><" icon (bottom-left corner) -> Connect to Host... -> Select
bell(or your target cluster).
Essential Extensions (Install on Remote) Once connected, install these in the "SSH: bell" section of your extensions pane:
-
Python & Pylance
For code completion and debugging.
-
ShellCheck
To catch errors in your bash scripts automatically.
-
GitLens
To visualize who edited code and when.
-
R
For R language support (requires extra config).
Editing Tips
- Integrated Terminal: Press `Ctrl + `` to open a terminal. It opens automatically in your current remote folder.
- Consistency: If your config uses a specific node (e.g.,
bell-fe02), stick to it. Connecting to different nodes launches multiple VS Code server instances, which wastes resources. - Drag & Drop: Drag files from your local computer into the VS Code file explorer to upload them instantly.
Reference
View Quick Reference (PDF) Includes configurations for Jupyter Kernels and R paths.
4. Conda Environment Management¶
Goal: Isolate software dependencies per project and avoid "works on my machine" issues.
Setup: Avoid Quota Issues Conda environments are large. Configure them to store data in your Depot space, not your Home directory (which has strict limits).
-
Create the directory:
mkdir -p /depot/your_lab/user/conda_envs -
Edit your config:
nano ~/.condarc -
Add this text:
Quick Commands (Use mamba for speed)
The RCAC conda module includes mamba, which is faster than standard conda.
| Action | Command |
|---|---|
| Start | module load conda |
| Create | mamba create -n my_env python=3.10 |
| Activate | conda activate my_env |
| Install | mamba install bioconda::samtools |
| List Envs | conda env list |
| Remove | conda env remove -n my_env |
Reproducibility: The "Time Capsule" Before you finish a project, save a snapshot of your environment. This guarantees reproducibility.
-
Export (Save):
-
Import (Load):
View Quick Reference (PDF) Includes visual workflows for creating vs. using environments and version pinning examples.
5. Containers & Overlays¶
Goal: Use software that runs the same way everywhere, without installation headaches or file quota (inode) issues.
Why Apptainer (Singularity)?
- Speed: A container is a single file (
.sif). It loads faster than a Conda environment with 30,000 tiny files. - Portability: Build it on your laptop, run it on Bell, run it on Anvil. It always works.
Quick Commands
The "Inode Saver" (Overlays)
Use Overlays for High File Counts
Some tools (like maker or trinity) create millions of small temporary files, which hits your file count limit (quota) even if you have storage space left.
Solution: Create an overlay file. The tool writes its temp files inside this single file, hiding them from the file system.
View Quick Reference (PDF) Includes recipes for building your own containers and advanced bind-mount usage.
6. Custom Bin Directory¶
Goal: Standardize your tools and run complex commands with a single word.
Setup
-
Create the directory:
mkdir -p ~/bin -
Add to your
$PATH(if not already there): -
Reload:
source ~/.bashrc
Recommended Wrapper Scripts
Save these files in ~/bin, then run chmod +x ~/bin/* to make them executable.
The "Sam to Sorted Bam" Converter¶
Converts SAM to BAM, sorts it, and indexes it in one go.
The BWA-MEM Wrapper¶
Runs BWA from a container without needing to load modules or remember the container path.
Fast BAM Sort¶
Quickly sort a BAM file using available threads.
7. Organizing Projects¶
Goal: Structure your data and code so that collaborators (and "Future You") can understand and reproduce your work.
The "Standard" Directory Tree Adopt this structure for every new experiment to keep things consistent.
Best Practices
- Raw Data is Sacred: Never write to
01_data/raw. Treat it as read-only. - Number Your Scripts: Name them in execution order (e.g.,
01_qc,02_align,03_call_variants). - No Spaces: Use underscores or hyphens in file names (e.g.,
Airway_Study, notAirway Study). - Relative Paths: Write scripts that reference
../01_datarather than/scratch/bell/user/project/...so the folder is portable.
Version Control (Git)
Track your 02_scripts folder, but ignore large data files.
View Quick Reference (PDF) Includes a step-by-step workflow for starting a new project.
8. File Transfers¶
Goal: Move data efficiently without corrupting files or freezing your terminal.
Choose the Right Tool
| Data Size | Recommended Tool | Why? |
|---|---|---|
| < 1 GB | scp |
Quick, simple, no setup required. |
| Directories | rsync |
Resumes if interrupted, syncs only changes. |
| Big Data (>100GB) | Globus | "Fire and forget." Fast, reliable, background transfer. |
| Cloud (Box/Drive) | rclone |
Command-line sync for cloud storage. |
1. rsync (The Workhorse)
Use this for moving project folders between your laptop and the cluster, or between scratch and depot.
-a: Archive (preserve permissions/times)-v: Verbose-P: Partial + Progress (allows resuming)
2. scp (Quick Copy)
Best for grabbing a single config file or script.
3. Globus (Big Data) For terabytes of sequencing data, do not use the command line. Use Globus for reliable, high-speed transfers that run in the background. First install Globus Connect Personal on your laptop and set it up.
- Log in to transfer.rcac.purdue.edu.
- Panel 1: Select "Purdue Rossmann/Bell/Anvil".
- Panel 2: Select your personal endpoint (laptop) or collaborator's endpoint.
- Drag and drop. You can close your browser; Globus will email you when it's done.
View Quick Reference (PDF) Includes detailed setup for Globus Connect Personal.
9. Storage Strategy¶
Goal: Balance performance and data safety. Don't let a full disk crash your pipeline.
The Hierarchy
| Location | Path | Speed | Persistence | Use Case |
|---|---|---|---|---|
| Local Scratch | $TMPDIR |
Fastest | Temporary (Job only) | High I/O (thousands of reads/writes). |
| Scratch | $RCAC_SCRATCH |
Fast | Purged (volatile) | Active analysis & intermediate files. |
| Depot | /depot/lab |
Slower | Permanent | Long-term storage & shared data. |
| Home | $HOME |
Slower | Permanent (Small Quota) | Config files, scripts, small logs. |
Quota Survival Commands
-
Check Status: Run
myquotato see your usage across all filesystems.- Watch out: You have limits on both Size (GB/TB) and Files (inodes).
-
Find Space Hogs: If you hit your limit, use these commands to find which directories are taking up space:
Clean Up Routine
Volatile Storage
Scratch is not for storage. Files are purged if not accessed for a set period.
- Compress:
tar -czf results.tar.gz results/ - Archive: Move tarballs to Data Depot.
- Delete:
rm -rf results/from scratch.
10. Journaling & Notes¶
Goal: Build a "Second Brain." Your shell history (history | grep cmd) is temporary; your notes are permanent.
The Strategy: Digital Lab Notebook Use tools like Obsidian (markdown-based) or OneNote.
- The "One-Liner" Library: Save those complex
awk,sed, orslurmcommands you spent hours perfecting. - Error Logs: When you fix a weird error, paste the exact error message and your solution. This makes it searchable next time you see it.
- Metadata: Document the "Why," not just the "How." (e.g., "Used parameter
-n 10because memory failed at-n 5").
Example Note Structure:
11. Reporting with RMarkdown¶
Goal: Stop copy-pasting images into PowerPoint. Create reports where the code is the documentation.
The Workflow
- Code + Narrative: Use RMarkdown (
.Rmd) to write your analysis code and your explanation in the same file. - Knit to HTML: Generate a single HTML file containing your interactive plots, tables, and code.
- Share: Push the HTML to your repo and enable GitHub Pages. Send your PI a link, not a zip file.
Quick Start Header Use this YAML header to add a table of contents and code folding (hides code by default so non-coders can read the report).
View Quick Reference (PDF) Official reference for syntax, chunk options, and formatting.
12. Version Control (Git)¶
Goal: Save your work history so you can undo mistakes and collaborate without emailing zip files.
1. First-Time Setup Tell Git who you are (run once per computer):
2. Starting a Project
- New Project:
cd my_projectthengit init - Existing Repo:
git clone https://github.com/username/repo.git
3. The Daily Workflow (Save & Sync)
-
Check: See what changed.
-
Stage: Select files to save (use
.for everything). -
Commit: Save the snapshot with a message.
-
Push: Send changes to GitHub.
4. The "Golden Rule" of Bioinformatics Git
Danger
Never commit large data files (FASTQ, BAM, big CSVs).
-
Solution: Create a file named
.gitignorein your folder. -
Add these lines to it:
View Quick Reference (PDF) Includes branching workflows and a list of Git "Don'ts".
13. Resource Efficiency¶
Goal: Stop guessing how much RAM you need. Requesting 100GB and using 2GB kills your priority and wastes cluster space.
The Audit Tool: seff
After a job finishes, run this command to see what actually happened.
What to look for:
- Memory Utilized: If you requested 64GB but used 4GB, lower your request to 6GB or 8GB next time.
- CPU Efficiency: If this is low (< 50%), your code isn't using the threads you asked for. Request fewer cores.
The "Sinteractive" Test Don't write a 50-line script and hope it works. Test interactively first.
14. Job Arrays¶
Goal: Process 100 samples in parallel using ONE script. Never write a for loop to submit jobs.
The Logic
Slurm launches multiple copies of your script. In each copy, the variable $SLURM_ARRAY_TASK_ID changes (1, 2, 3...). You use this number to pick which file to process.
The Template
Create a file named samples.txt listing your input filenames (one per line).
Common Array Commands
sbatch --array=1-100 script.sh: Submit 100 tasks.sbatch --array=1-100%20 script.sh: Submit 100, but only run 20 at a time (be nice to the queue).scancel <jobid>_<taskid>: Cancel just one specific task in the array.
View Quick Reference (PDF) Includes detailed syntax for GPU jobs and parameter sweeps.
15. Getting Help¶
Goal: Unblock yourself quickly by choosing the right channel.
-
Peer Support
Best for "How do I run X?" or "Why does this plot look weird?" Join RCAC Genomics Discord
-
Tutorials
Step-by-step guides for common bioinformatics tasks. Visit RCAC Bioinformatics Docs
-
Announcements
Subscribe to the Bioinformatics Mailing List to catch upcoming workshops and events.
-
System Issues
Best for "The node crashed" or "I can't log in." Email: rcac-help@purdue.edu
16. Writing Effective Support Tickets¶
Goal: Get your problem fixed in one email, not ten.
The "Good Ticket" Template
When emailing rcac-help@purdue.edu, include these four things to skip the "back-and-forth" phase:
- The Cluster: (e.g., Bell, Anvil, Negishi)
- The Job ID: (e.g.,
12345678) -- Support staff can look up exactly why it failed. - The Error: Paste the exact error message or attach the
.errlog file. Don't just say "it failed." - The Path: Where is the script/data located? (e.g.,
/scratch/bell/user/project/run.sh)
Example Email:
Email Example
Subject: Job 12345678 failed on Bell - "Segmentation Fault"
Hi,
My job 12345678 on Bell failed immediately after starting. Path:
/scratch/bell/aseethar/project_x/submit.shError: The log file showsSegmentation fault (core dumped)when loading thesamtoolsmodule.I have attached the slurm output file. Can you help?