ARC Tutorial for Python Users
0. Introduction to ARC
The Advanced Research Computing (ARC) service provides access to High Performance Computing resources, support, and advice to researchers within the University of Oxford. There is an extensive documentation on how to use ARC :
And some trainings are regularly organized : Training. The aim of this tutorials is to provide a short document that should help Python users to quickly start using ARC. The documents cited above are the reference documents, with all the details and regular update. The tutorial is written from the perspective of a Mac user, Linux users should be able to follow the same instructions but Windows users should refer to the official documentation.
At the centre of the ARC service are two high performance compute clusters - arc and htc.
- arc is designed for multi-node parallel computation
- htc is designed for high-thoughput operation (lower core count jobs).
htc is also a more heterogeneous system offering different types of resources, such as GPU computing and high memory systems; nodes on arc are uniform. Users get access to both both clusters automatically as part of the process of obtaining an account with ARC, and can use either or both.
For more detailed information on the hardware specifications of these clusters, see the ARC User guide
The ARC workflow is described in the following scheme :
When you log into ARC you log into a Login Node. From there you :
- Copy files to/from ARC (see section 3). These files are stored in some directory on the Shared disk, only you can access your data but the amount of storage available is shared.
- Prepare and submit jobs (see section 4) and access the results after job(s) are completed
The Management Nodes manage the job queue and decide when and where to start a job. Note that only the Management Nodes have access to the Compute nodes, in particular, you should never run code on your Login node.
Using ARC requires a bit of set up the first time you use it. In order to keep this tutorial as short as possible the setup part is presented in Appendix. Once everything is set up, everything you need to know is explained in the following four sections.
1. Logging in to ARC
Before you start, make sure you have access to ARC (see Appendix).
To access the ARC cluster from a Mac we will use ssh. This will only works if you are on the University of Oxford network. If you are not on the University network, you should use the University VPN.
To connect to the ARC cluster (large parallel jobs):
ssh abcd1234@arc-login.arc.ox.ac.uk
To connect to the HTC cluster (single-core to single-node jobs):
ssh abcd1234@htc-login.arc.ox.ac.uk
Where abcd1234
is your actual ARC username, which should be your Oxford SSO. If you have not set up an SSH key (see Appendix), you will be prompted to enter your password.
Upon logging in, you will be placed in your ARC $HOME
directory.
2 ARC overview
2.0 Basic Linux command
ARC operates on Linux. If you are unfamiliar with Linux, don’t worry—you only need basic knowledge. Here are some essential Linux commands that will help you navigate and manage files on the ARC cluster:
cd
: Changes the current directory. Usecd /path/to/directory
to move into a directory andcd ..
to go up one level. Usingcd
without arguments returns to your$HOME
directory.ls
: Lists files and directories in the current location. Options includels -l
for detailed information,ls -a
to show hidden files, andls -lh
for human-readable file sizes.rm
: Deletes files. Userm filename
to remove a file. Be careful, as deletion is permanent.rm -r
: Recursively deletes directories and their contents. Userm -r directory_name
to delete a directory.cat
: Displays the content of a file. Usecat filename
to view a file’s content.
If you want to know more about Linux, plenty of resources are available online. The ARC teams recommend this tutorial.
2.1 $HOME directory
If you ever need to go back to your $HOME
directory, juste type
cd $HOME
You can create folders in your $HOME
directory to organise your programs and scripts. Do not run code or compile programs directly in your $HOME
directory. The correct way to execute code on ARC is by submitting a job (see below). Your $HOME
directory has a storage limit of 20GB, making it unsuitable for storing large datasets. If you ever want to check how many storage you have left you can type :
myquota
2.2 $DATA directory
Another key directory is $DATA
, which is intended for job outputs and large datasets. To go there type
cd $DATA
The $DATA
directory provides 5TB of shared storage for your project. Use it for storing job outputs and large datasets for analysis. To check how many storage you have left, use the same command as before myquota
.
3. Transferring Files
We will frequently need to transfer files to and from ARC. There are several ways to do that, here are the two way I use most of the time :
3.1 Using Github
A convenient way to transfer code files to ARC is via GitHub. If you already have a GitHub repository with your code, you can clone it into your $HOME
directory and keep it synchronized.
To use GitHub from the command line, you must set up SSH authentication (see Appendix). Once SSH is set up, clone repositories using:
git clone git@github.com:username/repository.git
The SSH address can be found on your repository’s GitHub page under the ‘Code’ button. This downloads the repository to your ARC workspace. You can then navigate into the repository using cd repository
.
Here are some essential Git commands:
git pull
: Updates your local repository with the latest changes from GitHub.git commit -m "Your message"
: Saves staged changes with a message.git push
: Uploads committed changes to GitHub.
You will probably only pull from ARC and do all modification of code on your personal computer. Plenty of resources are available online to learn Git and GitHub, see for instance:
3.2 Using scp
While very convenient, GitHub imposes a file size limit (typically 100MB per file), making it unsuitable for transferring large datasets to ARC. To copy big files to ARC, one can use scp
scp local_file.extension abcd1234@arc-login.arc.ox.ac.uk:/path/to/destination/
To copy files from ARC:
scp abcd1234@arc-login.arc.ox.ac.uk:/path/to/file.extension local_destination/
If you’re not sure of the exact destination path on ARC, log in to ARC, navigate to the desired directory, and run pwd. This command will print the full path to the directory, which you can use with scp
To copy an entire directory (including its contents), use the -r
(recursive) option. For example:
scp -r abcd1234@arc-login.arc.ox.ac.uk:/path/to/directory local_destination/
4. Running Jobs on ARC
4.1 SLURM
ARC is a shared computational resource among all departments of the University of Oxford. Since this involves many users, a resource manager is needed to properly distribute the available computing power. ARC uses SLURM (Simple Linux Utility for Ressource Management).
Instead of running code directly in the command line as you would on your personal computer, you must write a SLURM submission script (written in Bash). This script specifies details such as the job name, estimated runtime, required CPU cores and memory, and the code to execute. Then, SLURM will allocate you job a number and a place in its queue. Your code will run when it reaches the top of the queue.
Several factors influence job priority in the queue, including submission frequency, job duration, and the requested computational resources. Some research groups can purchase credits to gain higher priority in the queue. You can find more information on the SLURM Reference Guide.
4.2 An example of submission script
Here is an example of a submission script that you can copy and paste and modify for your specific needs, you can also download it here, lets go through it:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --mem=96000
#SBATCH --time=48:00:00
#SBATCH --partition=long
#SBATCH --job-name=my_python_job
#SBATCH --mail-type=ALL
#SBATCH --mail-user=name.surname@college.ox.ac.uk
#SBATCH --account=name_of_the_project
# Store original directory and print useful environment variables
ORIG=$(pwd)
echo "TMPDIR: $TMPDIR"
echo "SCRATCH: $SCRATCH"
# Set up directories
SCRATCH_DIR="$TMPDIR/run_$SLURM_JOB_ID"
mkdir -p "$SCRATCH_DIR"
mkdir -p "$SCRATCH_DIR/output"
DEST="$DATA/run_$SLURM_JOB_ID"
mkdir -p "$DEST"
cd "$SCRATCH_DIR" || exit 1
echo "Current directory: $(pwd)"
# Copy input files and program to scratch directory
SRC_DIR="$ORIG/path/to/code/src"
cp -r "$SRC_DIR" "$SCRATCH_DIR"
# in the background, touch files every 6 hours so they’re not deleted by tmpwatch
while true ; do sleep 6h ; find . -type f -exec touch {} + ; done &
# Load Anaconda module
module load Anaconda3
# Create or activate Conda environment
export CONPREFIX=$DATA/envname
source activate "$CONPREFIX" || { echo "Failed to activate Conda environment!"; exit 1; }
# Install required packages
conda install pip
pip install -r "$SCRATCH_DIR/src/requirements.txt" || { echo "Failed to install dependencies!"; exit 1; }
# Print job details
echo "Running on host: $(hostname)"
echo "Scratch directory: $SCRATCH_DIR"
echo "Output directory: $DEST"
# Run Python script
SCRIPT="$ORIG/path/to/code/script.py"
cp -r "$SCRIPT" "$SCRATCH_DIR"
python script.py --output-dir "$SCRATCH_DIR/output" --data-dir "$DATA/dataset/data.npz" || { echo "Python script execution failed!"; exit 1; }
# Copy output files back to the destination directory
cp -r "$SCRATCH_DIR/output/"* "$DEST" || { echo "Failed to copy output files!"; exit 1; }
# Clean up scratch directory
rm -rf "$SCRATCH_DIR"
echo "Job completed successfully."
4.3 Breakdown of the submission script : SLURM instruction
The script starts with SLURM directives, which specify the computational resources required for the job.
- Ask for cores and memory :
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --mem=96000
When requesting ARC cores and memory, it is important to know: Each ARC compute node has 48 cores and 380G RAM, so about 8G RAM per core. So, for one node do not request more than 48 cores, and do not request more memory than 8G times the number of cores.
On HTC you can ask for high memory nodes and GPU, More details are available in the ARC User Guide.
- Partitions
#SBATCH --time=48:00:00 #SBATCH --partition=long
The clusters have the following time-based scheduling partitions available :
short
: default run time 1hr, maximum run time 12hrs.medium
: default run time 12hrs, maximum run time 48hrs.long
: default run time 24hrs, no run time limit.devel
: maximum run time 10 minutes - for batch job testing onlyinteractive
: maximum run time 24hrs, for pre/post-processing.
Jobs in the short and medium partitions have higher scheduling priority than those in the long partition, but they are restricted by their respective maximum run times. Interactive jobs function differently; see the ARC User Guide for more information
- Basic information
#SBATCH --job-name=my_python_job #SBATCH --mail-type=ALL #SBATCH --mail-user=name.surname@college.ox.ac.uk #SBATCH --account=name_of_the_project
These directives provide basic information about the job. The ‘mail’ directives will trigger an email alert when a job begins, finishes, or fails.
4.4 Breakdown of the submission script : other shell commands
The other shell commands say what to do in job. At the beginning of a job some temporary directory will be created :
$TMPDIR
: local directory accessible only by a compute node.-
$SCRATCH
: shared file system available to all nodes in a job. - Setting up directories
We start by setting up directories in the local directory $TMPDIR
. The $SLURM_JOB_ID
is a unique ID assigned by SLURM to the job.
# Set up directories
SCRATCH_DIR="$TMPDIR/run_$SLURM_JOB_ID"
mkdir -p "$SCRATCH_DIR"
mkdir -p "$SCRATCH_DIR/output"
Then we set up a directory in $DATA
where we will store our result at the end of the job.
DEST="$DATA/run_$SLURM_JOB_ID"
mkdir -p "$DEST"
And we move to our working directory :
cd "$SCRATCH_DIR" || exit 1
echo "Current directory: $(pwd)"
Finally, we copy our source code from $HOME
(or any directory where we start the job) to $SCRATCH_DIR
.
SRC_DIR="$ORIG/path/to/code/src"
cp -r "$SRC_DIR" "$SCRATCH_DIR"
- Deals with
tmpwatch
ARC has an automatic system called tmpwatch
, which removes files that have not been accessed for a certain period. If you run very long jobs, tmpwatch
may delete the first files created before your job completes, preventing you from accessing them.
To prevent this, we add the following line:
# in the background, touch files every 6 hours so they’re not deleted by tmpwatch
while true ; do sleep 6h ; find . -type f -exec touch {} + ; done &
- Python environment
In order to use Python, we will first have to set up a Python environment (see Appendix or the ARC User Guide). Once this is done we can proceed as follow to set up the environment :
# Load Anaconda module
module load Anaconda3
# Create or activate Conda environment
export CONPREFIX=$DATA/envname
source activate "$CONPREFIX" || { echo "Failed to activate Conda environment!"; exit 1; }
# Install required packages
conda install pip
pip install -r "$SCRATCH_DIR/src/requirements.txt" || { echo "Failed to install dependencies!"; exit 1; }
The requirements.txt file should contain all the packages required to run your code.You can generate a requirements.txt file by running pip freeze
from your local environment on your personal computer.
- Run your Python script
You can then proceed to copy your Python script to the working directory and then run it :
# Run Python script
SCRIPT="$ORIG/path/to/code/script.py"
cp -r "$SCRIPT" "$SCRATCH_DIR"
python script.py --output-dir "$SCRATCH_DIR/output" --data-dir "$DATA/dataset/data.npz" || { echo "Python script execution failed!"; exit 1; }
Note that your script is running on a temporary directory that has been created at the beginning of the job and will be erased at its end. This means that if you want to save some output it must be done in the $DATA
directory. Similarly, if you want want to proceed some data, your script needs to seek for them in the $DATA
directory. That’s why we give these two directories as input of the script. On the python side you can provide as follow to get and use this two path :
# Parse command-line arguments
parser = argparse.ArgumentParser(description="Run ARC Code")
parser.add_argument("--output-dir", type=str, required=True, help="Output directory for results")
parser.add_argument("--data-dir", type=str, required=True, help="Input data directory")
args = parser.parse_args()
# Get the output directory from command-line arguments
output_dir = args.output_dir
data_dir = args.data_dir
where we use the library argparse
4.5 Submit and manage your script
Once your submission script is written, you can submit it with:
sbatch job_script.sh
You should get a response:
submitted bash job <JOB_ID>
To check the status of all your jobs, type:
squeue -u abcd1234
To cancel a job:
scancel JOB_ID
To see the current output:
cat slurm-JOB_ID.out
Appendix : How to set up ARC
Get back to Section 0
A. Accessing the ARC Cluster
A.0 Getting an ARC account
To access and use ARC you need to be attached to a project, any academic of the University of Oxford can create a project. Once a project is created you can apply to get an ARC account here. The project manager will have to validate your application, and once this is done you will receive an email with your username and a separated email with a temporary password.
A.1 Logging in to ARC for the first time
To access the ARC cluster from a Mac we will use ssh. This will only works if you are on the University of Oxford network. If you are not on the University network, you should use the University VPN.
To connect to the ARC cluster:
ssh abcd1234@arc-login.arc.ox.ac.uk
Where abcd1234
is your actual ARC username, which should be your Oxford SSO. When you log in you are on your $HOME
directory,You are the only one who has access to this directory and you can store up to 20GB into it. The other important directory is $DATA, where 5TB is available to be shared among all project members (for more detail see Section 2).
A.2 Changing your password
The first thing to do is to change your password, for that type
passwd
You will be prompted to enter your current password, followed by the new password. After entering it twice, your password will be updated.
Get back to Section 1
A.3 SSH Key Setup (Optional)
To avoid entering your password every time you connect to a remote system via SSH, you can set up SSH key-based authentication. Here’s how you can do it:
- Generate an SSH key (if you don’t have one):
First, create an SSH key pair by running on you local computer the following command. You can replace “ASecretSentences” with any comment you like :
ssh-keygen -t ed25519 -C "ASecretSentences"
This will generate a private and public key pair in the default ~/.ssh/ directory
. You can accept the default file location or specify a different one when prompted.
- Copy the public key to ARC: Once the key pair is generated, copy the public key to your ARC home directory using the following command
ssh-copy-id abcd1234@arc-login.arc.ox.ac.uk
This command will prompt you to enter your password. Afterward, your public key will be added to the
~/.ssh/authorized_keys
file on the remote server, and you’ll be able to log in without entering a password in the future. - Test your setup: After the public key is copied, you can test the setup by connecting to the remote server :
ssh abcd1234@arc-login.arc.ox.ac.uk
If everything is set up correctly, you should be logged in without needing to enter your password.
Get back to Section 1.
B. Setting up a Github repository
To clone a GitHub repository to ARC, follow these steps:
- Ensure SSH Key Setup: Make sure you’ve set up SSH key authentication on ARC, as described in the previous section.
- Copy Your Public Key to GitHub: Add your public SSH key (from
~/.ssh/id_ed25519.pub
) to your GitHub account by going to Settings > SSH and GPG keys and pasting the key there. - Clone the Repository: Once your SSH key is linked to GitHub, you can clone the repository using the SSH URL. Run the following command on ARC:
git clone git@github.com:username/repository.git
The SSH address can be found on your repository’s GitHub page under the ‘Code’ button. This downloads the repository to your ARC workspace. You can then navigate into the repository using
cd repository
.
Get back to Section 3.1.
C. Setting Up a Python Virtual Environment
Setting up a Python Virtual Environment on ARC is a bit trickier, everything is explained in detail in the guide Using Python on ARC. Here is a summary of the important steps.
C.1 Interactive Session
You will first need an interactive session. To request an interactive session, when logging into ARC, run:
srun -p interactive --pty /bin/bash
Now we can start the set up of our virtual environment
C.2 Virtual Environment
We will use Anaconda, the available Anaconda version can be found by typing
module spider anaconda
To load the version of Anaconda you want, in this example we are using the latest version, use one of the following commands:
module load Anaconda3
or one of the specific Anaconda versions shown by module spider.
Once the module is loaded you can use the conda commands to create a virtual environment in your $DATA area. For example to create an environment named myenv
in $DATA
we can use the following commands:
export CONPREFIX=$DATA/myenv
conda create --prefix $CONPREFIX
ou can now use (activate) the environment by running one of the following commands:
source activate $CONPREFIX
You can then install package as usual with pip
by typing
pip install numpy
to install several package at the same time you can use a requirement file :
pip install requirement.txt
That could have been created by running pip freeze
in your local environment on your personal computer.
C.3 Remove Conda cache
By default Anaconda will cache all packages installed using pip install
into a directory in your $HOME
area named ~/.conda/pkgs
before installing them into your virtual environment. Over time this has the potential to put you over quota in $HOME
. If you find yourself over quota in $HOME
check how much space is being used in ~/.conda/pkgs
cd ~/.conda
du -sh pkgs
If the cache is indeed what puts you over quota, you can clean it by using
module load Anaconda3
conda clean --packages --tarballs
Get back to Section 4.4.