Getting access to the Misha HPC
As part of computational research, we will need to dispatch jobs to high-performance computing (HPC) clusters. Access to these clusters gives us stronger power for computationally-intensive tasks like training models. Running the same process on your local machine may take much more time.
To get started visit this page for some info about the HPC prepared by the Yale Center for Research Computing (YCRC) specifically for the Wu Tsai Institute. Please also review this github repository with tutorials made by Ms. Ping Luo, a senior staff member who works in our center to manage cluster use. It has more comprehensive information than the preliminary tutorial below.
Instructions
- Fill out the form to access the cluster, noting Dan as the PI from which to get access. You should get access in ~48 hours.
- Receive email from
hpc@yale.eduwith your username and instructions on how to login. - Choose login method.
- Access to Misha needs to be done through Secure Shell (SSH).
- Other clusters have a web interface for logging on (Bouchet, Grace, McCleary, and Milgram). Find the links for the platform, called Open OnDemand (OOD), here.
- (First time login only) Generate your public and private ssh keys. These keys are used to authenticate you during the remote login (i.e. they tell the cluster that you're you.). Keep the private key a secret!
- Instructions on how to do this can be found here.
- Note: the example in the documentation uses the RSA encryption scheme, but using the
ssh-keygencommand on Macs without any additional argument will use the Ed25519 encryption scheme. The key still works and will be stored in something likeid_ed25519.pub.- Sidebar from Viggy: If you're interested in math, I'd highly recommend reading up on RSA. The RSA public/private key method works beause it's easy to multiply two large prime numbers together, but extremely hard to factor that product back into those primes!
- (First time login only) Upload your public key to the SSH Key Uploader. This allows the cluster to associate it with your netID, which you use to sign in.
- Login to the cluster.
- Use the Terminal:
- Type
ssh YOUR_NET_ID@misha.ycrc.yale.edu. It will prompt you for your passcode. - After you provide it, it will ask you choose a second authentication option.
- Type
1to authenticate via a DUO push notification.
- Type
- Use your IDE:
- Open VSCode, and click the blue button in the bottom left corner of the screen. It looks like two chevron arrows pointing to each other.
- Click SSH, and Add New SSH Host
- Type
ssh YOUR_NET_ID@misha.ycrc.yale.edu - If it asks you for a config file, select the one with the
.ssh/path (often the first option). This tells VSCode where to look for your private key (which is still protected by your passcode). - Enter your passcode when prompted.
- Type
1to send a DUO push notification, and accept it.
- Use the Terminal:
- Submit jobs!
- Type
exitto close the connection.
About the HPC
- Consists of multiple groups of computers called nodes.
- Login node is shared between all users; handles all user logins and is usually excluded from running actual code jobs
- Compute nodes which are the majority of all the computers in the HPC; where the tasks are performed
- Can run both interactive and batch jobs on compute notes. Interactive jobs are processes in which you can interactively run programs on the computer, useful for debugging and/or coding. Batch jobs are non-interactive jobs that are run by the node and returned back to you. These can be parallelized, and will also run regardless of whether you are logged in or not.
- You can have at most 4 interactive sessions at once.
- Transfer nodes; used for transferring files, accessed via
ssh transfer
- HPC has multiple "partitions", which are used for different purposes. There are also special use nodes.
devel: default for interactive jobsday: default for batch jobsweek: default for long jobs (>24 hours)gpu: nodes with GPU accessbigmem: nodes for jobs with large RAM/memory requirementsmpi: for highly-parallelized codepi_NAME: PI and lab specific nodes available for purchace from YCPC
Cheat Sheet
- Interactive jobs:
sallocto submit interactive job. Flags:-por--partition=(defaultdevel/interactive)-tor--time=(DD-HH:MM:SS or DD-HH time limit)--mem-per-cpu=(default 5gb per cpu)
module loadto load common software- also
module listto list available software module purgeto remove all currently loaded software
- also
Example code:
salloc -t 1:00:00
module load miniconda
conda create -n env_name python=3.9 jupyter pandas
conda activate env_name
conda install pkg1 pkg2
module load miniconda
conda activate env_name
-
Batch jobs:
sbatchto submit batch job. Flags:-jor--job-name=(job name)-oor--output=(output file name)--mail-user(email address to receive alerts about job completions, default: Yale address)--mail-type=ALL(receive email notifs at beginning and end of job)
squeue --me(get status of all your submitted jobs)seff JOBID(get job stats when done e.g. CPU usage, time run)scancel JOBID(cancel job)htop -u NETID(view all current processes under your name)
-
Misc commands:
getquota(see remaining storage)dsqfor large numbers of identical jobs
Example code:
#!/bin/bash
#SBATCH -J example_job
#SBATCH -p dat
#SBATCH -t 12:00:00
#SBATCH --mail-type=ALL
module purge
module load miniconda
conda activate my_env
python my_python.py
Example Use Case: Cloning Remote Repository
At first, attempting to clone a repository in the standard way (e.g. git clone https://github.com/LevensteinLab/Lab-Handbook.git) may not work. This is because GitHub doesn't know how to handle a request from a remote computer. You must first authenticate yourself. We can repeat the same process we used to authenticate ourselves for the cluster, but for GitHub, which offers the ability to add SSH keys.
- While logged into the cluster, again run
ssh-keygen. Click enter to accept the default directory for where the keys will be stored. - Choose a passphrase. You will need to remember this, as it provides access to your private key.
- Navigate to that directory and open the public key. This may look something like:
cd /gpfs/radev/home/vv266/.ssh/followed bycat id_ed25519.pub
- Copy that text and open your GitHub profile.
- Navigate to Settings > SSH and GPG keys > New SSH key. Then paste your public key into the box.
- Return to the HPC and type
ssh -T git@github.com.- After you type in your passphrase, you should see a message like
Hi vviggyy! You've successfully authenticated, but GitHub does not provide shell access.
- After you type in your passphrase, you should see a message like
- You should now be able to clone repositories into the HPC environment.
- When you want to do so, visit the repository website, click the green
<> Codebutton, and tap "SSH" (NOT the HTTPS button!) - Use this URL when cloning e.g.
git clone git@github.com:LevensteinLab/Lab-Handbook.git
- When you want to do so, visit the repository website, click the green
- Set up a conda environment like above to run it!
Reference:
- Introduction to HPC Clusters (YouTube Video) (1:16 hr workshop video going over SLURM and how to log-in)
- Getting Started page
- Documentation Page. Submit help requests or attend drop-in office hours (via Zoom) on Wednesdays at 11am-12pm
- Check out info on all clusters offered by the YCRC. They're named after famous academics.
- Any potential system outages can be checked here.
- Common SLURM commands for interacting with jobs and the scheduler
- YCRC HPC Policies (make sure to read this before requesting an account)