Important: Where to Store data!
This document hopes to address confusion regarding where to store ones files to keep the $HOME directories from filling up.
Quick Version:
Every user has two areas on the HPC to work in: $HOME and $WORK. When logging into HPC, you are in your personal $HOME directory. This $HOME directory is the perfect place for storing programs, source code, scripts, configuration files, and other such small files. $HOME has 10 terabytes which is more than enough for these types of files, but chances are high that you are using HPC to analyze or generate very large amounts of data. This is what $WORK is for, and it has 217 terabytes available.
What exactly are $HOME and $WORK and why do they have dollar signs? The dollar signs signify that they are variables. The actual value of $HOME and $WORK is unique to each user, and when you use the variables $HOME and $WORK, the system will know exactly where to go for your specific user.
If you need to find out the actual directory path associated with $WORK, such as for navigating to it in your graphical file transfer program, you can check it out at the command line on HPC.
Code: [ekrell@hpcm ~]$ echo $WORK
/work/GenomicSamples/ekrell
By using the command "echo $WORK", I learn that my $WORK directory is located at " /work/GenomicSamples/ekrell "
Very often, ensuring that the output of your jobs are placed into $WORK is easy. You just need to specify "cd $WORK" after "#!/bin/bash" in your SLURM script.
More details:
There are two main storage areas.
$HOME
Each user has a personal home folder. This is where you automatically end up whenever you log into the system.
This is the best place for storing your programs, scripts, source code, personal notes, and configuration files. This directory is also periodically backed up to keep your data safe.
However, it is limited to 10TB. This is more than enough for storing code, but may not be able to handle the output of statistical tests, simulations, and other producers of large data sets.
If you want to use reference a location in this directory within your programs, use the variable " $HOME " - this environment variable allows the system to interpret it as the home of whoever is running the code.
To get back $HOME, use the command
Code: cd
$WORK
Big data has to go somewhere, and this is that place. With a limit of 217TB, there should be plenty of space for large input or output files.
This directory is not backed up, so transfer wanted data to your own machine when finished.
In your scripts, change to this directory before computing.
To get to $WORK, use the command
Code: cd $WORK
Note for R users:
These variables, $HOME and $WORK, are part of the shell environment. They are not directly part of R, but R has a function to convert shell variables into the string that they refer to.
For example, suppose I want to set the working directory to $WORK before running a simulation. I would use the following to set the working directory to the $WORK directory:
Code: setwd(Sys.getenv(c("WORK")))