A major part of SalvusFlow is its remote job execution framework. Every time a simulation is run with SalvusFlow, that framework is used. In such cases the site at which you want to run a simulation has to be specified.
A site is a set of configuration parameters which describe how to run SalvusCompute on a local or remote machine. Every site must have a unique name.
Most of the Salvus tutorials use a site called "local" with a local
site
type. We recommend to set this up as well if you want to follow along the
tutorials.
It is an obvious choice which site type is suitable for any given machine. Salvus currently supports the following site types. Please contact Mondaic if your cluster's job management system is not listed.
local
: Runs simulations on the same machine as SalvusFlow. This is the
only site type that does not use SSH.ssh
: For simulations on remote machines/work stations without a job
queuing system. Uses SSH for remote communication.slurm
: For clusters with the slurm job submission system. Uses SSH for
remote communication.pbs
: For clusters with the PbsPro job submission system. Uses SSH for
remote communication.lsf
: For clusters using the IBM Spectrum LSF job management system. Uses
SSH for remote communication.SSH configuration is sometimes a major hurdle for people who don't regularly use it, thus we have a separate page to help with that.
There are three recommended ways to add a new site:
SalvusCompute
if necessary. Start it by executingsalvus-cli add-site
Use the Salvus Configuration Builder.
The other recommended option it to have a look at our library of example site configurations and manually copy and these to the config TOML file and adjust them to your system.
All sites are defined in a global TOML
configuration file. The exact location of this file is system dependent but
can be queried with $ salvus-cli print-config-paths
. A convenient way to
edit this file is to call
salvus-cli edit-config
which will open the configuration file with your preferred editor (specified
with the $EDITOR
environment variable).
After the configuration has been edited you have to initialize the site by calling
salvus-cli init-site SITE_NAME
This command will run all kinds of tests of the configuration to make sure it is correct. If something goes wrong it offers extensive debugging output to pinpoint the issue. Once the site initialization has been successful the site is ready to be used. Any time a site is updated or changed it has to be initialized again!
A few more involved topics like SSH setup, MPI, and dark sites are detailed in the advanced section of the installation manual.
Most parameters, together with the provided comments/documentation should be
self-explanatory. The run_directory
and tmp_directory
parameters warrant
further explanation. Both of them specify directories at the local or remote
site that will be managed by SalvusFlow.
run_directory
: Every job run on this site will get its own directory.
SalvusFlow will use that directory to store all inputs and most output
files there.
tmp_directory
: Every job that produces a lot of output (e.g. volumetric
data output or checkpoints for adjoint simulations) will get a folder in
this directory, as these special output files are often orders-of-magnitude
larger in size than standard output files. Many HPC systems have multiple
file systems: one which stores a limited amount of user data (e.g. where
your "home" directory is located), and one which can performantly store
large quantities of data, but which also comes with no guarantee of file
persistance (e.g. where your "scratch" directory is located). In these
cases we recommend to point the tmp_directory
parameter to a folder on
the latter file system, keeping in mind that such data may be cleared from
time-to-time by the system's maintenance routines.
Note that both folders must be read- and writeable from the compute nodes.
Scenario A: Single filesystem, keep everything in same folder.
run_directory = "/path/to/salvus_data/run" tmp_directory = "/path/to/salvus_data/tmp"
Scenario B: Single filesystem. Use actual /tmp
directory for the large
files. Please keep in mind that the /tmp
directory is cleared upon restart
on many systems.
run_directory = "/path/to/salvus_data/run" tmp_directory = "/tmp/salvus_tmp"
Scenario C: One smaller filesystem for most files, another large scratch space for the potentially very large other files.
run_directory = "/path/to/salvus_data/run" tmp_directory = "/scratch/path/to/salvus_data/tmp"