SalvusFlow Configuration

What is a Site?

A major part of SalvusFlow is its remote job execution framework. Every time a simulation is run with SalvusFlow, that framework is used. In such cases the site at which you want to run a simulation has to be specified.

A site is a set of configuration parameters which describe how to run SalvusCompute on a local or remote machine. Every site must have a unique name.

Most of the Salvus tutorials use a site called "local" with a local site type. We recommend to set this up as well if you want to follow along the tutorials.

Supported Site Types

It is an obvious choice which site type is suitable for any given machine. Salvus currently supports the following site types. Please contact Mondaic if your cluster's job management system is not listed.

  • local: Runs simulations on the same machine as SalvusFlow. This is the only site type that does not use SSH.
  • ssh: For simulations on remote machines/work stations without a job queuing system. Uses SSH for remote communication.
  • slurm: For clusters with the slurm job submission system. Uses SSH for remote communication.
  • pbs: For clusters with the PbsPro job submission system. Uses SSH for remote communication.
  • lsf: For clusters using the IBM Spectrum LSF job management system. Uses SSH for remote communication.

SSH configuration is sometimes a major hurdle for people who don't regularly use it, thus we have a separate page to help with that.

Adding a New Site

There are three recommended ways to add a new site:

  1. The second would be to launch an interactive wizard which will ask you a number of questions to initialize a site. This will also offer to download and install SalvusCompute if necessary. Start it by executing
salvus-cli add-site
  1. Use the Salvus Configuration Builder.

  2. The other recommended option it to have a look at our library of example site configurations and manually copy and these to the config TOML file and adjust them to your system.

How to Configure SalvusFlow?

All sites are defined in a global TOML configuration file. The exact location of this file is system dependent but can be queried with $ salvus-cli print-config-paths. A convenient way to edit this file is to call

salvus-cli edit-config

which will open the configuration file with your preferred editor (specified with the $EDITOR environment variable).

After the configuration has been edited you have to initialize the site by calling

salvus-cli init-site SITE_NAME

This command will run all kinds of tests of the configuration to make sure it is correct. If something goes wrong it offers extensive debugging output to pinpoint the issue. Once the site initialization has been successful the site is ready to be used. Any time a site is updated or changed it has to be initialized again!

Advanced Configuration

A few more involved topics like SSH setup, MPI, and dark sites are detailed in the advanced section of the installation manual.

Run & Tmp Directories

Most parameters, together with the provided comments/documentation should be self-explanatory. The run_directory and tmp_directory parameters warrant further explanation. Both of them specify directories at the local or remote site that will be managed by SalvusFlow.

  • run_directory: Every job run on this site will get its own directory. SalvusFlow will use that directory to store all inputs and most output files there.

  • tmp_directory: Every job that produces a lot of output (e.g. volumetric data output or checkpoints for adjoint simulations) will get a folder in this directory, as these special output files are often orders-of-magnitude larger in size than standard output files. Many HPC systems have multiple file systems: one which stores a limited amount of user data (e.g. where your "home" directory is located), and one which can performantly store large quantities of data, but which also comes with no guarantee of file persistance (e.g. where your "scratch" directory is located). In these cases we recommend to point the tmp_directory parameter to a folder on the latter file system, keeping in mind that such data may be cleared from time-to-time by the system's maintenance routines.

Note that both folders must be read- and writeable from the compute nodes.

Scenario A: Single filesystem, keep everything in same folder.

run_directory = "/path/to/salvus_data/run"
tmp_directory = "/path/to/salvus_data/tmp"

Scenario B: Single filesystem. Use actual /tmp directory for the large files. Please keep in mind that the /tmp directory is cleared upon restart on many systems.

run_directory = "/path/to/salvus_data/run"
tmp_directory = "/tmp/salvus_tmp"

Scenario C: One smaller filesystem for most files, another large scratch space for the potentially very large other files.

run_directory = "/path/to/salvus_data/run"
tmp_directory = "/scratch/path/to/salvus_data/tmp"