Version:

This documentation is not for the latest stable Salvus version.

This tutorial is presented as Python code running inside a Jupyter Notebook, the recommended way to use Salvus. To run it yourself you can copy/type each individual cell or directly download the full notebook, including all required files.
Copy
%matplotlib inline
# This notebook will use this variable to determine which
# remote site to run on.
import os

SALVUS_FLOW_SITE_NAME = os.environ.get("SITE_NAME", "local")
PROJECT_DIR = "project"

Lamb's problem

An accurate solution to the wave equation is a requirement for a wide variety of seismological research. In this tutorial, we will validate the accuracy of Salvus by comparing numerically calculated seismograms to semi-analytical solutions of Lamb's Problem in 2-D. In addition to giving us confidence in the synthetic data we will use in future tutorials, it also gives us a chance to gently learn some of the key features of the SalvusProject API.

Lamb's problem is concerned with the behavior of the elastic wave equation in the presence of a half-space bounded by a free-surface condition. In our solution we expect both direct arrivals and those reflected from the free-surface, along with a contribution from the 2-D Rayleigh wave. To validate the solutions generated with Salvus, we will compare our results with semi-analytical ones computed using EX2DDIR. We'll consider a half-space bounded at y=2000y=2000, and excite waves using a Ricker source with a center frequency of 15 Hz. This setup keeps compute times very low, while also allowing for a fair amount of wavelengths to propagate within our domain. To get started, let's first import all the Python tools we'll need.

import pathlib
import numpy as np
import salvus.namespace as sn

Initializing a project

Before we initialize our project, we'll first need to initialize the spatial domain to which our project corresponds. In this case we'll be using a simple 2-D box domain. Don't worry, we'll make things a bit more exciting in future tutorials. The box domain can easily be constructed from a set of two dimensional extents as shown in the cell below.

d = sn.domain.dim2.BoxDomain(x0=0.0, x1=2000.0, y0=0.0, y1=1000.0)

For a given project the domain we specify is immutable; its extents and characteristics are used to infer other information regarding our meshes, simulations, and data. With the simple domain definition given above, we're now ready to initialize our project. To do this we can use the Project.from_domain() constructor as show below. This function takes a path (which must not yet exist), and a domain object such as the one we just constructed.

# Uncomment the following line to delete a
# potentially existing project for a fresh start
# !rm -rf proj
if pathlib.Path(PROJECT_DIR).exists():
    print("Opening existing project.")
    p = sn.Project(path=PROJECT_DIR)
else:
    print("Creating new project.")
    p = sn.Project.from_domain(path=PROJECT_DIR, domain=d)
Creating new project.

If the cell executed without any problems, you should now see a folder in your current directory with the name proj. This is where all the relevant information relating to your project will be stored. Just so we can get a hang of the basic structure of a project, let's open up the folder in our file browser. Most operating systems will understand the command below, and will open the project folder in another window. Just uncomment the line for your operating system.

# On Mac OSX
# !open proj

# On Linux:
# !xdg-open proj

With our domain initialized and our project created, we can now go right ahead and start preparing some scientific data. The first thing we'll do with the project is add observed data to it. In this case our observed data corresponds to a semi-analytic solution to Lamb's problem, as described in the introduction. These data are stored in an HDF5 file named reference_data.h5 in the current directory.

Some data formats, such as ASDF or SEGY describe their data with associated headers. We'll see how to add these types of data in a later tutorial, but in this case we are just reading in raw waveform traces with little to no meta information. Because of this we'll need to assist Salvus a little and tell the project to what events this raw data refers to. This information is passed in the form of an EventCollection object which, at is most basic, is a data structure which relates lists of source definitions to lists of receiver definitions. These definitions can be in the form of pressure injections, force vectors, or GCMT moment tensors for sources, as well as pressure, velocity, or strain (etc.) sensors for receivers. In the coordinate system of the reference dataset which we'll add, we've placed a single vector source at the location (x=1000,y=500)(x=1000, y=500). This source can be defined with the help of the simple_config helper as in the cell below.

srcs = sn.simple_config.source.cartesian.VectorPoint2D(
    x=1000.0, y=500.0, fx=0.0, fy=-1.0
)

The data from this source was received at an array of 5 receivers at locations (x={1010,1110,1210,1310,1410},y=800)(x=\{1010, 1110, 1210, 1310, 1410\}, y=800). For these and other simple arrays of receivers, the simple_config helper allows us to define the set in one go.

recs = sn.simple_config.receiver.cartesian.collections.ArrayPoint2D(
    y=800.0, x=np.linspace(1010.0, 1410.0, 5), fields=["displacement"]
)

With our sources and receivers now defined, we can add the combination of them both to our project as an EventCollection object.

p += sn.EventCollection.from_sources(sources=[srcs], receivers=recs)

Note here the syntax we used. An EventCollection, along with several other relevant objects, can be added to a project by simply using the += operator. Once the object is succesfully added to the project it is then "serialized", or saved, within the project directory structure. The power and usefulness of this concept will become apparent in a later tutorial -- for now all you need to know is that the event collection is now officially a part of our project!

Now that we've defined a full "event", we can go ahead and add our "observed" data. We do this by explicitly associating the event with the appropriate data file. Since the event does not have a natural name, as it would in the case of an event gathered from the GCMT catalogue for example, the project has named it for us internally. Events are given numerical names of the form "event_xxxx", which correspond to the order in which they were added. Below we add the reference data to our project with the tag "reference", and associate it with the event we just created, or "event_0000".

p.waveforms.add_external(
    data_name="reference",
    event="event_0000",
    data_filename="./reference_data.h5",
)

Now that the data is added, we can do a quick visualization of its contents. For 2-D box domains we can choose to plot individual events as either a shotgather, or a collection or wiggles, or both! Try experimenting with the list passed to .plot() below to see how the different options look.

p.waveforms.get(data_name="EXTERNAL_DATA:reference", events=["event_0000"])[
    0
].plot(component="X", receiver_field="displacement")

All right, that's enough setup for now. Let's get going with some simulations of our own.

Defining a model

The analytical solution was computed in an unbounded homogeneous isotropic elastic medium with material parameters specified in SI units as (ρ=2200,vp=3000,vs=1847.5)(\rho = 2200, v_p = 3000, v_s = 1847.5). If you recall from the presentation this morning, a complete model definition in Salvus is made up of a combination of a background model, and a (possibly empty) collection of volumetric models. As the analytic solution was computed in a homogeneous medium, we don't need to concern ourselves with (2- or 3-D) volumetric models for now. So, the next step is to define our background model using the salvus model interface. Since no volumetric models are required, we only need the background model to complete our final full model configuration.

bm = sn.model.background.homogeneous.IsotropicElastic(
    rho=2200.0, vp=3000.0, vs=1847.5
)
mc = sn.ModelConfiguration(background_model=bm, volume_models=None)

Defining source parameters

Note that up until now we have not specified any information regarding the frequency content of the data we are planning on simulating, and in fact all the parameters we've specified have been frequency independent. This is deliberate, as it is often the case that information on material parameters are provided independent of frequency. The next step is to add a time-frequency axis to our project, which enters in the form of an EventConfiguration. Here, at a bare minimum, we need to specify what type of source wavelet we would like to model, as well provide some basic information about the temporal extent of our upcoming simulations. The reference data were computed with using a Ricker wavelet with a center frequency of 15Hz15 Hz and, looking at the traces plotted above, we can see that the data runs for a bit more than 0.5 seconds. These parameters are now used to define our EventConfiguration object.

ec = sn.EventConfiguration(
    wavelet=sn.simple_config.stf.Ricker(center_frequency=15.0),
    waveform_simulation_configuration=sn.WaveformSimulationConfiguration(
        end_time_in_seconds=0.6
    ),
)

To get a better sense of what our wavelet looks like in both the time and frequency domain, we can easily plot its characteristics in the cell below.

ec.wavelet.plot()

We quickly see that, while the center frequency of the wavelet was specified to be 15Hz15Hz, there is actually a fair bit of energy that exists at frequencies higher than this. Its important to design our simulations so that they properly resolve all the frequencies we are interested in.

Defining simulation parameters

The final step in defining a simulation is pulling together all the above into a single reproducible SimulationConfiguration. A SimulationConfiguration is a unique identifier that brings together the model, the source wavelet parameterization, and a proxy of the resolution of the simulation together. If you recall from the theoretical presentations earlier today, we are often satisfied with a simulation mesh comprised of 1 4th order spectral-element per simulated wavelength. The question then remains: given a broadband source wavelet, which frequency do we want to mesh for? The wavelet plot above gives us a clue: the vast majority of the energy in the current wavelet is contained at frequencies below 30Hz30Hz. For our first attempt at matching the analytic solution then, we'll require that our mesh be generated using one element per wavelength at a frequency of 30Hz30Hz. As you are probably becoming familiar with by now, we can add the relevant SimulationConfiguration to our project as below.

p += sn.SimulationConfiguration(
    name="simulation_1",
    max_frequency_in_hertz=30.0,
    elements_per_wavelength=1.0,
    model_configuration=mc,
    event_configuration=ec,
)
event = sn.EventCollection.from_sources(sources=[srcs], receivers=None)
w = p.simulations.get_input_files("simulation_1", events=event)
[2020-09-28 17:57:04,753] INFO - salvus.project.meshing: Creating mesh. Hang on.

Visualizing the configuration

So far, regarding our simulation, we have:

  • Defined a homogeneous material model
  • Defined a Ricker wavelet source
  • Set a resolution criterion

In fact, this is all we need to do! Before we actually run the simulation though, it can be helpful to get a visual overview of what is about to happen. Salvus project provides a small convenience function to visualize a SimulationConfiguration directly in the notebook, as below. This function takes a list of events as well, for the purpose of overplotting sources and receivers on the resultant domain. Let's have a look.

w[0][0]
/miniconda/envs/salvus/lib/python3.7/_collections_abc.py:702: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  return len(self._mapping)
/miniconda/envs/salvus/lib/python3.7/_collections_abc.py:720: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  yield from self._mapping
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.
<salvus.flow.simple_config.simulation.Waveform object at 0x7f843b5d8a50>
p.viz.nb.simulation_setup(
    simulation_configuration="simulation_1", events=["event_0000"]
)
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.
<salvus.flow.simple_config.simulation.Waveform object at 0x7f843b5daa10>

Feel free to experiement with the dropdown menus and buttons. This visualization can really help debug obvious issues. At this point those of you familiar with older versions of Salvus might be wondering: where did the mesh from? In SalvusProject the complexity of mesh generation is moved into the background, and is handled internally via a reference to the SimulationConfiguration object. While the benefits of this approach are small for small domains and homogeneous models, they will become much later when we consider 3-D models and domains with topography.

Running the solver

With everything ready to go, it's now time to run our first simulation! The launch_simulations command below takes a few arguments worth describing:

  • site_name: This is an identifier which tells Flow whether you're running on your local machine, some remote cluster, or perhaps the old chess computer in your grandfather's basement. As long as Salvus has been set up correctly on the specified site all data transfers to / from the local or remote machine will happen automatically. Additionally, if a job management system is present on the remote site Flow will monitor the job queue.
  • ranks_per_job: This is the number of MPI ranks the job will run on, and can range from 1 to whatever your license will allow.
  • events: A list of events for which to run simulations for.
  • simulation_configuration: The configuration for which to run simulations for.
p.simulations.launch(
    ranks_per_job=2,
    site_name=SALVUS_FLOW_SITE_NAME,
    events=p.events.list(),
    simulation_configuration="simulation_1",
    max_block_in_seconds=60.0,
)
[2020-09-28 17:57:06,110] INFO - salvus.project.components.simulation_component: Submitting job ...
Uploading 1 files...

🚀  Submitted [email protected]local
After 0.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
1

And that's it! The simulations are off and running. SalvusFlow will take care of abstracting the machine archcitecture, and SalvusProject will take care of saving all the output data into the correct location, copying it from any remote machines as necessary. We can get the current status of the simulations by calling query_simulations() as below.

p.simulations.query()
[2020-09-28 17:57:07,378] INFO - salvus.project.components.simulation_component: Retrieving output for 1 simulation(s) from site.
True

Since the simulations are so small, they should not take more than a few seconds to run regardless of the machine. once they are done, we can simply call the project.nb_compare() function to compare the computed data to a reference dataset of our choosing.

p.viz.nb.waveforms(
    ["EXTERNAL_DATA:reference", "simulation_1"], receiver_field="displacement"
)
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.

Hmm. While the basic shape and behaviour of the waveforms looks good, there are certainly some discrepancies between what we've computed and the reference solution. Now our goal is to see if we can get the signals to match perfectly within the time which they overlap.

Challenge 1: Why doesn't it match?

The most obvious issue with the signals above is that they differ greatly at later times. There's quite an obvious issue here: the reference solution was computed assuming an infinite domain, but the domain we defined is finite in extent. In fact this is the main issue: we need to add absorbing boundaries.

To preserve the stability of the wavefield solution in the presence of complex or anisotropic media, Salvus employs a two-stage approach to absorbing boundaries. First, we apply absorbing boundary conditions at the edge of the mesh as outlined here. These conditions provide good absorbing characteristics for wave impacting the boundary at close to normal incidence, and are sufficient for most cases. If a more substantial absorbing profile is desired, one can also pad the simulated domain with a damping layer. This approach follows that given in this paper. Adding damping layers are advantageous in that they can almost completely cancel any boundary reflections, but do require one to enlarge the computational domain and therefore increase the cost of the resultant simulations. We have found that damping layers provide a good quality / performance tradeoff when 3.5 or more wavelengths are present in the absorbing layer.

In previous versions of Salvus absorbing boundary attachment was unfortunately a manual and tedious process. Fortunately, we now provide an interface to automatically extend the domain in a more user-friendly manner. To activate this feature, we first need to set a few parameters to tell the simulation that we do indeed want a layer of extended absorbing boundaries.

abp = sn.AbsorbingBoundaryParameters(
    reference_velocity=3000.0,
    number_of_wavelengths=3.5,
    reference_frequency=15.0,
)

And now we just simply add a new simulation configuration with the boundaries to our project.

p += sn.SimulationConfiguration(
    name="simulation_2",
    max_frequency_in_hertz=30.0,
    elements_per_wavelength=1.0,
    model_configuration=mc,
    event_configuration=ec,
    absorbing_boundaries=abp,
)

Visualizing this new configuration, we can see that the mesh has been padded with absorbing boundaries on all sides but the free surface.

p.visualizations.nb.simulation_setup(
    simulation_configuration="simulation_2",
    events=p.events.list()[0],
)
[2020-09-28 17:57:07,777] INFO - salvus.project.meshing: Creating mesh. Hang on.
/miniconda/envs/salvus/lib/python3.7/_collections_abc.py:702: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  return len(self._mapping)
/miniconda/envs/salvus/lib/python3.7/_collections_abc.py:720: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  yield from self._mapping
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.
<salvus.flow.simple_config.simulation.Waveform object at 0x7f84579cc290>

Now, let's take another look at the simulated data and see how we've done.

p.simulations.launch(
    ranks_per_job=2,
    site_name=SALVUS_FLOW_SITE_NAME,
    events=p.events.list(),
    simulation_configuration="simulation_2",
    max_block_in_seconds=60.0,
)
[2020-09-28 17:57:08,160] INFO - salvus.project.components.simulation_component: Submitting job ...
Uploading 1 files...

🚀  Submitted [email protected]local
After 0.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.3 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.6 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.3 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.6 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.9 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.2 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.5 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.8 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 4.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 4.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 4.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 5.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 5.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 5.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 6.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
1
p.simulations.query()
p.viz.nb.waveforms(
    ["EXTERNAL_DATA:reference", "simulation_2"], receiver_field="displacement"
)
[2020-09-28 17:57:14,676] INFO - salvus.project.components.simulation_component: Retrieving output for 1 simulation(s) from site.
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.

Looking a lot better :).

Challenge 2: Why doesn't it still match?

The purists in the audience might be concerned that, while the signals now look a lot closer, there are still some minor differences. To remedy this, let's take another look at the spectrum of our source.

ec.wavelet.plot()

Indeed while there is almost no energy above 30Hz30Hz, there is still at least some. This is where the final differences are coming from. In the cell below, create a new simulation configuration which matches the data even better. Feel free to play with either the max_frequency_in_hertz parameter, or the elements_per_wavelength parameter, noting the increase or decrease in simulation time as you go.

p += sn.SimulationConfiguration(
    name="simulation_3",
    max_frequency_in_hertz=40.0,
    elements_per_wavelength=1.0,
    model_configuration=mc,
    event_configuration=ec,
    absorbing_boundaries=abp,
)
p.simulations.launch(
    ranks_per_job=2,
    site_name=SALVUS_FLOW_SITE_NAME,
    events=p.events.list(),
    simulation_configuration="simulation_3",
    max_block_in_seconds=60.0,
)
[2020-09-28 17:57:15,364] INFO - salvus.project.meshing: Creating mesh. Hang on.
[2020-09-28 17:57:15,412] INFO - salvus.project.components.simulation_component: Submitting job ...
Uploading 1 files...

🚀  Submitted [email protected]local
After 0.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.9 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.2 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.5 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
1
p.simulations.query()
p.viz.nb.waveforms(
    ["EXTERNAL_DATA:reference", "simulation_2", "simulation_3"],
    receiver_field="displacement",
)
[2020-09-28 17:57:17,354] INFO - salvus.project.components.simulation_component: Retrieving output for 1 simulation(s) from site.
When run interactively in a Jupyter Notebook, this output cell contains a widget. Click the button below to load a preview of it. Please note that most interactive functionality does not work without a running Python kernel.

In the end, the accuracy of the simulation is always a tradeof between the time to completion and the error in the resultant solution. In a real full-waveform inversion, where there is significant noise present in the data, you may be ok with a 5% error in your waveforms if it results in an appreciable speedup in each simulation. Such an error level will likely be below the noise level of your data. If you are looking to do a true convergence test, it may be worth it to increase the elements_per_wavelength parameter and pay the extra simulation cost. As with many things, it is a fine balance between science, experience, and art.

w = ec.waveform_simulation_configuration.copy()
w.output.volume_data.filename = "volume.h5"
w.output.volume_data.format = "hdf5"
w.output.volume_data.fields = ["displacement"]
w.output.volume_data.sampling_interval_in_time_steps = 10


ec2 = sn.EventConfiguration(
    wavelet=sn.simple_config.stf.Ricker(center_frequency=15.0),
    waveform_simulation_configuration=w,
)


p += sn.SimulationConfiguration(
    name="simulation_4",
    max_frequency_in_hertz=40.0,
    elements_per_wavelength=1.0,
    model_configuration=mc,
    event_configuration=ec2,
    absorbing_boundaries=abp,
)

p.simulations.launch(
    ranks_per_job=2,
    site_name=SALVUS_FLOW_SITE_NAME,
    events=p.events.list(),
    simulation_configuration="simulation_4",
    max_block_in_seconds=60.0,
)
[2020-09-28 17:57:17,813] INFO - salvus.project.components.simulation_component: Submitting job ...
Uploading 1 files...

🚀  Submitted [email protected]local
After 0.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 0.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.3 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 1.6 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.0 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.3 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.6 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 2.9 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.1 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.4 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
After 3.7 seconds: running. Sleeping for 0.2500 seconds before checking job status again.
1
p.simulations.query()
folder = str(
    p.simulations.get_simulation_output_directory(
        simulation_configuration="simulation_4", event="event_0000"
    )
)
[2020-09-28 17:57:22,059] INFO - salvus.project.components.simulation_component: Retrieving output for 1 simulation(s) from site.
PAGE CONTENTS