Version:

Creating and plotting custom gathers from Salvus' receiver outputs

Salvus records the wavefields on receiver locations as specified by the user. These wavefields can be readily obtained for every receiver, but Salvus also has built-in functionality to retrieve data as well as plot sorted, filtered and selected subsets of these receivers.
This notebook illustrated how to use custom callback functions to create subsets of the data, and plot them.
Copy
import matplotlib.pyplot as plt
import numpy as np
import os
import xarray as xr
import dataclasses

import salvus.namespace as sn

# General configuration variables
SALVUS_FLOW_SITE_NAME = os.environ.get("SITE_NAME", "local")
PROJECT_DIR = "project"

Creating two synthetic datasets

To show how different receiver gathers can be plotted, we will need some data. In the following few cells we will create two synthetic datasets on a simple box domain, with very dense receiver spacing. The goal of this notebook is to show how to meaningfully collect data from such an array of receivers.

Creating a domain

We start with creating a box domain that is rather oblong.
domain = sn.domain.dim3.BoxDomain(
    x0=0, x1=0.1, y0=0.0, y1=0.1, z0=-0.0, z1=0.025
)

p = sn.Project.from_domain(PROJECT_DIR, domain, load_if_exists=True)

p.viz.nb.domain()
In this domain, let's add a source, and an array of receivers. The receivers aren't added exactly to the surface, but projected onto it from the top down, using SideSetPoint3D and with initial point [x, y, 2.0] and direction z, intersecting with side_set z1.
src_3d = sn.simple_config.source.cartesian.VectorPoint3D(
    x=0.03, y=0.065, z=0.025, fx=0.0, fy=0.0, fz=-1.0
)

# Create a receiver grid with regular spacing
x_locations_recs = [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09]
y_locations_recs = np.linspace(0.01, 0.09, 50)

recs = [
    sn.simple_config.receiver.cartesian.SideSetPoint3D(
        point=[x, y, 2.0],
        direction="z",
        side_set_name="z1",
        station_code=f"TOP-X{int(ix)}-Y{int(iy)}",
        fields=["displacement"],
    )
    for ix, x in enumerate(x_locations_recs)
    for iy, y in enumerate(y_locations_recs)
]

# Package up sources and receivers into an event collection
p.add_to_project(
    sn.EventCollection.from_sources(
        sources=[src_3d],
        receivers=recs,
    )
)
Now, finally, let's add a model, waveform details and a simulation configuration, i.e., everything that is needed to create an appropriate mesh and run a simulation. We use a homogeneous background model, a fixed end time, and a delta function as a source time function.
# Create a homogeneous, isotropic, elastic model
bm = sn.model.background.homogeneous.IsotropicElastic(
    vp=1000.0, rho=1000.0, vs=1000 / (2.0**0.5)
)
mc_background = sn.ModelConfiguration(background_model=bm)

# Set a fixed end time, as predicted by the user
wsc = sn.WaveformSimulationConfiguration(end_time_in_seconds=0.00030)

# Use a maximum frequency of 50 kHz. This strongly influences the meshing
# resolution, and thereby the cost of running a simulation
max_frequency = 30e3

# Create an event configuration by setting the source time function
event_configuration = sn.EventConfiguration(
    waveform_simulation_configuration=wsc,
    wavelet=sn.simple_config.stf.Ricker(center_frequency=0.5 * max_frequency),
)

# Add everything into a simulation configuration and add it to the project
p.add_to_project(
    sn.SimulationConfiguration(
        name="simulation_homogeneous",
        elements_per_wavelength=2.0,
        tensor_order=1,
        max_frequency_in_hertz=max_frequency,
        model_configuration=mc_background,
        event_configuration=event_configuration,
    )
)
This is all we need to create synthetic data. However, it would be interesting later on to compare at least two different datasets. To this extent, let's create another simulation with a different model. This time, let's add some heterogeneity on top of our background model.
The following function creates an xarray that can be passed to sn.model.volume.cartesian.GenericModel.
def get_ellipsoidal_inclusion(
    domain,
    bm,
    vp_perturbation: float = 0.3,
    vs_perturbation: float = -0.2,
    rho_perturbation: float = 0.8,
):
    """
    Returns an xarray limited to the `domain` bounds. It centers a
    ellipsoidal inclusion of vp, vs and rho based on the values of the
    passed `bm`.

    """
    # Define resolution of xarray
    nx, ny, nz = 100, 100, 100

    # Get bounds
    (x0, x1), (y0, y1), (z0, z1) = domain.bounds.sorted_bounds

    # Get domain dimensions
    xl, yl, zl = x1 - x0, y1 - y0, z1 - z0

    # Get midpoint
    xm, ym, zm = (x1 + x0) / 2.0, (y1 + y0) / 2.0, (z1 + z0) / 2.0

    # Create coordinate arrays
    x = np.linspace(x0, y1, nx)
    y = np.linspace(y0, y1, ny)
    z = np.linspace(z0, z1, nz)
    xx, yy, zz = np.meshgrid(x, y, z, indexing="ij")

    # Create parameter background fields
    vp = bm.vp * np.ones_like(xx)
    vs = bm.vs * np.ones_like(xx)
    rho = bm.rho * np.ones_like(xx)

    # Compute ellipsoidal mask
    mask = (
        np.sqrt(
            ((xx - xm) / xl) ** 2
            + ((yy - ym) / yl) ** 2
            + ((zz - zm) / zl) ** 2
        )
        < 0.25
    )

    # Use the mask to perturb the model
    vp[mask] *= 1.0 + vp_perturbation
    vs[mask] *= 1.0 + vs_perturbation
    rho[mask] *= 1.0 + rho_perturbation

    # Convert the numpy arrays to an xarray dataset
    ds = xr.Dataset(
        data_vars={
            "vp": (["x", "y", "z"], vp),
            "vs": (["x", "y", "z"], vs),
            "rho": (["x", "y", "z"], rho),
        },
        coords={"x": x, "y": y, "z": z},
    )

    return ds


volumetric_model = get_ellipsoidal_inclusion(domain, bm)
Other than the fact that Salvus uses xarrays as a possible inputs for a volumetric model, they are also easily visualized. One can create a slice along a coordinate axis with .sel to verify our volume model.
plt.figure()
# Create a horizontal slice (z fixed to midpoint)
volumetric_model["vp"].sel(z=0.0125, method="nearest").plot()
plt.gca().set_aspect("equal")

plt.figure()
# Create a vertical slice (x fixed to midpoint). The transpose is taken simply
# to use the vertical axis where we expect it to.
volumetric_model["vs"].sel(x=0.05, method="nearest").transpose("z", "y").plot()
plt.gca().set_aspect("equal")

plt.figure()
# Create a vertical slice (x fixed to midpoint). The transpose is taken simply
# to use the vertical axis where we expect it to.
volumetric_model["rho"].sel(x=0.05, method="nearest").transpose(
    "z", "y"
).plot()
plt.gca().set_aspect("equal")
That model looks exactly like something interesting. Let's add it to the project, and create a new simulation the uses this model.
# Add the model with inclusion to the project
p.add_to_project(
    sn.model.volume.cartesian.GenericModel(
        name="model_with_inclusion", data=get_ellipsoidal_inclusion(domain, bm)
    )
)

# Create a model configuration to be used by the simulation configuration
mc_3d = sn.ModelConfiguration(
    background_model=None, volume_models="model_with_inclusion"
)

# Recreate the same simulation configuration, but with the new model
p.add_to_project(
    sn.SimulationConfiguration(
        name="simulation_with_inclusion",
        elements_per_wavelength=2.0,
        tensor_order=1,
        max_frequency_in_hertz=max_frequency,
        model_configuration=mc_3d,
        event_configuration=event_configuration,
    )
)
To verify our models, one might like to visualize the simulations. First the event without the inclusion.
p.viz.nb.simulation_setup("simulation_homogeneous", "event_0000")
[2024-11-20 13:20:53,347] INFO: Creating mesh. Hang on.
<salvus.flow.simple_config.simulation.waveform.Waveform object at 0x7d055e7ec290>
Now the event with the inclusion. Externally, these look very similar!
p.viz.nb.simulation_setup("simulation_with_inclusion", "event_0000")
[2024-11-20 13:20:53,921] INFO: Creating mesh. Hang on.
<salvus.flow.simple_config.simulation.waveform.Waveform object at 0x7d0597cb3b10>
However, if we investigate the actual meshes created, we see that Salvus is able to adaptively mesh heterogeneous models.
mesh_A = p.simulations.get_mesh("simulation_homogeneous")

mesh_B = p.simulations.get_mesh("simulation_with_inclusion")

print(
    f"Number of elements:\n\thomogeneous model: {mesh_A.nelem}\n"
    f"\tmodel with inclusion: {mesh_B.nelem}"
)
Number of elements:
	homogeneous model: 243
	model with inclusion: 363
All that is left is to run the simulations to create the synthetic datasets.
p.simulations.launch(
    simulation_configuration="simulation_homogeneous",
    events=p.events.list(),
    site_name=SALVUS_FLOW_SITE_NAME,
    ranks_per_job=2,
)

p.simulations.launch(
    simulation_configuration="simulation_with_inclusion",
    events=p.events.list(),
    site_name=SALVUS_FLOW_SITE_NAME,
    ranks_per_job=2,
)
[2024-11-20 13:20:54,289] INFO: Submitting job ...
Uploading 1 files...

🚀  Submitted job_2411201320386752_f82bab9d2d@local
[2024-11-20 13:20:54,545] INFO: Submitting job ...
Uploading 1 files...

🚀  Submitted job_2411201320552389_071603b87e@local
1
Now check if all is going well with the simulations.
for sc in p.simulations.list():
    print(f"Checking in on simulation `{sc}`")

    p.simulations.query(simulation_configuration=sc, events="event_0000")
Checking in on simulation `simulation_homogeneous`
Checking in on simulation `simulation_with_inclusion`
The previous check can also be done in a blocking fashion, however, you might not see parallel simulations. This cell will block until all simulations are finished, as required by the rest of the notebook.
p.simulations.query(block=True)
True
We can in a very basic way get all the data from this simulation using Salvus Project's waveforms.get. This will return a list of object of type salvus.flow.collections.event_data.EventData. Taking the first of these objects, it can be directly plotted for the component and field of interest.
synthetic_data = p.waveforms.get(
    data_name="simulation_homogeneous", events=p.events.list()[0]
)[0]
synthetic_data_inclusion = p.waveforms.get(
    data_name="simulation_with_inclusion", events="event_0000"
)[0]

synthetic_data.plot(receiver_field="displacement", component="X")
synthetic_data_inclusion.plot(receiver_field="displacement", component="X")
This rather a rather roundabout way to visualize data in Salvus Projects. What is more readily useable is the convenience function shotgather, part of project.viz / project.visualizations. This function allows you to plot multiple datasets together in one go.
p.viz.shotgather(
    data=["simulation_homogeneous", "simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
)
<Axes: title={'center': 'Event: event_0000 | field: displacement | component: X'}, xlabel='Receiver Index', ylabel='Time [s]'>
Internally, this function uses project.visualizations.custom_gather. In addition to shot gathers, one is able to plot custom gathers with this method, by providing at least one of three callbacks:
  • sort_by -- useful for ordering receivers
  • filter_by -- useful for filtering receivers
  • select_by -- useful for complex operations to select specific receivers
These functions work in the following ways:
sort_by: Callback function called with the receiver objects to
    determine the sorting. Works like the `key` callback for the
    standard Python `sorted()` method.
filter_by: Callback function called with the receiver objects in
    internal Salvus format to determine the filtering. Works like
    the callback for the standard Python `filter()` method. For
    complex filter operations requiring all receivers, defer to
    the `select_by` method.
select_by: Callback function called with a list of tuples in the
    receiver objects in both internal Salvus and ObsPy Stream
    format, as well as the EventData object.
Let's demonstrate their usage in incremental complexity
The sort_by callback functions should taking in internal Salvus receiver objects. The expected return is an object for which sorting can be performed directly. This means, return an object per receiver for which Python knows how to sort them. One type of object that can be internally sorted by Python is float, which is an ideal sort key.
In the below example, the callback function calculates the (squared) distance of a passed receiver to the fixed center of the domain. Returning this distance results in the gather plotting function to sort the receivers in ascending order for this distance. Note that the type of the objects passed to the callback function varies with the receivers.
Note that the custom gather function adds more customizability to the plot, by adding:
interleave: Whether or not to alternatingly plot data. If
    false, traces from all data will be plotted on top of each
    other.
colors: The colors for the individual traces. Must be exactly as
    many colors as data items. If not given, it will attempt to
    choose suitable colors.
alpha: The opacity of all data, as used in Matplotlib. useful
    when setting interleave to false.
plot_using: The configuration of the axis to create or reuse. Will
    default to the standard size as defined in the PlotProperties
    dataclass.
data_selection_configuration: The data selection configuration to
    use.
def sort_by_distance_from_midpoint(receiver_objects):
    # A note, depending on user input, the receivers will have different types,
    # and with that, different attributes. For instance, in case of normal
    # point receivers, there would be no attribute `point` but only
    # `location`.
    X, Y, Z = receiver_objects.point
    return (X - 0.5) ** 2 + (Y - 0.5) ** 2


ax = p.viz.custom_gather(
    data=["simulation_homogeneous", "simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
    sort_by=sort_by_distance_from_midpoint,
    interleave=True,
    alpha=1.0,
)
A classical example where the filter_by callback is useful is creating line gathers for an event. Selecting receivers that all lie on the line "X == 0.03", can be done in the following way. Note that when sort_by is absent, the order the receivers are added to the simulation is adhered.
def filter_line(receiver_objects):
    X, Y, Z = receiver_objects.point

    if np.isclose(X, 0.03):
        return True
    return False


ax = p.viz.custom_gather(
    data=["simulation_homogeneous", "simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
    filter_by=filter_line,
    interleave=False,
    alpha=1.0,
)
We can easily compound these two callbacks to get the same plot in reverse ordering. Note that sorting and filtering operations typically will not influence eachother.
def sort_by_Y_reversed(receiver_objects):
    X, Y, Z = receiver_objects.point
    return -Y


ax = p.viz.custom_gather(
    data=["simulation_homogeneous", "simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
    filter_by=filter_line,
    sort_by=sort_by_Y_reversed,
    interleave=True,
    alpha=1.0,
)
By combining filters, sorting keys and further modifications of the returned axis object, the custom gather interface allows for a high degree of customization.
def filter_line_subset(receiver_objects):
    X, Y, Z = receiver_objects.point

    if np.isclose(X, 0.03) and 0.055 < Y < 0.075:
        return True
    return False


@dataclasses.dataclass
class PlotProperties:
    width = 5
    height = 5
    dpi = 300


ax = p.viz.custom_gather(
    data=["simulation_homogeneous", "simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
    filter_by=filter_line_subset,
    sort_by=sort_by_Y_reversed,
    interleave=False,
    alpha=0.7,
    plot_using=PlotProperties(),
)

# As the axes is passed back, we can alter it to suit our preferences
# Zoom in on the time axis
ax.set_ylim([2.5e-4, 1.2e-4])

# Add the filter details to the title
new_title = (
    ax.get_title()
    + "\nFilter applied to receivers: X = 0.05, 0.055 < Y < 0.075"
)
ax.set_title(new_title)
Text(0.5, 1.0, 'Event: event_0000 | field: displacement | component: X\nFilter applied to receivers: X = 0.05, 0.055 < Y < 0.075')
The select by callback is a very powerful, but also very complex operator. A general template is the following, which unpacks the tuples, and selects receivers based on a criterium.
def select_by_complex(receiver_object_list_of_tuples, event_data):

    # unpack
    salvus_internal_receivers = [
        tup[0] for tup in receiver_object_list_of_tuples
    ]
    obspy_stream_per_receivers = [
        tup[1] for tup in receiver_object_list_of_tuples
    ]

    # Initialize an empty list to store the selected tuples
    selected_receiver_object_list = []

    # Loop through the tuples and apply your selection criteria
    for i in range(len(receiver_object_list_of_tuples)):
        salvus_receiver = salvus_internal_receivers[i]
        obspy_stream = obspy_stream_per_receivers[i]

        # Apply your selection criteria here
        # For example, you might check some condition based on event_data
        if some_condition(salvus_receiver, event_data):
            selected_receiver_object_list.append(
                (salvus_receiver, obspy_stream)
            )

    return selected_receiver_object_list
The example below selects only receivers with an maximum amplitude of the X-component trace larger than 3e-9 on the receiver field (displacement in this case). Next, the receivers are sorted by first large ampltiude arrival.
Note: Sorting and filtering happen before the selection stage. Any custom additional sorting or filtering within the selection filter will further modify the previous operaitons. In other words, in this example the sorting from sort_by is overwritten.
def select_by_complex(receiver_object_list_of_tuples, event_data):
    # unpack
    salvus_internal_receivers = [
        tup[0] for tup in receiver_object_list_of_tuples
    ]
    obspy_stream_per_receivers = [
        tup[1] for tup in receiver_object_list_of_tuples
    ]

    # Initialize an empty list to store the selected tuples
    selected_receiver_object_list = []
    first_big_amplitudes = []  # custom sorting key

    # Loop through the tuples and apply the selection criteria
    for i in range(len(receiver_object_list_of_tuples)):
        salvus_receiver = salvus_internal_receivers[i]
        obspy_stream = obspy_stream_per_receivers[i]

        # Get maximum amplitude from trace 0
        max_amplitude = obspy_stream.traces[0].max()

        # Sort by first big amplitude
        first_big_amplitude = np.argmax(
            obspy_stream.traces[0].data > max_amplitude * 0.1
        )

        # Apply the selection criterium, and append resulting receivers
        if max_amplitude > 3e-9:
            selected_receiver_object_list.append(
                (salvus_receiver, obspy_stream)
            )

            # Save the sorting key
            first_big_amplitudes.append(first_big_amplitude)

    # Sort tuples with the sorting key
    sorted_selection = [
        tup
        for _, tup in sorted(
            zip(first_big_amplitudes, selected_receiver_object_list),
            key=lambda x: x[0],
        )
    ]

    return sorted_selection


ax = p.viz.custom_gather(
    data=["simulation_with_inclusion"],
    event="event_0000",
    receiver_field="displacement",
    component="X",
    sort_by=sort_by_distance_from_midpoint,
    select_by=select_by_complex,
    interleave=False,
    alpha=0.7,
)
All demonstrated callbacks can also be used in the EventData.get_data_cube method, to directly get the data for your own postprocessing.
t, d = synthetic_data_inclusion.get_data_cube(
    receiver_field="displacement",
    component="X",
    filter_by=filter_line,
    select_by=select_by_complex,
)

print(t.shape, d.shape)
(735,) (37, 735)
PAGE CONTENTS