Version:

Continental Scale FWI Tutorial - Part 3: Data selection and misfit definition

The second part of the continental scale full waveform inversion tutorial explains how to set up simulations to computationally replicate the observed data as well as how to select windows in which to compute misfits.
Copy
import os, pathlib
from frequency_band import FrequencyBand
from salvus import namespace as sn

SIMULATION_TIME_IN_SECONDS = 1200.0
RANKS_PER_JOB = 4
SALVUS_FLOW_SITE_NAME = os.environ.get("SITE_NAME", "local")
PROJECT_DIR = "project_dir_central_europe"

fband_file = pathlib.Path("./frequency_band_70_120.pkl")
fband = FrequencyBand.load(fband_file)
fband
Frequency band: 8.33 mHz - 14.3 mHz, 70.0 s - 120.0 s
FrequencyBand(min_frequency_in_hertz=0.008333333333333333, max_frequency_in_hertz=0.014285714285714285)
p = sn.Project(path=PROJECT_DIR)

Event Selection

For time reasons and due to limited computational reasons we strongly recommend to only choose a small subset of events for this tutorial. Feel free to choose your own events.
p.viz.nb.domain()
SELECTED_EVENTS = [
    "event_CRETE_GREECE_Mag_6.15_2015-04-16-18-07",
    "event_ICELAND_REGION_Mag_5.75_2020-06-20-19-26",
    "event_STRAIT_OF_GIBRALTAR_Mag_6.38_2016-01-25-04-22",
]

# Sanity check.
for event in SELECTED_EVENTS:
    assert p.events.exists(event), event
Now let us configure the simulations for the first few iterations. A few notes on the choices:
  • We choose a tensor order of 2 to better approximate the curvature of the Earth.
  • Minimum mesh period of 70.0 seconds to fit the processed data, defined in part 0.
  • Maximum depth of 1750 km. We are only fitting surface waves here so no need to go deeper. At higher frequencies when body waves become more prevalent it makes sense to increase the depth as well.
wsc = sn.WaveformSimulationConfiguration(
    attenuation=True, end_time_in_seconds=SIMULATION_TIME_IN_SECONDS
)
p.add_to_project(
    sn.SimulationConfiguration(
        tensor_order=2,
        name=f"initial_model_{fband.period_band_name}",
        elements_per_wavelength=1.25,
        min_period_in_seconds=fband.min_period_in_seconds,
        max_depth_in_meters=1750e3,
        model_configuration=sn.ModelConfiguration(
            background_model="prem_ani_no_crust",
        ),
        topography_configuration=sn.TopographyConfiguration(
            ["surface_topography"]
        ),
        bathymetry_configuration=sn.BathymetryConfiguration(
            ["ocean_bathymetry"]
        ),
        event_configuration=sn.EventConfiguration(
            wavelet=sn.simple_config.stf.FilteredHeaviside(
                min_frequency_in_hertz=fband.min_frequency_in_hertz,
                max_frequency_in_hertz=fband.max_frequency_in_hertz,
                highpass_corners=3,
                lowpass_corners=3,
                end_time_in_seconds=SIMULATION_TIME_IN_SECONDS,
            ),
            waveform_simulation_configuration=wsc,
        ),
        absorbing_boundaries=sn.AbsorbingBoundaryParameters(
            reference_velocity=3700.0,
            number_of_wavelengths=0.0,
            reference_frequency=fband.max_frequency_in_hertz,
        ),
    ),
    overwrite=True,
)
p.viz.nb.simulation_setup(
    f"initial_model_{fband.period_band_name}", events=SELECTED_EVENTS
)
[2024-11-20 13:43:47,716] INFO: Creating mesh. Hang on.
<salvus.flow.simple_config.simulation.waveform.Waveform object at 0x7d2ba1cca090>
As before we can now launch these simulations.
p.simulations.launch(
    f"initial_model_{fband.period_band_name}",
    SELECTED_EVENTS,
    site_name=SALVUS_FLOW_SITE_NAME,
    ranks_per_job=RANKS_PER_JOB,
    derived_job_config=sn.WavefieldCompression(
        forward_wavefield_sampling_interval=10
    ),
)

p.simulations.query(block=True, ping_interval_in_seconds=5)
[2024-11-20 13:43:52,882] INFO: Submitting job array with 3 jobs ...

True
Let's look and compare the simulations with the observed data.
p.viz.nb.waveforms(
    [
        f"PROCESSED_DATA:{fband.period_band_name}",
        f"initial_model_{fband.period_band_name}",
    ],
    receiver_field="displacement",
)
So the fit is largely already decent but it becomes clear that not everything can be used for the misfit computation.
Even though it is called full waveform inversion one rarely actually uses the full trace. Reasons for this are numerous like noise on the observed data, long often-scattered coda waves, unexplained site-effects and others. Thus windows have to be selected. It cannot be overstated how important this is.
We will later on match the phases of synthetic and observed data and it thus very crucial that we select windows in which observed and synthetic data are less than half a period out of sync. Otherwise we run into cycle skipping which will still minimize the objective function but yield an meaningless result.
Manual window picking becomes very tedious to do over a large amount of data and Salvus thus ships with some helper functions to aid with that.
Windows and weights are part of what is referred to as "data selection" within SalvusProject so you will see that term pop up.

Step 1: Tuning the Selection Algorithm

This is always a to some extent subjective issue and also dependent on the to be chosen misfit. We will later use a phase misfit so it is important to only pick windows where there is no cycle skipping. Salvus' built-in window picking algorithm attempts to be very conservative in that regard.
The first run of the window picking algorithm will only run it for a limited selection of receivers and not save the result into the project so it can be easily tuned. The built-in window picking algorithm has plot and verbose arguments - if you have your window picking function you will have to implement your own debugging/tuning process. The function additionally returns the picked windows so further analysis would be possible.
p.actions.seismology.pick_windows(
    data_selection_configuration=f"initial_selection_{fband.period_band_name}",
    observed_data_name=f"PROCESSED_DATA:{fband.period_band_name}",
    synthetic_data_name=f"initial_model_{fband.period_band_name}",
    events=SELECTED_EVENTS[1],
    receiver_field="displacement",
    window_taper_width_in_seconds=fband.min_period_in_seconds / 2.0,
    window_picking_function="built-in",
    window_picking_function_kwargs={
        "minimum_period_in_seconds": fband.min_period_in_seconds,
        "maximum_period_in_seconds": fband.max_period_in_seconds,
        "plot": True,
        "verbose": True,
    },
    save_results_in_project=False,
    receiver_name_pattern="IU.KEV.*",
    overwrite=True,
)
[Window selection for IU.KEV.00.BHZ] Correlation Coefficient: 0.8358
[Window selection for IU.KEV.00.BHZ] Absolute Noise Level: 2.817044e-08
[Window selection for IU.KEV.00.BHZ] Relative Noise Level: 6.280877e-03
[Window selection for IU.KEV.00.BHZ] Done, Selected 2 window(s)
[Window selection for IU.KEV.00.BHN] Correlation Coefficient: -0.3835
[Window selection for IU.KEV.00.BHN] Absolute Noise Level: 6.523066e-08
[Window selection for IU.KEV.00.BHN] Relative Noise Level: 5.648955e-02
[Window selection for IU.KEV.00.BHN] Correlation -0.3835 is below threshold of 0.1000
[Window selection for IU.KEV.00.BHE] Correlation Coefficient: 0.9298
[Window selection for IU.KEV.00.BHE] Absolute Noise Level: 1.469819e-07
[Window selection for IU.KEV.00.BHE] Relative Noise Level: 3.746031e-02
[Window selection for IU.KEV.00.BHE] Done, Selected 2 window(s)
[<salvus.flow.collections.event_window_and_weight_set.EventWindowAndWeightSet at 0x7d2aa9d05990>]
Once we found suitable parameters we can run it across the whole data sets. For larger data sets one would first run this against individual events and than look at the window statistics. But this tutorial is small enough to just directly run it against the full data set.
p.actions.seismology.pick_windows(
    data_selection_configuration=f"initial_selection_{fband.period_band_name}",
    observed_data_name=f"PROCESSED_DATA:{fband.period_band_name}",
    synthetic_data_name=f"initial_model_{fband.period_band_name}",
    events=SELECTED_EVENTS,
    receiver_field="displacement",
    window_taper_width_in_seconds=fband.min_period_in_seconds / 2.0,
    window_picking_function="built-in",
    window_picking_function_kwargs={
        "minimum_period_in_seconds": fband.min_period_in_seconds,
        "maximum_period_in_seconds": fband.max_period_in_seconds,
    },
    overwrite=True,
)
[2024-11-20 13:47:04,896] INFO: Encountered 6 issues during the window picking process. They have been logged to 'project_dir_central_europe/INTERNAL/LOGS/2024-11-20T13-47-04_actions_seismology_window_picking.log'.
[<salvus.flow.collections.event_window_and_weight_set.EventWindowAndWeightSet at 0x7d2aaa6f96d0>,
 <salvus.flow.collections.event_window_and_weight_set.EventWindowAndWeightSet at 0x7d2aaafdeb90>,
 <salvus.flow.collections.event_window_and_weight_set.EventWindowAndWeightSet at 0x7d2aaae71ed0>]
The windows can be visualized by passing the window set name to visualization method. This is useful to look at individual windows but becomes very tedious for larger data sets.
p.viz.nb.waveforms(
    [
        f"PROCESSED_DATA:{fband.period_band_name}",
        f"initial_model_{fband.period_band_name}",
    ],
    receiver_field="displacement",
    data_selection_configuration=f"initial_selection_{fband.period_band_name}",
)
SalvusProject contains a helpful statistical window plot that can plot an overview of all windows for a given event. In our opinion this is the most useful way to look at large sets of windows.
p.viz.seismology.windows(
    data_selection_configuration=f"initial_selection_{fband.period_band_name}"
)
This could certainly be tuned a bit more but this is good enough for now.
Stations in real world applications tend to strongly cluster and thus the sensitivity of the misfit is highly heterogenoeus and tends to be concentrated in regions with a dense station converage. This is not desired and one possibility to counteract this is to downweight stations in regions of dense coverage and thus giving a higher weight to more isolated stations in an effort to even the total sensitivity.
Salvus is highly flexible in that regard and allows settings weights for each inidividual receiver, each component within each receiver and even individual windows to be able to adapt to your application.
Here we will demonstrate how to apply a weighting scheme for seismic receivers based on the geometric station distribution.
p.actions.seismology.add_receiver_weights_to_windows(
    data_selection_configuration=f"initial_selection_{fband.period_band_name}",
    events=SELECTED_EVENTS,
    weighting_chain=[
        {
            "weighting_scheme": "ruan_et_al_2019",
            "function_kwargs": {"ref_distance_condition_fraction": 1.0 / 3.0},
        },
        {
            "weighting_scheme": "mute_near_source_receivers",
            "function_kwargs": {
                "minimum_receiver_distance_in_m": 200e3,
                "maximum_receiver_mute_distance_in_m": 500e3,
                "taper_type": "hanning",
            },
        },
    ],
    normalize=True,
)
These can be easily visualized on a map. Note how it gives higher weights to isolated stations. Click on the individual stations to look at that station's receiver weight.
p.viz.seismology.receiver_weights_map(
    data_selection_configuration=f"initial_selection_{fband.period_band_name}"
)
One could fine-tune this a bit more but this is good enough and let us now move on to the misfit definition.
The actual formula with which observed and synthetic data are compared is a pivotal choice in every full waveform inversion. There are many reasonable options and the best choice inevitably depends on the specific dataset. Keep in mind that this is what is being minimized in the end and your physical intuition should enter here to choose a misfit functional/definition that is able to actually extract the information you are after.
Salvus is very flexible in that regard - it ships with a few widely used misfit functions see here for the documentation -- but it is also possible to define your own, as detailed in other tutorials.
Here we'll use a time frequency phase difference misfit. It will perform a short term Fourier transform over the data and estimate the phase mismatch at each frequency. It has quite a number of tuning parameters which you can find in the function's documentation.
Similar to the simulation settings, also the misfit specifications enters as a Configuration object into project.
p.add_to_project(
    sn.MisfitConfiguration(
        name=f"tf_phase_misfit_{fband.period_band_name}",
        observed_data=f"PROCESSED_DATA:{fband.period_band_name}",
        misfit_function="time_frequency_phase_misfit",
        extra_kwargs_misfit_function={
            # The length of each individual FFT segment.
            "segment_length_in_seconds": 1.5 * fband.max_period_in_seconds,
            "frequency_limits": (
                fband.min_frequency_in_hertz,
                fband.max_frequency_in_hertz,
            ),
        },
        receiver_field="displacement",
        data_selection_configuration=f"initial_selection_{fband.period_band_name}",
    )
)
These misfits can be analyzed in a variety of ways. Here we'll use a built-in geographical map visualization.
p.viz.seismology.misfit_map(
    reference_data=f"initial_model_{fband.period_band_name}",
    misfit_configuration=f"tf_phase_misfit_{fband.period_band_name}",
)
PAGE CONTENTS