OpenSHA

Open-Source Seismic Hazard Analysis (OpenSHA)

Legacy Fault System Solution

NOTE: This is the format used in UCERF3 and is currently being redesigned to be more user friendly. When this format was developed, disk space was at a premium and Java did not yet support >2 GB zip files, so the binary formats were necessary in order to be able to store the entire UCERF3 model in a single compound solution zip file. We understand that these files are complicated, and hope that the forthcoming newer file format is easier for the user community to use. See this page for a draft of the new format. If you need to parse UCERF3 fault system solutions, consider waiting until the new and easier format is available or contact us for more information.

UCERF3 (and potentially other forecast) data is stored in a zip file format, this page describes how to parse these zip files. If you are using Java, a parser is already written in OpenSHA via the scratch.UCERF3.utils.FaultSystemIO class.

Zip File Contents

The following files constitute a Fault System Solution. See File Formats below for descriptions and implementation details for each format.

File Name File Format Optional? Description
fault_sections.xml XML no This XML file describes each sub section in the Fault System. These indexes will be referred to in the rup_sections.bin file when defining ruptures.
grid_sources.xml XML yes This XML file, if present, gives gridded seismicity MFDs at each node in the region that this solution covers.
grid_sources_reg.xml XML yes This XML file, if present, gives the region associated with gridded seismicity MFDs. Used in conjunction with grid_sources.bin and a more space efficient alternative to grid_sources.xml
grid_sources.bin Double array list binary yes This binary file, if present, gives gridded seismicity MFDs at each node in the region described in grid_sources_xml.
info.txt ASCII yes This text file, if present, contains metadata describing the solution.
mags.bin Double array binary no This file gives magnitudes for each rupture. It contains one double value for each rupture index, in order.
rakes.bin Double array binary no This file gives average rakes for each rupture. It contains one double value for each rupture index, in order.
rates.bin Double array binary no This file gives annualized rates for each rupture. It contains one double value for each rupture index, in order.
rup_areas.bin Double array binary no This file gives areas for each rupture in SI units (square meters). It contains one double value for each rupture index, in order.
rup_lengths.bin Double array binary yes This file, if present, gives lengths for each rupture in SI units (meters). It contains one double value for each rupture index, in order.
rup_mfds.bin MFD Binary yes This file, if present, gives magnitude frequency distributions for each rupture. It contains one function (consisting of an x value and y value data array) for each rupture index, in order.
rup_sections.bin Integer array list binary no This lists the sub sections involved in each rupture. It consists of numRuptures arrays, each of which lists the sub sections indexes (as defined in fault_sections.xml) for that rupture
sect_areas.bin Double array binary yes This file, if present, gives areas for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.
sect_slips.bin Double array binary yes This file, if present, gives slip rates after any aseismic and subseismogenic reductions for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.
sect_slips_std_dev.bin Double array binary yes This file, if present, gives standard deviations of slip rates after any aseismic and subseismogenic reductions for each fault sub section in SI units (square meters). It contains one double value for each sub section index, in order.
sub_seismo_on_fault_mfds.bin MFD Binary yes This file, if present, gives subseismogenic magnitude frequency distributions for each sub section. It contains one function (consisting of an x value and y value data array) for each sub section index, in order.

The following files are neither documented nor required but may be present in zip files generated by the UCERF3 inversion. They give metadata about the logic tree branch associated and other inversion metadata.

File Name File Format Optional? Description
close_sections.bin Integer array list binary yes This file lists, for each sub section (in order), all of the other sub sections that the given sub section connects with in the Fault System.
cluster_rups.bin Integer array list binary yes Some fault systems are separated into clusters of interconnected faults. This file lists, for each cluster, all of the rupture indexes which are part of the given cluster.
cluster_sects.bin Integer array list binary yes Some fault systems are separated into clusters of interconnected faults. This file lists, for each cluster, all of the sub section indexes which are part of the given cluster.
inv_rup_set_metadata.xml XML yes This file gives metadata for the logic tree branch and rupture filtration criterion (laugh test filter) used to generate this solution.
inv_sol_metadata.xml XML yes This file gives metadata for the UCERF3 inversion including equation set weights and final simulated annealing energies.
rup_avg_slips.bin Double array binary yes This file, if present, gives the average slip for each rupture in SI units (meters). It contains one double value for each rupture index, in order.

File Formats Used

You must write a parser for each of the following file formats in order to load in a fault system solution.

Double array binary file

These files contain an array of double values in a binary format. These files simply contain a series of big endian 8 bit double precision floating point numbers. The size of this file will be equal to the number of values x 8 bits.

Integer array list binary file

These files contain a list of integer arrays in a binary format. All file entries are 4-bit big endian integer values. The first value in the file is the number of integer arrays stored in the file. Then each array is written to the file by first writing the number of elements in the array, then each value in the array. For example, consider the following 3 arrays:

[ 0 6 2 4 ]
[ 3 6 2 ]
[ 3 7 9 1 4 7 ]

This would be written as (all stored as big endian 4-bit integers):

3 4 0 6 2 4 3 3 6 2 6 3 7 9 1 4 7

In this example, the number of arrays is in italics, each array’s size in bold, and array data is in plain text.

Fault section data XML file

Each fault subsection is stored in an XML file, an example of which is shown below.

<?xml version="1.0" encoding="UTF-8"?>

<OpenSHA>
  <FaultSectionPrefDataList>
    <!--- Each fault subsection is listed in it's own element so ensure correct ordering.
Each subsection element name will be i0, i1, i2, ... i[N-1] for N subsections. Note that
some fields may be NaN for certain solutions. -->
    <i0 sectionId="0" sectionName="Los Alamos extension, Subsection 0" aveLongTermSlipRate="NaN"
slipRateStdDev="0.0" aveDip="30.0" aveRake="NaN" aveUpperDepth="0.0" aveLowerDepth="12.0"
aseismicSlipFactor="0.1" couplingCoeff="1.0" dipDirection="205.15857"
parentSectionName="Los Alamos extension" parentSectionId="780" connector="false">
      <!-- This defines a polygon representing the geologic fault zone. Note that UCERF3 uses
a combination of this polygon and a buffered fault trace as described in UCERF3 Appendix O. -->
      <ZonePolygon name="Unnamed Region">
        <LocationList>
          <Location Latitude="34.562639" Longitude="-119.867803" Depth="0.0"/>
          <Location Latitude="34.54856" Longitude="-119.878996" Depth="0.0"/>
          <Location Latitude="34.586297" Longitude="-119.926461" Depth="0.0"/>
          <Location Latitude="34.630493" Longitude="-120.091308" Depth="0.0"/>
          <Location Latitude="34.647866" Longitude="-120.086651" Depth="0.0"/>
          <Location Latitude="34.60288" Longitude="-119.919064" Depth="0.0"/>
        </LocationList>
      </ZonePolygon>
      <!-- This is the actual fault trace. Note that order is important and should follow the
Aki-Richards definition -->
      <FaultTrace name="Los Alamos extension 1">
        <Location Latitude="34.63918" Longitude="-120.08897999999999" Depth="0.0"/>
        <Location Latitude="34.608190666873426" Longitude="-119.97328407378366" Depth="0.0"/>
      </FaultTrace>
    </i0>
    <i1 sectionId="1" ...>
        ...
    </i1>
    ...
    <i[N-1] sectionId="[N-1]" ...>
        ...
    </i[N-1]>
  </FaultSectionPrefDataList>
</OpenSHA>

Grid Sources XML file

Some solutions will contain gridded seismicity magnitude frequency distributions. Here is an example XML file:

NOTE 1: UCERF3 uses the RELM region evenly discretized at 0.1 degrees for gridded seismicity. Due to the complexities involved in reproducing our gridding exactly, a file is posted here with grid node indexes and locations for this region: http://opensha.usc.edu/ftp/kmilner/ucerf3/relm_gridded_region.csv

NOTE 2: This file is now deprecated as it is very large and does not compress well. The newer version of the file, grid_sources.xml, just contains the evenlyGriddedGeographicRegion region element below and uses a binary format.

<?xml version="1.0" encoding="UTF-8"?>

<OpenSHA>
  <!-- this explains the evenly gridded region for on which gridded seismicity is distributed. See
class javadocs here for more information: http://opensha.usc.edu/JavaDocs/org/opensha/commons/geo/GriddedRegion.html -->
  <evenlyGriddedGeographicRegion spacing="0.1" numPoints="7636">
    <anchor>
      <Location Latitude="31.5" Longitude="-125.4" Depth="0.0"/>
    </anchor>
    <Region name="RELM_TESTING Region">
      <LocationList>
        <Location Latitude="43.0" Longitude="-125.2" Depth="0.0"/>
        <Location Latitude="43.0" Longitude="-119.0" Depth="0.0"/>
        <Location Latitude="39.4" Longitude="-119.0" Depth="0.0"/>
        <Location Latitude="35.7" Longitude="-114.00000000000001" Depth="0.0"/>
        <Location Latitude="34.3" Longitude="-113.1" Depth="0.0"/>
        <Location Latitude="32.9" Longitude="-113.5" Depth="0.0"/>
        <Location Latitude="32.2" Longitude="-113.6" Depth="0.0"/>
        <Location Latitude="31.7" Longitude="-114.5" Depth="0.0"/>
        <Location Latitude="31.5" Longitude="-117.1" Depth="0.0"/>
        <Location Latitude="31.900000000000002" Longitude="-117.90000000000002" Depth="0.0"/>
        <Location Latitude="32.8" Longitude="-118.40000000000002" Depth="0.0"/>
        <Location Latitude="33.7" Longitude="-121.0" Depth="0.0"/>
        <Location Latitude="34.2" Longitude="-121.6" Depth="0.0"/>
        <Location Latitude="37.7" Longitude="-123.80000000000001" Depth="0.0"/>
        <Location Latitude="40.2" Longitude="-125.4" Depth="0.0"/>
        <Location Latitude="40.5" Longitude="-125.4" Depth="0.0"/>
      </LocationList>
    </Region>
  </evenlyGriddedGeographicRegion>
  <!-- this gives magnitude frequency distributions for each grid node. Each node can have an Unassociated MFD (seismicity at
that node that is not associated with a mapped fault), and a sub seismogenic MFD if a fault crosses the node. Num is the total
number of grid nodes.-->
  <MFDNodeList num="7636">
    <MFDNode index="0">
      <!-- this describes the discretization of the MFD function -->
      <UnassociatedFD name="Incremental Mag Freq Dist" tolerance="1.0000000000000001E-7" num="90" minX="0.05" maxX="8.950000000000001" delta="0.1">
        <Points>
          <!-- points on the MFD function -->
          <Point x="0.05" y="9.218866335939037"/>
          <Point x="0.15000000000000002" y="7.322805822785568"/>
          <Point x="0.25" y="5.816711422441951"/>
          <Point x="0.35000000000000003" y="4.620378116088878"/>
          <Point x="0.45" y="3.6700967927115817"/>
          ...
        </Points>
      </UnassociatedFD>
    </MFDNode>
    <MFDNode index="1">
      <!-- only some nodes have sub seismogenic MFDs -->
      <SubSeisMFD name="Incremental Mag Freq Dist" tolerance="1.0000000000000001E-7" num="90" minX="0.05" maxX="8.950000000000001" delta="0.1">
        <Points>
          <Point x="0.05" y="7.173882325337919"/>
          <Point x="0.15000000000000002" y="5.69841728360539"/>
          ...
        </Points>
      </SubSeisMFD>
      <UnassociatedFD name="Incremental Mag Freq Dist" tolerance="1.0000000000000001E-7" num="90" minX="0.05" maxX="8.950000000000001" delta="0.1">
        <Points>
          <Point x="0.05" y="9.51974524882418"/>
          <Point x="0.15000000000000002" y="7.561802438523384"/>
          <Point x="0.25" y="6.006553182326047"/>
          ...
        </Points>
      </UnassociatedFD>
    </MFDNode>
    ...
  </MFDNodeList>
</OpenSHA>

Grid Sources Binary file

This is a binary representation of grid source MFDs. All values are stored in a binary format (8-bit big endian floating point values) as a list of double arrays.

First, the number of total arrays is written, this is two times the number of grid nodes plus one, for the x values (which are only written once). The multiple of two is because each node has both an unassociated MFD (not associated with any faults) and an associated (associated with a fault) MFD.

For example, the 7636 grid nodes used for UCERF3 would write (7636 * 2) + 1 = 15273 arrays.

Then each array is written first with a 4-bit integer for the size of the array, followed by each 8-bit big endian value in the array. Empty arrays (size zero) mean that there is no MFD at that node (for example, many nodes do not have any faults and do not have an unassociated MFD).

Lets consider a simple example with 2 grid nodes where one associated MFD is null (note that actual grid source MFDs are discretized more finely):

Node 1:

Unassociated:

x y
5.0 0.5
5.5 0.1
6.0 1e-2
6.5 3e-5
7.0 1e-8
7.5 1e-11

Associated sub seismogenic:

null

Node 2:

Unassociated:

x y
5.0 0.4
5.5 0.2
6.0 2e-2
6.5 3e-5
7.0 2e-8
7.5 1e-10

Associated sub seismogenic:

x y
5.0 0.2
5.5 0.1
6.0 3e-2
6.5 7e-5
7.0 4e-8
7.5 6e-11

These would be written to the file as:

5 6 5.0 5.5 6.0 6.5 7.0 7.5 6 0.5 0.1 1e-2 3e-5 1e-8 1e-11 0 6 0.4 0.2 2e-2 3e-5 2e-8 1e-10 6 0.2 0.1 3e-2 7e-5 4e-8 6e-11

In this example, the number of arrays ((2 * the number of grid nodes) + 1), 4-bit integer, is in italics, each array’s size (integer) in bold, and array data in plain text. The first array data is for the x values, then the node 1 unassociated, then the node 2 unassociated, and finally the node 2 associated.

MFD Binary File

Some mean (across multiple logic tree branches) solutions may contain Magnitude Frequency Distributions for each rupture. In this case, the rates.bin file will contain total rates and mags.bin will contain weighted average magnitudes. These MFDs can be used to more accurately represent the mean of multiple solutions instead of using the mean magnitude.

Additionally, solutions can optionally include subseismogenic magnitude frequency distributions for each fault subsection. These are not needed for most applications, but can be used instead of the “associated” MFDs provided in the gridded seismicity data files.

They are written as a series of double arrays, with x values and y values separated into individual arrays. For example, consider these two functions:

function 1:

x y
5.5 0.1
5.75 0.3
5.9 0.2

function 2:

x y
5.5 0.05
5.75 0.33
5.9 0.24
6.21 0.1

These would be written to the file as:

4 3 5.5 5.75 5.9 3 0.1 0.3 0.2 4 5.5 5.75 5.9 6.21 4 0.05 0.33 0.24 0.1

In this example, the number of arrays (2 * the number of functions, integer) is in italics, each array’s size (integer) in bold, and array data (both x and y value arrays, double values) are in plain text.

Compound Fault System Solution Files

Compound Fault System Solution files are single zip files which contain all data for solutions for multiple UCERF3 Logic Tree Branches. This format takes advantage of the fact that many contain duplicate information, so that file is only written once. For example, rakes only depend on the Fault Model and Deformation Model (they are not, for example, dependent on the Spatial Seismicity Kernel). So one ‘rakes.bin’ file is stored for each combination of FM/DM, for example, “FM3_1_ZENGBB_rakes.bin’. The ‘rates.bin’ files, however, are unique to each logic tree branch and one is present for each branch. See the table below for a mapping the logic tree branch choices that each file type depends on.

File Name Logic Tree Branch Levels
close_sections.bin FM
cluster_rups.bin FM
cluster_sects.bin FM
fault_sections.xml FM, DM
info.txt ALL
mags.bin FM, DM, Scale
rakes.bin FM, DM
rates.bin ALL
rup_areas.bin FM, DM
rup_lengths.bin FM
rup_avg_slips.bin FM, DM, Scale
rup_sec_slip_type.txt N/A
rup_sections.bin FM
sect_areas.bin FM, DM
sect_slips.bin ALL BUT Dsr
sect_slips_std_dev.bin ALL BUT Dsr
inv_rup_set_metadata.xml ALL
inv_sol_metadata.xml ALL
grid_sources.xml ALL (old xml format)
grid_sources_reg.xml NONE (new binary format)
grid_sources.bin ALL (new binary format)
rup_mfds.bin ALL

Copyright ©2022 University of Southern California. All rights reserved. License—Disclaimer

This website is generated automatically from the OpenSHA wiki on GitHub, and is powered by GitHub Pages, Jekyll, and this GitHub Action.