Data from: Knowledge graphs for seismic data and metadata
Data files
Sep 19, 2023 version files 96.83 MB
-
example1.tar.gz
-
example2.tar.gz
-
example3.tar.gz
-
README.md
Abstract
The increasing scale and diversity of seismic data, and the growing role of big data in seismology, has raised interest in methods to make data exploration more accessible. This paper presents the use of knowledge graphs (KGs) for representing seismic data and metadata to improve data exploration and analysis, focusing on usability, flexibility, and extensibility. Using constraints derived from domain knowledge in seismology, we define semantic models of seismic station and event information used to construct the KGs. Our approach utilizes the capability of KGs to integrate data across many sources and diverse schema formats. We use schema-diverse, real-world seismic data to construct KGs with millions of nodes, and illustrate potential applications with three big-data examples. Our findings demonstrate the potential of KGs to enhance the efficiency and efficacy of seismological workflows in research and beyond, indicating a promising interdisciplinary future for this technology.
README: README
List of data provided:
- example1.tar.gz
- example1-events-gcmt.ndk
- example1-events-gcmt.ndk.json
example1-events-ncedc.csv- example1-events-usgs.csv
- example2.tar.gz
- example2-events-gcmt.ndk
- example2-events-gcmt.ndk.json
- example2-stations-stationXML.xml
- example2-stations-stationXML.json
- example3.tar.gz
- example3-events-gcmt.ndk
- example3-events-gcmt.ndk.json
- example3-events-usgs.csv
- example3-stations-stationXML.xml
- example3-stations-stationXML.json
Individual descriptions
In: example1.tar.gz
example1-events-gcmt.ndk
Description:
Event data used in example 1, downloaded from the Global Centroid Moment Tensor Project (https://www.globalcmt.org/). A complete description of the NDK file format can be found at the following link.
example1-events-gcmt.ndk.json
Description:
Data from file example1-events-gcmt.ndk that has been converted into JSON format.
example1-events-ncedc.csv
Updated description (14/09/2023):
This data is no longer included in this repository. Please find this data in the following Zenodo open data repository DOI:10.5281/zenodo.8346843.
Original description (31/08/2023):
Event data used in example 1, downloaded from the NCEDC Northern California Earthquake Catalog Search (https://ncedc.org/ncedc/catalog-search.html). A complete description of the file format and columns can be found at the following link.
example1-events-usgs.csv
Description:
Event data used in example 1, downloaded from the USGS + ANSS Comprehensive Earthquake Catalog (https://earthquake.usgs.gov/data/comcat/). A complete description of the file format and columns can be found at the following link.
In: example2.tar.gz
example2-events-gcmt.ndk
Description:
Event data used in example 2, downloaded from the Global Centroid Moment Tensor Project (https://www.globalcmt.org/). A complete description of the NDK file format can be found at the following link.
example2-events-gcmt.ndk.json
Description:
Data from file example2-events-gcmt.ndk that has been converted into JSON format.
example2-stations-stationXML.xml
Description:
Station data used in example 2, downloaded from the IRIS Data Management Center (https://service.iris.edu/fdsnws/station/1/). A complete description of the StationXML file format can be found at the following link.
example2-stations-stationXML.json
Description:
Data from file example2-stations-stationXML.xml that has been converted into JSON format.
In: example3.tar.gz
example3-events-gcmt.ndk
Description:
Event data used in example 3, downloaded from the Global Centroid Moment Tensor Project (https://www.globalcmt.org/). A complete description of the NDK file format can be found at the following link.
example3-events-gcmt.ndk.json
Description:
Data from file example3-events-gcmt.ndk that has been converted into JSON format.
example3-events-usgs.csv
Description:
Event data used in example 3, downloaded from the USGS + ANSS Comprehensive Earthquake Catalog (https://earthquake.usgs.gov/data/comcat/). A complete description of the file format and columns can be found at the following link.
example3-stations-stationXML.xml
Description:
Station data used in example 3, downloaded from the IRIS Data Management Center (https://service.iris.edu/fdsnws/station/1/). A complete description of the StationXML file format can be found at the following link.
example3-stations-stationXML.json
Description:
Data from file example3-stations-stationXML.xml that has been converted into JSON format.
Description of variables
Included here is a short description of the relevant variables in the data files. Some of the cells and fields in these data files are blank or empty. These have been left blank as to not interfere with the data processing software (DOI:10.5281/zenodo.8304009) described in the corresponding paper, Knowledge graphs for seismic data and metadata, Davis and Hunt (Forthcoming 2023)
Description of StationXML files
Attributes of the Network container are self descriptive string identifiers and date types. Sub-elements of the Network container are also either self descriptive string identifiers or self described fields with decimal values indicating a count, for example.
Attributes of the Station container are self descriptive string identifiers and date types. Sub-elements of the Station container are also either self descriptive string identifiers or self described fields with decimal values indicating a count, for example. There are the following exceptions: Latitude, given in degrees; Longitude, given in degrees; Elevation, elevation of local ground surface level at station, in meters; WaterLevel, elevation of the water surface in meters for underwater sites, where 0 is mean sea level.
Attributes of the Channel container are self descriptive string identifiers and date types. Sub-elements of the Channel container are also either self descriptive string identifiers or self described fields with decimal values indicating a count, for example. There are the following exceptions: Latitude, given in degrees; Longitude, given in degrees; Elevation, elevation of local ground surface level at station, in meters; Depth, depth of the sensor relative to the local ground surface level, in meters; Azimuth, azimuth of the component in degrees clockwise from geographic (true) north; Dip, dip of the component in degrees, positive is down from horizontal. For horizontal dip=0, for vertical upwards dip=-90 and for vertical downwards dip=+90; WaterLevel, elevation of the water surface in meters for underwater sites, where 0 is mean sea level; SampleRate, sample rate in samples per second; SampleRateRatio, sample rate expressed as number of samples in a number of seconds; ClockDrift, tolerance value, measured in seconds per sample; CalibrationUnits, units of calibration (e.g., V (for Volts) or A (for amps)).
Description of NDK files
The format is ASCII and uses five 80-character lines per earthquake, with the following format:
First line: Hypocenter line
[1-4] Hypocenter reference catalog (e.g., PDE for USGS location, ISC for ISC catalog, SWE for surface-wave location, [Ekstrom, BSSA, 2006])
[6-15] Date of reference event
[17-26] Time of reference event
[28-33] Latitude
[35-41] Longitude
[43-47] Depth
[49-55] Reported magnitudes, usually mb and MS
[57-80] Geographical location (24 characters)
Second line: CMT info (1)
[1-16] CMT event name. This string is a unique CMT-event identifier. Older events have 8-character names, current ones have 14-character names.
[18-61] Data used in the CMT inversion. Three data types may be used: Long-period body waves (B), Intermediate-period surface waves (S), and long-period mantle waves (M). For each data type, three values are given: the number of stations used, the number of components used, and the shortest period used.
[63-68] Type of source inverted for: "CMT: 0" - general moment tensor; "CMT: 1" - moment tensor with constraint of zero trace (standard); "CMT: 2" - double-couple source.
[70-80] Type and duration of moment-rate function assumed in the inversion. "TRIHD" indicates a triangular moment-rate function, "BOXHD" indicates a boxcar moment-rate function. The value given is half the duration of the moment-rate function.
Third line: CMT info (2)
[1-58] Centroid parameters determined in the inversion. Centroid time, given with respect to the reference time, centroid latitude, centroid longitude, and centroid depth. The value of each variable is followed by its estimated standard error.
[60-63] Type of depth. "FREE" indicates that the depth was a result of the inversion; "FIX " that the depth was fixed and not inverted for; "BDY " that the depth was fixed based on modeling of broad-band P waveforms.
[65-80] Timestamp. This 16-character string identifies the type of analysis that led to the given CMT results and, for recent events, the date and time of the analysis.
Fourth line: CMT info (3)
[1-2] The exponent for all following moment values. For example, if the exponent is given as 24, the moment values that follow, expressed in dyne-cm, should be multiplied by 10**24.
[3-80] The six moment-tensor elements: Mrr, Mtt, Mpp, Mrt, Mrp, Mtp, where r is up, t is south, and p is east. The value of each moment-tensor element is followed by its estimated standard error.
Fifth line: CMT info (4)
[1-3] Version code. This three-character string is used to track the version of the program that generates the "ndk" file.
[4-48] Moment tensor expressed in its principal-axis system: eigenvalue, plunge, and azimuth of the three eigenvectors. The eigenvalue should be multiplied by 10**(exponent) as given on line four.
[50-56] Scalar moment, to be multiplied by 10**(exponent) as given on line four.
[58-80] Strike, dip, and rake for first nodal plane of the best-double-couple mechanism, repeated for the second nodal plane.
Description of USGS csv files
The format is csv with the following columns and descriptions:
- time: Time when the event occurred.
- latitude: Decimal degrees latitude. Negative values for southern latitudes.
- longitude: Decimal degrees longitude. Negative values for western longitudes.
- depth: Depth of the event in kilometers.
- mag: The magnitude for the event.
- magType: The method or algorithm used to calculate the preferred magnitude for the event.
- nst: The total number of seismic stations used to determine earthquake location.
- gap: The largest azimuthal gap between azimuthally adjacent stations (in degrees).
- dmin: Horizontal distance from the epicenter to the nearest station (in degrees).
- rms: The root-mean-square (RMS) travel time residual, in sec, using all weights.
- net: The ID of a data contributor. Identifies the network considered to be the preferred source of information for this event.
- id: A unique identifier for the event. This is the current preferred id for the event.
- updated: Time when the event was most recently updated.
- place: Textual description of named geographic region near to the event. This may be a city name, or a Flinn-Engdahl Region name.
- type: Type of seismic event.
- horizontalError: Uncertainty of reported location of the event in kilometers.
- depthError: Uncertainty of reported depth of the event in kilometers.
- magError: Uncertainty of reported magnitude of the event. The estimated standard error of the magnitude.
- magNst: The total number of seismic stations used to calculate the magnitude for this earthquake.
- status: Indicates whether the event has been reviewed by a human.
- locationSource: The network that originally authored the reported location of this event.
- magSource: Network that originally authored the reported magnitude for this event.
Methods
The data here consists of, and was collected from:
- Station metadata, in StationXML format, acquired from IRIS DMC using the fdsnws-station webservice (https://service.iris.edu/fdsnws/station/1/).
- Earthquake event data, in NDK format, acquired from the Global Centroid-Moment Tensor (GCMT) catalog webservice (https://www.globalcmt.org) [1,2].
- Earthquake event data, in CSV format, acquired from the USGS earthquake catalog webservice (https://doi.org/10.5066/F7MS3QZH) [3].
The format of the data is described in the README. In addition, a complete description of the StationXML, NDK, and USGS file formats can be found at https://www.fdsn.org/xml/station/, https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained, and https://earthquake.usgs.gov/data/comcat/#event-terms, respectively.
Also provided are conversions from NDK and StationXML file formats into JSON format.
References:
[1] Dziewonski, A. M., Chou, T. A., & Woodhouse, J. H. (1981). Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4), 2825-2852.
[2] Ekström, G., Nettles, M., & Dziewoński, A. M. (2012). The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes. Physics of the Earth and Planetary Interiors, 200, 1-9.
[3] U.S. Geological Survey, Earthquake Hazards Program, 2017, Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products: Various, https://doi.org/10.5066/F7MS3QZH.
Usage notes
No special programs or software is reqired to open the data files included here.