resistics_readers.miniseed.mseed module

Reformatting for miniseed data

Miniseed data is potentially multi sampling frequency. Resistics does not support multi sampling frequency data files, therefore, miniseed data files need to be reformatted before they can be used.

Warning

This an initial implementation and there are likely to be numerous scenarios where this breaks. It is suggested to compare this with the output from miniseed to ascii thoroughly before using. https://github.com/iris-edu/mseed2ascii

Contributors are welcomed and ecouraged to improve the code and particularly add tests to ensure that the reformatting is occuring as expected.

exception resistics_readers.miniseed.mseed.NoDataInInterval(first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp)[source]

Bases: Exception

resistics_readers.miniseed.mseed.get_miniseed_stream(data_path: pathlib.Path)[source]

Get the miniseed file stream

resistics_readers.miniseed.mseed.get_streams(data_paths: List[pathlib.Path])Dict[pathlib.Path, obspy.core.stream.Stream][source]

Get the stream object for each data_path

Parameters

data_paths (List[Path]) – The data paths

Returns

The stream objects

Return type

Dict[Path, Stream]

resistics_readers.miniseed.mseed.get_table(streams: Dict[pathlib.Path, obspy.core.stream.Stream], trace_ids: List[str])pandas.core.frame.DataFrame[source]

Get table with start and ends for each trace of interest in each file

The table additionally contains the trace index for each trace for every file

Parameters
  • streams (Dict[Path, Stream]) – Dictionary of file paths to streams

  • trace_ids (List[str]) – The ids of the traces that are of interest

Returns

The data table

Return type

pd.DataFrame

resistics_readers.miniseed.mseed.get_first_last_times(table: pandas.core.frame.DataFrame)Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp][source]

Get the minimum first time and maximum last time for the data

Each trace may have different date ranges in the miniseed files. This function calculates the first and last times where data is present for each requested trace.

Parameters

table (pd.DataFrame) – The information table with the details about trace duration in each data file

Returns

The first and last time

Return type

Tuple[pd.Timestamp, pd.Timestamp]

resistics_readers.miniseed.mseed.get_streams_to_read(trace_id: str, table: pandas.core.frame.DataFrame, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp)pandas.core.frame.DataFrame[source]

Get the streams to read and the time intervals to read for each stream

Note that this finds time intervals to cover from_time to to_time inclusive

Parameters
  • trace_id (str) – The trace id

  • table (pd.DataFrame) – The table with details about date ranges covered by each miniseed file

  • from_time (pd.Timestamp) – The time to get data from

  • to_time (pd.Timestamp) – The time to get data to

Returns

A row for each data file to read and the time range to read from it

Return type

pd.DataFrame

resistics_readers.miniseed.mseed.get_stream_data(dt: pandas._libs.tslibs.timedeltas.Timedelta, stream: obspy.core.stream.Stream, trace_index: int, read_from: pandas._libs.tslibs.timestamps.Timestamp, read_to: pandas._libs.tslibs.timestamps.Timestamp)numpy.ndarray[source]

Get data for a single trace from a stream

Parameters
  • dt (pd.Timedelta) – The sampling rate

  • stream (Stream) – The miniseed file stream

  • trace_index (int) – The index of the trace

  • read_from (pd.Timestamp) – The time to read from

  • read_to (pd.Timestamp) – The time to read to

Returns

The trace data from the stream

Return type

np.ndarray

Raises
  • ValueError – If the number of expected samples does not give an integer. This is currently a safety first approach until more testing is done

  • ValueError – If the number of samples expected != the number of samples returned by the trace in the time interval

resistics_readers.miniseed.mseed.get_trace_data(fs: float, streams: Dict[pathlib.Path, obspy.core.stream.Stream], streams_to_read: Dict[str, Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp]], from_time: pandas._libs.tslibs.timestamps.Timestamp, n_samples: int)numpy.ndarray[source]

Get data for a single trace beginning at from_time and for n_samples

Parameters
  • fs (float) – The sampling frequency

  • streams (Dict[Path, Stream]) – The streams

  • streams_to_read (Dict[str, Tuple[pd.Timestamp, pd.Timestamp]]) – The streams to read for this trace and time interval

  • from_time (pd.Timestamp) – The time to get the data from

  • n_samples (int) – The number of samples to get

Returns

The data

Return type

np.ndarray

Raises
  • ValueError – If converting read_from date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour

  • ValueError – If converting read_to date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour

resistics_readers.miniseed.mseed.get_time_data(fs: float, id_map: Dict[str, str], streams: Dict[pathlib.Path, obspy.core.stream.Stream], table: pandas.core.frame.DataFrame, first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp)resistics.time.TimeData[source]

Get time data covering from_time to to_time

Parameters
  • fs (float) – The sampling frequency

  • id_map (Dict[str, str]) – The map from trace id to channel

  • streams (Dict[Path, Stream]) – The streams

  • table (pd.DataFrame) – The table with information about trace ranges for each file

  • first_time (pd.Timestamp) – The common first_time for all traces and streams

  • last_time (pd.Timestamp) – The common last_time for all traces and streams

  • from_time (pd.Timestamp) – The from time for this interval of data

  • to_time (pd.Timestamp) – The to time for this intervel of data

Returns

TimeData

Return type

TimeData

Raises
  • NoDataInInterval – If there is no trace data in the interval from_time and to_time

  • ValueError – If the number of samples in the interval is not an integer. This is a safety first approach for now that could fail at very high sampling frequencies, in which case much more thorough testing would be better.

resistics_readers.miniseed.mseed.get_processed_data(time_data: resistics.time.TimeData, processors: List[resistics.time.TimeProcess])resistics.time.TimeData[source]

Process time data

Parameters
  • time_data (TimeData) – TimeData to process

  • processors (List[TimeProcess]) – The processors to run

Returns

The processed TimeData

Return type

TimeData

resistics_readers.miniseed.mseed.reformat(dir_path: pathlib.Path, fs: float, id_map: Dict[str, str], chunk_time: pandas._libs.tslibs.timedeltas.Timedelta, write_path: pathlib.Path, from_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, to_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, processors: Optional[List[resistics.time.TimeProcess]] = None)None[source]

Reformat miniseed data into resistics numpy format in intervals

Parameters
  • dir_path (Path) – The directory with the miniseed files

  • fs (float) – The sampling frequencies being extracted

  • id_map (Dict[str, str]) – Map from trace ids to be extracted to channel names

  • chunk_time (pd.Timedelta) – The intervals to extract the data in, for example 1H, 12H, 1D

  • write_path (Path) – The path to write out the TimeData to

  • from_time (Optional[pd.Timestamp], optional) – Optionally provide a from time, by default None. If None, the from time will be the earliest timestamp shared by all traces that are requested to be reformatted

  • to_time (Optional[pd.Timestamp], optional) – Optionally provide a to time, by default None. If None, the last time will be the earliest timestamp shared by all traces that are requested to be reformatted

  • processors (Optional[List[TimeProcess]], optional) – Any processors to run, by default None. For example resampling of data.