resistics_readers.miniseed.mseed module¶

Reformatting for miniseed data

Miniseed data is potentially multi sampling frequency. Resistics does not support multi sampling frequency data files, therefore, miniseed data files need to be reformatted before they can be used.

Warning

This an initial implementation and there are likely to be numerous scenarios where this breaks. It is suggested to compare this with the output from miniseed to ascii thoroughly before using. https://github.com/iris-edu/mseed2ascii

Contributors are welcomed and ecouraged to improve the code and particularly add tests to ensure that the reformatting is occuring as expected.

exception resistics_readers.miniseed.mseed.NoDataInInterval(first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp)[source]¶: Bases: Exception

resistics_readers.miniseed.mseed.get_miniseed_stream(data_path: pathlib.Path)[source]¶: Get the miniseed file stream

resistics_readers.miniseed.mseed.get_streams(data_paths: List[pathlib.Path]) → Dict[pathlib.Path, obspy.core.stream.Stream][source]¶

Get the stream object for each data_path

Parameters: data_paths (List[Path]) – The data paths
Returns: The stream objects
Return type: Dict[Path, Stream]

resistics_readers.miniseed.mseed.get_table(streams: Dict[pathlib.Path, obspy.core.stream.Stream], trace_ids: List[str]) → pandas.core.frame.DataFrame[source]¶

Get table with start and ends for each trace of interest in each file

The table additionally contains the trace index for each trace for every file

Parameters

streams (Dict[Path, Stream]) – Dictionary of file paths to streams
trace_ids (List[str]) – The ids of the traces that are of interest

Returns

The data table

Return type

pd.DataFrame

resistics_readers.miniseed.mseed.get_first_last_times(table: pandas.core.frame.DataFrame) → Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp][source]¶

Get the minimum first time and maximum last time for the data

Each trace may have different date ranges in the miniseed files. This function calculates the first and last times where data is present for each requested trace.

Parameters: table (pd.DataFrame) – The information table with the details about trace duration in each data file
Returns: The first and last time
Return type: Tuple[pd.Timestamp, pd.Timestamp]

resistics_readers.miniseed.mseed.get_streams_to_read(trace_id: str, table: pandas.core.frame.DataFrame, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp) → pandas.core.frame.DataFrame[source]¶

Get the streams to read and the time intervals to read for each stream

Note that this finds time intervals to cover from_time to to_time inclusive

Parameters

trace_id (str) – The trace id
table (pd.DataFrame) – The table with details about date ranges covered by each miniseed file
from_time (pd.Timestamp) – The time to get data from
to_time (pd.Timestamp) – The time to get data to

Returns

A row for each data file to read and the time range to read from it

Return type

pd.DataFrame

resistics_readers.miniseed.mseed.get_stream_data(dt: pandas._libs.tslibs.timedeltas.Timedelta, stream: obspy.core.stream.Stream, trace_index: int, read_from: pandas._libs.tslibs.timestamps.Timestamp, read_to: pandas._libs.tslibs.timestamps.Timestamp) → numpy.ndarray[source]¶

Get data for a single trace from a stream

Parameters

dt (pd.Timedelta) – The sampling rate
stream (Stream) – The miniseed file stream
trace_index (int) – The index of the trace
read_from (pd.Timestamp) – The time to read from
read_to (pd.Timestamp) – The time to read to

Returns

The trace data from the stream

Return type

np.ndarray

Raises

ValueError – If the number of expected samples does not give an integer. This is currently a safety first approach until more testing is done
ValueError – If the number of samples expected != the number of samples returned by the trace in the time interval

resistics_readers.miniseed.mseed.get_trace_data(fs: float, streams: Dict[pathlib.Path, obspy.core.stream.Stream], streams_to_read: Dict[str, Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp]], from_time: pandas._libs.tslibs.timestamps.Timestamp, n_samples: int) → numpy.ndarray[source]¶

Get data for a single trace beginning at from_time and for n_samples

Parameters

fs (float) – The sampling frequency
streams (Dict[Path, Stream]) – The streams
streams_to_read (Dict[str, Tuple[pd.Timestamp, pd.Timestamp]]) – The streams to read for this trace and time interval
from_time (pd.Timestamp) – The time to get the data from
n_samples (int) – The number of samples to get

Returns

The data

Return type

np.ndarray

Raises

ValueError – If converting read_from date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour
ValueError – If converting read_to date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour

resistics_readers.miniseed.mseed.get_time_data(fs: float, id_map: Dict[str, str], streams: Dict[pathlib.Path, obspy.core.stream.Stream], table: pandas.core.frame.DataFrame, first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp) → resistics.time.TimeData[source]¶

Get time data covering from_time to to_time

Parameters

fs (float) – The sampling frequency
id_map (Dict[str, str]) – The map from trace id to channel
streams (Dict[Path, Stream]) – The streams
table (pd.DataFrame) – The table with information about trace ranges for each file
first_time (pd.Timestamp) – The common first_time for all traces and streams
last_time (pd.Timestamp) – The common last_time for all traces and streams
from_time (pd.Timestamp) – The from time for this interval of data
to_time (pd.Timestamp) – The to time for this intervel of data

Returns

TimeData

Return type

TimeData

Raises

NoDataInInterval – If there is no trace data in the interval from_time and to_time
ValueError – If the number of samples in the interval is not an integer. This is a safety first approach for now that could fail at very high sampling frequencies, in which case much more thorough testing would be better.

resistics_readers.miniseed.mseed.get_processed_data(time_data: resistics.time.TimeData, processors: List[resistics.time.TimeProcess]) → resistics.time.TimeData[source]¶

Process time data

Parameters

time_data (TimeData) – TimeData to process
processors (List[TimeProcess]) – The processors to run

Returns

The processed TimeData

Return type

TimeData

resistics_readers.miniseed.mseed.reformat(dir_path: pathlib.Path, fs: float, id_map: Dict[str, str], chunk_time: pandas._libs.tslibs.timedeltas.Timedelta, write_path: pathlib.Path, from_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, to_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, processors: Optional[List[resistics.time.TimeProcess]] = None) → None[source]¶

Reformat miniseed data into resistics numpy format in intervals

Parameters

dir_path (Path) – The directory with the miniseed files
fs (float) – The sampling frequencies being extracted
id_map (Dict[str, str]) – Map from trace ids to be extracted to channel names
chunk_time (pd.Timedelta) – The intervals to extract the data in, for example 1H, 12H, 1D
write_path (Path) – The path to write out the TimeData to
from_time (Optional[pd.Timestamp], optional) – Optionally provide a from time, by default None. If None, the from time will be the earliest timestamp shared by all traces that are requested to be reformatted
to_time (Optional[pd.Timestamp], optional) – Optionally provide a to time, by default None. If None, the last time will be the earliest timestamp shared by all traces that are requested to be reformatted
processors (Optional[List[TimeProcess]], optional) – Any processors to run, by default None. For example resampling of data.