resistics_readers.miniseed.mseed module¶
Reformatting for miniseed data
Miniseed data is potentially multi sampling frequency. Resistics does not support multi sampling frequency data files, therefore, miniseed data files need to be reformatted before they can be used.
Warning
This an initial implementation and there are likely to be numerous scenarios where this breaks. It is suggested to compare this with the output from miniseed to ascii thoroughly before using. https://github.com/iris-edu/mseed2ascii
Contributors are welcomed and ecouraged to improve the code and particularly add tests to ensure that the reformatting is occuring as expected.
- exception resistics_readers.miniseed.mseed.NoDataInInterval(first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp)[source]¶
Bases:
Exception
- resistics_readers.miniseed.mseed.get_miniseed_stream(data_path: pathlib.Path)[source]¶
Get the miniseed file stream
- resistics_readers.miniseed.mseed.get_streams(data_paths: List[pathlib.Path]) → Dict[pathlib.Path, obspy.core.stream.Stream][source]¶
Get the stream object for each data_path
- Parameters
data_paths (List[Path]) – The data paths
- Returns
The stream objects
- Return type
Dict[Path, Stream]
- resistics_readers.miniseed.mseed.get_table(streams: Dict[pathlib.Path, obspy.core.stream.Stream], trace_ids: List[str]) → pandas.core.frame.DataFrame[source]¶
Get table with start and ends for each trace of interest in each file
The table additionally contains the trace index for each trace for every file
- Parameters
streams (Dict[Path, Stream]) – Dictionary of file paths to streams
trace_ids (List[str]) – The ids of the traces that are of interest
- Returns
The data table
- Return type
pd.DataFrame
- resistics_readers.miniseed.mseed.get_first_last_times(table: pandas.core.frame.DataFrame) → Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp][source]¶
Get the minimum first time and maximum last time for the data
Each trace may have different date ranges in the miniseed files. This function calculates the first and last times where data is present for each requested trace.
- Parameters
table (pd.DataFrame) – The information table with the details about trace duration in each data file
- Returns
The first and last time
- Return type
Tuple[pd.Timestamp, pd.Timestamp]
- resistics_readers.miniseed.mseed.get_streams_to_read(trace_id: str, table: pandas.core.frame.DataFrame, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp) → pandas.core.frame.DataFrame[source]¶
Get the streams to read and the time intervals to read for each stream
Note that this finds time intervals to cover from_time to to_time inclusive
- Parameters
trace_id (str) – The trace id
table (pd.DataFrame) – The table with details about date ranges covered by each miniseed file
from_time (pd.Timestamp) – The time to get data from
to_time (pd.Timestamp) – The time to get data to
- Returns
A row for each data file to read and the time range to read from it
- Return type
pd.DataFrame
- resistics_readers.miniseed.mseed.get_stream_data(dt: pandas._libs.tslibs.timedeltas.Timedelta, stream: obspy.core.stream.Stream, trace_index: int, read_from: pandas._libs.tslibs.timestamps.Timestamp, read_to: pandas._libs.tslibs.timestamps.Timestamp) → numpy.ndarray[source]¶
Get data for a single trace from a stream
- Parameters
dt (pd.Timedelta) – The sampling rate
stream (Stream) – The miniseed file stream
trace_index (int) – The index of the trace
read_from (pd.Timestamp) – The time to read from
read_to (pd.Timestamp) – The time to read to
- Returns
The trace data from the stream
- Return type
np.ndarray
- Raises
ValueError – If the number of expected samples does not give an integer. This is currently a safety first approach until more testing is done
ValueError – If the number of samples expected != the number of samples returned by the trace in the time interval
- resistics_readers.miniseed.mseed.get_trace_data(fs: float, streams: Dict[pathlib.Path, obspy.core.stream.Stream], streams_to_read: Dict[str, Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp]], from_time: pandas._libs.tslibs.timestamps.Timestamp, n_samples: int) → numpy.ndarray[source]¶
Get data for a single trace beginning at from_time and for n_samples
- Parameters
fs (float) – The sampling frequency
streams (Dict[Path, Stream]) – The streams
streams_to_read (Dict[str, Tuple[pd.Timestamp, pd.Timestamp]]) – The streams to read for this trace and time interval
from_time (pd.Timestamp) – The time to get the data from
n_samples (int) – The number of samples to get
- Returns
The data
- Return type
np.ndarray
- Raises
ValueError – If converting read_from date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour
ValueError – If converting read_to date to samples does not give an integer. This is a safety first approach but problems could be encountered at very high sampling frequencies. In this case, much more testing needs to be done about expected behaviour
- resistics_readers.miniseed.mseed.get_time_data(fs: float, id_map: Dict[str, str], streams: Dict[pathlib.Path, obspy.core.stream.Stream], table: pandas.core.frame.DataFrame, first_time: pandas._libs.tslibs.timestamps.Timestamp, last_time: pandas._libs.tslibs.timestamps.Timestamp, from_time: pandas._libs.tslibs.timestamps.Timestamp, to_time: pandas._libs.tslibs.timestamps.Timestamp) → resistics.time.TimeData[source]¶
Get time data covering from_time to to_time
- Parameters
fs (float) – The sampling frequency
id_map (Dict[str, str]) – The map from trace id to channel
streams (Dict[Path, Stream]) – The streams
table (pd.DataFrame) – The table with information about trace ranges for each file
first_time (pd.Timestamp) – The common first_time for all traces and streams
last_time (pd.Timestamp) – The common last_time for all traces and streams
from_time (pd.Timestamp) – The from time for this interval of data
to_time (pd.Timestamp) – The to time for this intervel of data
- Returns
TimeData
- Return type
TimeData
- Raises
NoDataInInterval – If there is no trace data in the interval from_time and to_time
ValueError – If the number of samples in the interval is not an integer. This is a safety first approach for now that could fail at very high sampling frequencies, in which case much more thorough testing would be better.
- resistics_readers.miniseed.mseed.get_processed_data(time_data: resistics.time.TimeData, processors: List[resistics.time.TimeProcess]) → resistics.time.TimeData[source]¶
Process time data
- Parameters
time_data (TimeData) – TimeData to process
processors (List[TimeProcess]) – The processors to run
- Returns
The processed TimeData
- Return type
TimeData
- resistics_readers.miniseed.mseed.reformat(dir_path: pathlib.Path, fs: float, id_map: Dict[str, str], chunk_time: pandas._libs.tslibs.timedeltas.Timedelta, write_path: pathlib.Path, from_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, to_time: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, processors: Optional[List[resistics.time.TimeProcess]] = None) → None[source]¶
Reformat miniseed data into resistics numpy format in intervals
- Parameters
dir_path (Path) – The directory with the miniseed files
fs (float) – The sampling frequencies being extracted
id_map (Dict[str, str]) – Map from trace ids to be extracted to channel names
chunk_time (pd.Timedelta) – The intervals to extract the data in, for example 1H, 12H, 1D
write_path (Path) – The path to write out the TimeData to
from_time (Optional[pd.Timestamp], optional) – Optionally provide a from time, by default None. If None, the from time will be the earliest timestamp shared by all traces that are requested to be reformatted
to_time (Optional[pd.Timestamp], optional) – Optionally provide a to time, by default None. If None, the last time will be the earliest timestamp shared by all traces that are requested to be reformatted
processors (Optional[List[TimeProcess]], optional) – Any processors to run, by default None. For example resampling of data.