ReadData
- class SignalCreation.Utils.ReadData.ReadData(path_file=None, none_option=False, fill_value=None, size_min=None, size_max=None)
# Description of the class
Blueprints for any class gathering data based on reading one or several files. The aim of the class is to provide a global skeleton for each class gathering data by organized them around objects from xarray and pint packages. The class also aims at gathering a large collection of methods for reading files, manipulating pint object and manipulating xarray datasets. The motivation behind this is twofold: first to standardize the handling of complex objects in the whole program and thus ensure a correct use of them (mistakes are harder to make, some behaviour - such has giving a unit to any data - can be forced, …), and secondly to make the use of complex objects way easier (user friendly xarray-related methods can enable a person with no knowledge of the xarray package to start manipulating the code quickly)
# Argument
- path_file: str or list of str
If the data is stored in a single file, then path_file should be a string representing the path to this file if the data is stored in several files, then path_file should be the list of strings representing the paths to those files
# Main attributes
- _unit_registry: UnitRegistry (class inherited from pint.UnitRegistry)
Collection of every available units. This attribute is set to be equal to a specific instance of UnitRegistry, so that every unit from any sub-class of ReadData is set from the same instance of UnitRegistry, avoiding runtime errors.
- _file: Python file object or list of Python file object
The file from the input path ready to be read using Python native methods
- _data: xr.Dataset
The data set storing the data in the file, their coordinates, attributes (=metadata)… The values in the dataset are stored in the form of pint Quantity objects to always associate values with their units.
# Available methods
The class contains three types of methods:
Methods to override to make the class actually useful
- Methods facilitating the read of the input file:
Those methods are usually based on native python methods, and help formatting instantly the content of a file to make reading methods easier to make
- Mandatory methods to fill the data array
Those methods are the only methods that should be used to modify the data set _data. They take simple arguments and use xarray methods to fill _data. Making ourselves use those methods to modify _data ensure that strict rules are respected as well as making xarray even more user friendly.
- Methods to extract information
Generic methods to make the data stored in _data accessible and easy to manipulate. ReadData sub-classes should have get_information methods (where ‘information’ is replaced by the actual name of the information: pressure, temperature, etc.) making a call to the methods stored in this section
- Methods used to deal with the units
Methods used to make the pint package compatible with this new structure
- Utilitarian methods
Collection of various methods useful for the class
# How to use it - Good practices
ReadData is an abstract class that can not be used as such. Sub-classes inheriting from ReadData must be created to render it useful.
- Those sub-classes must have at least two methods to make them working:
A method overriding _fill_data_set that fill the xarray dataset _data from the data stored in the file
used to initialize the instance _ One or several ‘get_information’ methods to access the data stored in _data
The _fill_data_set should always make modification on _data using methods from the ‘Mandatory methods to fill the data array’.
Coordinate must always be created using ReadData’s specific method to avoid error resulting from a misuses of the coordinate unit.
- Parameters:
path_file (str | list | None)
none_option (bool)
fill_value (str | list | None)
size_min (int | float | None)
size_max (int | float | None)
- attr_to_pint(sub_data)
Turn a string composed by value and unit into pint object
- Parameters:
sub_data (DataArray | Dataset)
- Return type:
None
- static attribute_pint_to_str(data)
Turn every pint attributes from data (whether they are attribute of the set or of an array in the set) into str
- Parameters:
data (DataArray | Dataset)
- Return type:
None
- attribute_str_to_pint(data)
Turn every str attributes from data (whether they are attribute of the set or of an array in the set) into pint
- Parameters:
data (DataArray | Dataset)
- Return type:
None
- convert_from_to(values, original_unit, new_unit)
Convert a unitless value or a list of unitless values from original_unit to new_unit
- static create_coordinates(coord_name, coord_values, unit)
The mandatory methods to create coordinates. Returns a data array containing the values, themselves as coordinates and an attribute storing their uni
- Parameters:
coord_name (str) – str The name of the physical quantity of the coordinate (pressure, temperature, etc.)
coord_values (ndarray | list) – list or np.array The values taken by the coordinate
unit (str) – str The unit of the coordinate
- Returns:
xr.DataArray A data array representing the coordinate
- Return type:
DataArray
- create_netcdf(path)
create netCDF file to path location (need to specify fileName in path)
- Parameters:
path (str)
- data()
Return a copy of the dateset _data
- Return type:
Dataset
- static first_of_list(in_list)
Return the first element of a list if it exists, else None
- Parameters:
in_list (list | ndarray)
- go_to_line_with_key_word(key_words, i=None)
Go to the first line in the file containing specific key words and return that line
- Parameters:
key_words (str) – str The keywords flagging the line
i – int, Optional The number of the file in which the line must be found, to use only if the class uses several files
- Returns:
str The first line containing the key words
- Return type:
str
- go_to_line_without_key_word(key_words, i=None)
Go the first line in the file missing specific keywords and return that line
- Parameters:
key_words (str) – str The keywords flagging the lines to skip
i – int, Optional The number of the file in which the lines must be skipped, to use only if the class uses several files
- Returns:
str The first line missing the keywords
- Return type:
str
- static modify_date(date, day_offset, date_format='%d/%m/%Y')
Given a date in date_format, return the date (in same format) increased by day_offset
- Parameters:
date (str)
day_offset (int)
- static remove_useless_space(line)
Remove potential space at the beginning or the end of the line
- Parameters:
line (str)
- Return type:
str
- str_to_pint_unit(unit_name)
Convert a string into a pint unit
- Parameters:
unit_name (str)
- Return type:
Unit
- static surrounding_values(in_value, in_list)
Return the two values from the in_list that surround the in_value
- Parameters:
in_value (float | int)
in_list (list | ndarray)
- Return type:
tuple[float, float]
- static to_netcdf(data, path)
Save the input data set or data array as a netcdf file
- Parameters:
data (DataArray | Dataset)
path (str)
- Return type:
None
- update_data(in_data)
Set in_data as the new data array of the object
- Parameters:
in_data (DataArray | Dataset)
- Return type:
None
- SignalCreation.Utils.ReadData.ReadData.attr_to_pint(self, sub_data)
Turn a string composed by value and unit into pint object
- Parameters:
sub_data (DataArray | Dataset)
- Return type:
None