ReadData

class SignalCreation.Utils.ReadData.ReadData(path_file=None, none_option=False, fill_value=None, size_min=None, size_max=None)

# Description of the class

Blueprints for any class gathering data based on reading one or several files. The aim of the class is to provide a global skeleton for each class gathering data by organized them around objects from xarray and pint packages. The class also aims at gathering a large collection of methods for reading files, manipulating pint object and manipulating xarray datasets. The motivation behind this is twofold: first to standardize the handling of complex objects in the whole program and thus ensure a correct use of them (mistakes are harder to make, some behaviour - such has giving a unit to any data - can be forced, …), and secondly to make the use of complex objects way easier (user friendly xarray-related methods can enable a person with no knowledge of the xarray package to start manipulating the code quickly)

# Argument

path_file: str or list of str

If the data is stored in a single file, then path_file should be a string representing the path to this file if the data is stored in several files, then path_file should be the list of strings representing the paths to those files

# Main attributes

_unit_registry: UnitRegistry (class inherited from pint.UnitRegistry)

Collection of every available units. This attribute is set to be equal to a specific instance of UnitRegistry, so that every unit from any sub-class of ReadData is set from the same instance of UnitRegistry, avoiding runtime errors.

_file: Python file object or list of Python file object

The file from the input path ready to be read using Python native methods

_data: xr.Dataset

The data set storing the data in the file, their coordinates, attributes (=metadata)… The values in the dataset are stored in the form of pint Quantity objects to always associate values with their units.

# Available methods

The class contains three types of methods:

  • Methods to override to make the class actually useful

  • Methods facilitating the read of the input file:

    Those methods are usually based on native python methods, and help formatting instantly the content of a file to make reading methods easier to make

  • Mandatory methods to fill the data array

    Those methods are the only methods that should be used to modify the data set _data. They take simple arguments and use xarray methods to fill _data. Making ourselves use those methods to modify _data ensure that strict rules are respected as well as making xarray even more user friendly.

  • Methods to extract information

    Generic methods to make the data stored in _data accessible and easy to manipulate. ReadData sub-classes should have get_information methods (where ‘information’ is replaced by the actual name of the information: pressure, temperature, etc.) making a call to the methods stored in this section

  • Methods used to deal with the units

    Methods used to make the pint package compatible with this new structure

  • Utilitarian methods

    Collection of various methods useful for the class

# How to use it - Good practices

ReadData is an abstract class that can not be used as such. Sub-classes inheriting from ReadData must be created to render it useful.

Those sub-classes must have at least two methods to make them working:
  • A method overriding _fill_data_set that fill the xarray dataset _data from the data stored in the file

used to initialize the instance _ One or several ‘get_information’ methods to access the data stored in _data

The _fill_data_set should always make modification on _data using methods from the ‘Mandatory methods to fill the data array’.

Coordinate must always be created using ReadData’s specific method to avoid error resulting from a misuses of the coordinate unit.

Parameters:
  • path_file (str | list | None)

  • none_option (bool)

  • fill_value (str | list | None)

  • size_min (int | float | None)

  • size_max (int | float | None)

attr_to_pint(sub_data)

Turn a string composed by value and unit into pint object

Parameters:

sub_data (DataArray | Dataset)

Return type:

None

static attribute_pint_to_str(data)

Turn every pint attributes from data (whether they are attribute of the set or of an array in the set) into str

Parameters:

data (DataArray | Dataset)

Return type:

None

attribute_str_to_pint(data)

Turn every str attributes from data (whether they are attribute of the set or of an array in the set) into pint

Parameters:

data (DataArray | Dataset)

Return type:

None

convert_from_to(values, original_unit, new_unit)

Convert a unitless value or a list of unitless values from original_unit to new_unit

static create_coordinates(coord_name, coord_values, unit)

The mandatory methods to create coordinates. Returns a data array containing the values, themselves as coordinates and an attribute storing their uni

Parameters:
  • coord_name (str) – str The name of the physical quantity of the coordinate (pressure, temperature, etc.)

  • coord_values (ndarray | list) – list or np.array The values taken by the coordinate

  • unit (str) – str The unit of the coordinate

Returns:

xr.DataArray A data array representing the coordinate

Return type:

DataArray

create_netcdf(path)

create netCDF file to path location (need to specify fileName in path)

Parameters:

path (str)

data()

Return a copy of the dateset _data

Return type:

Dataset

static first_of_list(in_list)

Return the first element of a list if it exists, else None

Parameters:

in_list (list | ndarray)

go_to_line_with_key_word(key_words, i=None)

Go to the first line in the file containing specific key words and return that line

Parameters:
  • key_words (str) – str The keywords flagging the line

  • i – int, Optional The number of the file in which the line must be found, to use only if the class uses several files

Returns:

str The first line containing the key words

Return type:

str

go_to_line_without_key_word(key_words, i=None)

Go the first line in the file missing specific keywords and return that line

Parameters:
  • key_words (str) – str The keywords flagging the lines to skip

  • i – int, Optional The number of the file in which the lines must be skipped, to use only if the class uses several files

Returns:

str The first line missing the keywords

Return type:

str

static modify_date(date, day_offset, date_format='%d/%m/%Y')

Given a date in date_format, return the date (in same format) increased by day_offset

Parameters:
  • date (str)

  • day_offset (int)

static remove_useless_space(line)

Remove potential space at the beginning or the end of the line

Parameters:

line (str)

Return type:

str

str_to_pint_unit(unit_name)

Convert a string into a pint unit

Parameters:

unit_name (str)

Return type:

Unit

static surrounding_values(in_value, in_list)

Return the two values from the in_list that surround the in_value

Parameters:
  • in_value (float | int)

  • in_list (list | ndarray)

Return type:

tuple[float, float]

static to_netcdf(data, path)

Save the input data set or data array as a netcdf file

Parameters:
  • data (DataArray | Dataset)

  • path (str)

Return type:

None

update_data(in_data)

Set in_data as the new data array of the object

Parameters:

in_data (DataArray | Dataset)

Return type:

None

SignalCreation.Utils.ReadData.ReadData.attr_to_pint(self, sub_data)

Turn a string composed by value and unit into pint object

Parameters:

sub_data (DataArray | Dataset)

Return type:

None