BibDataManagement#

The python package bibdatamanagement (available on Pypi) provides the BibDataManagement class that contains the method needed to read, plot or merge bib files.

How to start#

  1. Install the package in your Python interpreter:

pip install bibdatamanagement
  1. Import the class in your Python script and initiate a BibDataManagement instance:

from bibdata_management import BibDataManagement

bib_file = 'your_path/your_file.bib'
bibdata = BibDataManagement(bib_file)
  1. Transform the .bib data in a DataFrame:

df_bib = bibdata.get_data(entry='YOUR_ENTRY', set_name='YOUR_SET')

When initialising a BibDataManagement instance, one can also add a .csv file that contains the default value for parameters description (short name, long name, description).

bibdata = BibDataManagement(bib_file, 'your_default_file.csv')

Reading#

get_data#

BibDataManagement.get_data(set_name=None, entry=None)#

Returns a pandas DataFrame containing all data in the .bib file or a subset based on the given entry and set names.

Parameters:
  • set_name (str, optional) – Name of the set to filter by.

  • entry (str, optional) – Name of the technology to filter by.

Returns:

DataFrame containing the filtered data, or all data if no filters are applied.

Return type:

pandas.DataFrame

See also

filter_by

Examples

>>> bibdata = BibDataManagement(bib_file)
>>> bib_df = bibdata.get_data(entry='WoodtoDiesel', set_name='first')
>>> bib_df
                                     sets technology_key        value     unit
cite_key             technology_name
peduzzi_biomass_2015 WoodtoDiesel      []            trl     7.000000        -
                     WoodtoDiesel      []         cmaint    35.810000  MCHF/GW
                     WoodtoDiesel      []           cinv  1955.000000  MCHF/GW
                     WoodtoDiesel      []             cp     1.000000        -
                     WoodtoDiesel      []        refsize     0.001000       GW
                     WoodtoDiesel      []            gwp     0.000000   kt/GWh
                     WoodtoDiesel      []       lifetime    15.000000        y
                     WoodtoDiesel      []           Elec    -0.032695        -

statistics#

BibDataManagement.statistics(df=None)#

Compute statistics (min, max, median, average, number of values) for each parameter.

Parameters:

df (pd.DataFrame, optional) – A frame coming from BibData. If None, the self dataframe from the .bib file is used.

Return type:

A dataframe indexed by entry and parameters with the statistics.

Examples

>>> bibdata = BibDataManagement(bib_file)
>>> stats_df = bibdata.statistics()
>>> stats_df
                                  min          max  ...  nvalues        values
(WoodtoDiesel, trl)          7.000000     7.000000  ...        1         [7.0]
(WoodtoDiesel, cmaint)      35.810000    35.810000  ...        1       [35.81]
(WoodtoDiesel, cinv)      1955.000000  1955.000000  ...        1      [1955.0]
(WoodtoDiesel, cp)           1.000000     1.000000  ...        1         [1.0]
(WoodtoDiesel, refsize)      0.001000     0.001000  ...        1       [0.001]
(WoodtoDiesel, gwp)          0.000000     0.000000  ...        1         [0.0]
(WoodtoDiesel, lifetime)    15.000000    15.000000  ...        1        [15.0]
(WoodtoDiesel, Elec)        -0.032695    -0.032695  ...        1  [-0.0326945]

filter_by#

static BibDataManagement.filter_by(df, column, criteria)#

Used to filter the given DataFrame by a given colum and on given value(s) in that column.

Parameters:
  • df (pd.DataFrame) – DataFrame that should be filtered.

  • column (str) – Name of the column to use for the filter.

  • criteria (str or list) – Value(s) that the column should have to pass the filter.

Return type:

A DataFrame where the rows not corresponding to the filter have been removed.

Raises:

ValueError – If the column name is not among the dataframe.

Warns:

ValueError – If the value given for the column are not valid ones. In that case the filter is not applied.

See also

filter_by_param, filter_by_entry, filter_by_set

Plotting#

param_histogram#

BibDataManagement.param_histogram(tech, parameter, filename=None, export_format='png', auto_open=True)#

Visualisation of the values found in .bib for a given technology parameter in a histogram, generated from plotly library.

Parameters:
  • tech (str) – The technology name to filter the data on.

  • parameter (str) – The parameter name to filter the data on.

  • filename (str, optional (default=None)) – The filename to save the exported plot as. If None, the plot is not exported.

  • export_format (str, optional (default='png')) – The file format to export the plot in. Only applicable if filename is provided. Supported formats are ‘png’, ‘jpeg’, ‘webp’, ‘svg’, ‘pdf’.

  • auto_open (bool, optional (default=True)) – Whether to automatically open the plot in a new browser tab.

Returns:

An interactive histogram of the values that the technology parameter has among the literature.

Return type:

figure plotly object

Examples

>>> bib_object.param_histogram(tech='ANDIG', parameter='cinv')
../_images/histo.png

parallel_coord#

BibDataManagement.parallel_coord(df=pandas.DataFrame, tech=None, params=None, color_by='paper', filename=None, export_format='png', auto_open=True)#

Visualisation of the values found in .bib for every parameter in a parallel plot coordinates, generated from plotly library.

Parameters:
  • df (pd.DataFrame, optional) – The data to plot. If not passed, tech and params can also be used.

  • tech (str, optional) – The technology to filter the data by, if df is not given.

  • params (list, optional) – List of string with the params to filter the data by, if df is not given.

  • color_by (str, optional) – The variable to color the lines by. Among [‘paper’, ‘tech’, ‘set’, ‘combined’].

  • filename (str, optional) – The filename to save the plot.

  • export_format (str, optional) – The format to save the plot. Among [png, jpg, html].

  • auto_open (bool, optional) – Indicates whether to automatically open the plot.

Returns:

An interactive parallel coordinates plot of the values that the technology has among the literature.

Return type:

figure plotly object

Examples

>>> bib_object.parallel_coord()
../_images/par_coord.png

Others#

merge_bib#

classmethod BibDataManagement.merge_bib(obj2)#

Allows to merge together two BibDataManagement objects.

Parameters:
  • cls (BibDataManagement) – Self BibDataManagement instance.

  • obj2 (BibDataManagement) – The second instance that should be merged with the first one.

Returns:

A new BibDataManagement instance, resulting from the two references file.

Return type:

BibDataManagement

Examples

>>> bib_obj1 = BibDataManagement('bibliography.bib', 'parameters_description.csv')
>>> bib_obj2 = BibDataManagement('another_bibliography.bib', 'parameters_description.csv')
>>> bib_full = bib_obj1.merge_bib(bib_obj2)

export_df_to_bibtex#

static BibDataManagement.export_df_to_bibtex(df, output_path)#

Exports the given dataframe into a .bib file, writing the data in the Notes section.

Parameters:
  • df (pd.DataFrame) – Dataframe to export.

  • output_path (str) – Path of the file to save.

Notes

  • The path should end by .bib

Examples

>>> bib_data = BibDataManagement('my_bib_file.bib')
>>> stats = bib_data.build_additional_set(from_stat='avg')
>>> bib_data.export_df_to_bibtex(stats, 'my_new_file.bib')
Exported

add_default_values#

BibDataManagement.add_default_values(pattern)#

Adds an entry to the dataframe, from a string. The string should follow the usual syntax of Bibdatamanagement.

Parameters:

pattern (str) – A Bibdata entry, between +- to /+-

Returns:

The name of the technology where default values have been added.

Return type:

str

Examples

>>> tech_name = bib_object.add_default_values(
...        "+- enhOR# default: [default, energyscope]: that's my general description"
...        "cmaint = 0.5:1:2 [MCHF/(ktCO2*y)]"
...        "cinv = 0:1 [MCHF/(ktCO2*y)]"
...        "refsize = 0 [GW]"
...        "gwp = 0 [kgCO2/kWe]"
...        "lifetime = 0 [y]"
...        "CO2c = 0 [ktCO2]"
...        "+-")
>>> tech_name
enhOR

Notes

  • The function returns the name of the technology but in fine, the whole entry was added to the instance df.

build_additional_set#

BibDataManagement.build_additional_set(df=None, from_stat='median')#

Adds another set to a DataFrame from a statistical criteria.

Parameters:
  • df (pd.DataFrame, optional) – A frame coming from BibData. If None, the self dataframe from the .bib file is used.

  • from_stat (str, optional) – The statistic that will be used to add the rows. Choose among [‘median’, ‘avg’, ‘weighted_avg’, ‘min’, ‘max’].

Return type:

A DataFrame with the supplementary rows.

Notes

  • The added rows have the value from_stats in the sets column.

  • The weigths used for the weighted average are the confidence given.

Examples

>>> df_with_avg = bib_object.build_additional_set(from_stat='avg')
>>> df_with_avg
                                                           cite_key  ...     short_name
cite_key      technology_name technology_key                 ...
my_paper_name avion           trl             my_paper_name  ...     tech_ready
                              cmaint          my_paper_name  ...  my_param_name
                              cinv            my_paper_name  ...           Cinv
              avion           trl                            ...     tech_ready
                              cmaint                         ...  my_param_name
                              cinv                           ...           Cinv
[6 rows x 28 columns]

The BibDataManagementES class#

BibDataManagementES inherits from BibDataManagement. Because it has an additional field from the note format, some methods are adapted. Moreover, it has an export designed to be used directly in the model.

BibDataManagementES.get_data(set_name=None, entry=None, category_name=None)#

Returns a pandas DataFrame containing all data in the .bib file or a subset based on the given technology, set names or category names.

Parameters:
  • set_name (str, list, optional) – Name(s) of the set to filter by.

  • entry (str, optional) – Name of the technology to filter by.

  • category_name (str, list, optional) – Name(s) of the category to filter by.

Returns:

DataFrame containing the filtered data, or all data if no filters are applied.

Return type:

pandas.DataFrame

BibDataManagementES.export_df(df, output_path, to)#

Exports the given dataframe into a .bib file or an EnergyScope input file.

Parameters:
  • df (pd.DataFrame) – Dataframe to export.

  • output_path (str) – Path of the file to save.

  • to (str) – ‘bib’ for a reference export, ‘energyscope’ for .dat file input.