Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD).

Overview

solo-epd-loader

Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD). Provides level 2 (l2) and low latency (ll) data obtained through CDF files from ESA's Solar Orbiter Archive (SOAR) for the following sensors:

  • Electron Proton Telescope (EPT)
  • High Energy Telescope (HET)
  • SupraThermal Electrons and Protons (STEP)

Installation

solo_epd_loader requires python >= 3.6, and it depends on cdflib and heliopy (which will be automatically installed). It can be installed from PyPI using:

pip install solo-epd-loader

Usage

The standard usecase is to utilize the epd_load function, which returns Pandas dataframe(s) of the EPD measurements and a dictionary containing information on the energy channels.

from solo_epd_loader import epd_load

df_1, df_2, energies = \
    epd_load(sensor, viewing, level, startdate, enddate, path, autodownload)

Input

  • sensor: ept, het, or step (string)
  • viewing: sun, asun, north, or south (string); not needed for sensor = step
  • level: ll or l2 (string)
  • startdate, enddate: YYYYMMDD, e.g., 20210415 (integer) (if no enddate is provided, enddate = startdate will be used)
  • path: directory in which Solar Orbiter data is/should be organized; e.g. /home/userxyz/solo/data/ (string)
  • autodownload: if True will try to download missing data files from SOAR (bolean)

Return

  • For sensor = ept or het:
    1. Pandas dataframe with proton fluxes and errors (for EPT also alpha particles) in ‘particles / (s cm^2 sr MeV)’
    2. Pandas dataframe with electron fluxes and errors in ‘particles / (s cm^2 sr MeV)’
    3. Dictionary with energy information for all particles:
      • String with energy channel info
      • Value of lower energy bin edge in MeV
      • Value of energy bin width in MeV
  • For sensor = step:
    1. Pandas dataframe with fluxes and errors in ‘particles / (s cm^2 sr MeV)’
    2. Dictionary with energy information for all particles:
      • String with energy channel info
      • Value of lower energy bin edge in MeV
      • Value of energy bin width in MeV

Data folder structure

The path variable provided to the module should be the base directory where the corresponding cdf data files should be placed in subdirectories. First subfolder defines the data product level (l2 or low_latency at the moment), the next one the instrument (so far only epd), and finally the sensor (ept, het or step).

For example, the folder structure could look like this: /home/userxyz/solo/data/l2/epd/het. In this case, you should call the loader with path=/home/userxyz/solo/data; i.e., the base directory for the data.

You can use the (automatic) download function described in the following section to let the subfolders be created initially automatically. NB: It might be that you need to run the code with sudo or admin privileges in order to be able to create new folders on your system.

Data download within Python

While using epd_load() to obtain the data, one can choose to automatically download missing data files from SOAR directly from within python. They are saved in the folder provided by the path argument (see above). For that, just add autodownload=True to the function call:

from solo_epd_loader import epd_load

df_protons, df_electrons, energies = \
    epd_load(sensor='het', viewing='sun', level='l2',
             startdate=20200820, enddate=20200821, \
             path='/home/userxyz/solo/data/', autodownload=True)

# plot protons and alphas
ax = df_protons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

# plot electrons
ax = df_electrons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

Note: The code will always download the latest version of the file available at SOAR. So in case a file V01.cdf is already locally present, V02.cdf will be downloaded nonetheless.

Example 1 - low latency data

Example code that loads low latency (ll) electron and proton (+alphas) fluxes (and errors) for EPT NORTH telescope from Apr 15 2021 to Apr 16 2021 into two Pandas dataframes (one for protons & alphas, one for electrons). In general available are ‘sun’, ‘asun’, ‘north’, and ‘south’ viewing directions for ‘ept’ and ‘het’ telescopes of SolO/EPD.

from solo_epd_loader import *

df_protons, df_electrons, energies = \
    epd_load(sensor='ept', viewing='north', level='ll',
             startdate=20210415, enddate=20210416, \
             path='/home/userxyz/solo/data/')

# plot protons and alphas
ax = df_protons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

# plot electrons
ax = df_electrons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

Example 2 - level 2 data

Example code that loads level 2 (l2) electron and proton (+alphas) fluxes (and errors) for HET SUN telescope from Aug 20 2020 to Aug 20 2020 into two Pandas dataframes (one for protons & alphas, one for electrons).

from solo_epd_loader import epd_load

df_protons, df_electrons, energies = \
    epd_load(sensor='het', viewing='sun', level='l2',
             startdate=20200820, enddate=20200821, \
             path='/home/userxyz/solo/data/')

# plot protons and alphas
ax = df_protons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

# plot electrons
ax = df_electrons.plot(logy=True, subplots=True, figsize=(20,60))
plt.show()

Example 3 - reproducing EPT data from Fig. 2 in Gómez-Herrero et al. 2021 [1]

from solo_epd_loader import epd_load

# set your local path here
lpath = '/home/userxyz/solo/data'

# load data
df_protons, df_electrons, energies = \
    epd_load(sensor='ept', viewing='sun', level='l2', startdate=20200708,
             enddate=20200724, path=lpath, autodownload=True)

# change time resolution to get smoother curve (resample with mean)
resample = '60min'

fig, axs = plt.subplots(2, sharex=True)
fig.suptitle('EPT Sun')

# plot selection of channels
for channel in [0, 8, 16, 26]:
    df_electrons['Electron_Flux'][f'Electron_Flux_{channel}']\
        .resample(resample).mean().plot(ax = axs[0], logy=True,
        label=energies["Electron_Bins_Text"][channel][0])
for channel in [6, 22, 32, 48]:
    df_protons['Ion_Flux'][f'Ion_Flux_{channel}']\
        .resample(resample).mean().plot(ax = axs[1], logy=True,
        label=energies["Ion_Bins_Text"][channel][0])

axs[0].set_ylim([0.3, 4e6])
axs[1].set_ylim([0.01, 5e8])

axs[0].set_ylabel("Electron flux\n"+r"(cm$^2$ sr s MeV)$^{-1}$")
axs[1].set_ylabel("Ion flux\n"+r"(cm$^2$ sr s MeV)$^{-1}$")
axs[0].legend()
axs[1].legend()
plt.subplots_adjust(hspace=0)
plt.show()

NB: This is just an approximate reproduction with different energy channels (smaller, not combined) and different time resolution! Figure

Example 4 - reproducing EPT data from Fig. 2 in Wimmer-Schweingruber et al. 2021 [2]

from solo_epd_loader import epd_load
import datetime

# set your local path here
lpath = '/home/userxyz/solo/data'

# load data
df_protons_sun, df_electrons_sun, energies = \
    epd_load(sensor='ept', viewing='sun', level='l2',
             startdate=20201210, enddate=20201211,
             path=lpath, autodownload=True)
df_protons_asun, df_electrons_asun, energies = \
    epd_load(sensor='ept', viewing='asun', level='l2',
             startdate=20201210, enddate=20201211,
             path=lpath, autodownload=True)
df_protons_south, df_electrons_south, energies = \
    epd_load(sensor='ept', viewing='south', level='l2',
             startdate=20201210, enddate=20201211,
             path=lpath, autodownload=True)
df_protons_north, df_electrons_north, energies = \
    epd_load(sensor='ept', viewing='north', level='l2',
             startdate=20201210, enddate=20201211,
             path=lpath, autodownload=True)

# plot mean intensities of two energy channels; 'channel' defines the lower one
channel = 6
ax = pd.concat([df_electrons_sun['Electron_Flux'][f'Electron_Flux_{channel}'],
                df_electrons_sun['Electron_Flux'][f'Electron_Flux_{channel+1}']],
                axis=1).mean(axis=1).plot(logy=True, label='sun', color='#d62728')
ax = pd.concat([df_electrons_asun['Electron_Flux'][f'Electron_Flux_{channel}'],
                df_electrons_asun['Electron_Flux'][f'Electron_Flux_{channel+1}']],
                axis=1).mean(axis=1).plot(logy=True, label='asun', color='#ff7f0e')
ax = pd.concat([df_electrons_north['Electron_Flux'][f'Electron_Flux_{channel}'],
                df_electrons_north['Electron_Flux'][f'Electron_Flux_{channel+1}']],
                axis=1).mean(axis=1).plot(logy=True, label='north', color='#1f77b4')
ax = pd.concat([df_electrons_south['Electron_Flux'][f'Electron_Flux_{channel}'],
                df_electrons_south['Electron_Flux'][f'Electron_Flux_{channel+1}']],
                axis=1).mean(axis=1).plot(logy=True, label='south', color='#2ca02c')

plt.xlim([datetime.datetime(2020, 12, 10, 23, 0),
          datetime.datetime(2020, 12, 11, 12, 0)])

ax.set_ylabel("Electron flux\n"+r"(cm$^2$ sr s MeV)$^{-1}$")
plt.title('EPT electrons ('+str(energies['Electron_Bins_Low_Energy'][channel])
          + '-' + str(energies['Electron_Bins_Low_Energy'][channel+2])+' MeV)')
plt.legend()
plt.show()

NB: This is just an approximate reproduction; e.g., the channel combination is a over-simplified approximation! image1

References

[1] First near-relativistic solar electron events observed by EPD onboard Solar Orbiter, Gómez-Herrero et al., A&A, 656 (2021) L3, https://doi.org/10.1051/0004-6361/202039883
[2] First year of energetic particle measurements in the inner heliosphere with Solar Orbiter’s Energetic Particle Detector, Wimmer-Schweingruber et al., A&A, 656 (2021) A22, https://doi.org/10.1051/0004-6361/202140940

License

This project is Copyright (c) Jan Gieseler and licensed under the terms of the BSD 3-clause license. This package is based upon the Openastronomy packaging guide which is licensed under the BSD 3-clause licence. See the licenses folder for more information.

Comments
  • Environment variable for path

    Environment variable for path

    Would it be possible to use (optionally) an environment variable for the path (preferably the same for all loaders)? That would make it much easier for multi-user environments to have data in one location only. Granted, it would possibly also need some file permission changing as well...

    enhancement 
    opened by tlml 12
  • Replacing FILLVALUES not working with pandas 1.5.0

    Replacing FILLVALUES not working with pandas 1.5.0

    At least until pandas 1.4.4 the replacement of FILLVAUES done by the following code worked: https://github.com/jgieseler/solo-epd-loader/blob/f92e4e995a273d5755792c3f02e4ea3c33cfc675/solo_epd_loader/init.py#L754-L761

    But since pandas 1.5.0 it doesn't work anymore, and the values of -1e+31 are not replaced with np.nan's.

    I don't know the reason, maybe it has to do with the fact that the corresponding DataFrames have a MultiIndex.

    bug 
    opened by jgieseler 1
  • Catch error that python doesn't have rights to create folders

    Catch error that python doesn't have rights to create folders

    Data for the different detectors are downloaded in subdirectories of the data directory provided by path. Under some circumstances, the script doesn't have the necessary rights to create these folders if they don't already exist. Then a FileNotFoundError: [Errno 2] No such file or directory: {path+subdir+file} is raised.

    Catch this problem and/or provide a meaningful warning message.

    bug 
    opened by jgieseler 1
  • Change from heliopy's cdf2lib to sunpy's read_cdf

    Change from heliopy's cdf2lib to sunpy's read_cdf

    Change the function to read cdf files from heliopy's cdf2lib() to sunpy's read_cdf() in _read_epd_cdf(); i.e., applies to EPT and HET data, not STEP data. The latter is read in manually using cdflib

    opened by jgieseler 0
  • Make downloading of all viewings optional

    Make downloading of all viewings optional

    SolO/EPD/EPT has for viewing directions; each delivered in a separate data file. Right now, all viewing files are downloaded for a requested day, even so the call to solo-epd-loader specifically asks for a single viewing direction and only returns that data. This has been included in the beginning because usually we have been interested in having all viewing-direction files anyhow. But it makes sense to have this at least as an option, so that you can deactivate this behaviour in case you want to only have e.g. the 'sun' viewing direction.

    enhancement 
    opened by jgieseler 0
  • Include resampling functionality

    Include resampling functionality

    Include resampling functionality like https://github.com/serpentine-h2020/SEPpy/blob/bc2e3e0662a019147d25bd554edbceaf7328e25b/seppy/loader/stereo.py#L24-L38

    enhancement 
    opened by jgieseler 0
  • Clean install_requires in setup.cfg

    Clean install_requires in setup.cfg

    With https://github.com/jgieseler/solo-epd-loader/commit/8fede59ac7a529cb1189f1ac40ddf20755b5cdaf bz4 and datetime have been added to the install_requires in setup.cfg (in the progress of establishing some testing), but this is not liked by the conda-forge version, which complains when bz4 and datetime are listed as requirements in the meta.yaml file. This needs to be sorted out.

    Until then, pip check has been removed from meta.yaml, cf. https://github.com/jgieseler/solo-epd-loader-feedstock/commit/9d9eda523e1690fc1d520bca4a4a40eba521b6be

    opened by jgieseler 0
  • Set level='l2' as default

    Set level='l2' as default

    Right now, level is a required positional argument. Set this by default to 'l2' because this should be the standard data product one should use if in doubt.

    opened by jgieseler 0
  • Add calc_av_en_flux_EPD()

    Add calc_av_en_flux_EPD()

    Add function that averages the flux of several energy channels into a combined energy channel. In principle already available here, but needs to be corectly integrated.

    enhancement 
    opened by jgieseler 1
  • Use sunpy_soar for downloading data from SOAR

    Use sunpy_soar for downloading data from SOAR

    sunpy_soar supports since v1.4 also low latency data. So it now is able to obtain all the same data we're downloading until now with solo_epd_loader (the source is in both cases ESA's SOAR). For the future, it would be worthwhile to completely move the downloading process to sunpy_soar to avoid duplication (and sunpy_soar is definitely much better written than my code 😅).

    enhancement 
    opened by jgieseler 1
Releases(v0.1.11)
Owner
Jan Gieseler
Jan Gieseler
Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The project retrains itself after every prediction, making it more robust and generalized over time.

Arun Singh Babal 2 Jul 1, 2022
Python / C++ based particle reaction-diffusion simulator

ReaDDy (Reaction Diffusion Dynamics) is an open source particle based reaction-diffusion simulator that can be configured and run via Python. Currentl

ReaDDy 46 Dec 9, 2022
A lightweight solution for local Particle development.

neopo A lightweight solution for local Particle development. Features Builds Particle projects locally without any overhead. Compatible with Particle

Nathan Robinson 19 Jan 1, 2023
A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.

DeltaCAT DeltaCAT is a Pythonic Data Catalog powered by Ray. Its data storage model allows you to define and manage fast, scalable, ACID-compliant dat

null 45 Oct 15, 2022
Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects

Data Structures and Algorithms All the essential resources and template code nee

Hesham 13 Dec 1, 2022
An unofficial python API for trading on the DeGiro platform, with the ability to get real time data and historical data.

DegiroAPI An unofficial API for the trading platform Degiro written in Python with the ability to get real time data and historical data for products.

Jorrick Sleijster 5 Dec 16, 2022
Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Brian Blaylock 194 Jan 2, 2023
A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions

A program made in PYTHON?? that automatically performs data insertions into a POSTGRES database ?? , using as base a .CSV file ?? , useful in mass data insertions.

Davi Galdino 1 Oct 17, 2022
Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

Thárcyla 1 Feb 14, 2022
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

DataCamp 114 Oct 12, 2022
resultados (data) de elecciones 2021 y código para extraer data de la ONPE

elecciones-peru-2021-ONPE Resultados (data) de elecciones 2021 y código para extraer data de la ONPE Data Licencia liberal, pero si vas a usarlo por f

Ragi Yaser Burhum 21 Jun 14, 2021
Yunqi Chen 7 Oct 30, 2022
Improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

We're the hackathon leftovers, but we are Too Good To Go ;-). A repo by Lukas Schubotz and Raymon van Dinter. We aim to improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

ASReview hackathon for Follow the Money 5 Dec 9, 2021
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.

Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps find essential insights from training results and improves AI performance.

Adansons Inc 27 Oct 22, 2022
Open-source data observability for modern data teams

Use cases Monitor your data warehouse in minutes: Data anomalies monitoring as dbt tests Data lineage made simple, reliable, and automated dbt operati

null 889 Jan 1, 2023
Run python scripts and pass data between multiple python and node processes using this npm module

Run python scripts and pass data between multiple python and node processes using this npm module. process-communication has a event based architecture for interacting with python data and errors inside nodejs.

Tyler Laceby 2 Aug 6, 2021
A Python library that helps data scientists to infer causation rather than observing correlation.

A Python library that helps data scientists to infer causation rather than observing correlation.

QuantumBlack Labs 1.7k Jan 4, 2023
Simple but maybe too simple config management through python data classes. We use it for machine learning.

??‍✈️ Coqpit Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization. Curre

coqui 67 Nov 29, 2022
List of short Codeforces problems with a statement of 1000 characters or less. Python script and data files included.

Shortest problems on Codeforces List of Codeforces problems with a short problem statement of 1000 characters or less. Sorted for each rating level. B

null 32 Dec 24, 2022