The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Overview

About

The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way. Having the data defined as a set of objects, specialized storage methods are used to get direct access to the separate attributes of the selected objects, without having to touch the bulk of the data. Included are histograming methods in an arbitrary number of dimensions, curve fitting, function evaluation, minimization, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode, as well as a general parallel processing framework, PROOF, that can considerably speed up an analysis.

Thanks to the built-in C++ interpreter cling, the command, the scripting and the programming language are all C++. The interpreter allows for fast prototyping of the macros since it removes the time consuming compile/link cycle. It also provides a good environment to learn C++. If more performance is needed the interactively developed macros can be compiled using a C++ compiler via a machine independent transparent compiler interface called ACliC.

The system has been designed in such a way that it can query its databases in parallel on clusters of workstations or many-core machines. ROOT is an open system that can be dynamically extended by linking external libraries. This makes ROOT a premier platform on which to build data acquisition, simulation and data analysis systems.

License: LGPL v2.1+ CII Best Practices

Cite

When citing ROOT, please use both the reference reported below and the DOI specific to your ROOT version available on Zenodo DOI. For example, you can copy-paste and fill in the following citation:

Rene Brun and Fons Rademakers, ROOT - An Object Oriented Data Analysis Framework,
Proceedings AIHENP'96 Workshop, Lausanne, Sep. 1996,
Nucl. Inst. & Meth. in Phys. Res. A 389 (1997) 81-86.
See also "ROOT" [software], Release vX.YY/ZZ, dd/mm/yyyy,
(Select the right link for your release here: https://zenodo.org/search?page=1&size=20&q=conceptrecid:848818&all_versions&sort=-version).

Live Demo for CERN Users

Screenshots

These screenshots shows some of the plots (produced using ROOT) presented when the Higgs boson discovery was announced at CERN:

CMS Data MC Ratio Plot

Atlas P0 Trends

See more screenshots on our gallery.

Installation and Getting Started

See https://root.cern/install for installation instructions. For instructions on how to build ROOT from these source files, see https://root.cern/install/build_from_source.

Our "Getting started with ROOT" page is then the perfect place to get familiar with ROOT.

Help and Support

Contribution Guidelines

Comments
  • Update ROOT's llvm to llvm13.

    Update ROOT's llvm to llvm13.

    The things we need to do before merging this PR and can probably be done by various people in parallel

    Cling standalone:

    • [x] Fix cling CUDA tests
    • [ ] Fix the remaining test failures (6, see below)
    • [x] Revert the commit 'FIXME: Undo this change and debug why we have PendingInstances.'
    Cling test failures

    Failures in master on my system:

        Cling :: CodeUnloading/PCH/VTables.C
        Cling :: DynamicLibraryManager/callable_lib_L_AB_order1.C
    

    Remaining failures (excluding the ones above):

      Cling :: CodeGeneration/Symbols.C
      Cling :: CodeUnloading/AtExit.C
      Cling :: CodeUnloading/PCH/VTablesClingPCH.C
      Cling :: CodeUnloading/RereadFile.C
      Cling :: ErrorRecovery/StoredState.C
      Cling :: MultipleInterpreters/MultipleInterpreters.C
    

    ROOT:

    • [x] Compare the build size against master
    • [x] Compare the .pcm file size against master
    • [ ] Add flags to ignore compilation warnings coming from llvm
    • [x] Remove the FIXME from commit 'Add another symbol generator to resolve the generated lazy symbol' - the explanation is in the commit
    • [x] Fix the llvm::StringRef conversion failures on OSX
    Binary Size this PR needs 13% more space (2.3 vs 2. GB)
    du -hs root-release-llvm13
    2.3G	.
    (base) vvassilev@vv-nuc /build/vvassilev/root-release-llvm13 $ du -hs ../root-release-master/
    2.0G	../root-release-master/
    
    Module files need ~5% more space on disk (215 vs 206 MB)
    diff -y llvm13 master 
    424K	lib/ASImageGui.pcm				      |	444K	lib/ASImageGui.pcm
    468K	lib/ASImage.pcm					      |	484K	lib/ASImage.pcm
    4.2M	lib/_Builtin_intrinsics.pcm			      |	4.0M	lib/_Builtin_intrinsics.pcm
    48K	lib/_Builtin_stddef_max_align_t.pcm		      |	44K	lib/_Builtin_stddef_max_align_t.pcm
    200K	lib/Cling_Runtime_Extra.pcm			      |	132K	lib/Cling_Runtime_Extra.pcm
    100K	lib/Cling_Runtime.pcm					100K	lib/Cling_Runtime.pcm
    11M	lib/Core.pcm					      |	9.6M	lib/Core.pcm
    564K	lib/EG.pcm					      |	584K	lib/EG.pcm
    5.7M	lib/Eve.pcm					      |	5.4M	lib/Eve.pcm
    652K	lib/FitPanel.pcm				      |	656K	lib/FitPanel.pcm
    504K	lib/Foam.pcm					      |	520K	lib/Foam.pcm
    440K	lib/Fumili.pcm					      |	460K	lib/Fumili.pcm
    1.2M	lib/Gdml.pcm						1.2M	lib/Gdml.pcm
    960K	lib/Ged.pcm					      |	968K	lib/Ged.pcm
    432K	lib/Genetic.pcm					      |	456K	lib/Genetic.pcm
    2.9M	lib/GenVector.pcm				      |	2.8M	lib/GenVector.pcm
    868K	lib/GeomBuilder.pcm				      |	876K	lib/GeomBuilder.pcm
    500K	lib/GeomPainter.pcm				      |	520K	lib/GeomPainter.pcm
    3.4M	lib/Geom.pcm					      |	3.3M	lib/Geom.pcm
    860K	lib/Gpad.pcm						860K	lib/Gpad.pcm
    836K	lib/Graf3d.pcm					      |	844K	lib/Graf3d.pcm
    1.0M	lib/Graf.pcm						1.0M	lib/Graf.pcm
    540K	lib/GuiBld.pcm					      |	556K	lib/GuiBld.pcm
    588K	lib/GuiHtml.pcm					      |	604K	lib/GuiHtml.pcm
    3.5M	lib/Gui.pcm					      |	3.4M	lib/Gui.pcm
    496K	lib/Gviz3d.pcm					      |	516K	lib/Gviz3d.pcm
    468K	lib/GX11.pcm					      |	484K	lib/GX11.pcm
    412K	lib/GX11TTF.pcm					      |	432K	lib/GX11TTF.pcm
    3.6M	lib/HistFactory.pcm				      |	3.4M	lib/HistFactory.pcm
    484K	lib/HistPainter.pcm				      |	500K	lib/HistPainter.pcm
    5.9M	lib/Hist.pcm					      |	5.7M	lib/Hist.pcm
    1.5M	lib/Html.pcm						1.5M	lib/Html.pcm
    1.8M	lib/Imt.pcm					      |	1.7M	lib/Imt.pcm
    1.9M	lib/libc.pcm						1.9M	lib/libc.pcm
    12M	lib/MathCore.pcm				      |	11M	lib/MathCore.pcm
    1.6M	lib/Matrix.pcm						1.6M	lib/Matrix.pcm
    3.1M	lib/Minuit2.pcm					      |	3.0M	lib/Minuit2.pcm
    544K	lib/Minuit.pcm					      |	560K	lib/Minuit.pcm
    476K	lib/MLP.pcm					      |	496K	lib/MLP.pcm
    1.2M	lib/MultiProc.pcm					1.2M	lib/MultiProc.pcm
    1.1M	lib/Net.pcm						1.1M	lib/Net.pcm
    712K	lib/NetxNG.pcm						712K	lib/NetxNG.pcm
    728K	lib/Physics.pcm					      |	736K	lib/Physics.pcm
    492K	lib/Postscript.pcm				      |	508K	lib/Postscript.pcm
    564K	lib/ProofBench.pcm				      |	584K	lib/ProofBench.pcm
    948K	lib/ProofDraw.pcm				      |	940K	lib/ProofDraw.pcm
    1.6M	lib/Proof.pcm						1.6M	lib/Proof.pcm
    732K	lib/ProofPlayer.pcm				      |	744K	lib/ProofPlayer.pcm
    596K	lib/Quadp.pcm					      |	608K	lib/Quadp.pcm
    392K	lib/RCsg.pcm					      |	412K	lib/RCsg.pcm
    536K	lib/Recorder.pcm				      |	556K	lib/Recorder.pcm
    5.4M	lib/RGL.pcm					      |	5.1M	lib/RGL.pcm
    1.6M	lib/RHTTP.pcm					      |	1.5M	lib/RHTTP.pcm
    412K	lib/RHTTPSniff.pcm				      |	436K	lib/RHTTPSniff.pcm
    400K	lib/Rint.pcm					      |	420K	lib/Rint.pcm
    2.6M	lib/RIO.pcm					      |	2.5M	lib/RIO.pcm
    23M	lib/RooFitCore.pcm				      |	22M	lib/RooFitCore.pcm
    1.1M	lib/RooFitHS3.pcm				      |	1008K	lib/RooFitHS3.pcm
    16M	lib/RooFit.pcm					      |	15M	lib/RooFit.pcm
    424K	lib/RooFitRDataFrameHelpers.pcm			      |	448K	lib/RooFitRDataFrameHelpers.pcm
    4.3M	lib/RooStats.pcm				      |	4.1M	lib/RooStats.pcm
    468K	lib/RootAuth.pcm				      |	484K	lib/RootAuth.pcm
    120K	lib/ROOT_Config.pcm					120K	lib/ROOT_Config.pcm
    15M	lib/ROOTDataFrame.pcm				      |	14M	lib/ROOTDataFrame.pcm
    332K	lib/ROOT_Foundation_C.pcm				332K	lib/ROOT_Foundation_C.pcm
    620K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm		      |	600K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm
    140K	lib/ROOT_Rtypes.pcm					140K	lib/ROOT_Rtypes.pcm
    4.1M	lib/ROOTTMVASofie.pcm					4.1M	lib/ROOTTMVASofie.pcm
    412K	lib/ROOTTPython.pcm				      |	432K	lib/ROOTTPython.pcm
    2.6M	lib/ROOTVecOps.pcm				      |	2.5M	lib/ROOTVecOps.pcm
    652K	lib/SessionViewer.pcm				      |	668K	lib/SessionViewer.pcm
    3.0M	lib/Smatrix.pcm					      |	2.9M	lib/Smatrix.pcm
    436K	lib/SpectrumPainter.pcm				      |	456K	lib/SpectrumPainter.pcm
    572K	lib/Spectrum.pcm				      |	584K	lib/Spectrum.pcm
    424K	lib/SPlot.pcm					      |	440K	lib/SPlot.pcm
    624K	lib/SQLIO.pcm					      |	640K	lib/SQLIO.pcm
    18M	lib/std.pcm					      |	17M	lib/std.pcm
    1.6M	lib/Thread.pcm					      |	1.5M	lib/Thread.pcm
    568K	lib/TMVAGui.pcm					      |	588K	lib/TMVAGui.pcm
    18M	lib/TMVA.pcm					      |	17M	lib/TMVA.pcm
    2.6M	lib/Tree.pcm					      |	2.5M	lib/Tree.pcm
    4.5M	lib/TreePlayer.pcm				      |	4.3M	lib/TreePlayer.pcm
    668K	lib/TreeViewer.pcm				      |	684K	lib/TreeViewer.pcm
    536K	lib/Unfold.pcm					      |	552K	lib/Unfold.pcm
    424K	lib/X3d.pcm					      |	448K	lib/X3d.pcm
    1.1M	lib/XMLIO.pcm					      |	1.0M	lib/XMLIO.pcm
    444K	lib/XMLParser.pcm				      |	464K	lib/XMLParser.pcm
    

    cc: @hahnjo, @Axel-Naumann

    opened by vgvassilev 750
  • [Exp PyROOT] Build PyROOT with multiple Python versions

    [Exp PyROOT] Build PyROOT with multiple Python versions

    The commits in this PR contain the necessary steps performed in order to allow the user to build PyROOT with more than one versions of Python. The version in use can be changed with the usual source thisroot.sh preceded by the specific Python version, e.g.: ROOT_PYTHON_VERSION=3.6 source bin/thisroot.sh performed inside the build directory. Quick summary of the commits: (1) set the necessary CMake variables to build the PyROOT libraries in lib/pythonX.Y (2) modify thisroot.sh to allow the user to select the Python version (3) necessary changes to pyunittests and tutorials CMake variables (4) installation

    new feature 
    opened by maxgalli 346
  • RooFit::MultiProcess & TestStatistics part 2 redo: RooFitZMQ & MultiProcess

    RooFit::MultiProcess & TestStatistics part 2 redo: RooFitZMQ & MultiProcess

    This PR is a do-over of #8385 and #8412 and, as such, again the second part of a split and clean-up of #8294. The most important blocker in those PRs was the inclusion of a patched libzmq in RooFitZMQ itself. This patch has now been included in libzmq proper. Another blocking review comment was that libzmq symbols must not be allowed to be exported through our libraries. This has been solved in theory, and in practice is pending another PR to libzmq. Having fixed these two blockers, we should now be able to continue.

    To recap:

    In this PR, we introduce two packages: RooFitZMQ and RooFit::MultiProcess. It also adds two builtins for ZeroMQ to ease dependency management: libzmq and cppzmq. The builtin for libzmq is especially necessary at this point because it has recently gained a necessary feature that has not been released yet.

    RooFit::MultiProcess is a task-based parallelization framework.

    It uses forked processes for parallelization, as opposed to threads. We chose this approach because A) the existing RooRealMPFE parallelization framework already made use of forks as well, so we had something to build on and B) it was at the time deemed infeasible to check the entire RooFit code for thread-safety. Moreover, we use MultiProcess to parallelize gradients -- i.e. the tasks to be executed in parallel are partial derivatives -- and these are sufficiently large tasks that communication in between tasks is not a big concern in the big fits that we aimed to parallelize.

    The communication between processes is done using ZeroMQ. The ZeroMQ dependency is wrapped in convenience classes contributed by @roelaaij which here are packaged as RooFitZMQ.

    Will un-draft the PR once the following is done (based on previous review comments by @guitargeek @hageboeck @amadio @lmoneta and also some other things from myself):

    • [x] includes: correct order (matching header, RooFit, ROOT, std) and ROOT includes in quotation marks
    • [x] fix ZMQ deprecation warnings
    • [x] refactor member names: underscore suffix
    • [x] document important things with doxygen
    • [x] remove commented out code and TODOs and other junk
    • [x] fix copyright headers + author lists (RooFitZMQ: me, Roel; MP: me, Inti, Vince)
    • [ ] rebase in 2-3 neat commits that all compile and pass tests
    • [x] clang-tidy up
    • [x] change libzmq builtin back to master after PR is merged: https://github.com/zeromq/libzmq/pull/4266
    • [ ] ~use enum class instead of template parameters for minimizer function implementation choice~ -> next PR

    Edit 18 Nov 2021: the following list is to keep track of unaddressed (at time of writing) comments made in this thread (because the thread is so long that it is very inconvenient to navigate on GitHub which doesn't load it all at once):

    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-773656413: only need to rebase, but that is already listed above.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-790026907: we have to double check whether the build issues still exist. They should be gone, because we don't build dictionaries anymore.
    • [x] https://github.com/root-project/root/pull/9078#discussion_r736998615: Related to the issue above, iiuc, because the include was missing from the dictionary, so this can probably also be marked resolved now.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791797535: change inc to res in RooFitZMQ and MultiProcess and only include these zmq header exposing include directories to specific targets that need them using target_include_directories. This way, we don't transitively expose zmq includes to ROOT users.
      • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791786326: The above solution also circumvents this issue with ZMQ_ENABLE_DRAFT preprocessor defines.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791883192: change copyright/license headers.

    Let me know if you find additional items for the todo list.

    in:RooFit/RooStats 
    opened by egpbos 312
  • [cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules

    [cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules

    Currently, 36 tests are failing for runtime modules: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/ We want to make these test pass so that we can say that the runtime modules is finally working.

    This patch enables ROOT to preload all modules at startup time. In my environment, this patch fixes 14 tests for runtime cxxmodules.

    Preloading all the modules has several advantages. 1. We do not have to rely on rootmap files which don't support some features (namespaces and templates) 2. Lookup would be faster because we don't have to do trampoline via rootmap files.

    The only disadvantage of preloading all the modules is the startup time performance. root.exe -q -l memory.C This is a release build without modules:

     cpu  time = 0.091694 seconds
     sys  time = 0.026187 seconds
     res  memory = 133.008 Mbytes
     vir  memory = 217.742 Mbytes
    

    This is a release build with modules, with this patch:

     cpu  time = 0.234134 seconds
     sys  time = 0.066774 seconds
     res  memory = 275.301 Mbytes
     vir  memory = 491.832 Mbytes
    

    As you can see, preloading all the modules makes both time and memory 2 to 3 times worse at a startup time.

    Edit : With hsimple.C root.exe -l -b tutorials/hsimple.C -q ~/CERN/ROOT/memory.C Release build without modules:

    Processing tutorials/hsimple.C...                                                                        
    hsimple   : Real Time =   0.04 seconds Cpu Time =   0.05 seconds                        
    (TFile *) 0x555ae2a9d560                                                                  
    Processing /home/yuka/CERN/ROOT/memory.C...                                                              
     cpu  time = 0.173591 seconds                                   
     sys  time = 0.011835 seconds                       
     res  memory = 135.32 Mbytes                                    
     vir  memory = 209.664 Mbytes 
    

    Release build with modules, with this patch:

    Processing tutorials/hsimple.C...
    hsimple   : Real Time =   0.04 seconds Cpu Time =   0.04 seconds
    (TFile *) 0x55d1b036d230
    Processing /home/yuka/CERN/ROOT/memory.C...
     cpu  time = 0.290742 seconds
     sys  time = 0.043851 seconds
     res  memory = 256.844 Mbytes
     vir  memory = 438.484 Mbytes
    

    However, it is a matter of course that we get slower startup time if we try to load all the modules at startup time, not on-demand. I haven't had a good benchmark for this but, in theory, it reduces execution time instead as we're anyway loading modules after the startup.

    opened by yamaguchi1024 282
  • [cmake] use only source dirs as include paths when building ROOT

    [cmake] use only source dirs as include paths when building ROOT

    Fully exclude ${CMAKE_BUILD_DIR)/include from includes paths when buiding ROOT libraries

    Several generated files placed first to ${CMAKE_BUILD_DIR)/ginclude and then copied to include.

    Dictionary generation still uses only ${CMAKE_BUILD_DIR)/include, otherwise cling complains about similar includes in different places. Once problem with cling fixed, source dirs can be used for it as well

    new feature 
    opened by linev 252
  • [cxxmodules] Implement global module indexing to improve performance.

    [cxxmodules] Implement global module indexing to improve performance.

    The global module index represents an efficient on-disk hash table which stores identifier->module mapping. Every time clang finds a unknown identifier we are informed and we can load the corresponding module on demand.

    This way we can provide minimal set of loaded modules. Currently, we see that for hsimple.C only the half of the modules are loaded. This can be further improved because we currently load all modules which have an identifier, that is when looking for (for example TPad) we will load all modules which have the identifier TPad, including modules which contain only a forward declaration of it.

    Kudos Arpitha Raghunandan (@arpi-r)!

    We still need some performance measurements but the preliminary results are promising.

    Performance

    Methodology

    We have a forwarding root.exe which essentially calls /usr/bin/time -v root.exe $@. We have processed and stored this information in csv files. We have run in three modes:

    1. root master without modules (modulesoff)
    2. root master with modules (moduleson)
    3. root master with this PR with modules (gmi)

    Run on Ubuntu 18.10 on Intel® Core™ i5-8250U CPU @ 1.60GHz × 8

    Results Interpretation

    A general comparison between 2) and 3) show that this PR makes ROOT about 3% faster and 25% more memory efficient.

    A general comparison between 1) and 3) shows that modules are still less efficient in a few cases which is expected because the PR loads more modules than it should. This will be addressed in subsequent PRs. A good trend is that some test already show that 3) is better than 1).

    The raw data could be found here. [work was done by Arpitha Raghunandan (@arpi-r)]

    Depends on #4005.

    opened by vgvassilev 219
  • [VecOps] RVec 2.0: small buffer optimization based on LLVM SmallVector

    [VecOps] RVec 2.0: small buffer optimization based on LLVM SmallVector

    • [x] add ARCHITECTURE.md
    • [x] use fCapacity == -1 to indicate memory-adoption mode
    • [x] switch asserts to throws
    • [x] expose the small buffer size as a template parameter (defaulted to sizeof(T)*8 > 1024 ? 0 : 8 or similar, see also https://lists.llvm.org/pipermail/llvm-dev/2020-November/146613.html and the way they currently do it in LLVM: https://llvm.org/doxygen/SmallVector_8h_source.html#l01101)
    • [x] re-check before/after benchmark runtimes (first measurements at https://eguiraud.web.cern.ch/eguiraud/decks/20201112_rvec_redesign_ppp )
    • [x] unit test for exceptions thrown during construction or resizing (and add note about lack of exception safety in docs)
    • [x] confirm that crediting of LLVM is ok (currently only in math/vecops/ARCHITECTURE.md)
    opened by eguiraud 200
  • [CMake] Add automatic FAILREGEX for gtests

    [CMake] Add automatic FAILREGEX for gtests

    gtests can print errors using ROOT's message system, but these get ignored completely. Several problems could have been caught automatically, but they went undetected.

    This adds a default regex to all gtests that checks for "(Fatal|Error|Warning) in <", unless an explicit FAILREGEX is passed to ROOT_ADD_GTEST.

    How to fix the tests:

    • [Easy, but unsafe] Add FAILREGEX "" to ROOT_ADD_GTEST. In that case, we will not grep for anything.
    • [Safe] Use the macros from https://github.com/root-project/root/blob/master/test/unit_testing_support/ROOTUnitTestSupport.h and catch the diagnostics
    • Fix what triggers the warnings/errors
    opened by hageboeck 200
  • Add vectorized implementations of first batch of TMath functions

    Add vectorized implementations of first batch of TMath functions

    This PR adds vectorized implementations of the following TMath functions using VecCore backend :

    • Log2
    • Breit-Wigner
    • Gaus
    • LaplaceDist
    • LaplaceDistI
    • Freq
    • Bessel I0, I1, J0, J1

    The first batch includes functions for which a definite speedup is obtained. Left out are the ones with more conditional branches. Work is ongoing to implement them as well.

    Here is the PR for benchmarks.

    Benchmarks from a trial run :

    ----------------------------------------------------------------------
    Benchmark                                Time           CPU Iterations
    -----------------------------------------------------------------------
    BM_TMath_Log2                       340895 ns     340801 ns       2042
    BM_TMath_BreitWigner                 42236 ns      42227 ns      16562
    BM_TMath_Gaus                       280188 ns     280130 ns       2476
    BM_TMath_LaplaceDist                246254 ns     246176 ns       2834
    BM_TMath_LaplaceDistI               291277 ns     291221 ns       2405
    BM_TMath_Freq                       388384 ns     388278 ns       1816
    BM_TMath_BesselI0                   283500 ns     283445 ns       2466
    BM_TMath_BesselI1                   327932 ns     327847 ns       2134
    BM_TMath_BesselJ0                   744044 ns     743897 ns        938
    BM_TMath_BesselJ1                   735381 ns     735235 ns        937
    BM_VectorizedTMath_Log2              97462 ns      97433 ns       7079
    BM_VectorizedTMath_BreitWigner       20773 ns      20769 ns      33494
    BM_VectorizedTMath_Gaus             127413 ns     127385 ns       5519
    BM_VectorizedTMath_LaplaceDist      118903 ns     118870 ns       5845
    BM_VectorizedTMath_LaplaceDistI     130724 ns     130693 ns       5367
    BM_VectorizedTMath_Freq             267444 ns     267389 ns       2590
    BM_VectorizedTMath_BesselI0         177544 ns     177503 ns       3936
    BM_VectorizedTMath_BesselI1         206571 ns     206523 ns       3370
    BM_VectorizedTMath_BesselJ0         326378 ns     326312 ns       2144
    BM_VectorizedTMath_BesselJ1         343600 ns     343531 ns       2014
    
    new contributor 
    opened by ArifAhmed1995 164
  • [cxxmodules] Enable the semantic global module index to boost performance.

    [cxxmodules] Enable the semantic global module index to boost performance.

    The global module index (GMI) is an optimization which hides the introduced by clang overhead when pre-loading the C++ modules at startup.

    The GMI represents a mapping between an identifier and a set of modules which contain this indentifier. This mean that if we TH1 is undeclared the GMI will load all modules which contain this identifier which is usually suboptimal, too.

    The semantic GMI maps identifiers only to modules which contain a definition of the entity behind the identifier. For cases such as typedefs where the entity introduces a synonym (rather than declaration) we map the first module we encounter. For namespaces we add all modules which has a namespace partition. The namespace case is still suboptimal and further improved by inspecting what exactly is being looked up in the namespace by the qualified lookup facilities.

    opened by vgvassilev 160
  • [tcling] Improve symbol resolution.

    [tcling] Improve symbol resolution.

    This patch consolidates the symbol resolution facilities throughout TCling into a new singleton class Dyld part of the cling's DynamicLibraryManager.

    The new dyld is responsible for:

    • Symlink resolution -- it implements a memory efficient representation of the full path to shared objects allowing search at constant time O(1). This also fixes issues when resolving symbols from OSX where the system libraries contain multiple levels of symlinks.
    • Bloom filter optimization -- it uses a stohastic data structure which gives a definitive answer if a symbol is not in the set. The implementation checks the .gnu.hash section in ELF which is the GNU implementation of a bloom filter and uses it. If the symbol is not in the bloom filter, the implementation builds its own and uses it. The measured performance of the bloom filter is 30% speed up for 2mb more memory. The custom bloom filter on top of the .gnu.hash filter gives 1-2% better performance. The advantage for the custom bloom filter is that it works on all implementations which do not support .gnu.hash (windows and osx). It is also customizable if we want to further reduce the false positive rates (currently at p=2%).
    • Hash table optimization -- we build a hash table which contains all symbols for a given library. This allows us to avoid the fallback symbol iteration if multiple symbols from the same library are requested. The hash table optimization targets to optimize the case where the bloom filter tells us the symbol is maybe in the library.

    Patch by Alexander Penev (@alexander-penev) and me!

    Performance Report

    |platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|22,82|26,89|20,08| |osx 10.14| roottest-cling| 589,67|452,97|307,34| |osx 10.14| roottest-python| 377,69|475,78|311,5| |osx 10.14| roottest-root-hist| 60,59|90,98|49,65| |osx 10.14| roottest-root-math| 106,18|140,41|73,96| |osx 10.14| roottest-root-tree| 1287,53|1861|1149,35| |osx 10.14| roottest-root-treeformula | 568,43|907,46|531| |osx 10.15| root-io-stdarray| - | 126.02 | 31.42| |osx 10.15| roottest-root-treeformula| - | 327.08 | 231.14 |

    The effect of running ctest -j8: |platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|14,45|18,89|13,03| |osx 10.14| roottest-cling| 88,96|118,94|100,1| |osx 10.14| roottest-python| 107,57|60,93|100,88| |osx 10.14| roottest-root-hist| 10,25|23,25|11,77| |osx 10.14| roottest-root-math| 8,33|21,23|9,27| |osx 10.14| roottest-root-tree| 555|840,89|510,97| |osx 10.14| roottest-root-treeformula | 235,44|402,82|228,91|

    We think in -j8 we lose the advantage of the new PR because the PCH had the rootmaps read in memory and restarting the processes allowed the kernel efficiently reuse that memory. Whereas, the modules and this PR scans the libraries from disk and builds in-memory optimization data structures. Reading from disk seems to be the bottleneck (not verified) but if that's an issue in future we can write out the index making subsequent runs at almost zero cost.

    opened by vgvassilev 159
  • Make ROOT terminology and workings easier to decypher

    Make ROOT terminology and workings easier to decypher

    Explain what you would like to see improved

    Documentation.

    I am having to spend way too much time trying to figure out what basic stuff is.

    Eg. a TTree is apparently a list of "independent columns" but so far looks to me very much like it is the equivalent of a table used to back a higher level representation of table in which case the columns are not independent - they would be related (unless "independent columns" is being used to mean statistically independent variables).

    And I came across "event" in code comments which sounds very like "event" is a "row" of data which'd make sense from a Cern perspective but is ambiguous/meaningless/confusing to a newbie.

    share how it could be improved

    A Glossary with ROOT term equivalents in other frames of reference

    Eg.

    Event ~ row ~ tuple ~ observation (assuming I guessed correctly)

    TTRee ~ RDataFrame/TDataFrame ~ dataset ~ Table ~ 1 or 2 dimensional Array or Tensor ~ a grid of data with one row per event/observation/record. TBranch ~ column of data in a grid or table of data TLeaf ~ element ~ cell - a single observation of single variable

    And where these are not correct list the differences between them to clarify what they actually are.

    Without a clear and precise understanding of what the terms mean you are never sure about what you are doing.


    Some (more) high level notes on how the framework works would be very useful at the start of the primer or comments in the code to explain "magic" when it happens - I was scratching my head as to how one particular object knew to use another when no relationship appeared in the code anywhere;

    
       // The canvas on which we'll draw the graph
        auto  mycanvas = new TCanvas();
    
     // lots of code like...
    
        // Draw the graph !
        graph.DrawClone("APE");
    
    // but no mention of mycanvas again until...    
    
        mycanvas->Print("graph_with_law.pdf");
    

    which raises all sorts of questions ( as it is not obvious what is going on ).


    Basic stuff first:

    Most people will want to read in a multi column file and get stats/analysis on those columns - fromCSV is buried pretty deep considering - why why why am I reading about "TTree"s when I can get going without it ?

    improvement 
    opened by bobOnGitHub 0
  • [RF] Fix and improvements in `testSumW2Error`

    [RF] Fix and improvements in `testSumW2Error`

    In testSumW2Error, a weighted clone of an unweighted dataset is created, where each event gets the weight 0.5.

    However, in the loop over the original dataset used to fill the new dataset, get(i) is never called, meaning the new weighted dataset is only filled with the same values. This resulted in an unstable fit, necessitating careful tweaking of the initial parameters to even get convergence. That's why it's better to copy the dataset correctly, even if this is just the test case. I noticed this problem when I was copy-pasting code to create another new unit test.

    Also, the binned dataset is now a binned clone of the unbinned dataset in the test, reducing the degree of randomness.

    Furthermore, some general code improvements are applied to the source file.

    in:RooFit/RooStats 
    opened by guitargeek 5
  • [RF] Completely implement `Offset(

    [RF] Completely implement `Offset("bin")` feature

    Fully implement and test the new Offset("bin") feature over the test matrix that is the tensor product of BatchMode(), doing and extended fit, RooDataSet vs. RooDataHist, and SumW2 correction. The test should compute the likelihood for a template PDF created from the data, and it should be numerically compatible with zero.

    void testOffsetBin()
    {
       using namespace RooFit;
       using RealPtr = std::unique_ptr<RooAbsReal>;
    
       // Create extended PDF model
       RooRealVar x("x", "x", -10, 10);
       RooRealVar mean("mean", "mean", 0, -10, 10);
       RooRealVar sigma("sigma", "width", 4, 0.1, 10);
       RooRealVar nEvents{"n_events", "n_events", 10000, 100, 100000};
    
       RooGaussian gauss("gauss", "gauss", x, mean, sigma);
       RooAddPdf extGauss("extGauss", "extGauss", gauss, nEvents);
    
       std::unique_ptr<RooDataSet> data{extGauss.generate(x)};
    
       {
          // Create weighted dataset and hist to test SumW2 feature
          RooRealVar weight("weight", "weight", 0.5, 0.0, 1.0);
          auto dataW = std::make_unique<RooDataSet>("dataW", "dataW", RooArgSet{x, weight}, "weight");
          for (std::size_t i = 0; i < data->numEntries(); ++i) {
             dataW->add(*data->get(i), 0.5); // try weights that are different from unity
          }
          std::swap(dataW, data); // try to replace the original dataset with weighted dataset
       }
    
       std::unique_ptr<RooDataHist> hist{data->binnedClone()};
    
       data->Print();
       hist->Print();
    
       // Create template PDF based on data
       RooHistPdf histPdf{"histPdf", "histPdf", x, *hist};
       RooAddPdf extHistPdf("extHistPdf", "extHistPdf", histPdf, nEvents);
    
       auto& pdf = extHistPdf;
    
       auto const bm = "off"; // it should also work work BatchMode("cpu")
    
       double nllVal01 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(false))}->getVal();
       double nllVal02 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(true)) }->getVal();
       double nllVal03 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(false))}->getVal();
       double nllVal04 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(true)) }->getVal();
    
       double nllVal1  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
       double nllVal2  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
       double nllVal3  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
       double nllVal4  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
    
       // The final unit test should also include the SumW2 option in the test matrix
    
       // For all configurations, the bin offset should have the effect of bringing
       // the NLL close to zero:
       std::cout << "Unbinned fit      :  " << nllVal01 << "    " << nllVal1 << std::endl;
       std::cout << "Unbinned ext. fit : " << nllVal02 << "   " << nllVal2 << std::endl;
       std::cout << "Binned fit        :  " << nllVal03 << "   " << nllVal3 << std::endl;
       std::cout << "Binned ext. fit   : " << nllVal04 << "   " << nllVal4 << std::endl;
    }
    
    new feature in:RooFit/RooStats 
    opened by guitargeek 0
  • [RF] Remove RooFormula code for gcc <= 4.8 when minimum standard is raised to C++17

    [RF] Remove RooFormula code for gcc <= 4.8 when minimum standard is raised to C++17

    This issue serves as a reminder that the code behind #ifndef ROOFORMULA_HAVE_STD_REGEX in RooFormula.cxx can be removed once the minimum C++ standard for ROOT is raised to C++17, because then gcc 4.8 is not supported anymore anyway. At that point, std::regex probably also works with visual studio, so the #ifndef _MSC_VER check can probably be removed in the same go.

    See #8583 as a reference for what files to check to know what the minimum supported C++ standard of ROOT is.

    in:RooFit/RooStats 
    opened by guitargeek 0
  • [RF] Exclude `RooGrid` class from IO

    [RF] Exclude `RooGrid` class from IO

    The RooGrid is a utility class for the RooMCIntegrator, which doesn't support IO itself. Therefore, it doesn't make sense to have a ClassDef(1) macro. It is only putting the unnecessary burden of keeping backwards compatibility on the developers.

    Therefore, this commit suggests to leave out the ClassDef macro out of RooGrid, and also remove the unnecessary base classes TObject and RooPrintable. There is only one printing function that makes sense anyway, which is kept without implementing the full RooPrintable interface.

    in:RooFit/RooStats 
    opened by guitargeek 6
  • [RF] Avoid code duplication with new private `Algorithms.h` file

    [RF] Avoid code duplication with new private `Algorithms.h` file

    The RooMomentMorphND and RooMomentMorphFuncND classes duplicated some copy-pasted code from stackoverflow. This is not factored out into a new private header file to avoid code duplication.

    Also, a semicolon is added after TRACE_CREATE and TRACE_DESTROY in order to not confuse clang-format.

    in:RooFit/RooStats 
    opened by guitargeek 6
Releases(v6-26-10)
Owner
ROOT
A modular scientific software framework
ROOT
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni

Unidata 971 Dec 25, 2022
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 5, 2023
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

Sentinel Hub 18 Dec 23, 2022
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Vo Cong Thanh 1 Jan 6, 2022
Tools for analyzing data collected with a custom unity-based VR for insects.

unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul

Hannah Haberkern 1 Dec 14, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 4, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

null 0 Dec 1, 2021
PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Burn Research 4 Oct 13, 2022
Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g

null 9 Nov 16, 2022
Developed for analyzing the covariance for OrcVIO

about This repo is developed for analyzing the covariance for OrcVIO environment setup platform ubuntu 18.04 using conda conda env create --file envir

Sean 1 Dec 8, 2021
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 3, 2023
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022