This is the home of an executable book project about using Modern C++ for high-performance data science.
Itβs a companion to a series of talks by Armin Sobhani for the Compute Ontario Colloquia.
Itβll be updated as more talks in the series are delivered.

Try in a Container π οΈ
docker run -p 8888:8888 -it --rm asobhani/high-performance-data-science-with-modern-cppOr this one with CUDA support:
docker run --gpus=all -p 8888:8888 -it --rm asobhani/high-performance-data-science-with-modern-cpp:latest-cudaapptainer run docker://asobhani/high-performance-data-science-with-modern-cpp:latestOr this one with CUDA support:
apptainer run --nv docker://asobhani/high-performance-data-science-with-modern-cpp:latest-cudaWatch the Recordings πΊΒΆ
1οΈβ£ Xeus-Cling and G3P
2οΈβ£ Ranx
C++ vs. Python for Data ScienceΒΆ
π Ease of Use
π Community and Libraries
Winner π Python π | |
|---|---|
| π» | C++'s ecosystem is not as extensive as Pythonβs for data science |
| β | Python has extensive libraries like NumPy, Pandas, Matplotlib, etc. and a large and active community |
π Performance
π Concurrency
Winner π C++ π | |
|---|---|
| β | C++ has built-in support for concurrency (C++11) and parallel algorithms (C++17) |
| π» | Pythonβs global interpreter lock can be a limitation for multi-threaded applications |
πΌ Memory Management
π« Rapid Prototyping
Winner π Python π | |
|---|---|
| π» | C++'s compiled nature makes it a lackluster |
| β | Pythonβs interpreted nature combined with Project Jupyter makes it a perfect match for the job |