This is the home of an executable book project about using Modern C++ for high-performance data science.
Itβs a companion to a series of talks by Armin Sobhani for the Compute Ontario Colloquia.
Itβll be updated as more talks in the series are delivered.

Try in a Container π οΈ
docker run -p 8888:8888 -it --rm asobhani/high-performance-data-science-with-modern-cpp
Or this one with CUDA support:
docker run --gpus=all -p 8888:8888 -it --rm asobhani/high-performance-data-science-with-modern-cpp:latest-cuda
apptainer run docker://asobhani/high-performance-data-science-with-modern-cpp:latest
Or this one with CUDA support:
apptainer run --nv docker://asobhani/high-performance-data-science-with-modern-cpp:latest-cuda
Watch the video πΊ
C++ vs. Python for Data ScienceΒΆ
π Ease of Use
π Community and Libraries
- C++'s ecosystem is not as extensive as
Python
βs for data science π - Python has extensive libraries like NumPy, Pandas, Matplotlib, etc. and a large and active community π
π Performance
π Concurrency
- C++ has built-in support for concurrency (
C++11
) and parallel algorithms (C++17
) π - Pythonβs global interpreter lock can be a limitation for multi-threaded applications π
πΌ Memory Management
π« Rapid Prototyping
- C++'s compiled nature makes it a lackluster π
- Pythonβs interpreted nature combined with Project Jupyter makes it a perfect match for the job π