Skip to article frontmatterSkip to article content

High-Performance Data Science with Modern C++

The Executable Book

Ontario Tech University

GitHub License Binder


This is the home of an executable book project about using Modern C++ for high-performance data science.

It’s a companion to a series of talks by Armin Sobhani for the Compute Ontario Colloquia.

It’ll be updated as more talks in the series are delivered.

Watch the Recordings πŸ“ΊΒΆ

C++ vs. Python for Data ScienceΒΆ

😌 Ease of Use

Winner πŸ† Python πŸ†
πŸ”»C++ has a steeper learning curve and more complex syntax compared to Python
βœ…Python’s syntax is simple and readable, making it accessible for beginners

πŸ“š Community and Libraries

Winner πŸ† Python πŸ†
πŸ”»C++'s ecosystem is not as extensive as Python’s for data science
βœ…Python has extensive libraries like NumPy, Pandas, Matplotlib, etc. and a large and active community

πŸƒ Performance

Winner πŸ† C++ πŸ†
βœ…C++ is known for its high performance and efficiency
πŸ”»Python is generally slower than C++ due to its interpreted nature

πŸ”€ Concurrency

Winner πŸ† C++ πŸ†
βœ…C++ has built-in support for concurrency (C++11) and parallel algorithms (C++17)
πŸ”»Python’s global interpreter lock can be a limitation for multi-threaded applications

πŸ’Ό Memory Management

Winner πŸ† C++ πŸ†
βœ…C++ offers fine-grained control over memory management, which can be crucial for large-scale data processing
πŸ”»Python offers less control compared to C++

πŸ’« Rapid Prototyping

Winner πŸ† Python πŸ†
πŸ”»C++'s compiled nature makes it a lackluster
βœ…Python’s interpreted nature combined with Project Jupyter makes it a perfect match for the job