Performance Profiling Guide for Python

Profilers help to identify performance problems. These are tools designed to give the metrics to find the slowest parts of the code so that we can optimize what really matters. Profilers can gather a wide variety of metrics: wall time, CPU time, network or memory consumption, I/O operations, etc.
Profilers can answer questions like,

How many times is each method in my code called?
How long does each of these methods take?
How much memory does the method consume?

There are different types of profilers:

Deterministic Profiling: Deterministic profilers execute trace functions at various points of interest (function call, function return) and record precise timings of these events. It means the code runs slower under profiling. Its use in production systems is often impractical.
Statistical profiling: Instead of tracking every event (call to every function), statistical profilers interrupt applications periodically and collect samples of the execution state (call stack snapshots). The call stacks are then analyzed to determine the execution time of different parts of the application. This method is less accurate, but it also reduces the overhead.

All the profilers we are going to discuss here are Deterministic Profilers because they capture precise timings of events. Please note that the Memory Profiler package also has the mprof module that does statistical profiling. It is discussed briefly in Memory Profiler notebook.

This GitHub aims to show different profilers for Python and explain in detail the procedure to profile different workloads with different profilers. Below is the list of all the profilers we will be discussing. Each profiler has a separate folder with a Jupyter Notebook to guide you.

Performance Profiler	Lines or Function	Description of Profiler
Memory Profiler	lines	It provides memory consumption of each individual line inside the function. Minimal code modification is required. It is generally used after identifying hotspot functions from a function profiler. It does not profile GPU workloads. It cannot profile individual threads. It does not provide execution time information.
Line Profiler	lines	It times the execution of each individual line inside the function. No code modification is required. It is generally used after identifying hotspot functions from a function profiler. It does not profile GPU workloads. It cannot profile individual threads. It does not provide memory consumption information.
cProfile	function	It times the execution of different functions. No code modification is required. It provides a call stack graph and execution time of functions that help identify hotspots. It does not profile GPU workloads. It cannot profile individual threads. It does not provide memory consumption information.
Profile	function	It times the execution of different functions. No code modification is required. It provides a call stack graph and execution time of functions that help identify hotspots. It does not profile GPU workloads. Unlike cProfile, it can profile individual threads but has more overhead compared to cProfile. It does not provide memory consumption information
FunctionTrace	function	It times the execution of different functions but only supports Python>3.5. No code modification is required. It provides stack charts, flame graphs, and call trees that help identify hotspots. It does not profile GPU workloads. It can profile individual threads. It does not provide memory consumption information. Profiling results can be shared very easily through browser.
Scalene	function and line	It times the execution of different functions and lines but only supports Python>3.7. No code modification is required. It does not provide call stack information. It can profile GPU workloads. It can profile individual threads. It provides memory consumption information. Profiling results can be shared very easily through browser. It has integration to GPT3, when activated it can suggest changes to optimize code
VTune	function and line	It times the execution of different functions and lines and supports other languages like C, Java, etc. Minimal code modification is required. It also provides a GUI that is easy to use. It provides call stack information, flame graph, and hardware utilization. It can profile GPU workloads. It can profile individual threads. It provides memory consumption information. Profiling results can be shared very easily through web browser interface. It also gives low-level C, C++ functions that can be potential hotspots. The profiling overhead is high as compared to other profilers.

We will also use the following Intel AI Reference Kit in our profiling examples:

Scikit-Learn Intelligent Indexing for Incoming Correspondence – Ref Kit

Follow the steps mentioned in the intelligent-Indexing Ref Kit GitHub ReadMe to setup the environments accordingly.
The process involves

Setting up a virtual environment for both stock and Intel®-accelerated machine learning packages
Preprocessing data using Pandas*/Intel® Distribution of Modin and NLTK
Training an NLP model for text classification using Scikit-Learn*/Intel® Extension for Scikit-Learn*

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
FunctionTrace_Profiler		FunctionTrace_Profiler
Line_Profiler		Line_Profiler
Memory_Profiler		Memory_Profiler
Python_Profiler		Python_Profiler
Scalene_Profiler		Scalene_Profiler
Vtune_Profiler		Vtune_Profiler
cProfile_Profiler		cProfile_Profiler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance Profiling Guide for Python

About

Releases

Packages

Contributors 3

Languages

License

IntelPython/Profiling_Guide

Folders and files

Latest commit

History

Repository files navigation

Performance Profiling Guide for Python

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages