In this post we will review how to profile CPU usage of python functions.
First, let us create a CPU consuming code:
import math
import random
import time
def calculate(value):
return math.sqrt(value * 100)
def scan_dict(dict):
for key in dict.keys():
value = calculate(dict[key])
if value > 100:
print("big")
def main():
start = time.time()
dict = {}
for i in range(10000):
dict[i] = random.randint(0, 100)
for i in range(1000):
scan_dict(dict)
passed = time.time() - start
print('total {:.3f} seconds'.format(passed))
This code initializes a dictionary with 10K entries, and then it calls 1000 time to a function that scans the dictionary entries, and calculate square root for each element.
As you can see, we have a timing printing for the entire main() function. On my machine, it prints:
total 1.787 seconds
So we about 2 seconds for this program. Now lets try finding out where is the time spent. We will use cProfile for this.
import cProfile
import pstats
import io
pr = cProfile.Profile()
pr.enable()
main()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumtime')
ps.print_stats()
with open('test.txt', 'w+') as f:
f.write(s.getvalue())
The first thing noticed when wrapping the main() function with cProfile, is the response time degradation.
total 3.572 seconds
So profiling has its price. The analyze of the cProfile is saved to a file, which is sorted by the cumulative time spent in each function.
20054767 function calls in 3.572 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.003 0.003 3.572 3.572 /home/alon/git/a.py:17(main)
1000 1.527 0.002 3.557 0.004 /home/alon/git/a.py:10(scan_dict)
10000000 1.455 0.000 2.030 0.000 /home/alon/git/a.py:6(calculate)
10000000 0.575 0.000 0.575 0.000 {built-in method math.sqrt}
10000 0.002 0.000 0.011 0.000 /usr/lib/python3.8/random.py:244(randint)
10000 0.004 0.000 0.009 0.000 /usr/lib/python3.8/random.py:200(randrange)
10000 0.003 0.000 0.005 0.000 /usr/lib/python3.8/random.py:250(_randbelow_with_getrandbits)
12761 0.001 0.000 0.001 0.000 {method 'getrandbits' of '_random.Random' objects}
10000 0.001 0.000 0.001 0.000 {method 'bit_length' of 'int' objects}
1000 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The table contains a line per function. For each function, we have the following information.
- ncalls - amount of function calls
- tottime - time spent in the function, sub-functions not included
- percall - tottime/ncalls
- cumtime - time spent in the function, sub-functions included
- percall - cumtime/ncalls
Final Note
The results are quite surprising. I had expected that the square root function will consume most of the CPU, but it turns out that most of the time was spent in the scanning of the dictionary. Turns out that python is not very effective tool to scan a big dictionary. For example, the same code in GO would run ~15 times faster.