Monday, April 11, 2022

Python Profiling



In this post we will review how to profile CPU usage of python functions.


First, let us create a CPU consuming code:



import math
import random
import time


def calculate(value):
return math.sqrt(value * 100)


def scan_dict(dict):
for key in dict.keys():
value = calculate(dict[key])
if value > 100:
print("big")


def main():
start = time.time()
dict = {}
for i in range(10000):
dict[i] = random.randint(0, 100)

for i in range(1000):
scan_dict(dict)

passed = time.time() - start
print('total {:.3f} seconds'.format(passed))



This code initializes a dictionary with 10K entries, and then it calls 1000 time to a function that scans the dictionary entries, and calculate square root for each element.


As you can see, we have a timing printing for the entire main() function. On my machine, it prints:


total 1.787 seconds


So we about 2 seconds for this program. Now lets try finding out where is the time spent. We will use cProfile for this.



import cProfile
import pstats
import io

pr = cProfile.Profile()
pr.enable()
main()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumtime')
ps.print_stats()

with open('test.txt', 'w+') as f:
f.write(s.getvalue())


The first thing noticed when wrapping the main() function with cProfile, is the response time degradation.


total 3.572 seconds


So profiling has its price. The analyze of the cProfile is saved to a file, which is sorted by the cumulative time spent in each function.



        20054767 function calls in 3.572 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.003 0.003 3.572 3.572 /home/alon/git/a.py:17(main)
1000 1.527 0.002 3.557 0.004 /home/alon/git/a.py:10(scan_dict)
10000000 1.455 0.000 2.030 0.000 /home/alon/git/a.py:6(calculate)
10000000 0.575 0.000 0.575 0.000 {built-in method math.sqrt}
10000 0.002 0.000 0.011 0.000 /usr/lib/python3.8/random.py:244(randint)
10000 0.004 0.000 0.009 0.000 /usr/lib/python3.8/random.py:200(randrange)
10000 0.003 0.000 0.005 0.000 /usr/lib/python3.8/random.py:250(_randbelow_with_getrandbits)
12761 0.001 0.000 0.001 0.000 {method 'getrandbits' of '_random.Random' objects}
10000 0.001 0.000 0.001 0.000 {method 'bit_length' of 'int' objects}
1000 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}



The table contains a line per function. For each function, we have the following information.

  • ncalls - amount of function calls
  • tottime - time spent in the function, sub-functions not included
  • percall - tottime/ncalls
  • cumtime - time spent in the function, sub-functions included
  • percall - cumtime/ncalls


Final Note

The results are quite surprising. I had expected that the square root function will consume most of the CPU, but it turns out that most of the time was spent in the scanning of the dictionary. Turns out that python is not very effective tool to scan a big dictionary. For example, the same code in GO would run ~15 times faster.











No comments:

Post a Comment