run KISS: Python Profiling

In this post we will review how to profile CPU usage of python functions.

First, let us create a CPU consuming code:

import math
import random
import time


def calculate(value):
    return math.sqrt(value * 100)


def scan_dict(dict):
    for key in dict.keys():
        value = calculate(dict[key])
        if value > 100:
            print("big")


def main():
    start = time.time()
    dict = {}
    for i in range(10000):
        dict[i] = random.randint(0, 100)

    for i in range(1000):
        scan_dict(dict)

    passed = time.time() - start
    print('total {:.3f} seconds'.format(passed))

This code initializes a dictionary with 10K entries, and then it calls 1000 time to a function that scans the dictionary entries, and calculate square root for each element.

As you can see, we have a timing printing for the entire main() function. On my machine, it prints:

total 1.787 seconds

So we about 2 seconds for this program. Now lets try finding out where is the time spent. We will use cProfile for this.

import cProfile
import pstats
import io

pr = cProfile.Profile()
pr.enable()
main()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumtime')
ps.print_stats()

with open('test.txt', 'w+') as f:
    f.write(s.getvalue())

The first thing noticed when wrapping the main() function with cProfile, is the response time degradation.

total 3.572 seconds

So profiling has its price. The analyze of the cProfile is saved to a file, which is sorted by the cumulative time spent in each function.

        20054767 function calls in 3.572 seconds

  Ordered by: cumulative time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.003    0.003    3.572    3.572 /home/alon/git/a.py:17(main)
    1000    1.527    0.002    3.557    0.004 /home/alon/git/a.py:10(scan_dict)
10000000    1.455    0.000    2.030    0.000 /home/alon/git/a.py:6(calculate)
10000000    0.575    0.000    0.575    0.000 {built-in method math.sqrt}
   10000    0.002    0.000    0.011    0.000 /usr/lib/python3.8/random.py:244(randint)
   10000    0.004    0.000    0.009    0.000 /usr/lib/python3.8/random.py:200(randrange)
   10000    0.003    0.000    0.005    0.000 /usr/lib/python3.8/random.py:250(_randbelow_with_getrandbits)
   12761    0.001    0.000    0.001    0.000 {method 'getrandbits' of '_random.Random' objects}
   10000    0.001    0.000    0.001    0.000 {method 'bit_length' of 'int' objects}
    1000    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
       1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
       1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
       2    0.000    0.000    0.000    0.000 {built-in method time.time}
       1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The table contains a line per function. For each function, we have the following information.

ncalls - amount of function calls
tottime - time spent in the function, sub-functions not included
percall - tottime/ncalls
cumtime - time spent in the function, sub-functions included
percall - cumtime/ncalls

Final Note

The results are quite surprising. I had expected that the square root function will consume most of the CPU, but it turns out that most of the time was spent in the scanning of the dictionary. Turns out that python is not very effective tool to scan a big dictionary. For example, the same code in GO would run ~15 times faster.

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Monday, April 11, 2022

Python Profiling

Final Note

No comments:

Post a Comment