赵走x博客 - 开发者的网上家园

Python性能分析与优化：1、性能分析基础

资源编号:75720 前端微信小程序相关问题热度：93

本系列文章来自书籍《Python性能分析与优化》，原书籍依据python2，本人在原有书籍的基础上，进行了部分修改，采用了python3进行代码演示。不涉及商业目的，若发现侵权问题，请与作者留言联系，我会在第一时间下架。

# 开篇说明
>本系列文章来自书籍《Python性能分析与优化》，原书籍依据python2，本人在原有书籍的基础上，进行了部分修改，采用了python3进行代码演示。不涉及商业目的，若发现侵权问题，请与作者留言联系，我会在第一时间下架。

# **第 1 章　性能分析基础**

就像在12秒内跑完100米障碍跑的人在婴儿时期需要先学爬一样，程序员在精通性能分析（profiling）之前需要先了解一些基础知识。因此，在我们探索Python程序的性能优化与分析技术之前，需要对相关的基础知识有一个清晰的认识。

只要你掌握了这些基础知识，就可以进一步学习具体的工具和技术。因此，这一章将介绍所有你平时羞于开口问人却又应该掌握的性能分析知识。本章的具体内容如下。

* 介绍性能分析的明确定义，概述各种性能分析技术。
    
* 论述性能分析在开发周期中的重要作用，因为性能分析不是那种只做一次就抛到脑后的事情。性能分析应该是开发过程中一个完整的组成部分，就像写测试一样。
    
* 介绍哪些东西适合进行性能分析。看看我们可以度量哪些资源，以及这些度量如何帮助我们发现性能瓶颈。
    
* 分析过早优化的风险，即解释为什么未经性能分析便对代码进行优化通常不是一种好做法。
    
* 学习关于程序运行时间复杂性的知识。虽然理解性能分析技术是成功优化程序的一个步骤，但我们也需要理解算法复杂性的度量指标，这样才能够明白是否有必要优化算法。
    
* 一些好的做法。本章最后将介绍一些对项目进行性能分析时需要记住的好习惯。

## **1.1　什么是性能分析**

没有优化过的程序通常会在某些子程序（subroutine）上消耗大部分的CPU指令周期（CPU cycle）。性能分析就是分析代码和它正在使用的资源之间有着怎样的关系。例如，性能分析可以告诉你一个指令占用了多少CPU时间，或者整个程序消耗了多少内存。性能分析是通过使用一种被称为性能分析器（profiler）的工具，对程序或者二进制可执行文件（如果可以拿到）的源代码进行调整来完成的。

通常，当需要优化程序性能，或者程序遇到了一些奇怪的bug时（一般与内存泄漏有关），开发者会对他们的程序进行性能分析。这时，性能分析可以帮助开发者深刻地了解程序是如何使用计算机资源的（即可以细致到一个函数被调用了多少次）。

根据这些信息，以及对源代码的深刻认知，开发者就可以找到程序的性能瓶颈或者内存泄漏所在，然后修复错误的代码。

性能分析软件有两类方法论：基于事件的性能分析（event-based profiling）和统计式性能分析（statistical profiling）。在使用这两类软件时，应该牢记它们各自的优缺点。

### **1.1.1　基于事件的性能分析**

不是所有的编程语言都支持这类性能分析。支持这类基于事件的性能分析的编程语言主要有以下几种。

* **Java**：JVMTI（JVM Tools Interface，JVM工具接口）为性能分析器提供了钩子，可以跟踪诸如函数调用、线程相关的事件、类加载之类的事件。
    
* **.NET**：和Java一样，.NET运行时提供了事件跟踪功能（[https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Testing/Profiling#Methods_of_data_gathering](https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Testing/Profiling#Methods_of_data_gathering)）。
    
* **Python**： 开发者可以用`sys.setprofile`函数，跟踪`python_[call|return|exception]`或`c_[call|return|exception]`之类的事件。

**基于事件的性能分析器**（event-based profiler，也称为**轨迹性能分析器**，tracing profiler）是通过收集程序执行过程中的具体事件进行工作的。这些性能分析器会产生大量的数据。基本上，它们需要监听的事件越多，产生的数据量就越大。这导致它们不太实用，在开始对程序进行性能分析时也不是首选。但是，当其他性能分析方法不够用或者不够精确时，它们可以作为最后的选择。如果你想分析程序中所有返回语句的性能，那么这类性能分析器就可以为你提供完成任务应该有的颗粒度，而其他性能分析器都不能为你提供如此细致的结果。

一个Python基于事件的性能分析器的简单示例代码如下所示（当学完后面的章节时，你对这个主题的理解将会更加深刻）：

```
import profile
import sys

def profiler(frame, event, arg):
    print('profiler: %r %r' % (event, arg))

sys.setprofile(profiler)

# 计算斐波那契数列的简单（也是非常低效的）示例
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

def fib_seq(n):
    seq = []
    if n > 0:
        seq.extend(fib_seq(n - 1))
    seq.append(fib(n))
    return seq

print(fib_seq(2))

```

上面程序的输出结果如下所示：

```
profiler: 'call' None
profiler: 'call' None
profiler: 'call' None
profiler: 'call' None
profiler: 'return' 0
profiler: 'c_call' <built-in method append of list object at 0x113d5a888>
profiler: 'c_return' <built-in method append of list object at 0x113d5a888>
profiler: 'return' [0]
profiler: 'c_call' <built-in method extend of list object at 0x113d5a8c8>
profiler: 'c_return' <built-in method extend of list object at 0x113d5a8c8>
profiler: 'call' None
profiler: 'return' 1
profiler: 'c_call' <built-in method append of list object at 0x113d5a8c8>
profiler: 'c_return' <built-in method append of list object at 0x113d5a8c8>
profiler: 'return' [0, 1]
profiler: 'c_call' <built-in method extend of list object at 0x10c1b0f88>
profiler: 'c_return' <built-in method extend of list object at 0x10c1b0f88>
profiler: 'call' None
profiler: 'call' None
profiler: 'return' 1
profiler: 'call' None
profiler: 'return' 0
profiler: 'return' 1
profiler: 'c_call' <built-in method append of list object at 0x10c1b0f88>
profiler: 'c_return' <built-in method append of list object at 0x10c1b0f88>
profiler: 'return' [0, 1, 1]
profiler: 'c_call' <built-in function print>
[0, 1, 1]
profiler: 'c_return' <built-in function print>
profiler: 'return' None
profiler: 'call' None
profiler: 'c_call' <built-in method locked of _thread.lock object at 0x10c5420d0>
profiler: 'c_return' <built-in method locked of _thread.lock object at 0x10c5420d0>
profiler: 'c_call' <built-in method release of _thread.lock object at 0x10c5420d0>
profiler: 'c_return' <built-in method release of _thread.lock object at 0x10c5420d0>
profiler: 'call' None
profiler: 'c_call' <built-in method locked of _thread.lock object at 0x10c5420d0>
profiler: 'c_return' <built-in method locked of _thread.lock object at 0x10c5420d0>
profiler: 'return' None
profiler: 'call' None
profiler: 'call' None
profiler: 'c_call' <built-in method values of dict object at 0x10c52b2d0>
profiler: 'c_return' <built-in method values of dict object at 0x10c52b2d0>
profiler: 'c_call' <built-in method values of dict object at 0x10c52b678>
profiler: 'c_return' <built-in method values of dict object at 0x10c52b678>
profiler: 'return' [<_MainThread(MainThread, stopped 4649584064)>]
profiler: 'call' None
profiler: 'return' False
profiler: 'call' None
profiler: 'return' False
profiler: 'return' None
profiler: 'return' None
profiler: 'call' None
profiler: 'call' None
profiler: 'c_call' <built-in method acquire of _thread.RLock object at 0x10c51bc00>
profiler: 'c_return' <built-in method acquire of _thread.RLock object at 0x10c51bc00>
profiler: 'return' None
profiler: 'call' None
profiler: 'call' None
profiler: 'c_call' <built-in method acquire of _thread.RLock object at 0x10c51bc00>
profiler: 'c_return' <built-in method acquire of _thread.RLock object at 0x10c51bc00>
profiler: 'return' None
profiler: 'call' None
profiler: 'return' <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
profiler: 'call' None
profiler: 'return' <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
profiler: 'c_call' <built-in function hasattr>
profiler: 'c_return' <built-in function hasattr>
profiler: 'call' None
profiler: 'return' <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
profiler: 'c_call' <built-in method flush of _io.TextIOWrapper object at 0x10c1a6708>
profiler: 'c_return' <built-in method flush of _io.TextIOWrapper object at 0x10c1a6708>
profiler: 'call' None
profiler: 'c_call' <built-in method release of _thread.RLock object at 0x10c51bc00>
profiler: 'c_return' <built-in method release of _thread.RLock object at 0x10c51bc00>
profiler: 'return' None
profiler: 'return' None
profiler: 'call' None
profiler: 'call' None
profiler: 'c_call' <built-in method acquire of _thread.RLock object at 0x10c51b750>
profiler: 'c_return' <built-in method acquire of _thread.RLock object at 0x10c51b750>
profiler: 'return' None
profiler: 'call' None
profiler: 'c_call' <built-in method release of _thread.RLock object at 0x10c51b750>
profiler: 'c_return' <built-in method release of _thread.RLock object at 0x10c51b750>
profiler: 'return' None
profiler: 'return' None
profiler: 'call' None
profiler: 'c_call' <built-in method release of _thread.RLock object at 0x10c51bc00>
profiler: 'c_return' <built-in method release of _thread.RLock object at 0x10c51bc00>
profiler: 'return' None
profiler: 'return' None
profiler: 'call' None
profiler: 'return' None
```

你会发现，`PROFILER`会被每一个事件调用。我们可以打印/收集`PROFILER`函数里我们觉得有意义的内容。在上面的简单示例代码中，最后一行表示执行`fib_seq(2)`生成一组数值。如果我们处理一个实际点儿的程序，性能分析输出的结果可能要比上述结果大好几个数量级。这就是基于事件的性能分析软件通常作为性能分析的最后选择的原因。虽然其他性能分析软件（马上就会看到）产生的结果会少很多，但是分析的精确程度也要低一些。

### **1.1.2　统计式性能分析**

统计式性能分析器以固定的时间间隔对程序计数器（program counter）进行抽样统计。这样做可以让开发者掌握目标程序在每个函数上消耗的时间。由于它对程序计数器进行抽样，所以数据结果是对真实值的统计近似。不过，这类软件足以窥见被分析程序的性能细节，查出性能瓶颈之所在。

这类性能分析软件的优点如下所示。

* **分析的数据更少**：由于我们只对程序执行过程进行抽样，而不用保留每一条数据，因此需要分析的信息量会显著减少。
    
* **对性能造成的影响更小**：由于使用抽样的方式（用操作系统中断），目标程序的性能遭受的干扰更小。虽然使用性能分析器并不能做到100%无干扰，但是统计式性能分析器比基于事件的性能分析器造成的干扰要小。

下面是一个Linux统计式性能分析器OProfile（[http://oprofile.sourceforge.net/news/](http://oprofile.sourceforge.net/news/)）的分析结果：

```
Function name,File name,Times Encountered,Percentage
"func80000","statistical_profiling.c",30760,48.96%
"func40000","statistical_profiling.c",17515,27.88%
"func20000","static_functions.c",7141,11.37%
"func10000","static_functions.c",3572,5.69%
"func5000","static_functions.c",1787,2.84%
"func2000","static_functions.c",768,1.22%
"func1500","statistical_profiling.c",701,1.12%
"func1000","static_functions.c",385,0.61%
"func500","statistical_profiling.c",194,0.31%

```

下面的性能分析结果，是通过Python的统计式性能分析器statprof对前面的代码进行分析得出的：

```
%     cumulative      self
time     seconds   seconds  name
100.00      0.01      0.01  B02088_01_03.py:11:fib
  0.00      0.01      0.00  B02088_01_03.py:17:fib_seq
  0.00      0.01      0.00  B02088_01_03.py:21:<module>
---
Sample count: 1
Total time: 0.010000 seconds

```

你会发现，两个性能分析器对同样代码的分析结果差异非常大。

## **1.2　性能分析的重要性**

现在我们已经知道了性能分析的涵义，还应该理解在产品开发周期中进行性能分析的重要性和实际意义。

性能分析并不是每个程序都要做的事情，尤其对于那些小软件来说，是没多大必要的（不像那些杀手级嵌入式软件或专门用于演示的性能分析程序）。性能分析需要花时间，而且只有在程序中发现了错误的时候才有用。但是，仍然可以在此之前进行性能分析，捕获潜在的bug，这样可以节省后期的程序调试时间。

在硬件变得越来越先进、越来越快速且越来越便宜的今天，开发者自然也越来越难以理解，为什么我们还要消耗资源（主要是时间）去对开发的产品进行性能分析。毕竟，我们已经拥有测试驱动开发、代码审查、结对编程，以及其他让代码更加可靠且符合预期的手段。难道不是吗？

然而，我们没有意识到的是，随着我们使用的编程语言越来越高级（几年间我们就从汇编语言进化到了JavaScript），我们愈加不关心CPU循环周期、内存配置、CPU寄存器等底层细节了。新一代程序员都通过高级语言学习编程技术，因为它们更容易理解而且开箱即用。但它们依然是对硬件和与硬件交互行为的抽象。随着这种趋势的增长，新的开发者越来越不会将性能分析作为软件开发中的一个步骤了。

让我们看看下面这种情景。

我们已经知道，性能分析是用来测量程序所使用的资源的。前面已经说过，资源正变得越来越便宜。因此，生产软件并让更多的客户使用我们的软件，其成本变得越来越低。

如今，随便开发一个软件就可以获得上千用户。如果通过社交网络一推广，用户可能马上就会呈指数级增长。一旦用户量激增，程序通常会崩溃，或者变得异常缓慢，最终被客户无情抛弃。

上面这种情况，显然可能是由于糟糕的软件设计和缺乏扩展性的架构造成的。毕竟，一台服务器有限的内存和CPU资源也可能会成为软件的瓶颈。但是，另一种可能的原因，也是被证明过许多次的原因，就是我们的程序没有做过压力测试。我们没有考虑过资源消耗情况；我们只保证了测试已经通过，而且乐此不疲。也就是说，我们目光短浅，结果就是项目崩溃夭折。

性能分析可以帮助我们避免项目崩溃夭折，因为它可以相当准确地为我们展示程序运行的情况，不论负载情况如何。因此，如果在负载非常低的情况下，通过性能分析发现软件在I/O操作上消耗了80%的时间，那么这就给了我们一个提示。有人可能觉得，在测试阶段程序运行很正常，在负载很重的情况下也应该不会有问题。想想内存泄漏的情况吧。在这种情况下，小测试是不会发现大负载里出现的bug的。但是，产品负载过重时，内存泄漏就会发生。性能分析可以在负载真的过重之前，为我们提供足够的证据来发现这类隐患。