列表¶

列表是一种常用的数据结构，用于存储对象。程序员通常关注获取、设置、搜索、过滤和排序操作。此外，有时我们还会遇到内存管理方面的一些常见陷阱。因此，本速查表的主要目标是收集一些常见的操作和陷阱。

从零开始¶

在 Python 中，我们可以使用多种方法来操作列表。在开始学习这些多功能操作之前，以下代码片段展示了列表最常见的操作。

>>> a = [1, 2, 3, 4, 5]
>>> # contains
>>> 2 in a
True
>>> # positive index
>>> a[0]
1
>>> # negative index
>>> a[-1]
5
>>> # slicing list[start:end:step]
>>> a[1:]
[2, 3, 4, 5]
>>> a[1:-1]
[2, 3, 4]
>>> a[1:-1:2]
[2, 4]
>>> # reverse
>>> a[::-1]
[5, 4, 3, 2, 1]
>>> a[:0:-1]
[5, 4, 3, 2]
>>> # set an item
>>> a[0] = 0
>>> a
[0, 2, 3, 4, 5]
>>> # append items to list
>>> a.append(6)
>>> a
[0, 2, 3, 4, 5, 6]
>>> a.extend([7, 8, 9])
>>> a
[0, 2, 3, 4, 5, 6, 7, 8, 9]
>>> # delete an item
>>> del a[-1]
>>> a
[0, 2, 3, 4, 5, 6, 7, 8]
>>> # list comprehension
>>> b = [x for x in range(3)]
>>> b
[0, 1, 2]
>>> # add two lists
>>> a + b
[0, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2]

初始化¶

通常情况下，如果列表表达式中的元素是不可变对象，我们可以通过 * 运算符来创建一个列表。

>>> a = [None] * 3
>>> a
[None, None, None]
>>> a[0] = "foo"
>>> a
['foo', None, None]

但是，如果列表表达式中的元素是可变对象，则 * 运算符会将该元素的引用复制 N 次。为了避免此陷阱，我们应该使用列表推导式来初始化列表。

>>> a = [[]] * 3
>>> b = [[] for _ in range(3)]
>>> a[0].append("Hello")
>>> a
[['Hello'], ['Hello'], ['Hello']]
>>> b[0].append("Python")
>>> b
[['Python'], [], []]

复制¶

将列表赋值给变量是一种常见的陷阱。此赋值操作不会将列表复制到变量中。变量仅引用列表并增加列表的引用计数。

import sys
>>> a = [1, 2, 3]
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> b[2] = 123456  # a[2] = 123456
>>> b
[1, 2, 123456]
>>> a
[1, 2, 123456]

复制操作有两种类型。第一种称为浅拷贝（非递归复制），第二种称为深拷贝（递归复制）。大多数情况下，浅拷贝足以复制列表。但是，如果列表嵌套，则必须使用深拷贝。

>>> # shallow copy
>>> a = [1, 2]
>>> b = list(a)
>>> b[0] = 123
>>> a
[1, 2]
>>> b
[123, 2]
>>> a = [[1], [2]]
>>> b = list(a)
>>> b[0][0] = 123
>>> a
[[123], [2]]
>>> b
[[123], [2]]
>>> # deep copy
>>> import copy
>>> a = [[1], [2]]
>>> b = copy.deepcopy(a)
>>> b[0][0] = 123
>>> a
[[1], [2]]
>>> b
[[123], [2]]

使用 `slice`¶

有时，我们的数据可能像数据包一样连接成一个大的片段。在这种情况下，我们将使用 slice 对象作为解释变量来表示数据的范围，而不是使用切片表达式。

>>> icmp = (
...     b"080062988e2100005bff49c20005767c"
...     b"08090a0b0c0d0e0f1011121314151617"
...     b"18191a1b1c1d1e1f2021222324252627"
...     b"28292a2b2c2d2e2f3031323334353637"
... )
>>> head = slice(0, 32)
>>> data = slice(32, len(icmp))
>>> icmp[head]
b'080062988e2100005bff49c20005767c'

列表推导式¶

在 PEP 202 中提出的列表推导式提供了一种优雅的方式来根据另一个列表、序列或某些可迭代对象创建新的列表。此外，我们有时可以使用此表达式来替代 map 和 filter。

>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [(lambda x: x**2)(i) for i in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x for x in range(10) if x > 5]
[6, 7, 8, 9]
>>> [x if x > 5 else 0 for x in range(10)]
[0, 0, 0, 0, 0, 0, 6, 7, 8, 9]
>>> [x + 1 if x < 5 else x + 2 if x > 5 else x + 5 for x in range(10)]
[1, 2, 3, 4, 5, 10, 8, 9, 10, 11]
>>> [(x, y) for x in range(3) for y in range(2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]

解包¶

有时，我们希望将列表解包到变量中，以使我们的代码更易读。在这种情况下，我们将 N 个元素分配给 N 个变量，如下例所示。

>>> arr = [1, 2, 3]
>>> a, b, c = arr
>>> a, b, c
(1, 2, 3)

根据 PEP 3132，在 Python 3 中，我们可以使用单个星号将 N 个元素解包到少于 N 个变量中。

>>> arr = [1, 2, 3, 4, 5]
>>> a, b, *c, d = arr
>>> a, b, d
(1, 2, 5)
>>> c
[3, 4]

使用 `enumerate`¶

enumerate 是一个内置函数。它帮助我们同时获取索引（或计数）和元素，而无需使用 range(len(list))。更多信息可以在循环技巧中找到。

>>> for i, v in enumerate(range(3)):
...     print(i, v)
...
0 0
1 1
2 2
>>> for i, v in enumerate(range(3), 1): # start = 1
...     print(i, v)
...
1 0
2 1
3 2

合并列表¶

zip 使我们能够一次迭代多个列表中包含的元素。当其中一个列表耗尽时，迭代停止。因此，迭代的长度与最短列表相同。如果不需要此行为，我们可以使用Python 3中的 itertools.zip_longest 或Python 2中的 itertools.izip_longest。

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> list(zip(a, b))
[(1, 4), (2, 5), (3, 6)]
>>> c = [1]
>>> list(zip(a, b, c))
[(1, 4, 1)]
>>> from itertools import zip_longest
>>> list(zip_longest(a, b, c))
[(1, 4, 1), (2, 5, None), (3, 6, None)]

过滤元素¶

filter 是一个内置函数，用于帮助我们删除不需要的元素。在Python 2中，filter 返回一个列表。但是，在Python 3中，filter 返回一个可迭代对象。请注意，列表推导式或生成器表达式提供了一种更简洁的方法来删除元素。

>>> [x for x in range(5) if x > 1]
[2, 3, 4]
>>> l = ['1', '2', 3, 'Hello', 4]
>>> f = lambda x: isinstance(x, int)
>>> filter(f, l)
<filter object at 0x10bee2198>
>>> list(filter(f, l))
[3, 4]
>>> list((i for i in l if f(i)))
[3, 4]

栈¶

在 Python 中，不需要额外的栈数据结构，因为 list 提供了 append 和 pop 方法，使我们能够将列表用作栈。

>>> stack = []
>>> stack.append(1)
>>> stack.append(2)
>>> stack.append(3)
>>> stack
[1, 2, 3]
>>> stack.pop()
3
>>> stack.pop()
2
>>> stack
[1]

`in` 操作符¶

我们可以实现 __contains__ 方法，使类能够进行 in 操作。对于程序员来说，这是模拟自定义类成员资格测试操作的常用方法。

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __contains__(self, item):
        return True if item in self.__list else False

stack = Stack()
stack.push(1)
print(1 in stack)
print(0 in stack)

示例

python stack.py
True
False

访问元素¶

使自定义类像列表一样执行获取和设置操作很简单。我们可以实现一个 __getitem__ 方法和一个 __setitem__ 方法，使类能够通过索引检索和覆盖数据。此外，如果我们想使用函数 len 来计算元素的数量，我们可以实现一个 __len__ 方法。

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __repr__(self):
        return "{}".format(self.__list)

    def __len__(self):
        return len(self.__list)

    def __getitem__(self, idx):
        return self.__list[idx]

    def __setitem__(self, idx, val):
        self.__list[idx] = val


stack = Stack()
stack.push(1)
stack.push(2)
print("stack:", stack)

stack[0] = 3
print("stack:", stack)
print("num items:", len(stack))

示例

$ python stack.py
stack: [1, 2]
stack: [3, 2]
num items: 2

委托迭代¶

如果自定义容器类持有列表，并且我们希望迭代在容器上工作，则可以实现 __iter__ 方法以将迭代委托给列表。请注意，__iter__ 方法应该返回一个迭代器对象，因此我们不能直接返回列表；否则，Python 会引发 TypeError。

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __iter__(self):
        return iter(self.__list)

stack = Stack()
stack.push(1)
stack.push(2)
for s in stack:
    print(s)

示例

$ python stack.py
1
2

排序¶

Python 列表提供了一个内置的 list.sort 方法，该方法就地对列表进行排序，无需使用额外的内存。此外，list.sort 的返回值为 None，以避免与 sorted 混淆，并且该函数只能用于 list。

>>> l = [5, 4, 3, 2, 1]
>>> l.sort()
>>> l
[1, 2, 3, 4, 5]
>>> l.sort(reverse=True)
>>> l
[5, 4, 3, 2, 1]

sorted 函数不会就地修改任何可迭代对象。相反，它返回一个新的已排序列表。如果某些列表的元素是只读或不可变的，则使用 sorted 比 list.sort 更安全。此外，list.sort 和 sorted 之间的另一个区别是 sorted 接受任何可迭代对象。

>>> l = [5, 4, 3, 2, 1]
>>> new = sorted(l)
>>> new
[1, 2, 3, 4, 5]
>>> l
[5, 4, 3, 2, 1]
>>> d = {3: 'andy', 2: 'david', 1: 'amy'}
>>> sorted(d)  # sort iterable
[1, 2, 3]

要对元素为元组的列表进行排序，使用 operator.itemgetter 会很有帮助，因为它将键函数分配给 sorted 的 key 参数。请注意，键应该是可比较的；否则，它将引发 TypeError。

>>> from operator import itemgetter
>>> l = [('andy', 10), ('david', 8), ('amy', 3)]
>>> l.sort(key=itemgetter(1))
>>> l
[('amy', 3), ('david', 8), ('andy', 10)]

operator.itemgetter 很有用，因为该函数返回一个 getter 方法，该方法可以应用于具有 __getitem__ 方法的其他对象。例如，对元素为字典的列表进行排序可以通过使用 operator.itemgetter 来实现，因为所有元素都具有 __getitem__。

>>> from pprint import pprint
>>> from operator import itemgetter
>>> l = [
...     {'name': 'andy', 'age': 10},
...     {'name': 'david', 'age': 8},
...     {'name': 'amy', 'age': 3},
... ]
>>> l.sort(key=itemgetter('age'))
>>> pprint(l)
[{'age': 3, 'name': 'amy'},
 {'age': 8, 'name': 'david'},
 {'age': 10, 'name': 'andy'}]

如果需要对元素既不可比较又没有 __getitem__ 方法的列表进行排序，则分配自定义键函数是可行的。

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=lambda x: x.val)
>>> nodes
[Node(1), Node(2), Node(3)]
>>> nodes.sort(key=lambda x: x.val, reverse=True)
>>> nodes
[Node(3), Node(2), Node(1)]

以上代码片段可以使用 operator.attrgetter 简化。该函数根据属性名称返回属性 getter。请注意，属性应该是可比较的；否则，sorted 或 list.sort 将引发 TypeError。

>>> from operator import attrgetter
>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=attrgetter('val'))
>>> nodes
[Node(1), Node(2), Node(3)]

如果对象具有 __lt__ 方法，则表示该对象是可比较的，并且 sorted 或 list.sort 不需要将其 key 参数输入键函数。列表或可迭代序列可以直接排序。

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...     def __lt__(self, other):
...         return self.val - other.val < 0
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]

如果对象没有 __lt__ 方法，则可能在对象类声明后修补该方法。换句话说，修补后，对象变得可比较。

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> Node.__lt__ = lambda s, o: s.val < o.val
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]

请注意，Python3 中的 sorted 或 list.sort 不支持 cmp 参数，该参数是 Python2 中唯一有效的参数。如果需要使用旧的比较函数，例如某些遗留代码，则 functools.cmp_to_key 很有用，因为它将比较函数转换为键函数。

>>> from functools import cmp_to_key
>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=cmp_to_key(lambda x,y: x.val - y.val))
>>> nodes
[Node(1), Node(2), Node(3)]

已排序列表¶

import bisect

class Foo(object):
    def __init__(self, k):
        self.k = k

    def __eq__(self, rhs):
        return self.k == rhs.k

    def __ne__(self, rhs):
        return self.k != rhs.k

    def __lt__(self, rhs):
        return self.k < rhs.k

    def __gt__(self, rhs):
        return self.k > rhs.k

    def __le__(self, rhs):
        return self.k <= rhs.k

    def __ge__(self, rhs):
        return self.k >= rhs.k

    def __repr__(self):
        return f"Foo({self.k})"

    def __str__(self):
        return self.__repr__()

foo = [Foo(1), Foo(3), Foo(2), Foo(0)]
bar = []
for x in foo:
    bisect.insort(bar, x)

print(bar) # [Foo(0), Foo(1), Foo(2), Foo(3)]

新建列表¶

# new a list with size = 3

>>> [0] * 3
[0, 0, 0]

# new a 2d list with size 3x3

>>> [[0] * 3 for _ in range(3)]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

请注意，我们应该避免通过以下代码片段创建多维列表，因为列表中的所有对象都指向同一个地址。

>>> a = [[0] * 3] * 3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[1][1] = 2
>>> a
[[0, 2, 0], [0, 2, 0], [0, 2, 0]]

循环缓冲区¶

>>> from collections import deque
>>> d = deque(maxlen=8)
>>> for x in range(9):
...     d.append(x)
...
>>> d
deque([1, 2, 3, 4, 5, 6, 7, 8], maxlen=8)

>>> from collections import deque
>>> def tail(path, n=10):
...     with open(path) as f:
...         return deque(f, n)
...
>>> tail("/etc/hosts")

分块¶

>>> def chunk(lst, n):
...     for i in range(0, len(lst), n):
...         yield lst[i:i+n]
...
>>> a = [1, 2, 3, 4, 5, 6, 7, 8]
>>> list(chunk(a, 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]

分组¶

>>> import itertools
>>> s = "AAABBCCCCC"
>>> for k, v in itertools.groupby(s):
...     print(k, list(v))
...
A ['A', 'A', 'A']
B ['B', 'B']
C ['C', 'C', 'C', 'C', 'C']

# group by key

>>> x = [('gp1', 'a'), ('gp2', 'b'), ('gp2', 'c')]
>>> for k, v in itertools.groupby(x, lambda x: x[0]):
...     print(k, list(v))
...
gp1 [('gp1', 'a')]
gp2 [('gp2', 'b'), ('gp2', 'c')]

二分查找¶

>>> def binary_search(arr, x, lo=0, hi=None):
...     if not hi: hi = len(arr)
...     pos = bisect_left(arr, x, lo, hi)
...     return pos if pos != hi and arr[pos] == x else -1
...
>>> a = [1, 1, 1, 2, 3]
>>> binary_search(a, 1)
0
>>> binary_search(a, 2)
3

下界¶

>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_left(a, 3)
2
>>> bisect.bisect_left(a, 3.5)
4

上界¶

>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_right(a, 3)
4
>>> bisect.bisect_right(a, 3.5)
4

字典序¶

# python compare lists lexicographically

>>> a = [(1,2), (1,1), (1,0), (2,1)]
>>> a.sort()
>>> a
[(1, 0), (1, 1), (1, 2), (2, 1)]

Trie¶

>>> from functools import reduce
>>> from collections import defaultdict
>>> Trie = lambda: defaultdict(Trie)
>>> prefixes = ['abc', 'de', 'g']
>>> trie = Trie()
>>> end = True
>>> for p in prefixes:
...     reduce(dict.__getitem__, p, trie)[end] = p
...

# search prefix

>>> def find(trie, word):
...     curr = trie
...     for c in word:
...         if c not in curr:
...             return False
...         curr = curr[c]
...     return True
...
>>> find(trie, "abcdef")
False
>>> find(trie, "abc")
True
>>> find(trie, "ab")
True

# search word

>>> def find(trie, p):
...     curr = trie
...     for c in p:
...         if c not in curr or True in curr:
...             break
...         curr = curr[c]
...     return True if True in curr else False
...
>>> find(trie, "abcdef")
True
>>> find(trie, "abc")
True
>>> find(trie, "ab")
False

列表¶

从零开始¶

初始化¶

复制¶

使用 `slice`¶

列表推导式¶

解包¶

使用 `enumerate`¶

合并列表¶

过滤元素¶

栈¶

`in` 操作符¶

访问元素¶

委托迭代¶

排序¶

已排序列表¶

新建列表¶

循环缓冲区¶

分块¶

分组¶

二分查找¶

下界¶

上界¶

字典序¶

Trie¶

有用链接

速查表

目录

列表¶

从零开始¶

初始化¶

复制¶

使用 slice¶

列表推导式¶

解包¶

使用 enumerate¶

合并列表¶

过滤元素¶

栈¶

in 操作符¶

访问元素¶

委托迭代¶

排序¶

已排序列表¶

新建列表¶

循环缓冲区¶

分块¶

分组¶

二分查找¶

下界¶

上界¶

字典序¶

Trie¶

使用 `slice`¶

使用 `enumerate`¶

`in` 操作符¶