10. Lists and Tuples¶
10.1. Motivation of composite data type¶
The following code calculates the average of five numbers:
def average_five_numbers(n1, n2, n3, n4, n5):
return (n1 + n2 + n3 + n4 + n5) / 5
average_five_numbers(1, 2, 3, 4, 5)
3.0
What about using the above function to compute the average household income in Hong Kong.
The labor size in Hong Kong in 2018 is close to 4 million.
Should we create a variable to store the income of each individual?
Should we recursively apply the function to groups of five numbers?
What we need is
a composite data type that can keep a variable numbers of items, so that
we can then define a function that takes an object of the composite data type,
and returns the average of all items in the object.
How to store a sequence of items in Python?
tuple
and list
are two built-in classes for ordered collections of objects of possibly different types.
Indeed, we have already used tuples and lists before.
%%mytutor -h 300
a_list = '1 2 3'.split()
a_tuple = (lambda *args: args)(1,2,3)
a_list[0] = 0
a_tuple[0] = 0
What is the difference between tuple and list?
10.2. Constructing sequences¶
How to create tuple/list?
Mathematicians often represent a set of items in two different ways:
Roster notation, which enumerates the elements in the sequence. E.g., $\( \{0, 1, 4, 9, 16, 25, 36, 49, 64, 81\} \)$
Set-builder notation, which describes the content using a rule for constructing the elements. $\( \{x^2| x\in \mathbb{N}, x< 10 \}, \)$ namely the set of perfect squares less than 100.
Python also provides two corresponding ways to create a tuple/list:
How to create a tuple/list by enumerating its items?
To create a tuple, we enclose a comma separated sequence by parentheses:
%%mytutor -h 450
empty_tuple = ()
singleton_tuple = (0,) # why not (0)?
heterogeneous_tuple = (singleton_tuple,
(1, 2.0),
print)
enclosed_starred_tuple = (*range(2),
*'23')
Note that:
If the enclosed sequence has one term, there must be a comma after the term.
The elements of a tuple can have different types.
The unpacking operator
*
can unpack an iterable into a sequence in an enclosure.
To create a list, we use square brackets to enclose a comma separated sequence of objects.
%%mytutor -h 450
empty_list = []
singleton_list = [0] # no need to write [0,]
heterogeneous_list = [singleton_list,
(1, 2.0),
print]
enclosed_starred_list = [*range(2),
*'23']
We can also create a tuple/list from other iterables using the constructors tuple
/list
as well as addition and multiplication similar to str
.
%%mytutor -h 950
str2list = list('Hello')
str2tuple = tuple('Hello')
range2list = list(range(5))
range2tuple = tuple(range(5))
tuple2list = list((1, 2, 3))
list2tuple = tuple([1, 2, 3])
concatenated_tuple = (1,) + (2, 3)
concatenated_list = [1, 2] + [3]
duplicated_tuple = (1,) * 2
duplicated_list = 2 * [1]
Exercise Explain the difference between following two expressions. Why a singleton tuple must have a comma after the item.
print((1+2)*2,
(1+2,)*2, sep='\n')
6
(3, 3)
(1+2)*2
evaluates to 6
but (1+2,)*2
evaluates to (3,3)
.
The parentheses in
(1+2)
indicate the addition needs to be performed first, butthe parentheses in
(1+2,)
creates a tuple.
Hence, singleton tuple must have a comma after the item to differentiate these two use cases.
How to use a rule to construct a tuple/list?
We can specify the rule using a comprehension,
which we have used in a generator expression.
E.g., the following is a python one-liner that returns a generator for prime numbers.
all?
prime_sequence = lambda stop: (x for x in range(2, stop)
if all(x % divisor for divisor in range(2, x)))
print(*prime_sequence(100))
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are two comprehensions used:
In
all(x % divisor for divisor in range(2, x))
, the comprehension creates a generator of remainders to the functionall
, which returns true if all the remainders areTrue
in boolean expression.In the return value
(x for x in range(2, stop) if ...)
of the anonymous function, the comprehension creates a generator of numbers from 2 tostop-1
that satisfy the condition of theif
clause.
Exercise Use comprehension to define a function composite_sequence
that takes a non-negative integer stop
and returns a generator of composite numbers strictly smaller than stop
. Use any
instead of all
to check if a number is composite.
any?
### BEGIN SOLUTION
composite_sequence = lambda stop: (x for x in range(2, stop)
if any(x % divisor == 0 for divisor in range(2, x)))
### END SOLUTION
print(*composite_sequence(100))
4 6 8 9 10 12 14 15 16 18 20 21 22 24 25 26 27 28 30 32 33 34 35 36 38 39 40 42 44 45 46 48 49 50 51 52 54 55 56 57 58 60 62 63 64 65 66 68 69 70 72 74 75 76 77 78 80 81 82 84 85 86 87 88 90 91 92 93 94 95 96 98 99
We can construct a list instead of a generator using comprehension:
print(list(x**2 for x in range(10))) # Use the list constructor
print([x**2 for x in range(10)]) # Enclose comprehension by brackets
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We can also use comprehension to construct a tuple:
print(tuple(x**2 for x in range(10))) # Use the tuple constructor
(0, 1, 4, 9, 16, 25, 36, 49, 64, 81)
Exercise Explain the difference between the following expressions.
print((x**2 for x in range(10)),
(*(x**2 for x in range(10)),), sep='\n')
<generator object <genexpr> at 0x7f8ef52bbe50>
(0, 1, 4, 9, 16, 25, 36, 49, 64, 81)
The first is a generator expression, not a tuple.
The second is a tuple constructed by enclosing the sequence from unpacking the generator.
There must be a comma after the generator since there is only one enclosed term, even though that term generates multiple items.
Exercise Explain the difference between the following expressions.
print([x for x in range(10)],
[(lambda arg: arg)(x for x in range(10))], sep='\n')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[<generator object <genexpr> at 0x7f8ef4a3a050>]
In the second expression, the comprehension provided as an argument to a function becomes a generator object,
which is returned by the anonymous function and enclosed to form the singleton list.In the first expression, the comprehension is not converted to a generator.
With list comprehension, we can simulate a sequence of biased coin flips.
from random import random as rand
p = rand() # unknown bias
coin_flips = ['H' if rand() <= p else 'T' for i in range(1000)]
print('Chance of head:', p)
print('Coin flips:',*coin_flips)
Chance of head: 0.8731471316362651
Coin flips: H H H H H H H T T T H H H H H T T H T H H H H H H H H T H H H H H H H H H H H H H H T H T H T H H H T H H T T H H H T H H H H H H H H H H H H H T H H H H H H H H H H H H H H H H T H H H T H H H H H H H H H H H T H H H H H H H H H H H H H H T H H H H H H H H H T T T H H H H H H H H H H H H H H T H T H H H H T H H H H H H H H H H H H T T H H H H H H H T H H T H H T H T H H H H H H H H H H H H H H H H H H H H H H H H H T H H H H H H H T H T H H T H H H T H T H H T T H H H H H T H H H H H H H H H H H H H H H H H T H H H H H H H H T T H H H H H H T H H H H H H H H H H H H H H H H H H T T H H H T H T H H H H H T T H H H H T H H H H H H H H H H H H H H H H H T T H T H H H H H T H H H H T H H H H H H H H T H H T H H H H H H H H H H H H H H T H H H H H H H H H H H H H H T H H H H H H H H H T H T H H H H H T H H H H H T H H H T H H T
H H H H H H H H H H H H H H H T H T H H H H T H H H H T H H H H H H T H H H H H T H H H H H H H H T H H H H H H T H H H H T H H H H H H H H H H H H H H H H H H H H H H H H H T H H H H H H H H T H H H H H H H H T H H H T H H H H H H H H H H T H H T H H H H H H H H H H T H H T H H H H H H H T H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H T T H H H H H H H H H H T H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H T H H H H H T H H H T H H H H H H H H H H H T H H T H H T H H H H H H H T H H H T H H H H H H H T H H H H H H H H H H H H H H H H H H H H H H T H H T H H H H H H H H H H H H T H H H H H H H T H H H H H H H H H H H T H T H H H H T H H H H H H T H H H H H H H H H H H T H H H H H H H H H T H H H H H H T H H H H H H T T H H H T H H H H H H H H H H H H H H H H H T H T H H H H H T H H H H H H H H H H H H H H H H H H T H H H H H H H H H H H H H H H H H H T H H H H H H H H H T H H H H T T H H H H H H H H H T H H H H H H T H H H H T H H H H H H H H H T H H H H H H H H H T H H H H H H H H H H H H T H H H T H T H H H H H H H H H H H H H H H H H H H H
H H H H H H H H T H H T H H H H H H
We can then estimate the bias by the fraction of heads coming up.
def average(seq):
return sum(seq)/len(seq)
head_indicators = [1 if outcome == 'H' else 0 for outcome in coin_flips]
fraction_of_heads = average(head_indicators)
print('Fraction of heads:', fraction_of_heads)
Fraction of heads: 0.872
Note that sum
and len
returns the sum and length of the sequence.
Exercise Define a function variance
that takes in a sequence seq
and returns the variance of the sequence.
def variance(seq):
### BEGIN SOLUTION
return sum(i**2 for i in seq)/len(seq) - average(seq)**2
### END SOLUTION
delta = (variance(head_indicators)/len(head_indicators))**0.5
print('95% confidence interval: [{:.2f},{:.2f}]'.format(p-2*delta,p+2*delta))
95% confidence interval: [0.85,0.89]
10.3. Selecting items in a sequence¶
How to traverse a tuple/list?
Instead of calling the dunder method directly, we can use a for loop to iterate over all the items in order.
a = (*range(5),)
for item in a: print(item, end=' ')
0 1 2 3 4
To do it in reverse, we can use the reversed
function.
reversed?
a = [*range(5)]
for item in reversed(a): print(item, end=' ')
4 3 2 1 0
We can also traverse multiple tuples/lists simultaneously by zip
ping them.
zip?
a = (*range(5),)
b = reversed(a)
for item1, item2 in zip(a,b):
print(item1,item2)
0 4
1 3
2 2
3 1
4 0
How to select an item in a sequence?
Sequence objects such as str
/tuple
/list
implements the getter method __getitem__
to return their items.
We can select an item by subscription
a[i]
where a
is a list and i
is an integer index.
A non-negative index indicates the distance from the beginning.
a = (*range(10),)
print(a)
print('Length:', len(a))
print('First element:',a[0])
print('Second element:',a[1])
print('Last element:',a[len(a)-1])
print(a[len(a)]) # IndexError
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Length: 10
First element: 0
Second element: 1
Last element: 9
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-20-8793aa5ed482> in <module>
5 print('Second element:',a[1])
6 print('Last element:',a[len(a)-1])
----> 7 print(a[len(a)]) # IndexError
IndexError: tuple index out of range
a[i]
with i >= len(a)
results in an IndexError
.
A negative index represents a negative offset from an imaginary element one past the end of the sequence.
a = [*range(10)]
print(a)
print('Last element:',a[-1])
print('Second last element:',a[-2])
print('First element:',a[-len(a)])
print(a[-len(a)-1]) # IndexError
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Last element: 9
Second last element: 8
First element: 0
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-21-6f6376dfba21> in <module>
4 print('Second last element:',a[-2])
5 print('First element:',a[-len(a)])
----> 6 print(a[-len(a)-1]) # IndexError
IndexError: list index out of range
a[i]
with i < -len(a)
results in an IndexError
.
How to select multiple items?
We can use a slicing to select a range of items:
a[start:stop]
a[start:stop:step]
where a
is a list;
start
is an integer representing the index of the starting item in the selection;stop
is an integer that is one larger than the index of the last item in the selection; andstep
is an integer that specifies the step/stride size through the list.
a = (*range(10),)
print(a[1:4])
print(a[1:4:2])
(1, 2, 3)
(1, 3)
The parameters take their default values if missing or equal to None.
a = [*range(10)]
print(a[:4]) # start defaults to 0
print(a[1:]) # stop defaults to len(a)
print(a[1:4:]) # step defaults to 1
[0, 1, 2, 3]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3]
They can take negative values.
print(a[-1:])
print(a[:-1])
print(a[::-1])
[9]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
They can also take a mixture of negative and postive values.
print(a[-1:1]) # equal [a[-1], a[0]]?
print(a[1:-1]) # equal []?
print(a[1:-1:-1]) # equal [a[1], a[0]]?
print(a[-100:100]) # result in IndexError like subscription?
[]
[1, 2, 3, 4, 5, 6, 7, 8]
[]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
We can now implement a practical sorting algorithm called quicksort to sort a sequence.
import random
def quicksort(seq):
'''Return a sorted list of items from seq.'''
if len(seq) <= 1:
return list(seq)
i = random.randint(0, len(seq) - 1)
pivot, others = seq[i], [*seq[:i], *seq[i + 1:]]
left = quicksort([x for x in others if x < pivot])
right = quicksort([x for x in others if x >= pivot])
return [*left, pivot, *right]
seq = [random.randint(0, 99) for i in range(10)]
print(seq, quicksort(seq), sep='\n')
[82, 98, 50, 74, 7, 75, 14, 65, 65, 62]
[7, 14, 50, 62, 65, 65, 74, 75, 82, 98]
The above recursion creates a sorted list as [*left, pivot, *right]
where
pivot
is a randomly picked an item inseq
,left
is the sorted list of items smaller thanpivot
, andright
is the sorted list of items no smaller thanpivot
.
The base case happens when seq
contains at most one item, in which case seq
is already sorted.
There is a built-in function sorted
for sorting a sequence. It uses the Timsort algorithm.
sorted?
sorted(sorted(seq))
[7, 14, 50, 62, 65, 65, 74, 75, 82, 98]
10.4. Mutating a list¶
For list (but not tuple), subscription and slicing can also be used as the target of an assignment operation to mutate the list.
%%mytutor -h 300
b = [*range(10)] # aliasing
b[::2] = b[:5]
b[0:1] = b[:5]
b[::2] = b[:5] # fails
Last assignment fails because [::2]
with step size not equal to 1
is an extended slice, which can only be assigned to a list of equal size.
What is the difference between mutation and aliasing?
In the previous code:
The first assignment
b = [*range(10)]
is aliasing, which gives the list the target name/identifierb
.Other assignments such as
b[::2] = b[:5]
are mutations that calls__setitem__
because the targetb[::2]
is not an identifier.
Exercise Explain the outcome of the following checks of equivalence?
%%mytutor -h 400
a = [10, 20, 30, 40]
b = a
print('a is b? {}'.format(a is b))
print('{} == {}? {}'.format(a, b, a == b))
b[1:3] = b[2:0:-1]
print('{} == {}? {}'.format(a, b, a == b))
a is b
anda == b
returnsTrue
because the assignmentb = a
makesb
an alias of the same objecta
points to.In particular, the operation
b[1:3] = b[2:0:-1]
affects the same lista
points to.
Why mutate a list?
The following is another implementation of composite_sequence
that takes advantage of the mutability of list.
def sieve_composite_sequence(stop):
is_composite = [False] * stop # initialization
for factor in range(2,stop):
if is_composite[factor]: continue
for multiple in range(factor*2,stop,factor):
is_composite[multiple] = True
return (x for x in range(4,stop) if is_composite[x])
for x in sieve_composite_sequence(100): print(x, end=' ')
4 6 8 9 10 12 14 15 16 18 20 21 22 24 25 26 27 28 30 32 33 34 35 36 38 39 40 42 44 45 46 48 49 50 51 52 54 55 56 57 58 60 62 63 64 65 66 68 69 70 72 74 75 76 77 78 80 81 82 84 85 86 87 88 90 91 92 93 94 95 96 98 99
The algorithm
changes
is_composite[x]
fromFalse
toTrue
ifx
is a multiple of a smaller numberfactor
, andreturns a generator that generates composite numbers according to
is_composite
.
Exercise Is sieve_composite_sequence
more efficient than your solution composite_sequence
? Why?
for x in composite_sequence(10000): pass
for x in sieve_composite_sequence(1000000): pass
The line if is_composite[factor]: continue
avoids the redundant computations of checking composite factors.
Exercise Note that the multiplication operation *
is the most efficient way to initialize a 1D list with a specified size, but we should not use it to initialize a 2D list. Fix the following code so that a
becomes [[1, 0], [0, 1]]
.
%%mytutor -h 250
a = [[0] * 2] * 2
a[0][0] = a[1][1] = 1
print(a)
### BEGIN SOLUTION
a = [[0] * 2 for i in range(2)]
### END SOLUTION
a[0][0] = a[1][1] = 1
print(a)
[[1, 0], [0, 1]]
10.5. Different methods to operate on a sequence¶
The following compares the lists of public attributes for tuple
and list
.
We determine membership using the operator
in
ornot in
.Different from the keyword
in
in a for loop, operatorin
calls the method__contains__
.
list_attributes = dir(list)
tuple_attributes = dir(tuple)
print(
'Common attributes:', ', '.join([
attr for attr in list_attributes
if attr in tuple_attributes and attr[0] != '_'
]))
print(
'Tuple-specific attributes:', ', '.join([
attr for attr in tuple_attributes
if attr not in list_attributes and attr[0] != '_'
]))
print(
'List-specific attributes:', ', '.join([
attr for attr in list_attributes
if attr not in tuple_attributes and attr[0] != '_'
]))
Common attributes: count, index
Tuple-specific attributes:
List-specific attributes: append, clear, copy, extend, insert, pop, remove, reverse, sort
There are no public tuple-specific attributes, and
all the list-specific attributes are methods that mutate the list, except
copy
.
The common attributes
count
method returns the number of occurrences of a value in a tuple/list, andindex
method returns the index of the first occurrence of a value in a tuple/list.
%%mytutor -h 300
a = (1,2,2,4,5)
print(a.index(2))
print(a.count(2))
reverse
method reverses the list instead of returning a reversed list.
%%mytutor -h 300
a = [*range(10)]
print(reversed(a))
print(*reversed(a))
print(a.reverse())
copy
method returns a copy of a list.tuple
does not have thecopy
method but it is easy to create a copy by slicing.
%%mytutor -h 400
a = [*range(10)]
b = tuple(a)
a_reversed = a.copy()
a_reversed.reverse()
b_reversed = b[::-1]
sort
method sorts the list in place instead of returning a sorted list.
%%mytutor -h 300
import random
a = [random.randint(0,10) for i in range(10)]
print(sorted(a))
print(a.sort())
extend
method that extends a list instead of creating a new concatenated list.append
method adds an object to the end of a list.insert
method insert an object to a specified location.
%%mytutor -h 300
a = b = [*range(5)]
print(a + b)
print(a.extend(b))
print(a.append('stop'))
print(a.insert(0,'start'))
pop
method deletes and return the last item of the list.remove
method removes the first occurrence of a value in the list.clear
method clears the entire list.
We can also use the function del
to delete a selection of a list.
%%mytutor -h 300
a = [*range(10)]
del a[::2]
print(a.pop())
print(a.remove(5))
print(a.clear())