Added sections 2-4

This commit is contained in:
David Beazley
2020-05-25 12:40:11 -05:00
parent 7a4423dee4
commit c06face3a5
21 changed files with 5639 additions and 0 deletions

View File

@@ -0,0 +1,538 @@
# 2.4 Sequences
In this part, we look at some common idioms for working with sequence data.
### Introduction
Python has three *sequences* datatypes.
* String: `'Hello'`. A string is considered a sequence of characters.
* List: `[1, 4, 5]`.
* Tuple: `('GOOG', 100, 490.1)`.
All sequences are ordered and have length.
```python
a = 'Hello' # String
b = [1, 4, 5] # List
c = ('GOOG', 100, 490.1) # Tuple
# Indexed order
a[0] # 'H'
b[-1] # 5
c[1] # 100
# Length of sequence
len(a) # 5
len(b) # 3
len(c) # 3
```
Sequences can be replicated: `s * n`.
```pycon
>>> a = 'Hello'
>>> a * 3
'HelloHelloHello'
>>> b = [1, 2, 3]
>>> b * 2
[1, 2, 3, 1, 2, 3]
>>>
```
Sequences of the same type can be concatenated: `s + t`.
```pycon
>>> a = (1, 2, 3)
>>> b = (4, 5)
>>> a + b
(1, 2, 3, 4, 5)
>>>
>>> c = [1, 5]
>>> a + c
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "list") to tuple
```
### Slicing
Slicing means to take a subsequence from a sequence.
The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
```python
a = [0,1,2,3,4,5,6,7,8]
a[2:5] # [2,3,4]
a[-5:] # [4,5,6,7,8]
a[:3] # [0,1,2]
```
* Indices `start` and `end` must be integers.
* Slices do *not* include the end value.
* If indices are omitted, they default to the beginning or end of the list.
### Slice re-assignment
Slices can also be reassigned and deleted.
```python
# Reassignment
a = [0,1,2,3,4,5,6,7,8]
a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8]
```
*Note: The reassigned slice doesn't need to have the same length.*
```python
# Deletion
a = [0,1,2,3,4,5,6,7,8]
del a[2:4] # [0,1,4,5,6,7,8]
```
### Sequence Reductions
There are some functions to reduce a sequence to a single value.
```pycon
>>> s = [1, 2, 3, 4]
>>> sum(s)
10
>>> min(s) 1
>>> max(s) 4
>>> t = ['Hello', 'World']
>>> max(t)
'World'
>>>
```
### Iteration over a sequence
The for-loop iterates over the elements in the sequence.
```pycon
>>> s = [1, 4, 9, 16]
>>> for i in s:
... print(i)
...
1
4
9
16
>>>
```
On each iteration of the loop, you get a new item to work with.
This new value is placed into an iteration variable. In this example, the
iteration variable is `x`:
```python
for x in s: # `x` is an iteration variable
...statements
```
In each iteration, it overwrites the previous value (if any).
After the loop finishes, the variable retains the last value.
### `break` statement
You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
```python
for name in namelist:
if name == 'Jake':
break
...
...
statements
```
When the `break` statement is executed, it will exit the loop and move
on the next `statements`. The `break` statement only applies to the
inner-most loop. If this loop is within another loop, it will not
break the outer loop.
### `continue` statement
To skip one element and move to the next one you use the `continue` statement.
```python
for line in lines:
if line == '\n': # Skip blank lines
continue
# More statements
...
```
This is useful when the current item is not of interest or needs to be ignored in the processing.
### Looping over integers
If you need to count, use `range()`.
```python
for i in range(100):
# i = 0,1,...,99
```
The syntax is `range([start,] end [,step])`
```python
for i in range(100):
# i = 0,1,...,99
for j in range(10,20):
# j = 10,11,..., 19
for k in range(10,50,2):
# k = 10,12,...,48
# Notice how it counts in steps of 2, not 1.
```
* The ending value is never included. It mirrors the behavior of slices.
* `start` is optional. Default `0`.
* `step` is optional. Default `1`.
### `enumerate()` function
The `enumerate` function provides a loop with an extra counter value.
```python
names = ['Elwood', 'Jake', 'Curtis']
for i, name in enumerate(names):
# Loops with i = 0, name = 'Elwood'
# i = 1, name = 'Jake'
# i = 2, name = 'Curtis'
```
How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
A good example of using `enumerate()` is tracking line numbers while reading a file:
```python
with open(filename) as f:
for lineno, line in enumerate(f, start=1):
...
```
In the end, `enumerate` is just a nice shortcut for:
```python
i = 0
for x in s:
statements
i += 1
```
Using `enumerate` is less typing and runs slightly faster.
### For and tuples
You can loop with multiple iteration variables.
```python
points = [
(1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
]
for x, y in points:
# Loops with x = 1, y = 4
# x = 10, y = 40
# x = 23, y = 14
# ...
```
When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
### `zip()` function
The `zip` function takes sequences and makes an iterator that combines them.
```python
columns = ['name', 'shares', 'price']
values = ['GOOG', 100, 490.1 ]
pairs = zip(a, b)
# ('name','GOOG'), ('shares',100), ('price',490.1)
```
To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
```python
for column, value in pairs:
...
```
A common use of `zip` is to create key/value pairs for constructing dictionaries.
```python
d = dict(zip(columns, values))
```
## Exercises
### (a) Counting
Try some basic counting examples:
```pycon
>>> for n in range(10): # Count 0 ... 9
print(n, end=' ')
0 1 2 3 4 5 6 7 8 9
>>> for n in range(10,0,-1): # Count 10 ... 1
print(n, end=' ')
10 9 8 7 6 5 4 3 2 1
>>> for n in range(0,10,2): # Count 0, 2, ... 8
print(n, end=' ')
0 2 4 6 8
>>>
```
### (b) More sequence operations
Interactively experiment with some of the sequence reduction operations.
```pycon
>>> data = [4, 9, 1, 25, 16, 100, 49]
>>> min(data)
1
>>> max(data)
100
>>> sum(data)
204
>>>
```
Try looping over the data.
```pycon
>>> for x in data:
print(x)
4
9
...
>>> for n, x in enumerate(data):
print(n, x)
0 4
1 9
2 1
...
>>>
```
Sometimes the `for` statement, `len()`, and `range()` get used by
novices in some kind of horrible code fragment that looks like it
emerged from the depths of a rusty C program.
```pycon
>>> for n in range(len(data)):
print(data[n])
4
9
1
...
>>>
```
Dont do that! Not only does reading it make everyones eyes bleed, its inefficient with memory and it runs a lot slower.
Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason.
### (c) A practical `enumerate()` example
Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
```python
>>> cost = portfolio_cost('Data/missing.csv')
Row 4: Couldn't convert: ['MSFT', '', '51.23']
Row 7: Couldn't convert: ['IBM', '', '70.44']
>>>
```
To do this, youll need to change just a few parts of your code.
```python
...
for rowno, row in enumerate(rows, start=1):
try:
...
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
```
### (d) Using the `zip()` function
In the file `portfolio.csv`, the first line contains column headers. In all previous code, weve been discarding them.
```pycon
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> headers
['name', 'shares', 'price']
>>>
```
However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
First try this to pair the file headers with a row of data:
```pycon
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>> list(zip(headers, row))
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
>>>
```
Notice how `zip()` paired the column headers with the column values.
Weve used `list()` here to turn the result into a list so that you
can see it. Normally, `zip()` creates an iterator that must be
consumed by a for-loop.
This pairing is just an intermediate step to building a dictionary. Now try this:
```pycon
>>> record = dict(zip(headers, row))
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
```
This transformation is one of the most useful tricks to know about
when processing a lot of data files. For example, suppose you wanted
to make the `pcost.py` program work with various input files, but
without regard for the actual column number where the name, shares,
and price appear.
Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
```python
# pcost.py
def portfolio_cost(filename):
...
for rowno, row in enumerate(rows, start=1):
record = dict(zip(headers, row))
try:
nshares = int(record['shares'])
price = float(record['price'])
total_cost += nshares * price
# This catches errors in int() and float() conversions above
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
...
```
Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
```csv
name,date,time,shares,price
"AA","6/11/2007","9:50am",100,32.20
"IBM","5/13/2007","4:20pm",50,91.10
"CAT","9/23/2006","1:30pm",150,83.44
"MSFT","5/17/2007","10:30am",200,51.23
"GE","2/1/2006","10:45am",95,40.37
"MSFT","10/31/2006","12:05pm",50,65.10
"IBM","7/9/2006","3:15pm",100,70.44
```
```python
>>> portfolio_cost('Data/portfoliodate.csv')
44671.15
>>>
```
If you did it right, youll find that your program still works even
though the data file has a completely different column format than
before. Thats cool!
The change made here is subtle, but significant. Instead of
`portfolio_cost()` being hardcoded to read a single fixed file format,
the new version reads any CSV file and picks the values of interest
out of it. As long as the file has the required columns, the code will work.
Modify the `report.py` program you wrote in Section 2.3 that it uses
the same technique to pick out column headers.
Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
produces the same answer as before.
### (e) Inverting a dictionary
A dictionary maps keys to values. For example, a dictionary of stock prices.
```pycon
>>> prices = {
'GOOG' : 490.1,
'AA' : 23.45,
'IBM' : 91.1,
'MSFT' : 34.23
}
>>>
```
If you use the `items()` method, you can get `(key,value)` pairs:
```pycon
>>> prices.items()
dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
>>>
```
However, what if you wanted to get a list of `(value, key)` pairs instead?
*Hint: use `zip()`.*
```pycon
>>> pricelist = list(zip(prices.values(),prices.keys()))
>>> pricelist
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
>>>
```
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
```pycon
>>> min(pricelist)
(23.45, 'AA')
>>> max(pricelist)
(490.1, 'GOOG')
>>> sorted(pricelist)
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
>>>
```
This also illustrates an important feature of tuples. When used in
comparisons, tuples are compared element-by-element starting with the
first item. Similar to how strings are compared
character-by-character.
`zip()` is often used in situations like this where you need to pair
up data from different places. For example, pairing up the column
names with column values in order to make a dictionary of named
values.
Note that `zip()` is not limited to pairs. For example, you can use it
with any number of input lists:
```pycon
>>> a = [1, 2, 3, 4]
>>> b = ['w', 'x', 'y', 'z']
>>> c = [0.2, 0.4, 0.6, 0.8]
>>> list(zip(a, b, c))
[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
>>>
```
Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
```pycon
>>> a = [1, 2, 3, 4, 5, 6]
>>> b = ['x', 'y', 'z']
>>> list(zip(a,b))
[(1, 'x'), (2, 'y'), (3, 'z')]
>>>
```
[Next](05_Collections)