link experiment
This commit is contained in:
14
Notes/06_Generators/00_Overview.md
Normal file
14
Notes/06_Generators/00_Overview.md
Normal file
@@ -0,0 +1,14 @@
|
||||
# Overview
|
||||
|
||||
A simple definition of *Iteration*: Looping over items.
|
||||
|
||||
```python
|
||||
a = [2,4,10,37,62]
|
||||
# Iterate over a
|
||||
for x in a:
|
||||
...
|
||||
```
|
||||
|
||||
This is a very common pattern. Loops, list comprehensions, etc.
|
||||
|
||||
Most programs do a huge amount of iteration.
|
||||
313
Notes/06_Generators/01_Iteration_protocol.md
Normal file
313
Notes/06_Generators/01_Iteration_protocol.md
Normal file
@@ -0,0 +1,313 @@
|
||||
# 6.1 Iteration Protocol
|
||||
|
||||
This section looks at the process of iteration.
|
||||
|
||||
### Iteration Everywhere
|
||||
|
||||
Many different objects support iteration.
|
||||
|
||||
```python
|
||||
a = 'hello'
|
||||
for c in a: # Loop over characters in a
|
||||
...
|
||||
|
||||
b = { 'name': 'Dave', 'password':'foo'}
|
||||
for k in b: # Loop over keys in dictionary
|
||||
...
|
||||
|
||||
c = [1,2,3,4]
|
||||
for i in c: # Loop over items in a list/tuple
|
||||
...
|
||||
|
||||
f = open('foo.txt')
|
||||
for x in f: # Loop over lines in a file
|
||||
...
|
||||
```
|
||||
|
||||
### Iteration: Protocol
|
||||
|
||||
Let's take an inside look at the `for` statement.
|
||||
|
||||
```python
|
||||
for x in obj:
|
||||
# statements
|
||||
```
|
||||
|
||||
What happens under the hood?
|
||||
|
||||
```python
|
||||
_iter = obj.__iter__() # Get iterator object
|
||||
while True:
|
||||
try:
|
||||
x = _iter.__next__() # Get next item
|
||||
except StopIteration: # No more items
|
||||
break
|
||||
# statements ...
|
||||
```
|
||||
|
||||
All the objects that work with the `for-loop` implement this low-level iteration protocol.
|
||||
Example: Manual iteration over a list.
|
||||
|
||||
```python
|
||||
>>> x = [1,2,3]
|
||||
>>> it = x.__iter__()
|
||||
>>> it
|
||||
<listiterator object at 0x590b0>
|
||||
>>> it.__next__()
|
||||
1
|
||||
>>> it.__next__()
|
||||
2
|
||||
>>> it.__next__()
|
||||
3
|
||||
>>> it.__next__()
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ? StopIteration
|
||||
>>>
|
||||
```
|
||||
|
||||
### Supporting Iteration
|
||||
|
||||
Knowing about iteration is useful if you want to add it to your own objects.
|
||||
For example, making a custom container.
|
||||
|
||||
```python
|
||||
class Portfolio(object):
|
||||
def __init__(self):
|
||||
self.holdings = []
|
||||
|
||||
def __iter__(self):
|
||||
return self.holdings.__iter__()
|
||||
...
|
||||
|
||||
port = Portfolio()
|
||||
for s in port:
|
||||
...
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
### (a) Iteration Illustrated
|
||||
|
||||
Create the following list:
|
||||
|
||||
```python
|
||||
a = [1,9,4,25,16]
|
||||
```
|
||||
|
||||
Manually iterate over this list. Call `__iter__()` to get an iterator and
|
||||
call the `__next__()` method to obtain successive elements.
|
||||
|
||||
```python
|
||||
>>> i = a.__iter__()
|
||||
>>> i
|
||||
<listiterator object at 0x64c10>
|
||||
>>> i.__next__()
|
||||
1
|
||||
>>> i.__next__()
|
||||
9
|
||||
>>> i.__next__()
|
||||
4
|
||||
>>> i.__next__()
|
||||
25
|
||||
>>> i.__next__()
|
||||
16
|
||||
>>> i.__next__()
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
StopIteration
|
||||
>>>
|
||||
```
|
||||
|
||||
The `next()` built-in function is a shortcut for calling
|
||||
the `__next__()` method of an iterator. Try using it on a file:
|
||||
|
||||
```python
|
||||
>>> f = open('Data/portfolio.csv')
|
||||
>>> f.__iter__() # Note: This returns the file itself
|
||||
<_io.TextIOWrapper name='Data/portfolio.csv' mode='r' encoding='UTF-8'>
|
||||
>>> next(f)
|
||||
'name,shares,price\n'
|
||||
>>> next(f)
|
||||
'"AA",100,32.20\n'
|
||||
>>> next(f)
|
||||
'"IBM",50,91.10\n'
|
||||
>>>
|
||||
```
|
||||
|
||||
Keep calling `next(f)` until you reach the end of the
|
||||
file. Watch what happens.
|
||||
|
||||
### (b) Supporting Iteration
|
||||
|
||||
On occasion, you might want to make one of your own objects support
|
||||
iteration--especially if your object wraps around an existing
|
||||
list or other iterable. In a new file `portfolio.py`, define the
|
||||
following class:
|
||||
|
||||
```python
|
||||
# portfolio.py
|
||||
|
||||
class Portfolio(object):
|
||||
|
||||
def __init__(self, holdings):
|
||||
self._holdings = holdings
|
||||
|
||||
@property
|
||||
def total_cost(self):
|
||||
return sum([s.cost for s in self._holdings])
|
||||
|
||||
def tabulate_shares(self):
|
||||
from collections import Counter
|
||||
total_shares = Counter()
|
||||
for s in self._holdings:
|
||||
total_shares[s.name] += s.shares
|
||||
return total_shares
|
||||
```
|
||||
|
||||
This class is meant to be a layer around a list, but with some
|
||||
extra methods such as the `total_cost` property. Modify the `read_portfolio()`
|
||||
function in `report.py` so that it creates a `Portfolio` instance like this:
|
||||
|
||||
```
|
||||
# report.py
|
||||
...
|
||||
|
||||
import fileparse
|
||||
from stock import Stock
|
||||
from portfolio import Portfolio
|
||||
|
||||
def read_portfolio(filename):
|
||||
'''
|
||||
Read a stock portfolio file into a list of dictionaries with keys
|
||||
name, shares, and price.
|
||||
'''
|
||||
with open(filename) as file:
|
||||
portdicts = fileparse.parse_csv(file,
|
||||
select=['name','shares','price'],
|
||||
types=[str,int,float])
|
||||
|
||||
portfolio = [ Stock(d['name'], d['shares'], d['price']) for d in portdicts ]
|
||||
return Portfolio(portfolio)
|
||||
...
|
||||
```
|
||||
|
||||
Try running the `report.py` program. You will find that it fails spectacularly due to the fact
|
||||
that `Portfolio` instances aren't iterable.
|
||||
|
||||
```python
|
||||
>>> import report
|
||||
>>> report.portfolio_report('Data/portfolio.csv', 'Data/prices.csv')
|
||||
... crashes ...
|
||||
```
|
||||
|
||||
Fix this by modifying the `Portfolio` class to support iteration:
|
||||
|
||||
```python
|
||||
class Portfolio(object):
|
||||
|
||||
def __init__(self, holdings):
|
||||
self._holdings = holdings
|
||||
|
||||
def __iter__(self):
|
||||
return self._holdings.__iter__()
|
||||
|
||||
@property
|
||||
def total_cost(self):
|
||||
return sum([s.shares*s.price for s in self._holdings])
|
||||
|
||||
def tabulate_shares(self):
|
||||
from collections import Counter
|
||||
total_shares = Counter()
|
||||
for s in self._holdings:
|
||||
total_shares[s.name] += s.shares
|
||||
return total_shares
|
||||
```
|
||||
|
||||
After you've made this change, your `report.py` program should work again. While you're
|
||||
at it, fix up your `pcost.py` program to use the new `Portfolio` object. Like this:
|
||||
|
||||
```python
|
||||
# pcost.py
|
||||
|
||||
import report
|
||||
|
||||
def portfolio_cost(filename):
|
||||
'''
|
||||
Computes the total cost (shares*price) of a portfolio file
|
||||
'''
|
||||
portfolio = report.read_portfolio(filename)
|
||||
return portfolio.total_cost
|
||||
...
|
||||
```
|
||||
|
||||
Test it to make sure it works:
|
||||
|
||||
```python
|
||||
>>> import pcost
|
||||
>>> pcost.portfolio_cost('Data/portfolio.csv')
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
### (d) Making a more proper container
|
||||
|
||||
If making a container class, you often want to do more than just
|
||||
iteration. Modify the `Portfolio` class so that it has some other
|
||||
special methods like this:
|
||||
|
||||
```python
|
||||
class Portfolio(object):
|
||||
def __init__(self, holdings):
|
||||
self._holdings = holdings
|
||||
|
||||
def __iter__(self):
|
||||
return self._holdings.__iter__()
|
||||
|
||||
def __len__(self):
|
||||
return len(self._holdings)
|
||||
|
||||
def __getitem__(self, index):
|
||||
return self._holdings[index]
|
||||
|
||||
def __contains__(self, name):
|
||||
return any([s.name == name for s in self._holdings])
|
||||
|
||||
@property
|
||||
def total_cost(self):
|
||||
return sum([s.shares*s.price for s in self._holdings])
|
||||
|
||||
def tabulate_shares(self):
|
||||
from collections import Counter
|
||||
total_shares = Counter()
|
||||
for s in self._holdings:
|
||||
total_shares[s.name] += s.shares
|
||||
return total_shares
|
||||
```
|
||||
|
||||
Now, try some experiments using this new class:
|
||||
|
||||
```
|
||||
>>> import report
|
||||
>>> portfolio = report.read_portfolio('Data/portfolio.csv')
|
||||
>>> len(portfolio)
|
||||
7
|
||||
>>> portfolio[0]
|
||||
Stock('AA', 100, 32.2)
|
||||
>>> portfolio[1]
|
||||
Stock('IBM', 50, 91.1)
|
||||
>>> portfolio[0:3]
|
||||
[Stock('AA', 100, 32.2), Stock('IBM', 50, 91.1), Stock('CAT', 150, 83.44)]
|
||||
>>> 'IBM' in portfolio
|
||||
True
|
||||
>>> 'AAPL' in portfolio
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
One important observation about this--generally code is considered
|
||||
"Pythonic" if it speaks the common vocabulary of how other parts of
|
||||
Python normally work. For container objects, supporting iteration,
|
||||
indexing, containment, and other kinds of operators is an important
|
||||
part of this.
|
||||
|
||||
[Next](02_Customizing_iteration)
|
||||
265
Notes/06_Generators/02_Customizing_iteration.md
Normal file
265
Notes/06_Generators/02_Customizing_iteration.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# 6.2 Customizing Iteration
|
||||
|
||||
This section looks at how you can customize iteration using a generator.
|
||||
|
||||
### A problem
|
||||
|
||||
Suppose you wanted to create your own custom iteration pattern.
|
||||
|
||||
For example, a countdown.
|
||||
|
||||
```python
|
||||
>>> for x in countdown(10):
|
||||
... print(x, end=' ')
|
||||
...
|
||||
10 9 8 7 6 5 4 3 2 1
|
||||
>>>
|
||||
```
|
||||
|
||||
There is an easy way to do this.
|
||||
|
||||
### Generators
|
||||
|
||||
A generator is a function that defines iteration.
|
||||
|
||||
```python
|
||||
def countdown(n):
|
||||
while n > 0:
|
||||
yield n
|
||||
n -= 1
|
||||
```
|
||||
|
||||
For example:
|
||||
|
||||
```python
|
||||
>>> for x in countdown(10):
|
||||
... print(x, end=' ')
|
||||
...
|
||||
10 9 8 7 6 5 4 3 2 1
|
||||
>>>
|
||||
```
|
||||
|
||||
A generator is any function that uses the `yield` statement.
|
||||
|
||||
The behavior of generators is different than a normal function.
|
||||
Calling a generator function creates a generator object. It does not execute the function.
|
||||
|
||||
```python
|
||||
def countdown(n):
|
||||
# Added a print statement
|
||||
print('Counting down from', n)
|
||||
while n > 0:
|
||||
yield n
|
||||
n -= 1
|
||||
```
|
||||
|
||||
```python
|
||||
>>> x = countdown(10)
|
||||
# There is NO PRINT STATEMENT
|
||||
>>> x
|
||||
# x is a generator object
|
||||
<generator object at 0x58490>
|
||||
>>>
|
||||
```
|
||||
|
||||
The function only executes on `__next__()` call.
|
||||
|
||||
```python
|
||||
>>> x = countdown(10)
|
||||
>>> x
|
||||
<generator object at 0x58490>
|
||||
>>> x.__next__()
|
||||
Counting down from 10
|
||||
10
|
||||
>>>
|
||||
```
|
||||
|
||||
`yield` produces a value, but suspends the function execution.
|
||||
The function resumes on next call to `__next__()`.
|
||||
|
||||
```python
|
||||
>>> x.__next__()
|
||||
9
|
||||
>>> x.__next__()
|
||||
8
|
||||
```
|
||||
|
||||
When the generator returns, the iteration raises an error.
|
||||
|
||||
```python
|
||||
>>> x.__next__()
|
||||
1
|
||||
>>> x.__next__()
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ? StopIteration
|
||||
>>>
|
||||
```
|
||||
|
||||
*Observation: A generator function implements the same low-level protocol that the for statements uses on lists, tuples, dicts, files, etc.*
|
||||
|
||||
## Exercises
|
||||
|
||||
### (a) A Simple Generator
|
||||
|
||||
If you ever find yourself wanting to customize iteration, you should
|
||||
always think generator functions. They're easy to write---make
|
||||
a function that carries out the desired iteration logic and use `yield`
|
||||
to emit values.
|
||||
|
||||
For example, try this generator that searches a file for lines containing
|
||||
a matching substring:
|
||||
|
||||
```python
|
||||
>>> def filematch(filename, substr):
|
||||
with open(filename, 'r') as f:
|
||||
for line in f:
|
||||
if substr in line:
|
||||
yield line
|
||||
|
||||
>>> for line in open('Data/portfolio.csv'):
|
||||
print(line, end='')
|
||||
|
||||
name,shares,price
|
||||
"AA",100,32.20
|
||||
"IBM",50,91.10
|
||||
"CAT",150,83.44
|
||||
"MSFT",200,51.23
|
||||
"GE",95,40.37
|
||||
"MSFT",50,65.10
|
||||
"IBM",100,70.44
|
||||
>>> for line in filematch('Data/portfolio.csv', 'IBM'):
|
||||
print(line, end='')
|
||||
|
||||
"IBM",50,91.10
|
||||
"IBM",100,70.44
|
||||
>>>
|
||||
```
|
||||
|
||||
This is kind of interesting--the idea that you can hide a bunch of
|
||||
custom processing in a function and use it to feed a for-loop.
|
||||
The next example looks at a more unusual case.
|
||||
|
||||
### (b) Monitoring a streaming data source
|
||||
|
||||
Generators can be an interesting way to monitor real-time data sources
|
||||
such as log files or stock market feeds. In this part, we'll
|
||||
explore this idea. To start, follow the next instructions carefully.
|
||||
|
||||
The program `Data/stocksim.py` is a program that
|
||||
simulates stock market data. As output, the program constantly writes
|
||||
real-time data to a file `stocklog.csv`. In a
|
||||
separate command window go into the `Data/` directory and run this program:
|
||||
|
||||
```bash
|
||||
bash % python3 stocksim.py
|
||||
```
|
||||
|
||||
If you are on Windows, just locate the `stocksim.py` program and
|
||||
double-click on it to run it. Now, forget about this program (just
|
||||
let it run). Using another window, look at the file
|
||||
`Data/stocklog.csv` being written by the simulator. You should see
|
||||
new lines of text being added to the file every few seconds. Again,
|
||||
just let this program run in the background---it will run for several
|
||||
hours (you shouldn't need to worry about it).
|
||||
|
||||
Once the above program is running, let's write a little program to
|
||||
open the file, seek to the end, and watch for new output. Create a
|
||||
file `follow.py` and put this code in it:
|
||||
|
||||
```python
|
||||
# follow.py
|
||||
import os
|
||||
import time
|
||||
|
||||
f = open('Data/stocklog.csv')
|
||||
f.seek(0, os.SEEK_END) # Move file pointer 0 bytes from end of file
|
||||
|
||||
while True:
|
||||
line = f.readline()
|
||||
if line == '':
|
||||
time.sleep(0.1) # Sleep briefly and retry
|
||||
continue
|
||||
fields = line.split(',')
|
||||
name = fields[0].strip('"')
|
||||
price = float(fields[1])
|
||||
change = float(fields[4])
|
||||
if change < 0:
|
||||
print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')
|
||||
```
|
||||
|
||||
If you run the program, you'll see a real-time stock ticker. Under the hood,
|
||||
this code is kind of like the Unix `tail -f` command that's used to watch a log file.
|
||||
|
||||
Note: The use of the `readline()` method in this example is
|
||||
somewhat unusual in that it is not the usual way of reading lines from
|
||||
a file (normally you would just use a `for`-loop). However, in
|
||||
this case, we are using it to repeatedly probe the end of the file to
|
||||
see if more data has been added (`readline()` will either
|
||||
return new data or an empty string).
|
||||
|
||||
### (c) Using a generator to produce data
|
||||
|
||||
If you look at the code in part (b), the first part of the code is producing
|
||||
lines of data whereas the statements at the end of the `while` loop are consuming
|
||||
the data. A major feature of generator functions is that you can move all
|
||||
of the data production code into a reusable function.
|
||||
|
||||
Modify the code in part (b) so that the file-reading is performed by
|
||||
a generator function `follow(filename)`. Make it so the following code
|
||||
works:
|
||||
|
||||
```python
|
||||
>>> for line in follow('Data/stocklog.csv'):
|
||||
print(line, end='')
|
||||
|
||||
... Should see lines of output produced here ...
|
||||
```
|
||||
|
||||
Modify the stock ticker code so that it looks like this:
|
||||
|
||||
|
||||
```python
|
||||
if __name__ == '__main__':
|
||||
for line in follow('Data/stocklog.csv'):
|
||||
fields = line.split(',')
|
||||
name = fields[0].strip('"')
|
||||
price = float(fields[1])
|
||||
change = float(fields[4])
|
||||
if change < 0:
|
||||
print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')
|
||||
```
|
||||
|
||||
### (d) Watching your portfolio
|
||||
|
||||
Modify the `follow.py` program so that it watches the stream of stock
|
||||
data and prints a ticker showing information for only those stocks
|
||||
in a portfolio. For example:
|
||||
|
||||
```python
|
||||
if __name__ == '__main__':
|
||||
import report
|
||||
|
||||
portfolio = report.read_portfolio('Data/portfolio.csv')
|
||||
|
||||
for line in follow('Data/stocklog.csv'):
|
||||
fields = line.split(',')
|
||||
name = fields[0].strip('"')
|
||||
price = float(fields[1])
|
||||
change = float(fields[4])
|
||||
if name in portfolio:
|
||||
print(f'{name:>10s} {price:>10.2f} {change:>10.2f}')
|
||||
----
|
||||
|
||||
Note: For this to work, your `Portfolio` class must support the
|
||||
`in` operator. See the last exercise and make sure you implement the
|
||||
`__contains__()` operator.
|
||||
|
||||
### Discussion
|
||||
|
||||
Something very powerful just happened here. You moved an interesting iteration pattern
|
||||
(reading lines at the end of a file) into its own little function. The `follow()` function
|
||||
is now this completely general purpose utility that you can use in any program. For
|
||||
example, you could use it to watch server logs, debugging logs, and other similar data sources.
|
||||
That's kind of cool.
|
||||
|
||||
[Next](03_Producers_consumers)
|
||||
301
Notes/06_Generators/03_Producers_consumers.md
Normal file
301
Notes/06_Generators/03_Producers_consumers.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# 6.3 Producers, Consumers and Pipelines
|
||||
|
||||
Generators are a useful tool for setting various kinds of producer/consumer
|
||||
problems and dataflow pipelines. This section discusses that.
|
||||
|
||||
### Producer-Consumer Problems
|
||||
|
||||
Generators are closely related to various forms of *producer-consumer*.
|
||||
|
||||
```python
|
||||
# Producer
|
||||
def follow(f):
|
||||
...
|
||||
while True:
|
||||
...
|
||||
yield line # Produces value in `line` below
|
||||
...
|
||||
|
||||
# Consumer
|
||||
for line in follow(f): # Consumes vale from `yield` above
|
||||
...
|
||||
```
|
||||
|
||||
`yield` produces values that `for` consumes.
|
||||
|
||||
### Generator Pipelines
|
||||
|
||||
You can use this aspect of generators to set up processing pipelines (like Unix pipes).
|
||||
|
||||
*producer* → *processing* → *processing* → *consumer*
|
||||
|
||||
Processing pipes have an initial data producer, some set of intermediate processing stages and a final consumer.
|
||||
|
||||
**producer** → *processing* → *processing* → *consumer*
|
||||
|
||||
```python
|
||||
def producer():
|
||||
...
|
||||
yield item
|
||||
...
|
||||
```
|
||||
|
||||
The producer is typically a generator. Although it could also be a list of some other sequence.
|
||||
`yield` feeds data into the pipeline.
|
||||
|
||||
*producer* → *processing* → *processing* → **consumer**
|
||||
|
||||
```python
|
||||
def consumer(s):
|
||||
for item in s:
|
||||
...
|
||||
```
|
||||
|
||||
Consumer is a for-loop. It gets items and does something with them.
|
||||
|
||||
*producer* → **processing** → **processing** → *consumer*
|
||||
|
||||
```python
|
||||
def processing(s:
|
||||
for item in s:
|
||||
...
|
||||
yield newitem
|
||||
...
|
||||
```
|
||||
|
||||
Intermediate processing stages simultaneously consume and produce items.
|
||||
They might modify the data stream.
|
||||
They can also filter (discarding items).
|
||||
|
||||
*producer* → *processing* → *processing* → *consumer*
|
||||
|
||||
```python
|
||||
def producer():
|
||||
...
|
||||
yield item # yields the item that is received by the `processing`
|
||||
...
|
||||
|
||||
def processing(s:
|
||||
for item in s: # Comes from the `producer`
|
||||
...
|
||||
yield newitem # yields a new item
|
||||
...
|
||||
|
||||
def consumer(s):
|
||||
for item in s: # Comes from the `processing`
|
||||
...
|
||||
```
|
||||
|
||||
Code to setup the pipeline
|
||||
|
||||
```python
|
||||
a = producer()
|
||||
b = processing(a)
|
||||
c = consumer(b)
|
||||
```
|
||||
|
||||
You will notice that data incrementally flows through the different functions.
|
||||
|
||||
## Exercises
|
||||
|
||||
For this exercise the `stocksim.py` program should still be running in the background.
|
||||
You’re going to use the `follow()` function you wrote in the previous exercise.
|
||||
|
||||
### (a) Setting up a simple pipeline
|
||||
|
||||
Let's see the pipelining idea in action. Write the following
|
||||
function:
|
||||
|
||||
```python
|
||||
>>> def filematch(lines, substr):
|
||||
for line in lines:
|
||||
if substr in line:
|
||||
yield line
|
||||
|
||||
>>>
|
||||
```
|
||||
|
||||
This function is almost exactly the same as the first generator
|
||||
example in the previous exercise except that it's no longer
|
||||
opening a file--it merely operates on a sequence of lines given
|
||||
to it as an argument. Now, try this:
|
||||
|
||||
```
|
||||
>>> lines = follow('Data/stocklog.csv')
|
||||
>>> ibm = filematch(lines, 'IBM')
|
||||
>>> for line in ibm:
|
||||
print(line)
|
||||
|
||||
... wait for output ...
|
||||
```
|
||||
|
||||
It might take awhile for output to appear, but eventually you
|
||||
should see some lines containing data for IBM.
|
||||
|
||||
### (b) Setting up a more complex pipeline
|
||||
|
||||
Take the pipelining idea a few steps further by performing
|
||||
more actions.
|
||||
|
||||
```
|
||||
>>> from follow import follow
|
||||
>>> import csv
|
||||
>>> lines = follow('Data/stocklog.csv')
|
||||
>>> rows = csv.reader(lines)
|
||||
>>> for row in rows:
|
||||
print(row)
|
||||
|
||||
['BA', '98.35', '6/11/2007', '09:41.07', '0.16', '98.25', '98.35', '98.31', '158148']
|
||||
['AA', '39.63', '6/11/2007', '09:41.07', '-0.03', '39.67', '39.63', '39.31', '270224']
|
||||
['XOM', '82.45', '6/11/2007', '09:41.07', '-0.23', '82.68', '82.64', '82.41', '748062']
|
||||
['PG', '62.95', '6/11/2007', '09:41.08', '-0.12', '62.80', '62.97', '62.61', '454327']
|
||||
...
|
||||
```
|
||||
|
||||
Well, that's interesting. What you're seeing here is that the output of the
|
||||
`follow()` function has been piped into the `csv.reader()` function and we're
|
||||
now getting a sequence of split rows.
|
||||
|
||||
### (c) Making more pipeline components
|
||||
|
||||
Let's extend the whole idea into a larger pipeline. In a separate file `ticker.py`,
|
||||
start by creating a function that reads a CSV file as you did above:
|
||||
|
||||
```python
|
||||
# ticker.py
|
||||
|
||||
from follow import follow
|
||||
import csv
|
||||
|
||||
def parse_stock_data(lines):
|
||||
rows = csv.reader(lines)
|
||||
return rows
|
||||
|
||||
if __name__ == '__main__':
|
||||
lines = follow('Data/stocklog.csv')
|
||||
rows = parse_stock_data(lines)
|
||||
for row in rows:
|
||||
print(row)
|
||||
```
|
||||
|
||||
Write a new function that selects specific columns:
|
||||
|
||||
```
|
||||
# ticker.py
|
||||
...
|
||||
def select_columns(rows, indices):
|
||||
for row in rows:
|
||||
yield [row[index] for index in indices]
|
||||
...
|
||||
def parse_stock_data(lines):
|
||||
rows = csv.reader(lines)
|
||||
rows = select_columns(rows, [0, 1, 4])
|
||||
return rows
|
||||
```
|
||||
|
||||
Run your program again. You should see output narrowed down like this:
|
||||
|
||||
```
|
||||
['BA', '98.35', '0.16']
|
||||
['AA', '39.63', '-0.03']
|
||||
['XOM', '82.45','-0.23']
|
||||
['PG', '62.95', '-0.12']
|
||||
...
|
||||
```
|
||||
|
||||
Write generator functions that convert data types and build dictionaries.
|
||||
For example:
|
||||
|
||||
```python
|
||||
# ticker.py
|
||||
...
|
||||
|
||||
def convert_types(rows, types):
|
||||
for row in rows:
|
||||
yield [func(val) for func, val in zip(types, row)]
|
||||
|
||||
def make_dicts(rows, headers):
|
||||
for row in rows:
|
||||
yield dict(zip(headers, row))
|
||||
...
|
||||
def parse_stock_data(lines):
|
||||
rows = csv.reader(lines)
|
||||
rows = select_columns(rows, [0, 1, 4])
|
||||
rows = convert_types(rows, [str, float, float])
|
||||
rows = make_dicts(rows, ['name', 'price', 'change'])
|
||||
return rows
|
||||
...
|
||||
```
|
||||
|
||||
Run your program again. You should now a stream of dictionaries like this:
|
||||
|
||||
```
|
||||
{ 'name':'BA', 'price':98.35, 'change':0.16 }
|
||||
{ 'name':'AA', 'price':39.63, 'change':-0.03 }
|
||||
{ 'name':'XOM', 'price':82.45, 'change': -0.23 }
|
||||
{ 'name':'PG', 'price':62.95, 'change':-0.12 }
|
||||
...
|
||||
```
|
||||
|
||||
### (d) Filtering data
|
||||
|
||||
Write a function that filters data. For example:
|
||||
|
||||
```python
|
||||
# ticker.py
|
||||
...
|
||||
|
||||
def filter_symbols(rows, names):
|
||||
for row in rows:
|
||||
if row['name'] in names:
|
||||
yield row
|
||||
```
|
||||
|
||||
Use this to filter stocks to just those in your portfolio:
|
||||
|
||||
```python
|
||||
import report
|
||||
portfolio = report.read_portfolio('Data/portfolio.csv')
|
||||
rows = parse_stock_data(follow('Data/stocklog.csv'))
|
||||
rows = filter_symbols(rows, portfolio)
|
||||
for row in rows:
|
||||
print(row)
|
||||
```
|
||||
|
||||
### (e) Putting it all together
|
||||
|
||||
In the `ticker.py` program, write a function `ticker(portfile, logfile, fmt)`
|
||||
that creates a real-time stock ticker from a given portfolio, logfile,
|
||||
and table format. For example::
|
||||
|
||||
```python
|
||||
>>> from ticker import ticker
|
||||
>>> ticker('Data/portfolio.csv', 'Data/stocklog.csv', 'txt')
|
||||
Name Price Change
|
||||
---------- ---------- ----------
|
||||
GE 37.14 -0.18
|
||||
MSFT 29.96 -0.09
|
||||
CAT 78.03 -0.49
|
||||
AA 39.34 -0.32
|
||||
...
|
||||
|
||||
>>> ticker('Data/portfolio.csv', 'Data/stocklog.csv', 'csv')
|
||||
Name,Price,Change
|
||||
IBM,102.79,-0.28
|
||||
CAT,78.04,-0.48
|
||||
AA,39.35,-0.31
|
||||
CAT,78.05,-0.47
|
||||
...
|
||||
```
|
||||
|
||||
### Discussion
|
||||
|
||||
Some lessons learned: You can create various generator functions and
|
||||
chain them together to perform processing involving data-flow
|
||||
pipelines. In addition, you can create functions that package a
|
||||
series of pipeline stages into a single function call (for example,
|
||||
the `parse_stock_data()` function).
|
||||
|
||||
[Next](04_More_generators)
|
||||
|
||||
|
||||
179
Notes/06_Generators/04_More_generators.md
Normal file
179
Notes/06_Generators/04_More_generators.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# 6.4 More Generators
|
||||
|
||||
This section introduces a few additional generator related topics including
|
||||
generator expressions and the itertools module.
|
||||
|
||||
### Generator Expressions
|
||||
|
||||
A generator version of a list comprehension.
|
||||
|
||||
```python
|
||||
>>> a = [1,2,3,4]
|
||||
>>> b = (2*x for x in a)
|
||||
>>> b
|
||||
<generator object at 0x58760>
|
||||
>>> for i in b:
|
||||
... print(i, end=' ')
|
||||
...
|
||||
2 4 6 8
|
||||
>>>
|
||||
```
|
||||
|
||||
Differences with List Comprehensions.
|
||||
|
||||
* Does not construct a list.
|
||||
* Only useful purpose is iteration.
|
||||
* Once consumed, can't be reused.
|
||||
|
||||
General syntax.
|
||||
|
||||
```python
|
||||
(<expression> for i in s if <conditional>)
|
||||
```
|
||||
|
||||
It can also serve as a function argument.
|
||||
|
||||
```python
|
||||
sum(x*x for x in a)
|
||||
```
|
||||
|
||||
It can be applied to any iterable.
|
||||
|
||||
```python
|
||||
>>> a = [1,2,3,4]
|
||||
>>> b = (x*x for x in a)
|
||||
>>> c = (-x for x in b)
|
||||
>>> for i in c:
|
||||
... print(i, end=' ')
|
||||
...
|
||||
-1 -4 -9 -16
|
||||
>>>
|
||||
```
|
||||
|
||||
The main use of generator expressions is in code that performs some
|
||||
calculation on a sequence, but only uses the result once. For
|
||||
example, strip all comments from a file.
|
||||
|
||||
```python
|
||||
f = open('somefile.txt')
|
||||
lines = (line for line in f if not line.startswith('#'))
|
||||
for line in lines:
|
||||
...
|
||||
f.close()
|
||||
```
|
||||
|
||||
With generators, the code runs faster and uses little memory. It's like a filter applied to a stream.
|
||||
|
||||
### Why Generators
|
||||
|
||||
* Many problems are much more clearly expressed in terms of iteration.
|
||||
* Looping over a collection of items and performing some kind of operation (searching, replacing, modifying, etc.).
|
||||
* Processing pipelines can be applied to a wide range of data processing problems.
|
||||
* Better memory efficiency.
|
||||
* Only produce values when needed.
|
||||
* Contrast to constructing giant lists.
|
||||
* Can operate on streaming data
|
||||
* Generators encourage code reuse
|
||||
* Separates the *iteration* from code that uses the iteration
|
||||
* You can build a toolbox of interesting iteration functions and *mix-n-match*.
|
||||
|
||||
### `itertools` module
|
||||
|
||||
The `itertools` is a library module with various functions designed to help with iterators/generators.
|
||||
|
||||
```python
|
||||
itertools.chain(s1,s2)
|
||||
itertools.count(n)
|
||||
itertools.cycle(s)
|
||||
itertools.dropwhile(predicate, s)
|
||||
itertools.groupby(s)
|
||||
itertools.ifilter(predicate, s)
|
||||
itertools.imap(function, s1, ... sN)
|
||||
itertools.repeat(s, n)
|
||||
itertools.tee(s, ncopies)
|
||||
itertools.izip(s1, ... , sN)
|
||||
```
|
||||
|
||||
All functions process data iteratively.
|
||||
They implement various kinds of iteration patterns.
|
||||
|
||||
More information at [Generator Tricks for Systems Programmers](http://www.dabeaz.com/generators/) tutorial from PyCon '08.
|
||||
|
||||
## Exercises
|
||||
|
||||
In the previous exercises, you wrote some code that followed lines being written to a log file and parsed them into a sequence of rows.
|
||||
This exercise continues to build upon that. Make sure the `Data/stocksim.py` is still running.
|
||||
|
||||
### (a) Generator Expressions
|
||||
|
||||
Generator expressions are a generator version of a list comprehension.
|
||||
For example:
|
||||
|
||||
```python
|
||||
>>> nums = [1, 2, 3, 4, 5]
|
||||
>>> squares = (x*x for x in nums)
|
||||
>>> squares
|
||||
<generator object <genexpr> at 0x109207e60>
|
||||
>>> for n in squares:
|
||||
... print(n)
|
||||
...
|
||||
1
|
||||
4
|
||||
9
|
||||
16
|
||||
25
|
||||
```
|
||||
|
||||
Unlike a list a comprehension, a generator expression can only be used once.
|
||||
Thus, if you try another for-loop, you get nothing:
|
||||
|
||||
```python
|
||||
>>> for n in squares:
|
||||
... print(n)
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) Generator Expressions in Function Arguments
|
||||
|
||||
Generator expressions are sometimes placed into function arguments.
|
||||
It looks a little weird at first, but try this experiment:
|
||||
|
||||
```python
|
||||
>>> nums = [1,2,3,4,5]
|
||||
>>> sum([x*x for x in nums]) # A list comprehension
|
||||
55
|
||||
>>> sum(x*x for x in nums) # A generator expression
|
||||
55
|
||||
>>>
|
||||
```
|
||||
In the above example, the second version using generators would
|
||||
use significantly less memory if a large list was being manipulated.
|
||||
|
||||
In your `portfolio.py` file, you performed a few calculations
|
||||
involving list comprehensions. Try replacing these with
|
||||
generator expressions.
|
||||
|
||||
### (c) Code simplification
|
||||
|
||||
Generators expressions are often a useful replacement for
|
||||
small generator functions. For example, instead of writing a
|
||||
function like this:
|
||||
|
||||
```python
|
||||
def filter_symbols(rows, names):
|
||||
for row in rows:
|
||||
if row['name'] in names:
|
||||
yield row
|
||||
```
|
||||
|
||||
You could write something like this:
|
||||
|
||||
```python
|
||||
rows = (row for row in rows if row['name'] in names)
|
||||
```
|
||||
|
||||
Modify the `ticker.py` program to use generator expressions
|
||||
as appropriate.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user