Added sections 2-4

This commit is contained in:
David Beazley
2020-05-25 12:40:11 -05:00
parent 7a4423dee4
commit c06face3a5
21 changed files with 5639 additions and 0 deletions

View File

@@ -0,0 +1,7 @@
# Working With Data Overview
In this section, we look at how Python programmers represent and work with data.
Most programs today work with data. We are going to learn to common programming idioms and how to not shoot yourself in the foot.
We will take a look at part of the object-model in Python. Which is a big part of understanding most Python programs.

View File

@@ -0,0 +1,431 @@
# 2.1 Datatypes and Data structures
This section introduces data structures in the form of tuples and dicts.
### Primitive Datatypes
Python has a few primitive types of data:
* Integers
* Floating point numbers
* Strings (text)
We have learned about these in the previous section.
### None type
```python
email_address = None
```
This type is often used as a placeholder for optional or missing value.
```python
if email_address:
send_email(email_address, msg)
```
### Data Structures
Real programs have more complex data than the ones that can be easily represented by the datatypes learned so far.
For example information about a stock:
```code
100 shares of GOOG at $490.10
```
This is an "object" with three parts:
* Name or symbol of the stock ("GOOG", a string)
* Number of shares (100, an integer)
* Price (490.10 a float)
### Tuples
A tuple is a collection of values grouped together.
Example:
```python
s = ('GOOG', 100, 490.1)
```
Sometimes the `()` are ommitted in the syntax.
```python
s = 'GOOG', 100, 490.1
```
Special cases (0-tuple, 1-typle).
```python
t = () # An empty tuple
w = ('GOOG', ) # A 1-item tuple
```
Tuples are usually used to represent *simple* records or structures.
Typically, it is a single *object* of multiple parts. A good analogy: *A tuple is like a single row in a database table.*
Tuple contents are ordered (like an array).
```python
s = ('GOOG', 100, 490.1)
name = s[0] # 'GOOG'
shares = s[1] # 100
price = s[2] # 490.1
```
However, th contents can't be modified.
```pycon
>>> s[1] = 75
TypeError: object does not support item assignment
```
You can, however, make a new tuple based on a current tuple.
```python
s = (s[0], 75, s[2])
```
### Tuple Packing
Tuples are focused more on packing related items together into a single *entity*.
```python
s = ('GOOG', 100, 490.1)
```
The tuple is then easy to pass around to other parts of a program as a single object.
### Tuple Unpacking
To use the tuple elsewhere, you can unpack its parts into variables.
```python
name, shares, price = s
print('Cost', shares * price)
```
The number of variables must match the tuple structure.
```python
name, shares = s # ERROR
Traceback (most recent call last):
...
ValueError: too many values to unpack
```
### Tuples vs. Lists
Tuples are NOT just read-only lists. Tuples are most ofter used for a *single item* consisting of multiple parts.
Lists are usually a collection of distinct items, usually all of the same type.
```python
record = ('GOOG', 100, 490.1) # A tuple representing a stock in a portfolio
symbols = [ 'GOOG', 'AAPL', 'IBM' ] # A List representing three stock symbols
```
### Dictionaries
A dictionary is a hash table or associative array.
It is a collection of values indexed by *keys*. These keys serve as field names.
```python
s = {
'name': 'GOOG',
'shares': 100,
'price': 490.1
}
```
### Common operations
To read values from a dictionary use the key names.
```pycon
>>> print(s['name'], s['shares'])
GOOG 100
>>> s['price']
490.10
>>>
```
To add or modify values assign using the key names.
```pycon
>>> s['shares'] = 75
>>> s['date'] = '6/6/2007'
>>>
```
To delete a value use the `del` statement.
```pycon
>>> del s['date']
>>>
```
### Why dictionaries?
Dictionaries are useful when there are *many* different values and those values
might be modified or manipulated. Dictionaries make your code more readable.
```python
s['price']
# vs
s[2]
```
## Exercises
### Note
In the last few exercises, you wrote a program that read a datafile `Data/portfolio.csv`. Using the `csv` module, it is easy to read the file row-by-row.
```pycon
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> next(rows)
['name', 'shares', 'price']
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>>
```
Although reading the file is easy, you often want to do more with the data than read it.
For instance, perhaps you want to store it and start performing some calculations on it.
Unfortunately, a raw "row" of data doesnt give you enough to work with. For example, even a simple math calculation doesnt work:
```pycon
>>> row = ['AA', '100', '32.20']
>>> cost = row[1] * row[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>
```
To do more, you typically want to interpret the raw data in some way and turn it into a more useful kind of object so that you can work with it later.
Two simple options are tuples or dictionaries.
### (a) Tuples
At the interactive prompt, create the following tuple that represents
the above row, but with the numeric columns converted to proper
numbers:
```pycon
>>> t = (row[0], int(row[1]), float(row[2]))
>>> t
('AA', 100, 32.2)
>>>
```
Using this, you can now calculate the total cost by multiplying the shares and the price:
```pycon
>>> cost = t[1] * t[2]
>>> cost
3220.0000000000005
>>>
```
Is math broken in Python? Whats the deal with the answer of
3220.0000000000005?
This is an artifact of the floating point hardware on your computer
only being able to accurately represent decimals in Base-2, not
Base-10. For even simple calculations involving base-10 decimals,
small errors are introduced. This is normal, although perhaps a bit
surprising if you havent seen it before.
This happens in all programming languages that use floating point
decimals, but it often gets hidden when printing. For example:
```pycon
>>> print(f'{cost:0.2f}')
3220.00
>>>
```
Tuples are read-only. Verify this by trying to change the number of shares to 75.
```pycon
>>> t[1] = 75
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>>
```
Although you cant change tuple contents, you can always create a completely new tuple that replaces the old one.
```pycon
>>> t = (t[0], 75, t[2])
>>> t
('AA', 75, 32.2)
>>>
```
Whenever you reassign an existing variable name like this, the old
value is discarded. Although the above assignment might look like you
are modifying the tuple, you are actually creating a new tuple and
throwing the old one away.
Tuples are often used to pack and unpack values into variables. Try the following:
```pycon
>>> name, shares, price = t
>>> name
'AA'
>>> shares
75
>>> price
32.2
>>>
```
Take the above variables and pack them back into a tuple
```pycon
>>> t = (name, 2*shares, price)
>>> t
('AA', 150, 32.2)
>>>
```
### (b) Dictionaries as a data structure
An alternative to a tuple is to create a dictionary instead.
```pycon
>>> d = {
'name' : row[0],
'shares' : int(row[1]),
'price' : float(row[2])
}
>>> d
{'name': 'AA', 'shares': 100, 'price': 32.2 }
>>>
```
Calculate the total cost of this holding:
```pycon
>>> cost = d['shares'] * d['price']
>>> cost
3220.0000000000005
>>>
```
Compare this example with the same calculation involving tuples above. Change the number of shares to 75.
```pycon
>>> d['shares'] = 75
>>> d
{'name': 'AA', 'shares': 75, 'price': 75}
>>>
```
Unlike tuples, dictionaries can be freely modified. Add some attributes:
```pycon
>>> d['date'] = (6, 11, 2007)
>>> d['account'] = 12345
>>> d
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007), 'account': 12345}
>>>
```
### (c) Some additional dictionary operations
If you turn a dictionary into a list, youll get all of its keys:
```pycon
>>> list(d)
['name', 'shares', 'price', 'date', 'account']
>>>
```
Similarly, if you use the `for` statement to iterate on a dictionary, you will get the keys:
```pycon
>>> for k in d:
print('k =', k)
k = name
k = shares
k = price
k = date
k = account
>>>
```
Try this variant that performs a lookup at the same time:
```pycon
>>> for k in d:
print(k, '=', d[k])
name = AA
shares = 75
price = 32.2
date = (6, 11, 2007)
account = 12345
>>>
```
You can also obtain all of the keys using the `keys()` method:
```pycon
>>> keys = d.keys()
>>> keys
dict_keys(['name', 'shares', 'price', 'date', 'account'])
>>>
```
`keys()` is a bit unusual in that it returns a special `dict_keys` object.
This is an overlay on the original dictionary that always gives you the current keys—even if the dictionary changes. For example, try this:
```pycon
>>> del d['account']
>>> keys
dict_keys(['name', 'shares', 'price', 'date'])
>>>
```
Carefully notice that the `'account'` disappeared from `keys` even though you didnt call `d.keys()` again.
A more elegant way to work with keys and values together is to use the `items()` method. This gives you `(key, value)` tuples:
```pycon
>>> items = d.items()
>>> items
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
>>> for k, v in d.items():
print(k, '=', v)
name = AA
shares = 75
price = 32.2
date = (6, 11, 2007)
>>>
```
If you have tuples such as `items`, you can create a dictionary using the `dict()` function. Try it:
```pycon
>>> items
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
>>> d = dict(items)
>>> d
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007)}
>>>
```
[Next](02_Containers)

View File

@@ -0,0 +1,413 @@
# Containers
### Overview
Programs often have to work with many objects.
* A portfolio of stocks
* A table of stock prices
There are three main choices to use.
* Lists. Ordered data.
* Dictionaries. Unordered data.
* Sets. Unordered collection
### Lists as a Container
Use a list when the order of the data matters. Remember that lists can hold any kind of objects.
For example, a list of tuples.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.3),
('CAT', 150, 83.44)
]
portfolio[0] # ('GOOG', 100, 490.1)
portfolio[2] # ('CAT', 150, 83.44)
```
### List construction
Building a list from scratch.
```python
records = [] # Initial empty list
# Use .append() to add more items
records.append(('GOOG', 100, 490.10))
records.append(('IBM', 50, 91.3))
...
```
An example when reading records from a file.
```python
records = [] # Initial empty list
with open('portfolio.csv', 'rt') as f:
for line in f:
row = line.split(',')
records.append((row[0], int(row[1])), float(row[2]))
```
### Dicts as a Container
Dictionaries are useful if you want fast random lookups (by key name). For
example, a dictionary of stock prices:
```python
prices = {
'GOOG': 513.25,
'CAT': 87.22,
'IBM': 93.37,
'MSFT': 44.12
}
```
Here are some simple lookups:
```pycon
>>> prices['IBM']
93.37
>>> prices['GOOG']
513.25
>>>
```
### Dict Construction
Example of building a dict from scratch.
```python
prices = {} # Initial empty dict
# Insert new items
prices['GOOG'] = 513.25
prices['CAT'] = 87.22
prices['IBM'] = 93.37
```
An example populating the dict from the contents of a file.
```python
prices = {} # Initial empty dict
with open('prices.csv', 'rt') as f:
for line in f:
row = line.split(',')
prices[row[0]] = float(row[1])
```
### Dictionary Lookups
You can test the existence of a key.
```python
if key in d:
# YES
else:
# NO
```
You can look up a value that might not exist and provide a default value in case it doesn't.
```python
name = d.get(key, default)
```
An example:
```python
>>> prices.get('IBM', 0.0)
93.37
>>> prices.get('SCOX', 0.0)
0.0
>>>
```
### Composite keys
Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable.
For example, tuples:
```python
holidays = {
(1, 1) : 'New Years',
(3, 14) : 'Pi day',
(9, 13) : "Programmer's day",
}
```
Then to access:
```pycon
>>> holidays[3, 14] 'Pi day'
>>>
```
*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.*
### Sets
Sets are collection of unordered unique items.
```python
tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative sintax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
```
Sets are useful for membership tests.
```pycon
>>> tech_stocks
set(['AAPL', 'IBM', 'MSFT'])
>>> 'IBM' in tech_stocks
True
>>> 'FB' in tech_stocks
False
>>>
```
Sets are also useful for duplicate elimination.
```python
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
unique = set(names)
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])
```
Additional set operations:
```python
names.add('CAT') # Add an item
names.remove('YHOO') # Remove an item
s1 | s2 # Set union
s1 & s2 # Set intersection
s1 - s2 # Set difference
```
## Exercises
### Objectives
### Exercise A: A list of tuples
The file `Data/portfolio.csv` contains a list of stocks in a portfolio.
In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation.
Your code should have looked something like this:
```python
# pcost.py
import csv
def portfolio_cost(filename):
'''Computes the total cost (shares*price) of a portfolio file'''
total_cost = 0.0
with open(filename, 'rt') as f:
rows = csv.reader(f)
headers = next(rows)
for row in rows:
nshares = int(row[1])
price = float(row[2])
total_cost += nshares * price
return total_cost
```
Using this code as a rough guide, create a new file `report.py`. In
that file, define a function `read_portfolio(filename)` that opens a
given portfolio file and reads it into a list of tuples. To do this,
youre going to make a few minor modifications to the above code.
First, instead of defining `total_cost = 0`, youll make a variable thats initially set to an empty list. For example:
```python
portfolio = []
```
Next, instead of totaling up the cost, youll turn each row into a
tuple exactly as you just did in the last exercise and append it to
this list. For example:
```python
for row in rows:
holding = (row[0], int(row[1]), float(row[2]))
portfolio.append(holding)
```
Finally, youll return the resulting `portfolio` list.
Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter):
*Hint: Use `-i` when executing the file in the terminal*
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> portfolio
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
>>>
>>> portfolio[0]
('AA', 100, 32.2)
>>> portfolio[1]
('IBM', 50, 91.1)
>>> portfolio[1][1]
50
>>> total = 0.0
>>> for s in portfolio:
total += s[1] * s[2]
>>> print(total)
44671.15
>>>
```
This list of tuples that you have created is very similar to a 2-D array.
For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers.
That said, you can also rewrite the last for-loop using a statement like this:
```python
>>> total = 0.0
>>> for name, shares, price in portfolio:
total += shares*price
>>> print(total)
44671.15
>>>
```
### (b) List of Dictionaries
Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple.
In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.
Experiment with this new function in the same manner as you did in part (a).
```pycon
>>> portfolio = read_portfolio('portfolio.csv')
>>> portfolio
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
{'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
{'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
{'name': 'IBM', 'shares': 100, 'price': 70.44}]
>>> portfolio[0]
{'name': 'AA', 'shares': 100, 'price': 32.2}
>>> portfolio[1]
{'name': 'IBM', 'shares': 50, 'price': 91.1}
>>> portfolio[1]['shares']
50
>>> total = 0.0
>>> for s in portfolio:
total += s['shares']*s['price']
>>> print(total)
44671.15
>>>
```
Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers.
This is often preferred because the resulting code is easier to read later.
Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function.
```pycon
>>> from pprint import pprint
>>> pprint(portfolio)
[{'name': 'AA', 'price': 32.2, 'shares': 100},
{'name': 'IBM', 'price': 91.1, 'shares': 50},
{'name': 'CAT', 'price': 83.44, 'shares': 150},
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
{'name': 'GE', 'price': 40.37, 'shares': 95},
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
>>>
```
### (c) Dictionaries as a container
A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer.
In the Python shell, try playing with a dictionary:
```pycon
>>> prices = { }
>>> prices['IBM'] = 92.45
>>> prices['MSFT'] = 45.12
>>> prices
... look at the result ...
>>> prices['IBM']
92.45
>>> prices['AAPL']
... look at the result ...
>>> 'AAPL' in prices
False
>>>
```
The file `Data/prices.csv` contains a series of lines with stock prices.
The file looks something like this:
```csv
"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
"C",3.72
...
```
Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.
To do this, start with an empty dictionary and start inserting values into it just
as you did above. However, you are reading the values from a file now.
Well use this data structure to quickly lookup the price of a given stock name.
A few little tips that youll need for this part. First, make sure you use the `csv` module just as you did before—theres no need to reinvent the wheel here.
```pycon
>>> import csv
>>> f = open('Data/prices.csv', 'r')
>>> rows = csv.reader(f)
>>> for row in rows:
print(row)
['AA', '9.22']
['AXP', '24.85']
...
[]
>>>
```
The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.
Theres a possibility that this could cause your program to die with an exception.
Use the `try` and `except` statements to catch this as appropriate.
Once you have written your `read_prices()` function, test it interactively to make sure it works:
```python
>>> prices = read_prices('Data/prices.csv')
>>> prices['IBM']
106.28
>>> prices['MSFT']
20.89
>>>
```
### (e) Finding out if you can retire
Tie all of this work together by adding the statements to your `report.py` program.
It takes the list of stocks in part (b) and the dictionary of prices in part (c) and
computes the current value of the portfolio along with the gain/loss.
[Next](03_Formatting)

View File

@@ -0,0 +1,276 @@
# 2.3 Formatting
This is a slight digression, but when you work with data, you often want to
produce structured output (tables, etc.). For example:
```code
Name Shares Price
---------- ---------- -----------
AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
```
### String Formatting
One way to format string in Python 3.6+ is with `f-strings`.
```python
>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
' IBM 100 91.10'
>>>
```
The part `{expression:format}` is replaced.
It is commonly used with `print`.
```python
print(f'{name:>10s} {shares:>10d} {price:>10.2f}')
```
### Format codes
Format codes (after the `:` inside the `{}`) are similar to C `printf()`. Common codes
include:
```code
d Decimal integer
b Binary integer
x Hexadecimal integer
f Float as [-]m.dddddd
e Float as [-]m.dddddde+-xx
g Float, but selective use of E notation s String
c Character (from integer)
```
Common modifiers adjust the field width and decimal precision. This is a partial list:
```code
:>10d Integer right aligned in 10-character field
:<10d Integer left aligned in 10-character field
:^10d Integer centered in 10-character field :0.2f Float with 2 digit precision
```
### Dictionary Formatting
You can use the `format_map()` method on strings.
```python
>>> s = {
'name': 'IBM',
'shares': 100,
'price': 91.1
}
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
' IBM 100 91.10'
>>>
```
It uses the same `f-strings` but takes the values from the supplied dictionary.
### C-Style Formatting
You can also use the formatting operator `%`.
```python
>>> 'The value is %d' % 3
'The value is 3'
>>> '%5d %-5d %10d' % (3,4,5)
' 3 4 5'
>>> '%0.2f' % (3.1415926,)
'3.14'
```
This requires a single item or a tuple on the right. Format codes are modeled after the C `printf()` as well.
*Note: This is the only formatting available on byte strings.*
```python
>>> b'%s has %n messages' % (b'Dave', 37)
b'Dave has 37 messages'
>>>
```
## Exercises
In the previous exercise, you wrote a program called `report.py` that computed the gain/loss of a
stock portfolio. In this exercise, you're going to modify it to produce a table like this:
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
GE 95 13.48 -26.89
MSFT 50 20.89 -44.21
IBM 100 106.28 35.84
```
In this report, "Price" is the current share price of the stock and "Change" is the change in the share price from the initial purchase price.
### (a) How to format numbers
A common problem with printing numbers is specifying the number of decimal places. One way to fix this is to use f-strings. Try
these examples:
```python
>>> value = 42863.1
>>> print(value)
42863.1
>>> print(f'{value:0.4f}')
42863.1000
>>> print(f'{value:>16.2f}')
42863.10
>>> print(f'{value:<16.2f}')
42863.10
>>> print(f'{value:*>16,.2f}')
*******42,863.10
>>>
```
Full documentation on the formatting codes used f-strings can be found
[here](https://docs.python.org/3/library/string.html#format-specification-mini-language). Formatting
is also sometimes performed using the `%` operator of strings.
```pycon
>>> print('%0.4f' % value)
42863.1000
>>> print('%16.2f' % value)
42863.10
>>>
```
Documentation on various codes used with `%` can be found [here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting).
Although its commonly used with `print`, string formatting is not tied to printing.
If you want to save a formatted string. Just assign it to a variable.
```pycon
>>> f = '%0.4f' % value
>>> f
'42863.1000'
>>>
```
### (b) Collecting Data
In order to generate the above report, youll first want to collect
all of the data shown in the table. Write a function `make_report()`
that takes a list of stocks and dictionary of prices as input and
returns a list of tuples containing the rows of the above table.
Add this function to your `report.py` file. Heres how it should work if you try it interactively:
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> prices = read_prices('Data/prices.csv')
>>> report = make_report(portfolio, prices)
>>> for r in report:
print(r)
('AA', 100, 9.22, -22.980000000000004)
('IBM', 50, 106.28, 15.180000000000007)
('CAT', 150, 35.46, -47.98)
('MSFT', 200, 20.89, -30.339999999999996)
('GE', 95, 13.48, -26.889999999999997)
...
>>>
```
### (c) Printing a formatted table
Redo the above for-loop, but change the print statement to format the tuples.
```pycon
>>> for r in report:
print('%10s %10d %10.2f %10.2f' % r)
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
...
>>>
```
You can also expand the values and use f-strings. For example:
```pycon
>>> for name, shares, price, change in report:
print(f'{name:>10s} {shares:>10d} {price:>10.2f} {change:>10.2f}')
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
...
>>>
```
Take the above statements and add them to your `report.py` program.
Have your program take the output of the `make_report()` function and print a nicely formatted table as shown.
### (d) Adding some headers
Suppose you had a tuple of header names like this:
```python
headers = ('Name', 'Shares', 'Price', 'Change')
```
Add code to your program that takes the above tuple of headers and
creates a string where each header name is right-aligned in a
10-character wide field and each field is separated by a single space.
```python
' Name Shares Price Change'
```
Write code that takes the headers and creates the separator string between the headers and data to follow.
This string is just a bunch of "-" characters under each field name. For example:
```python
'---------- ---------- ---------- -----------'
```
When youre done, your program should produce the table shown at the top of this exercise.
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
GE 95 13.48 -26.89
MSFT 50 20.89 -44.21
IBM 100 106.28 35.84
```
### (e) Formatting Challenge
How would you modify your code so that the price includes the currency symbol ($) and the output looks like this:
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 $9.22 -22.98
IBM 50 $106.28 15.18
CAT 150 $35.46 -47.98
MSFT 200 $20.89 -30.34
GE 95 $13.48 -26.89
MSFT 50 $20.89 -44.21
IBM 100 $106.28 35.84
```
[Next](04_Sequences)

View File

@@ -0,0 +1,538 @@
# 2.4 Sequences
In this part, we look at some common idioms for working with sequence data.
### Introduction
Python has three *sequences* datatypes.
* String: `'Hello'`. A string is considered a sequence of characters.
* List: `[1, 4, 5]`.
* Tuple: `('GOOG', 100, 490.1)`.
All sequences are ordered and have length.
```python
a = 'Hello' # String
b = [1, 4, 5] # List
c = ('GOOG', 100, 490.1) # Tuple
# Indexed order
a[0] # 'H'
b[-1] # 5
c[1] # 100
# Length of sequence
len(a) # 5
len(b) # 3
len(c) # 3
```
Sequences can be replicated: `s * n`.
```pycon
>>> a = 'Hello'
>>> a * 3
'HelloHelloHello'
>>> b = [1, 2, 3]
>>> b * 2
[1, 2, 3, 1, 2, 3]
>>>
```
Sequences of the same type can be concatenated: `s + t`.
```pycon
>>> a = (1, 2, 3)
>>> b = (4, 5)
>>> a + b
(1, 2, 3, 4, 5)
>>>
>>> c = [1, 5]
>>> a + c
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "list") to tuple
```
### Slicing
Slicing means to take a subsequence from a sequence.
The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
```python
a = [0,1,2,3,4,5,6,7,8]
a[2:5] # [2,3,4]
a[-5:] # [4,5,6,7,8]
a[:3] # [0,1,2]
```
* Indices `start` and `end` must be integers.
* Slices do *not* include the end value.
* If indices are omitted, they default to the beginning or end of the list.
### Slice re-assignment
Slices can also be reassigned and deleted.
```python
# Reassignment
a = [0,1,2,3,4,5,6,7,8]
a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8]
```
*Note: The reassigned slice doesn't need to have the same length.*
```python
# Deletion
a = [0,1,2,3,4,5,6,7,8]
del a[2:4] # [0,1,4,5,6,7,8]
```
### Sequence Reductions
There are some functions to reduce a sequence to a single value.
```pycon
>>> s = [1, 2, 3, 4]
>>> sum(s)
10
>>> min(s) 1
>>> max(s) 4
>>> t = ['Hello', 'World']
>>> max(t)
'World'
>>>
```
### Iteration over a sequence
The for-loop iterates over the elements in the sequence.
```pycon
>>> s = [1, 4, 9, 16]
>>> for i in s:
... print(i)
...
1
4
9
16
>>>
```
On each iteration of the loop, you get a new item to work with.
This new value is placed into an iteration variable. In this example, the
iteration variable is `x`:
```python
for x in s: # `x` is an iteration variable
...statements
```
In each iteration, it overwrites the previous value (if any).
After the loop finishes, the variable retains the last value.
### `break` statement
You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
```python
for name in namelist:
if name == 'Jake':
break
...
...
statements
```
When the `break` statement is executed, it will exit the loop and move
on the next `statements`. The `break` statement only applies to the
inner-most loop. If this loop is within another loop, it will not
break the outer loop.
### `continue` statement
To skip one element and move to the next one you use the `continue` statement.
```python
for line in lines:
if line == '\n': # Skip blank lines
continue
# More statements
...
```
This is useful when the current item is not of interest or needs to be ignored in the processing.
### Looping over integers
If you need to count, use `range()`.
```python
for i in range(100):
# i = 0,1,...,99
```
The syntax is `range([start,] end [,step])`
```python
for i in range(100):
# i = 0,1,...,99
for j in range(10,20):
# j = 10,11,..., 19
for k in range(10,50,2):
# k = 10,12,...,48
# Notice how it counts in steps of 2, not 1.
```
* The ending value is never included. It mirrors the behavior of slices.
* `start` is optional. Default `0`.
* `step` is optional. Default `1`.
### `enumerate()` function
The `enumerate` function provides a loop with an extra counter value.
```python
names = ['Elwood', 'Jake', 'Curtis']
for i, name in enumerate(names):
# Loops with i = 0, name = 'Elwood'
# i = 1, name = 'Jake'
# i = 2, name = 'Curtis'
```
How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
A good example of using `enumerate()` is tracking line numbers while reading a file:
```python
with open(filename) as f:
for lineno, line in enumerate(f, start=1):
...
```
In the end, `enumerate` is just a nice shortcut for:
```python
i = 0
for x in s:
statements
i += 1
```
Using `enumerate` is less typing and runs slightly faster.
### For and tuples
You can loop with multiple iteration variables.
```python
points = [
(1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
]
for x, y in points:
# Loops with x = 1, y = 4
# x = 10, y = 40
# x = 23, y = 14
# ...
```
When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
### `zip()` function
The `zip` function takes sequences and makes an iterator that combines them.
```python
columns = ['name', 'shares', 'price']
values = ['GOOG', 100, 490.1 ]
pairs = zip(a, b)
# ('name','GOOG'), ('shares',100), ('price',490.1)
```
To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
```python
for column, value in pairs:
...
```
A common use of `zip` is to create key/value pairs for constructing dictionaries.
```python
d = dict(zip(columns, values))
```
## Exercises
### (a) Counting
Try some basic counting examples:
```pycon
>>> for n in range(10): # Count 0 ... 9
print(n, end=' ')
0 1 2 3 4 5 6 7 8 9
>>> for n in range(10,0,-1): # Count 10 ... 1
print(n, end=' ')
10 9 8 7 6 5 4 3 2 1
>>> for n in range(0,10,2): # Count 0, 2, ... 8
print(n, end=' ')
0 2 4 6 8
>>>
```
### (b) More sequence operations
Interactively experiment with some of the sequence reduction operations.
```pycon
>>> data = [4, 9, 1, 25, 16, 100, 49]
>>> min(data)
1
>>> max(data)
100
>>> sum(data)
204
>>>
```
Try looping over the data.
```pycon
>>> for x in data:
print(x)
4
9
...
>>> for n, x in enumerate(data):
print(n, x)
0 4
1 9
2 1
...
>>>
```
Sometimes the `for` statement, `len()`, and `range()` get used by
novices in some kind of horrible code fragment that looks like it
emerged from the depths of a rusty C program.
```pycon
>>> for n in range(len(data)):
print(data[n])
4
9
1
...
>>>
```
Dont do that! Not only does reading it make everyones eyes bleed, its inefficient with memory and it runs a lot slower.
Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason.
### (c) A practical `enumerate()` example
Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
```python
>>> cost = portfolio_cost('Data/missing.csv')
Row 4: Couldn't convert: ['MSFT', '', '51.23']
Row 7: Couldn't convert: ['IBM', '', '70.44']
>>>
```
To do this, youll need to change just a few parts of your code.
```python
...
for rowno, row in enumerate(rows, start=1):
try:
...
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
```
### (d) Using the `zip()` function
In the file `portfolio.csv`, the first line contains column headers. In all previous code, weve been discarding them.
```pycon
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> headers
['name', 'shares', 'price']
>>>
```
However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
First try this to pair the file headers with a row of data:
```pycon
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>> list(zip(headers, row))
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
>>>
```
Notice how `zip()` paired the column headers with the column values.
Weve used `list()` here to turn the result into a list so that you
can see it. Normally, `zip()` creates an iterator that must be
consumed by a for-loop.
This pairing is just an intermediate step to building a dictionary. Now try this:
```pycon
>>> record = dict(zip(headers, row))
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
```
This transformation is one of the most useful tricks to know about
when processing a lot of data files. For example, suppose you wanted
to make the `pcost.py` program work with various input files, but
without regard for the actual column number where the name, shares,
and price appear.
Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
```python
# pcost.py
def portfolio_cost(filename):
...
for rowno, row in enumerate(rows, start=1):
record = dict(zip(headers, row))
try:
nshares = int(record['shares'])
price = float(record['price'])
total_cost += nshares * price
# This catches errors in int() and float() conversions above
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
...
```
Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
```csv
name,date,time,shares,price
"AA","6/11/2007","9:50am",100,32.20
"IBM","5/13/2007","4:20pm",50,91.10
"CAT","9/23/2006","1:30pm",150,83.44
"MSFT","5/17/2007","10:30am",200,51.23
"GE","2/1/2006","10:45am",95,40.37
"MSFT","10/31/2006","12:05pm",50,65.10
"IBM","7/9/2006","3:15pm",100,70.44
```
```python
>>> portfolio_cost('Data/portfoliodate.csv')
44671.15
>>>
```
If you did it right, youll find that your program still works even
though the data file has a completely different column format than
before. Thats cool!
The change made here is subtle, but significant. Instead of
`portfolio_cost()` being hardcoded to read a single fixed file format,
the new version reads any CSV file and picks the values of interest
out of it. As long as the file has the required columns, the code will work.
Modify the `report.py` program you wrote in Section 2.3 that it uses
the same technique to pick out column headers.
Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
produces the same answer as before.
### (e) Inverting a dictionary
A dictionary maps keys to values. For example, a dictionary of stock prices.
```pycon
>>> prices = {
'GOOG' : 490.1,
'AA' : 23.45,
'IBM' : 91.1,
'MSFT' : 34.23
}
>>>
```
If you use the `items()` method, you can get `(key,value)` pairs:
```pycon
>>> prices.items()
dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
>>>
```
However, what if you wanted to get a list of `(value, key)` pairs instead?
*Hint: use `zip()`.*
```pycon
>>> pricelist = list(zip(prices.values(),prices.keys()))
>>> pricelist
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
>>>
```
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
```pycon
>>> min(pricelist)
(23.45, 'AA')
>>> max(pricelist)
(490.1, 'GOOG')
>>> sorted(pricelist)
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
>>>
```
This also illustrates an important feature of tuples. When used in
comparisons, tuples are compared element-by-element starting with the
first item. Similar to how strings are compared
character-by-character.
`zip()` is often used in situations like this where you need to pair
up data from different places. For example, pairing up the column
names with column values in order to make a dictionary of named
values.
Note that `zip()` is not limited to pairs. For example, you can use it
with any number of input lists:
```pycon
>>> a = [1, 2, 3, 4]
>>> b = ['w', 'x', 'y', 'z']
>>> c = [0.2, 0.4, 0.6, 0.8]
>>> list(zip(a, b, c))
[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
>>>
```
Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
```pycon
>>> a = [1, 2, 3, 4, 5, 6]
>>> b = ['x', 'y', 'z']
>>> list(zip(a,b))
[(1, 'x'), (2, 'y'), (3, 'z')]
>>>
```
[Next](05_Collections)

View File

@@ -0,0 +1,160 @@
# 2.5 `collections` module
The `collections` module provides a number of useful objects for data handling.
This part briefly introduces some of these features.
### Example: Counting Things
Let's say you want to tabulate the total shares of each stock.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
```
There are two `IBM` entries and two `GOOG` entries in this list. The shares need to be combined together somehow.
Solution: Use a `Counter`.
```python
from collections import Counter
total_shares = Counter()
for name, shares, price in portfolio:
total_shares[name] += shares
total_shares['IBM'] # 150
```
### Example: One-Many Mappings
Problem: You want to map a key to multiple values.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
```
Like in the previous example, the key `IBM` should have two different tuples instead.
Solution: Use a `defaultdict`.
```python
from collections import defaultdict
holdings = defaultdict(list)
for name, shares, price in portfolio:
holdings[name].append((shares, price))
holdings['IBM'] # [ (50, 91.1), (100, 45.23) ]
```
The `defaultdict` ensures that every time you access a key you get a default value.
### Example: Keeping a History
Problem: We want a history of the last N things.
Solution: Use a `deque`.
```python
from collections import deque
history = deque(maxlen=N)
with open(filename) as f:
for line in f:
history.append(line)
...
```
## Exercises
The `collections` module might be one of the most useful library
modules for dealing with special purpose kinds of data handling
problems such as tabulating and indexing.
In this exercise, well look at a few simple examples. Start by
running your `report.py` program so that you have the portfolio of
stocks loaded in the interactive mode.
```bash
bash % python3 -i report.py
```
### (a) Tabulating with Counters
Suppose you wanted to tabulate the total number of shares of each stock.
This is easy using `Counter` objects. Try it:
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
```
Carefully observe how the multiple entries for `MSFT` and `IBM` in `portfolio` get combined into a single entry here.
You can use a Counter just like a dictionary to retrieve individual values:
```python
>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>
```
If you want to rank the values, do this:
```python
>>> # Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>
```
Lets grab another portfolio of stocks and make a new Counter:
```pycon
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
```
Finally, lets combine all of the holdings doing one simple operation:
```pycon
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>
```
This is only a small taste of what counters provide. However, if you
ever find yourself needing to tabulate values, you should consider
using one.
[Next](06_List_comprehension)

View File

@@ -0,0 +1,316 @@
# 2.6 List Comprehensions
A common task is processing items in a list. This section introduces list comprehensions,
a useful tool for doing just that.
### Creating new lists
A list comprehension creates a new list by applying an operation to each element of a sequence.
```pycon
>>> a = [1, 2, 3, 4, 5]
>>> b = [2*x for x in a ]
>>> b
[2, 4, 6, 8, 10]
>>>
```
Another example:
```pycon
>>> names = ['Elwood', 'Jake']
>>> a = [name.lower() for name in names]
>>> a
['elwood', 'jake']
>>>
```
The general syntax is: `[ <expression> for <variable_name> in <sequence> ]`.
### Filtering
You can also filter during the list comprehension.
```pycon
>>> a = [1, -5, 4, 2, -2, 10]
>>> b = [2*x for x in a if x > 0 ]
>>> b
[2, 8, 4, 20]
>>>
```
### Use cases
List comprehensions are hugely useful. For example, you can collect values of a specific
record field:
```python
stocknames = [s['name'] for s in stocks]
```
You can perform database-like queries on sequences.
```python
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]
```
You can also combine a list comprehension with a sequence reduction:
```python
cost = sum([s['shares']*s['price'] for s in stocks])
```
### General Syntax
```code
[ <expression> for <variable_name> in <sequence> if <condition>]
```
What it means:
```python
result = []
for variable_name in sequence:
if condition:
result.append(expression)
```
### Historical Digression
List comprehension come from math (set-builder notation).
```code
a = [ x * x for x in s if x > 0 ] # Python
a = { x^2 | x ∈ s, x > 0 } # Math
```
It is also implemented in several other languages. Most
coders probably aren't thinking about their math class though. So,
it's fine to view it as a cool list shortcut.
## Exercises
Start by running your `report.py` program so that you have the portfolio of stocks loaded in the interactive mode.
```bash
bash % python3 -i report.py
```
Now, at the Python interactive prompt, type statements to perform the operations described below.
These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.
### (a) List comprehensions
Try a few simple list comprehensions just to become familiar with the syntax.
```pycon
>>> nums = [1,2,3,4]
>>> squares = [ x * x for x in nums ]
>>> squares
[1, 4, 9, 16]
>>> twice = [ 2 * x for x in nums if x > 2 ]
>>> twice
[6, 8]
>>>
```
Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.
### (b) Sequence Reductions
Compute the total cost of the portfolio using a single Python statement.
```pycon
>>> cost = sum([ s['shares'] * s['price'] for s in portfolio ])
>>> cost
44671.15
>>>
```
After you have done that, show how you can compute the current value of the portfolio using a single statement.
```pycon
>>> value = sum([ s['shares'] * prices[s['name']] for s in portfolio ])
>>> value
28686.1
>>>
```
Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list.
```pycon
>>> [ s['shares'] * s['price'] for s in portfolio ]
[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0]
>>>
```
The `sum()` function is then performing a reduction across the result:
```python
>>> sum(_)
44671.15
>>>
```
With this knowledge, you are now ready to go launch a big-data startup company.
### (c) Data Queries
Try the following examples of various data queries.
First, a list of all portfolio holdings with more than 100 shares.
```pycon
>>> more100 = [ s for s in portfolio if s['shares'] > 100 ]
>>> more100
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>
```
All portfolio holdings for MSFT and IBM stocks.
```pycon
>>> msftibm = [ s for s in portfolio if s['name'] in {'MSFT','IBM'} ]
>>> msftibm
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200},
{'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>>
```
A list of all portfolio holdings that cost more than $10000.
```pycon
>>> cost10k = [ s for s in portfolio if s['shares'] * s['price'] > 10000 ]
>>> cost10k
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>
```
### (d) Data Extraction
Show how you could build a list of tuples `(name, shares)` where `name` and `shares` are taken from `portfolio`.
```pycon
>>> name_shares =[ (s['name'], s['shares']) for s in portfolio ]
>>> name_shares
[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)]
>>>
```
If you change the the square brackets (`[`,`]`) to curly braces (`{`, `}`), you get something known as a set comprehension.
This gives you unique or distinct values.
For example, this determines the set of stock names that appear in `portfolio`:
```pycon
>>> names = { s['name'] for s in portfolio }
>>> names
{ 'AA', 'GE', 'IBM', 'MSFT', 'CAT'] }
>>>
```
If you specify `key:value` pairs, you can build a dictionary.
For example, make a dictionary that maps the name of a stock to the total number of shares held.
```pycon
>>> holdings = { name: 0 for name in names }
>>> holdings
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
>>>
```
This latter feature is known as a **dictionary comprehension**. Lets tabulate:
```pycon
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
{ 'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT':250, 'CAT': 150 }
>>>
```
Try this example that filters the `prices` dictionary down to only those names that appear in the portfolio:
```pycon
>>> portfolio_prices = { name: prices[name] for name in names }
>>> portfolio_prices
{'AA': 9.22, 'GE': 13.48, 'IBM': 106.28, 'MSFT': 20.89, 'CAT': 35.46}
>>>
```
### (e) Advanced Bonus: Extracting Data From CSV Files
Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing.
Heres an example that shows how to extract selected columns from a CSV file.
First, read a row of header information from a CSV file:
```pycon
>>> import csv
>>> f = open('Data/portfoliodate.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> headers
['name', 'date', 'time', 'shares', 'price']
>>>
```
Next, define a variable that lists the columns that you actually care about:
```pycon
>>> select = ['name', 'shares', 'price']
>>>
```
Now, locate the indices of the above columns in the source CSV file:
```pycon
>>> indices = [ headers.index(colname) for colname in select ]
>>> indices
[0, 3, 4]
>>>
```
Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:
```pycon
>>> row = next(rows)
>>> record = { colname: row[index] for colname, index in zip(select, indices) } # dict-comprehension
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
```
If youre feeling comfortable with what just happened, read the rest
of the file:
```pycon
>>> portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ]
>>> portfolio
[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'},
{'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'},
{'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
Oh my, you just reduced much of the `read_portfolio()` function to a single statement.
### Commentary
List comprehensions are commonly used in Python as an efficient means
for transforming, filtering, or collecting data. Due to the syntax,
you dont want to go overboard—try to keep each list comprehension as
simple as possible. Its okay to break things into multiple
steps. For example, its not clear that you would want to spring that
last example on your unsuspecting co-workers.
That said, knowing how to quickly manipulate data is a skill thats
incredibly useful. There are numerous situations where you might have
to solve some kind of one-off problem involving data imports, exports,
extraction, and so forth. Becoming a guru master of list
comprehensions can substantially reduce the time spent devising a
solution. Also, don't forget about the `collections` module.
[Next](07_Objects)

View File

@@ -0,0 +1,408 @@
# 2.7 Objects
This section introduces more details about Python's internal object model and
discusses some matters related to memory management, copying, and type checking.
### Assignment
Many operations in Python are related to *assigning* or *storing* values.
```python
a = value # Assignment to a variable
s[n] = value # Assignment to an list
s.append(value) # Appending to a list
d['key'] = value # Adding to a dictionary
```
*A caution: assignment operations **never make a copy** of the value being assigned.*
All assignments are merely reference copies (or pointer copies if you prefer).
### Assignment example
Consider this code fragment.
```python
a = [1,2,3]
b = a
c = [a,b]
```
A picture of the underlying memory operations. In this example, there
is only one list object `[1,2,3]`, but there are four different
references to it.
This means that modifying a value affects *all* references.
```pycon
>>> a.append(999)
>>> a
[1,2,3,999]
>>> b
[1,2,3,999]
>>> c
[[1,2,3,999], [1,2,3,999]]
>>>
```
Notice how a change in the original list shows up everywhere else (yikes!).
This is because no copies were ever made. Everything is pointing to the same thing.
### Reassigning values
Reassigning a value *never* overwrites the memory used by the previous value.
```pycon
a = [1,2,3]
b = a
a = [4,5,6]
print(a) # [4, 5, 6]
print(b) # [1, 2, 3] Holds the original value
```
Remember: **Variables are names, not memory locations.**
### Some Dangers
If you don't know about this sharing, you will shoot yourself in the
foot at some point. Typical scenario. You modify some data thinking
that it's your own private copy and it accidentally corrupts some data
in some other part of the program.
*Comment: This is one of the reasons why the primitive datatypes (int, float, string) are immutable (read-only).*
### Identity and References
Use ths `is` operator to check if two values are exactly the same object.
```pycon
>>> a = [1,2,3]
>>> b = a
>>> a is b
True
>>>
```
`is` compares the object identity (an integer). The identity can be
obtained using `id()`.
```pycon
>>> id(a)
3588944
>>> id(b)
3588944
>>>
```
### Shallow copies
Lists and dicts have methods for copying.
```pycon
>>> a = [2,3,[100,101],4]
>>> b = list(a) # Make a copy
>>> a is b
False
```
It's a new list, but the list items are shared.
```python
>>> a[2].append(102)
>>> b[2]
[100,101,102]
>>>
>>> a[2] is b[2]
True
>>>
```
For example, the inner list `[100, 101]` is being shared.
This is knows as a shallow copy.
### Deep copies
Sometimes you need to make a copy of an object and all the objects contained withn it.
You can use the `copy` module for this:
```pycon
>>> a = [2,3,[100,101],4]
>>> import copy
>>> b = copy.deepcopy(a)
>>> a[2].append(102)
>>> b[2]
[100,101]
>>> a[2] is b[2]
False
>>>
```
### Names, Values, Types
Variable names do not have a *type*. It's only a name.
However, values *do* have an underlying type.
```pycon
>>> a = 42
>>> b = 'Hello World'
>>> type(a)
<type 'int'>
>>> type(b)
<type 'str'>
```
`type()` will tell you what it is. The type name is usually a function
that creates or converts a value to that type.
### Type Checking
How to tell if an object is a specific type.
```python
if isinstance(a,list):
print('a is a list')
```
Checking for one of many types.
```python
if isinstance(a, (list,tuple)):
print('a is a list or tuple')
```
*Caution: Don't go overboard with type checking. It can lead to excessive complexity.*
### Everything is an object
Numbers, strings, lists, functions, exceptions, classes, instances,
etc. are all objects. It means that all objects that can be named can
be passed around as data, placed in containers, etc., without any
restrictions. There are no *special* kinds of objects. Sometimes it
is said that all objects are "first-class".
A simple example:
```pycon
>>> import math
>>> items = [abs, math, ValueError ]
>>> items
[<built-in function abs>,
<module 'math' (builtin)>,
<type 'exceptions.ValueError'>]
>>> items[0](-45)
45
>>> items[1].sqrt(2)
1.4142135623730951
>>> try:
x = int('not a number')
except items[2]:
print('Failed!')
Failed!
>>>
```
Here, `items` is a list containing a function, a module and an exception.
You can use the items in the list in place of the original names:
```python
items[0](-45) # abs
items[1].sqrt(2) # math
except items[2]: # ValueError
```
## Exercises
In this set of exercises, we look at some of the power that comes from first-class
objects.
### (a) First-class Data
In the file `Data/portfolio.csv`, we read data organized as columns that look like this:
```csv
name,shares,price
"AA",100,32.20
"IBM",50,91.10
...
```
In previous code, we used the `csv` module to read the file, but still had to perform manual type conversions. For example:
```python
for row in rows:
name = row[0]
shares = int(row[1])
price = float(row[2])
```
This kind of conversion can also be performed in a more clever manner using some list basic operations.
Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:
```pycon
>>> types = [str, int, float]
>>>
```
The reason you can even create this list is that everything in Python
is *first-class*. So, if you want to have a list of functions, thats
fine. The items in the list you created are functions for converting
a value `x` into a given type (e.g., `str(x)`, `int(x)`, `float(x)`).
Now, read a row of data from the above file:
```pycon
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>>
```
As noted, this row isnt enough to do calculations because the types are wrong. For example:
```pycon
>>> row[1] * row[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>
```
However, maybe the data can be paired up with the types you specified in `types`. For example:
```pycon
>>> types[1]
<type 'int'>
>>> row[1]
'100'
>>>
```
Try converting one of the values:
```pycon
>>> types[1](row[1]) # Same as int(row[1])
100
>>>
```
Try converting a different value:
```pycon
>>> types[2](row[2]) # Same as float(row[2])
32.2
>>>
```
Try the calculation with converted values:
```pycon
>>> types[1](row[1])*types[2](row[2])
3220.0000000000005
>>>
```
Zip the column types with the fields and look at the result:
```pycon
>>> r = list(zip(types, row))
>>> r
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
>>>
```
You will notice that this has paired a type conversion with a
value. For example, `int` is paired with the value `'100'`.
The zipped list is useful if you want to perform conversions on all of the values, one
after the other. Try this:
```pycon
>>> converted = []
>>> for func, val in zip(types, row):
converted.append(func(val))
...
>>> converted
['AA', 100, 32.2]
>>> converted[1] * converted[2]
3220.0000000000005
>>>
```
Make sure you understand whats happening in the above code.
In the loop, the `func` variable is one of the type conversion functions (e.g.,
`str`, `int`, etc.) and the `val` variable is one of the values like
`'AA'`, `'100'`. The expression `func(val)` is converting a value (kind of like a type cast).
The above code can be compressed into a single list comprehension.
```pycon
>>> converted = [func(val) for func, val in zip(types, row)]
>>> converted
['AA', 100, 32.2]
>>>
```
### (b) Making dictionaries
Remember how the `dict()` function can easily make a dictionary if you have a sequence of key names and values?
Lets make a dictionary from the column headers:
```pycon
>>> headers
['name', 'shares', 'price']
>>> converted
['AA', 100, 32.2]
>>> dict(zip(headers, converted))
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
```
Of course, if youre up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:
```pycon
>>> { name: func(val) for name, func, val in zip(headers, types, row) }
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
```
### (c) The Big Picture
Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary.
Just to illustrate, suppose you read data from a different datafile like this:
```pycon
>>> f = open('Data/dowstocks.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> row = next(rows)
>>> headers
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
>>> row
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
>>>
```
Lets convert the fields using a similar trick:
```pycon
>>> types = [str, float, str, str, float, float, float, float, int]
>>> converted = [func(val) for func, val in zip(types, row)]
>>> record = dict(zip(headers, converted))
>>> record
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
'change': -0.18}
>>> record['name']
'AA'
>>> record['price']
39.48
>>>
```
Spend some time to ponder what youve done in this exercise. Well revisit these ideas a little later.

View File

@@ -0,0 +1,11 @@
# Overview
In this section you will learn:
* How to organize larger programs.
* Defining and working with functions.
* Exceptions and Error handling.
* Basic module management.
* Script writing.
Python is great for short scripts, one-off problems, prototyping, testing, etc.

View File

@@ -0,0 +1,275 @@
# 3.1 Python Scripting
In this part we look more closely at the practice of writing Python
scripts.
### What is a Script?
A *script* is a program that runs a series of statements and stops.
```python
# program.py
statement1
statement2
statement3
...
```
We have been writing scripts to this point.
### A Problem
If you write a useful script, it will grow in features and
functionality. You may want to apply it to other related problems.
Over time, it might become a critical application. And if you don't
take care, it might turn into a huge tangled mess. So, let's get
organized.
### Defining Things
You must always define things before they get used later on in a program.
```python
def square(x):
return x*x
a = 42
b = a + 2 # Requires that `a` is defined
z = square(b) # Requires `square` and `b` to be defined
```
**The order is important.**
You almost always put the definitions of variables an functions near the beginning.
### Defining Functions
It is a good idea to put all of the code related to a single *task* all in one place.
```python
def read_prices(filename):
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
A function also simplifies repeated operations.
```python
oldprices = read_prices('oldprices.csv')
newprices = read_prices('newprices.csv')
```
### What is a Function?
A function is a named sequence of statements.
```python
def funcname(args):
statement
statement
...
return result
```
*Any* Python statement can be used inside.
```python
def foo():
import math
print(math.sqrt(2))
help(math)
```
There are no *special* statements in Python.
### Function Definition
Functions can be *defined* in any order.
```python
def foo(x):
bar(x)
def bar(x):
statements
# OR
def bar(x)
statements
def foo(x):
bar(x)
```
Functions must only be defined before they are actually *used* (or called) during program execution.
```python
foo(3) # foo must be defined already
```
Stylistically, it is probably more common to see functions defined in a *bottom-up* fashion.
### Bottom-up Style
Functions are treated as building blocks.
The smaller/simpler blocks go first.
```python
# myprogram.py
def foo(x):
...
def bar(x):
...
foo(x) # Defined above
...
def spam(x):
...
bar(x) # Defined above
...
spam(42) # Code that uses the functions appears at the end
```
Later functions build upon earlier functions.
### Function Design
Ideally, functions should be a *black box*.
They should only operate on passed inputs and avoid global variables
and mysterious side-effects. Main goals: *Modularity* and *Predictability*.
### Doc Strings
A good practice is to include documentations in the form of
doc-strings. Doc-strings are strings written immediately after the
name of the function. They feed `help()`, IDEs and other tools.
```python
def read_prices(filename):
'''
Read prices from a CSV file of name,price
'''
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
### Type Annotations
You can also add some optional type annotations to your function definitions.
```python
def read_prices(filename: str) -> dict:
'''
Read prices from a CSV file of name,price
'''
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
These do nothing. It is purely informational.
They may be used by IDEs, code checkers, etc.
## Exercises
In section 2, you wrote a program called `report.py` that printed out a report showing the performance of a stock portfolio.
This program consisted of some functions. For example:
```python
# report.py
import csv
def read_portfolio(filename):
'''
Read a stock portfolio file into a list of dictionaries with keys
name, shares, and price.
'''
portfolio = []
with open(filename) as f:
rows = csv.reader(f)
headers = next(rows)
for row in rows:
record = dict(zip(headers, row))
stock = {
'name' : record['name'],
'shares' : int(record['shares']),
'price' : float(record['price'])
}
portfolio.append(stock)
return portfolio
...
```
However, there were also portions of the program that just performed a series of scripted calculations.
This code appeared near the end of the program. For example:
```python
...
# Output the report
headers = ('Name', 'Shares', 'Price', 'Change')
print('%10s %10s %10s %10s' % headers)
print(('-' * 10 + ' ') * len(headers))
for row in report:
print('%10s %10d %10.2f %10.2f' % row)
...
```
In this exercise, were going take this program and organize it a little more strongly around the use of functions.
### (a) Structuring a program as a collection of functions
Modify your `report.py` program so that all major operations,
including calculations and output, are carried out by a collection of
functions. Specifically:
* Create a function `print_report(report)` that prints out the report.
* Change the last part of the program so that it is nothing more than a series of function calls and no other computation.
### (b) Creating a function for program execution
Take the last part of your program and package it into a single function `portfolio_report(portfolio_filename, prices_filename)`.
Have the function work so that the following function call creates the report as before:
```python
portfolio_report('Data/portfolio.csv', 'Data/prices.csv')
```
In this final version, your program will be nothing more than a series
of function definitions followed by a single function call to
`portfolio_report()` at the very end (which executes all of the steps
involved in the program).
By turning your program into a single function, it becomes easy to run it on different inputs.
For example, try these statements interactively after running your program:
```python
>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv')
... look at the output ...
>>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv']
>>> for name in files:
print(f'{name:-^43s}')
portfolio_report(name, 'prices.csv')
print()
... look at the output ...
>>>
```
[Next](02_More_functions)

View File

@@ -0,0 +1,491 @@
# 3.2 More on Functions
This section fills in a few more details about how functions work and are defined.
### Calling a Function
Consider this function:
```python
def read_prices(filename, debug):
...
```
You can call the function with positional arguments:
```
prices = read_prices('prices.csv', True)
```
Or you can call the function with keyword arguments:
```python
prices = read_prices(filename='prices.csv', debug=True)
```
### Default Arguments
Sometimes you want an optional argument.
```python
def read_prices(filename, debug=False):
...
```
If a default value is assigned, the argument is optional in function calls.
```python
d = read_prices('prices.csv')
e = read_prices('prices.dat', True)
```
*Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).*
### Prefer keyword arguments for optional arguments
Compare and contrast these two different calling styles:
```python
parse_data(data, False, True) # ?????
parse_data(data, ignore_errors=True)
parse_data(data, debug=True)
parse_data(data, debug=True, ignore_errors=True)
```
Keyword arguments improve code clarity.
### Design Best Practices
Always give short, but meaningful names to functions arguments.
Someone using a function may want to use the keyword calling style.
```python
d = read_prices('prices.csv', debug=True)
```
Python development tools will show the names in help features and documentation.
### Return Values
The `return` statement returns a value
```python
def square(x):
return x * x
```
If no return value or `return` not specified, `None` is returned.
```python
def bar(x):
statements
return
a = bar(4) # a = None
# OR
def foo(x):
statements # No `return`
b = foo(4) # b = None
```
### Multiple Return Values
Functions can only return one value.
However, a function may return multiple values by returning a tuple.
```python
def divide(a,b):
q = a // b # Quotient
r = a % b # Remainder
return q, r # Return a tuple
```
Usage example:
```python
x, y = divide(37,5) # x = 7, y = 2
x = divide(37, 5) # x = (7, 2)
```
### Variable Scope
Programs assign values to variables.
```python
x = value # Global variable
def foo():
y = value # Local variable
```
Variables assignments occur outside and inside function definitions.
Variables defined outside are "global". Variables inside a function are "local".
### Local Variables
Variables inside functions are private.
```python
def read_portfolio(filename):
portfolio = []
for line in open(filename):
fields = line.split()
s = (fields[0],int(fields[1]),float(fields[2]))
portfolio.append(s)
return portfolio
```
In this example, `filename`, `portfolio`, `line`, `fields` and `s` are local variables.
Those variables are not retained or accessible after the function call.
```pycon
>>> stocks = read_portfolio('stocks.dat')
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>
```
They also can't conflict with variables found elsewhere.
### Global Variables
Functions can freely access the values of globals.
```python
name = 'Dave'
def greeting():
print('Hello', name) # Using `name` global variable
```
However, functions can't modify globals:
```python
name = 'Dave'
def spam():
name = 'Guido'
spam()
print(name) # prints 'Dave'
```
**Remember: All assignments in functions are local.**
### Modifying Globals
If you must modify a global variable you must declare it as such.
```python
name = 'Dave'
def spam():
global name
name = 'Guido' # Changes the global name above
```
The global declaration must appear before its use. Having seen this,
know that it is considered poor form. In fact, try to avoid entirely
if you can. If you need a function to modify some kind of state outside
of the function, it's better to use a class instead (more on this later).
### Argument Passing
When you call a function, the argument variables are names for passed values.
If mutable data types are passed (e.g. lists, dicts), they can be modified *in-place*.
```python
def foo(items):
items.append(42) # Modifies the input object
a = [1, 2, 3]
foo(a)
print(a) # [1, 2, 3, 42]
```
**Key point: Functions don't receive a copy of the input arguments.**
### Reassignment vs Modifying
Make sure you understand the subtle difference between modifying a value and reassigning a variable name.
```python
def foo(items):
items.append(42) # Modifies the input object
a = [1, 2, 3]
foo(a)
print(a) # [1, 2, 3, 42]
# VS
def bar(items):
items = [4,5,6] # Reassigns `items` variable
b = [1, 2, 3]
bar(b)
print(b) # [1, 2, 3]
```
*Reminder: Variable assignment never overwrites memory. The name is simply bound to a new value.*
## Exercises
This exercise involves a lot of steps and putting concepts together from past exercises.
The final solution is only about 25 lines of code, but take your time and make sure you understand each part.
A central part of your `report.py` program focuses on the reading of
CSV files. For example, the function `read_portfolio()` reads a file
containing rows of portfolio data and the function `read_prices()`
reads a file containing rows of price data. In both of those
functions, there are a lot of low-level "fiddly" bits and similar
features. For example, they both open a file and wrap it with the
`csv` module and they both convert various fields into new types.
If you were doing a lot of file parsing for real, youd probably want
to clean some of this up and make it more general purpose. That's
our goal.
Start this exercise by creating a new file called `fileparse.py`. This is where we will be doing our work.
### (a) Reading CSV Files
To start, lets just focus on the problem of reading a CSV file into a
list of dictionaries. In the file `fileparse.py`, define a simple
function that looks like this:
```python
# fileparse.py
import csv
def parse_csv(filename):
'''
Parse a CSV file into a list of records
'''
with open(filename) as f:
rows = csv.reader(f)
# Read the file headers
headers = next(rows)
records = []
for row in rows:
if not row: # Skip rows with no data
continue
record = dict(zip(headers, row))
records.append(record)
return records
```
This function reads a CSV file into a list of dictionaries while
hiding the details of opening the file, wrapping it with the `csv`
module, ignoring blank lines, and so forth.
Try it out:
Hint: `python3 -i fileparse.py`.
```pycon
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
This is great except that you cant do any kind of useful calculation with the data because everything is represented as a string.
Well fix this shortly, but lets keep building on it.
### (b) Building a Column Selector
In many cases, youre only interested in selected columns from a CSV file, not all of the data.
Modify the `parse_csv()` function so that it optionally allows user-specified columns to be picked out as follows:
```python
>>> # Read all of the data
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>> # Read some of the data
>>> shares_held = parse_csv('portfolio.csv', select=['name','shares'])
>>> shares_held
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
>>>
```
An example of a column selector was given in Section 2.5.
However, heres one way to do it:
```python
# fileparse.py
import csv
def parse_csv(filename, select=None):
'''
Parse a CSV file into a list of records
'''
with open(filename) as f:
rows = csv.reader(f)
# Read the file headers
headers = next(rows)
# If a column selector was given, find indices of the specified columns.
# Also narrow the set of headers used for resulting dictionaries
if select:
indices = [headers.index(colname) for colname in select]
headers = select
else:
indices = []
records = []
for row in rows:
if not row: # Skip rows with no data
continue
# Filter the row if specific columns were selected
if indices:
row = [ row[index] for index in indices ]
# Make a dictionary
record = dict(zip(headers, row))
records.append(record)
return records
```
There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices.
For example, suppose the input file had the following headers:
```pycon
>>> headers = ['name', 'date', 'time', 'shares', 'price']
>>>
```
Now, suppose the selected columns were as follows:
```pycon
>>> select = ['name', 'shares']
>>>
```
To perform the proper selection, you have to map the selected column names to column indices in the file.
Thats what this step is doing:
```pycon
>>> indices = [headers.index(colname) for colname in select ]
>>> indices
[0, 3]
>>>
```
In other words, "name" is column 0 and "shares" is column 3.
When you read a row of data from the file, the indices are used to filter it:
```pycon
>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
>>> row = [ row[index] for index in indices ]
>>> row
['AA', '100']
>>>
```
### (c) Performing Type Conversion
Modify the `parse_csv()` function so that it optionally allows type-conversions to be applied to the returned data.
For example:
```pycon
>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
>>> shares_held
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
>>>
```
You already explored this in Exercise 2.7. You'll need to insert the
following fragment of code into your solution:
```python
...
if types:
row = [func(val) for func, val in zip(types, row) ]
...
```
### (d) Working with Headers
Some CSV files dont include any header information.
For example, the file `prices.csv` looks like this:
```csv
"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
...
```
Modify the `parse_csv()` function so that it can work with such files by creating a list of tuples instead.
For example:
```python
>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
>>> prices
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
>>>
```
To make this change, youll need to modify the code so that the first
line of data isnt interpreted as a header line. Also, youll need to
make sure you dont create dictionaries as there are no longer any
column names to use for keys.
### (e) Picking a different column delimitier
Although CSV files are pretty common, its also possible that you could encounter a file that uses a different column separator such as a tab or space.
For example, the file `Data/portfolio.dat` looks like this:
```csv
name shares price
"AA" 100 32.20
"IBM" 50 91.10
"CAT" 150 83.44
"MSFT" 200 51.23
"GE" 95 40.37
"MSFT" 50 65.10
"IBM" 100 70.44
```
The `csv.reader()` function allows a different delimiter to be given as follows:
```python
rows = csv.reader(f, delimiter=' ')
```
Modify your `parse_csv()` function so that it also allows the delimiter to be changed.
For example:
```pycon
>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
If youve made it this far, youve created a nice library function thats genuinely useful.
You can use it to parse arbitrary CSV files, select out columns of
interest, perform type conversions, without having to worry too much
about the inner workings of files or the `csv` module.
Nice!
[Next](03_Error_checking)

View File

@@ -0,0 +1,393 @@
# 3.3 Error Checking
This section discusses some aspects of error checking and exception handling.
### How programs fail
Python performs no checking or validation of function argument types or values.
A function will work on any data that is compatible with the statements in the function.
```python
def add(x, y):
return x + y
add(3, 4) # 7
add('Hello', 'World') # 'HelloWorld'
add('3', '4') # '34'
```
If there are errors in a function, they will show up at run time (as an exception).
```python
def add(x, y):
return x + y
>>> add(3, '4')
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +:
'int' and 'str'
>>>
```
To verify code, there is a strong emphasis on testing (covered later).
### Exceptions
Exceptions are used to signal errors.
To raise an exception yourself, use `raise` statement.
```python
if name not in names:
raise RuntimeError('Name not found')
```
To catch an exception use `try-except`.
```python
try:
authenticate(username)
except RuntimeError as e:
print(e)
```
### Exception Handling
Exceptions propagate to the first matching `except`.
```python
def grok():
...
raise RuntimeError('Whoa!') # Exception raised here
def spam():
grok() # Call that will raise exception
def bar():
try:
spam()
except RuntimeError as e: # Exception caught here
...
def foo():
try:
bar()
except RuntimeError as e: # Exception does NOT arrive here
...
foo()
```
To handle the exception, use the `except` block. You can add any statements you want to handle the error.
```python
def grok(): ...
raise RuntimeError('Whoa!')
def bar():
try:
grok()
except RuntimeError as e: # Exception caught here
statements # Use this statements
statements
...
bar()
```
After handling, execution resumes with the first statement after the `try-except`.
```python
def grok(): ...
raise RuntimeError('Whoa!')
def bar():
try:
grok()
except RuntimeError as e: # Exception caught here
statements
statements
...
statements # Resumes execution here
statements # And continues here
...
bar()
```
### Built-in Exceptions
There are about two-dozen built-in exceptions.
This is not an exhaustive list. Check the documentation for more.
```python
ArithmeticError
AssertionError
EnvironmentError
EOFError
ImportError
IndexError
KeyboardInterrupt
KeyError
MemoryError
NameError
ReferenceError
RuntimeError
SyntaxError
SystemError
TypeError
ValueError
```
### Exception Values
Most exceptions have an associated value. It contains more information about what's wrong.
```python
raise RuntimeError('Invalid user name')
```
This value is passed to the variable supplied in `except`.
```python
try:
...
except RuntimeError as e: # `e` holds the value raised
...
```
The value is an instance of the exception type. However, it often looks like a string when
printed.
```python
except RuntimeError as e:
print('Failed : Reason', e)
```
### Catching Multiple Errors
You can catch different kinds of exceptions with multiple `except` blocks.
```python
try:
...
except LookupError as e:
...
except RuntimeError as e:
...
except IOError as e:
...
except KeyboardInterrupt as e:
...
```
Alternatively, if the block to handle them is the same, you can group them:
```python
try:
...
except (IOError,LookupError,RuntimeError) as e:
...
```
### Catching All Errors
To catch any exception, use `Exception` like this:
```python
try:
...
except Exception:
print('An error occurred')
```
In general, writing code like that is a bad idea because you'll have no idea
why it failed.
### Wrong Way to Catch Errors
Here is the wrong way to use exceptions.
```python
try:
go_do_something()
except Exception:
print('Computer says no')
```
This swallows all possible errors. It may make it impossible to debug
when the code is failing for some reason you didn't expect at all
(e.g. uninstalled Python module, etc.).
### Somewhat Better Approach
This is a more sane approach.
```python
try:
go_do_something()
except Exception as e:
print('Computer says no. Reason :', e)
```
It reports a specific reason for failure. It is almost always a good
idea to have some mechanism for viewing/reporting errors when you
write code that catches all possible exceptions.
In general though, it's better to catch the error more narrowly. Only
catch the errors you can actually deal with. Let other errors pass to
other code.
### Reraising an Exception
Use `raise` to propagate a caught error.
```python
try:
go_do_something()
except Exception as e:
print('Computer says no. Reason :', e)
raise
```
It allows you to take action (e.g. logging) and pass the error on to the caller.
### Exception Best Practices
Don't catch exceptions. Fail fast and loud. If it's important, someone
else will take care of the problem. Only catch an exception if you
are *that* someone. That is, only catch errors where you can recover
and sanely keep going.
### `finally` statement
It specifies code that must fun regardless of whether or not an exception occurs.
```python
lock = Lock()
...
lock.acquire()
try:
...
finally:
lock.release() # this will ALWAYS be executed. With and without exception.
```
Comonly used to properly manage resources (especially locks, files, etc.).
### `with` statement
In modern code, `try-finally` often replaced with the `with` statement.
```python
lock = Lock()
with lock:
# lock acquired
...
# lock released
```
A more familiar example:
```python
with open(filename) as f:
# Use the file
...
# File closed
```
It defines a usage *context* for a resource. When execution leaves that context,
resources are released. `with` only works with certain objects.
## Exercises
### (a) Raising exceptions
The `parse_csv()` function you wrote in the last section allows
user-specified columns to be selected, but that only works if the
input data file has column headers.
Modify the code so that an exception gets raised if both the `select`
and `has_headers=False` arguments are passed.
For example:
```python
>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fileparse.py", line 9, in parse_csv
raise RuntimeError("select argument requires column headers")
RuntimeError: select argument requires column headers
>>>
```
Having added this one check, you might ask if you should be performing
other kinds of sanity checks in the function. For example, should you
check that the filename is a string, that types is a list, or anything
of that nature?
As a general rule, its usually best to skip such tests and to just
let the program fail on bad inputs. The traceback message will point
at the source of the problem and can assist in debugging.
The main reason for adding the above check to avoid running the code
in a non-sensical mode (e.g., using a feature that requires column
headers, but simultaneously specifying that there are no headers).
This indicates a programming error on the part of the calling code.
### (b) Catching exceptions
The `parse_csv()` function you wrote is used to process the entire
contents of a file. However, in the real-world, its possible that
input files might have corrupted, missing, or dirty data. Try this
experiment:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fileparse.py", line 36, in parse_csv
row = [func(val) for func, val in zip(types, row)]
ValueError: invalid literal for int() with base 10: ''
>>>
```
Modify the `parse_csv()` function to catch all `ValueError` exceptions
generated during record creation and print a warning message for rows
that cant be converted.
The message should include the row number and information about the reason why it failed.
To test your function, try reading the file `Data/missing.csv` above.
For example:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Row 4: Couldn't convert ['MSFT', '', '51.23']
Row 4: Reason invalid literal for int() with base 10: ''
Row 7: Couldn't convert ['IBM', '', '70.44']
Row 7: Reason invalid literal for int() with base 10: ''
>>>
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>
```
### (c) Silencing Errors
Modify the `parse_csv()` function so that parsing error messages can be silenced if explicitly desired by the user.
For example:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True)
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>
```
Error handling is one of the most difficult things to get right in
most programs. As a general rule, you shouldnt silently ignore
errors. Instead, its better to report problems and to give the user
an option to the silence the error message if they choose to do so.
[Next](04_Modules)

View File

@@ -0,0 +1,317 @@
# 3.4 Modules
This section introduces the concept of modules.
### Modules and import
Any Python source file is a module.
```python
# foo.py
def grok(a):
...
def spam(b):
...
```
The `import` statement loads and *executes* a module.
```python
# program.py
import foo
a = foo.grok(2)
b = foo.spam('Hello')
...
```
### Namespaces
A module is a collection of named values and is sometimes said to be a *namespace*.
The names are all of the global variables and functions defined in the source file.
After importing, the module name is used as a prefix. Hence the *namespace*.
```python
import foo
a = foo.grok(2)
b = foo.spam('Hello')
...
```
The module name is tied to the file name (foo -> foo.py).
### Global Definitions
Everything defined in the *global* scope is what populates the module
namespace. `foo` in our previous example. Consider two modules
that define the same variable `x`.
```python
# foo.py
x = 42
def grok(a):
...
```
```python
# bar.py
x = 37
def spam(a):
...
```
In this case, the `x` definitions refer to different variables. One
is `foo.x` and the other is `bar.x`. Different modules can use the
same names and those names won't conflict with each other.
**Modules are isolated.**
### Modules as Environments
Modules form an enclosing environment for all of the code defined inside.
```python
# foo.py
x = 42
def grok(a):
print(x)
```
*Global* variables are always bound to the enclosing module (same file).
Each source file is its own little universe.
### Module Execution
When a module is imported, *all of the statements in the module
execute* one after another until the end of the file is reached. The
contents of the module namespace are all of the *global* names that
are still defined at the end of the execution process. If there are
scripting statements that carry out tasks in the global scope
(printing, creating files, etc.) you will see them run on import.
### `import as` statement
You can change the name of a module as you import it:
```python
import math as m
def rectangular(r, theta):
x = r * m.cos(theta)
y = r * m.sin(theta)
return x, y
```
It works the same as a normal import. It just renames the module in that one file.
### `from` module import
This picks selected symbols out of a module and makes them available locally.
```python
from math import sin, cos
def rectangular(r, theta):
x = r * cos(theta)
y = r * sin(theta)
return x, y
```
It allows parts of a module to be used without having to type the module prefix.
Useful for frequently used names.
### Comments on importing
Variations on import do *not* change the way that modules work.
```python
import math as m
# vs
from math import cos, sin
...
```
Specifically, `import` always executes the *entire* file and modules
are still isolated environments.
The `import module as` statement is only manipulating the names.
### Module Loading
Each module loads and executes only *once*.
*Note: Repeated imports just return a reference to the previously loaded module.*
`sys.modules` is a dict of all loaded modules.
```python
>>> import sys
>>> sys.modules.keys()
['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...]
>>>
```
### Locating Modules
Python consults a path list (sys.path) when looking for modules.
```python
>>> import sys
>>> sys.path
[
'',
'/usr/local/lib/python36/python36.zip',
'/usr/local/lib/python36',
...
]
```
Current working directory is usually first.
### Module Search Path
`sys.path` contains the search paths.
You can manually adjust if you need to.
```python
import sys
sys.path.append('/project/foo/pyfiles')
```
Paths are also added via environment variables.
```python
% env PYTHONPATH=/project/foo/pyfiles python3
Python 3.6.0 (default, Feb 3 2017, 05:53:21)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)]
>>> import sys
>>> sys.path
['','/project/foo/pyfiles', ...]
```
## Exercises
For this exercise involving modules, it is critically important to
make sure you are running Python in a proper environment. Modules
are usually when programmers encounter problems with the current working
directory or with Python's path settings.
### (a) Module imports
In section 3, we created a general purpose function `parse_csv()` for parsing the contents of CSV datafiles.
Now, were going to see how to use that function in other programs.
First, start in a new shell window. Navigate to the folder where you
have all your files. We are going to import them.
Start Python interactive mode.
```shell
bash % python3
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
```
Once youve done that, try importing some of the programs you
previously wrote. You should see their output exactly as before.
Just emphasize, importing a module runs its code.
```python
>>> import bounce
... watch output ...
>>> import mortgage
... watch output ...
>>> import report
... watch output ...
>>>
```
If none of this works, youre probably running Python in the wrong directory.
Now, try importing your `fileparse` module and getting some help on it.
```python
>>> import fileparse
>>> help(fileparse)
... look at the output ...
>>> dir(fileparse)
... look at the output ...
>>>
```
Try using the module to read some data:
```python
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False)
>>> pricelist
... look at the output ...
>>> prices = dict(pricelist)
>>> prices
... look at the output ...
>>> prices['IBM']
106.11
>>>
```
Try importing a function so that you dont need to include the module name:
```python
>>> from fileparse import parse_csv
>>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>>
```
### (b) Using your library module
In section 2, you wrote a program `report.py` that produced a stock report like this:
```shell
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
```
Take that program and modify it so that all of the input file
processing is done using functions in your `fileparse` module. To do
that, import `fileparse` as a module and change the `read_portfolio()`
and `read_prices()` functions to use the `parse_csv()` function.
Use the interactive example at the start of this exercise as a guide.
Afterwards, you should get exactly the same output as before.
### (c) Using more library imports
In section 1, you wrote a program `pcost.py` that read a portfolio and computed its cost.
```python
>>> import pcost
>>> pcost.portfolio_cost('Data/portfolio.csv')
44671.15
>>>
```
Modify the `pcost.py` file so that it uses the `report.read_portfolio()` function.
### Commentary
When you are done with this exercise, you should have three
programs. `fileparse.py` which contains a general purpose
`parse_csv()` function. `report.py` which produces a nice report, but
also contains `read_portfolio()` and `read_prices()` functions. And
finally, `pcost.py` which computes the portfolio cost, but makes use
of the code written for the `report.py` program.
[Next](05_Main_module)

View File

@@ -0,0 +1,299 @@
# 3.5 Main Module
This section introduces the concept of a main program or main module.
### Main Functions
In many programming languages, there is a concept of a *main* function or method.
```c
// c / c++
int main(int argc, char *argv[]) {
...
}
```
```java
// java
class myprog {
public static void main(String args[]) {
...
}
}
```
This is the first function that is being executing when an application is launched.
### Python Main Module
Python has no *main* function or method. Instead, there is a *main*
module. The *main module* is the source file that runs first.
```bash
bash % python3 prog.py
...
```
Whatever module you give to the interpreter at startup becomes *main*. It doesn't matter the name.
### `__main__` check
It is standard practice for modules that can run as a main script to use this convention:
```python
# prog.py
...
if __name__ == '__main__':
# Running as the main program ...
statements
...
```
Statements inclosed inside the `if` statement become the *main* program.
### Main programs vs. library imports
Any file can either run as main or as a library import:
```bash
bash % python3 prog.py # Running as main
```
```python
import prog
```
In both cases, `__name__` is the name of the module. However, it will only be set to `__main__` if
running as main.
As a general rule, you don't want statements that are part of the main
program to execute on a library import. So, it's common to have an `if-`check in code
that might be used either way.
```python
if __name__ == '__main__':
# Does not execute if loaded with import ...
```
### Program Template
Here is a common program template for writing a Python program:
```python
# prog.py
# Import statements (libraries)
import modules
# Functions
def spam():
...
def blah():
...
# Main function
def main():
...
if __name__ == '__main__':
main()
```
### Command Line Tools
Python is often used for command-line tools
```bash
bash % python3 report.py portfolio.csv prices.csv
```
It means that the scripts are executed from the shell /
terminal. Common use cases are for automation, background tasks, etc.
### Command Line Args
The command line is a list of text strings.
```bash
bash % python3 report.py portfolio.csv prices.csv
```
This list of text strings is found in `sys.argv`.
```python
# In the previous bash command
sys.argv # ['report.py, 'portfolio.csv', 'prices.csv']
```
Here is a simple example of processing the arguments:
```python
import sys
if len(sys.argv) != 3:
raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile')
portfile = sys.argv[1]
pricefile = sys.argv[2]
...
```
### Standard I/O
Standard Input / Output (or stdio) are files that work the same as normal files.
```python
sys.stdout
sys.stderr
sys.stdin
```
By default, print is directed to `sys.stdout`. Input is read from
`sys.stdin`. Tracebacks and errors are directed to `sys.stderr`.
Be aware that *stdio* could be connected to terminals, files, pipes, etc.
```bash
bash % python3 prog.py > results.txt
# or
bash % cmd1 | python3 prog.py | cmd2
```
### Environment Variables
Environment variables are set in the shell.
```bash
bash % setenv NAME dave
bash % setenv RSH ssh
bash % python3 prog.py
```
`os.environ` is a dictionary that contains these values.
```python
import os
name = os.environ['NAME'] # 'dave'
```
Changes are reflected in any subprocesses later launched by the program.
### Program Exit
Program exit is handled through exceptions.
```python
raise SystemExit
raise SystemExit(exitcode)
raise SystemExit('Informative message')
```
An alternative.
```python
import sys
sys.exit(exitcode)
```
A non-zero exit code indicates an error.
### The `#!` line
On Unix, the `#!` line can launch a script as Python.
Add the following to the first line of your script file.
```python
#!/usr/bin/env python3
# prog.py
...
```
It requires the executable permission.
```bash
bash % chmod +x prog.py
# Then you can execute
bash % prog.py
... output ...
```
*Note: The Python Launcher on Windows also looks for the `#!` line to indicate language version.*
### Script Template
Here is a common code template for Python programs that run as command-line scripts:
```python
#!/usr/bin/env python3
# prog.py
# Import statements (libraries)
import modules
# Functions
def spam():
...
def blah():
...
# Main function
def main(argv):
# Parse command line args, environment, etc.
...
if __name__ == '__main__':
import sys
main(sys.argv)
```
## Exercises
### (a) `main()` functions
In the file `report.py` add a `main()` function that accepts a list of command line options and produces the same output as before.
You should be able to run it interatively like this:
```python
>>> import report
>>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv'])
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
>>>
```
Modify the `pcost.py` file so that it has a similar `main()` function:
```python
>>> import pcost
>>> pcost.main(['pcost.py', 'Data/portfolio.csv'])
Total cost: 44671.15
>>>
```
### (b) Making Scripts
Modify the `report.py` and `pcost.py` programs so that they can execute as a script on the command line:
```bash
bash $ python3 report.py Data/portfolio.csv Data/prices.csv
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
bash $ python3 pcost.py Data/portfolio.csv
Total cost: 44671.15
```

View File

@@ -0,0 +1,132 @@
# 3.6 Design Discussion
In this section we consider some design decisions made in code so far.
### Filenames versus Iterables
Compare these two programs that return the same output.
```python
# Provide a filename
def read_data(filename):
records = []
with open(filename) as f:
for line in f:
...
records.append(r)
return records
d = read_data('file.csv')
```
```python
# Provide lines
def read_data(lines):
records = []
for line in lines:
...
records.append(r)
return records
with open('file.csv') as f:
d = read_data(f)
```
* Which of these functions do you prefer? Why?
* Which of these functions is more flexible?
### Deep Idea: "Duck Typing"
[Duck Typing](https://en.wikipedia.org/wiki/Duck_typing) is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the [duck test](https://en.wikipedia.org/wiki/Duck_test).
> If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In our previous example that reads the lines, our `read_data` expects
any iterable object. Not just the lines of a file.
```python
def read_data(lines):
records = []
for line in lines:
...
records.append(r)
return records
```
This means that we can use it with other *lines*.
```python
# A CSV file
lines = open('data.csv')
data = read_data(lines)
# A zipped file
lines = gzip.open('data.csv.gz','rt')
data = read_data(lines)
# The Standard Input
lines = sys.stdin
data = read_data(lines)
# A list of strings
lines = ['ACME,50,91.1','IBM,75,123.45', ... ]
data = read_data(lines)
```
There is considerable flexibility with this design.
*Question: Shall we embrace or fight this flexibility?*
### Library Design Best Practices
Code libraries are often better served by embracing flexibility.
Don't restrict your options. With great flexibility comes great power.
## Exercise
### (a)From filenames to file-like objects
In this section, you worked on a file `fileparse.py` that contained a
function `parse_csv()`. The function worked like this:
```pycon
>>> import fileparse
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>>
```
Right now, the function expects to be passed a filename. However, you
can make the code more flexible. Modify the function so that it works
with any file-like/iterable object. For example:
```
>>> import fileparse
>>> import gzip
>>> with gzip.open('Data/portfolio.csv.gz', 'rt') as f:
... port = fileparse.parse_csv(f, types=[str,int,float])
...
>>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1']
>>> port = fileparse.parse_csv(lines, types=[str,int,float])
>>>
```
In this new code, what happens if you pass a filename as before?
```
>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>> port
... look at output (it should be crazy) ...
>>>
```
With flexibility comes power and with power comes responsibility. Sometimes you'll
need to be careful.
### (b) Fixing existing functions
Fix the `read_portfolio()` and `read_prices()` functions in the
`report.py` file so that they work with the modified version of
`parse_csv()`. This should only involve a minor modification.
Afterwards, your `report.py` and `pcost.py` programs should work
the same way they always did.

View File

@@ -0,0 +1,35 @@
# Overview
## Object Oriented (OO) programming
A Programming technique where code is organized as a collection of *objects*.
An *object* consists of:
* Data. Attributes
* Behavior. Methods, functions applied to the object.
You have already been using some OO during this course.
For example with Lists.
```python
>>> nums = [1, 2, 3]
>>> nums.append(4) # Method
>>> nums.insert(1,10) # Method
>>> nums
[1, 10, 2, 3, 4] # Data
>>>
```
`nums` is an *instance* of a list.
Methods (`append` and `insert`) are attached to the instance (`nums`).
## Summary
This will be a high-level overview of classes.
Most code involving classes will involve the topics covered in this section.
If you're merely using existing libraries, the code is typically fairly simple.

View File

@@ -0,0 +1,253 @@
# 4.1 Classes
### The `class` statement
Use the `class` statement to define a new object.
```python
class Player(object):
def __init__(self, x, y):
self.x = x
self.y = y
self.health = 100
def move(self, dx, dy):
self.dx += dx
self.dy += dy
def damage(self, pts):
self.health -= pts
```
In a nutshell, a class is a set of functions that carry out various operations on so-called *instances*.
### Instances
Instances are the actual *objects* that you manipulate in your program.
They are created by calling the class as a function.
```python
>>> a = Player(2, 3)
>>> b = Player(10, 20)
>>>
```
`a` anb `b` are instances of `Player`.
*Emphasize: The class statement is just the definition (it does nothing by itself). Similar to a function definition.*
### Instance Data
Each instance has its own local data.
```python
>>> a.x
2
>>> b.x
10
```
This data is initialized by the `__init__()`.
```python
class Player(object):
def __init__(self, x, y):
# Any value stored on `self` is instance data
self.x = x
self.y = y
self.health = 100
```
There are no restrictions on the total number or type of attributes stored.
### Instance Methods
Instance methods are functions applied to instances of an object.
```python
class Player(object):
...
# `move` is a method
def move(self, dx, dy):
self.x += dx
self.y += dy
```
The object itself is always passed as first argument.
```python
>>> a.move(1, 2)
# matches `a` to `self`
# matches `1` to `dx`
# matches `2` to `dy`
def move(self, dx, dy):
```
By convention, the instance is called `self`. However, the actual name
used is unimportant. The object is always passed as the first
argument. It is simply Python programming style to call this argument
`self`.
### Class Scoping
Classes do not define a scope.
```python
class Player(object):
...
def move(self, dx, dy):
self.x += dx
self.y += dy
def left(self, amt):
move(-amt, 0) # NO. Calls a global `move` function
self.move(-amt, 0) # YES. Calls method `move` from above.
```
If you want to operate on an instance, you always have to refer too it explicitly (e.g., `self`).
## Exercises
### (a) Objects as Data Structures
In section 2 and 3, we worked with data represented as tuples and dictionaries.
For example, a holding of stock could be represented as a tuple like this:
```python
s = ('GOOG',100,490.10)
```
or as a dictionary like this:
```python
s = { 'name' : 'GOOG',
'shares' : 100,
'price' : 490.10
}
```
You can even write functions for manipulating such data. For example:
```python
def cost(s):
return s['shares'] * s['price']
```
However, as your program gets large, you might want to create a better sense of organization.
Thus, another approach for representing data would be to define a class.
Create a file called `stock.py` and define a class `Stock` that represents a single holding of stock.
Have the instances of `Stock` have `name`, `shares`, and `price` attributes.
```python
>>> import stock
>>> s = stock.Stock('GOOG',100,490.10)
>>> s.name
'GOOG'
>>> s.shares
100
>>> s.price
490.1
>>>
```
Create a few more `Stock` objects and manipulate them. For example:
```python
>>> a = stock.Stock('AAPL',50,122.34)
>>> b = stock.Stock('IBM',75,91.75)
>>> a.shares * a.price
6117.0
>>> b.shares * b.price
6881.25
>>> stocks = [a,b,s]
>>> stocks
[<stock.Stock object at 0x37d0b0>, <stock.Stock object at 0x37d110>, <stock.Stock object at 0x37d050>]
>>> for t in stocks:
print(f'{t.name:>10s} {t.shares:>10d} {t.price:>10.2f}')
... look at the output ...
>>>
```
One thing to emphasize here is that the class `Stock` acts like a factory for creating instances of objects.
Basically, you just call it as a function and it creates a new object for you.
Also, it needs to be emphasized that each object is distinct---they
each have their own data that is separate from other objects that have
been created. An object defined by a class is somewhat similar to a
dictionary, just with somewhat different syntax.
For example, instead of writing `s['name']` or `s['price']`, you now
write `s.name` and `s.price`.
### (b) Reading Data into a List of Objects
In your `stock.py` program, write a function
`read_portfolio(filename)` that reads portfolio data from a file into
a list of `Stock` objects. This function is going to mimic the
behavior of earlier code you have written. Heres how your function
will behave:
```python
>>> import stock
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
>>> portfolio
[<stock.Stock object at 0x81d70>, <stock.Stock object at 0x81cf0>, <stock.Stock object at 0x81db0>,
<stock.Stock object at 0x81df0>, <stock.Stock object at 0x81e30>, <stock.Stock object at 0x81e70>,
<stock.Stock object at 0x81eb0>]
>>>
```
It is important to emphasize that `read_portfolio()` is a top-level function, not a method of the `Stock` class.
This function is merely creating a list of `Stock` objects; its not an operation on an individual `Stock` instance.
Try performing some calculations with the above data. First, try printing a formatted table:
```python
>>> for s in portfolio:
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
... look at the output ...
>>>
```
Try a list comprehension:
```python
>>> more100 = [s for s in portfolio if s.shares > 100]
>>> for s in more100:
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
... look at the output ...
>>>
```
Again, notice the similarity between `Stock` objects and dictionaries. Theyre basically the same idea, but the syntax for accessing values differs.
### (c) Adding some Methods
With classes, you can attach functions to your objects. These are
known as methods and are functions that operate on the data stored
inside an object.
Add a `cost()` and `sell()` method to your `Stock` object. They should
work like this:
```python
>>> import stock
>>> s = stock.Stock('GOOG',100,490.10)
>>> s.cost()
49010.0
>>> s.shares
100
>>> s.sell(25)
>>> s.shares
75
>>> s.cost()
36757.5
>>>
```
[Next](02_Inheritance)

View File

@@ -0,0 +1,502 @@
# 4.2 Inheritance
Inheritance is a commonly used tool for writing extensible programs. This section explores that idea.
### Introduction
Inheritance is used to specialize existing objects:
```python
class Parent:
...
class Child(Parent): # Check how `Parent` is between the parenthesis
...
```
The new class `Child` is called a derived class or subclass.
The `Parent` class is known as base class or superclass.
`Parent` is specified in `()` after the class name, `class Child(Parent):`.
### Extending
With inheritance, you are taking an existing class and:
* Adding new methods
* Redefining some of the existing methods
* Adding new attributes to instances
In the end you are **extending existing code**.
### Example
Suppose that this is your starting class:
```python
class Stock(object):
def __init__(self, name, shares, price):
self.name = name
self.shares = shares
self.price = price
def cost(self):
return self.shares * self.price
def sell(self, nshares):
self.shares -= nshares
```
You can change any part of this via inheritance.
### Add a new method
```python
class MyStock(Stock):
def panic(self):
self.sell(self.shares)
```
Usage example.
```python
>>> s = MyStock('GOOG', 100, 490.1)
>>> s.sell(25)
>>> s.shares 75
>>> s.panic()
>>> s.shares 0
>>>
```
### Redefining an existing method
```python
class MyStock(Stock):
def cost(self):
return 1.25 * self.shares * self.price
```
Usage example.
```python
>>> s = MyStock('GOOG', 100, 490.1)
>>> s.cost()
61262.5
>>>
```
The new method takes the place of the old one. The other methods are unaffected.
## Overriding
Sometimes a class extends an existing method, but it wants to use the original implementation.
For this, use `super()`:
```python
class Stock(object):
...
def cost(self):
return self.shares * self.price
...
class MyStock(Stock):
def cost(self):
# Check the call to `super`
actual_cost = super().cost()
return 1.25 * actual_cost
```
Use `super()` to call the previous version.
*Caution: Python 2 is different.*
```python
actual_cost = super(MyStock, self).cost()
```
### `__init__` and inheritance
If `__init__` is redefined, it is mandatory to initialize the parent.
```python
class Stock(object):
def __init__(self, name, shares, price):
self.name = name
self.shares = shares
self.price = price
class MyStock(Stock):
def __init__(self, name, shares, price, factor):
# Check the call to `super` and `__init__`
super().__init__(name, shares, price)
self.factor = factor
def cost(self):
return self.factor * super().cost()
```
You should call the `init` on the `super` which is the way to call the previous version as shown previously.
### Using Inheritance
Inheritance is sometimes used to organize related objects.
```python
class Shape(object):
...
class Circle(Shape):
...
class Rectangle(Shape):
...
```
Think of a logical hierarchy or taxonomy. However, a more common usage is
related to making reusable or extensible code:
```python
class CustomHandler(TCPHandler):
def handle_request(self):
...
# Custom processing
```
The base class contains some general purpose code.
Your class inherits and customized specific parts. Maybe it plugs into a framework.
### "is a" relationship
Inheritance establishes a type relationship.
```python
class Shape(object):
...
class Circle(Shape):
...
```
Check for object instance.
```python
>>> c = Circle(4.0)
>>> isinstance(c, Shape)
True
>>>
```
*Important: Code that works with the parent is also supposed to work with the child.*
### `object` base class
If a class has no parent, you sometimes see `object` used as the base.
```python
class Shape(object):
...
```
`object` is the parent of all objects in Python.
*Note: it's not technically required in Python 3. If omitted in Python 2, it results in an "old style class" which should be avoided.*
### Multiple Inheritance
You can inherit from multiple classes by specifying them in the definition of the class.
```python
class Mother(object):
...
class Father(object):
...
class Child(Mother, Father):
...
```
The class `Child` inherits features from both parents. There are some rather tricky details. Don't do it unless you know what you are doing.
We're not going to explore multiple inheritance further in this course.
## Exercises
### (a) Print Portfolio
A major use of inheritance is in writing code thats meant to be extended or customized in various ways—especially in libraries or frameworks.
To illustrate, start by adding the following function to your `stock.py` program:
```python
# stock.py
...
def print_portfolio(portfolio):
'''
Make a nicely formatted table showing portfolio contents.
'''
headers = ('Name','Shares','Price')
for h in headers:
print(f'{h:>10s}',end=' ')
print()
print(('-'*10 + ' ')*len(headers))
for s in portfolio:
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
```
Add a little testing section to the bottom of your `stock.py` file that runs the above function:
```python
if __name__ == '__main__':
portfolio = read_portfolio('Data/portfolio.csv')
print_portfolio(portfolio)
```
When you run your `stock.py`, you should get this output:
```bash
Name Shares Price
---------- ---------- ----------
AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
```
### (b) An Extensibility Problem
Suppose that you wanted to modify the `print_portfolio()` function to
support a variety of different output formats such as plain-text,
HTML, CSV, or XML. To do this, you could try to write one gigantic
function that did everything. However, doing so would likely lead to
an unmaintainable mess. Instead, this is a perfect opportunity to use
inheritance instead.
To start, focus on the steps that are involved in a creating a
table. At the top of the table is a set of table headers. After that,
rows of table data appear. Lets take those steps and and put them into their own class.
Create a file called `tableformat.py` and define the following class:
```python
# tableformat.py
class TableFormatter(object):
def headings(self, headers):
'''
Emit the table headings.
'''
raise NotImplementedError()
def row(self, rowdata):
'''
Emit a single row of table data.
'''
raise NotImplementedError()
```
This class does nothing, but it serves as a kind of design specification for additional classes that will be defined shortly.
Modify the `print_portfolio()` function so that it accepts a `TableFormatter` object as input and invokes methods on it to produce the output.
For example, like this:
```python
# stock.py
...
def print_portfolio(portfolio, formatter):
'''
Make a nicely formatted table showing portfolio contents.
'''
formatter.headings(['Name', 'Shares', 'Price'])
for s in portfolio:
# Form a row of output data (as strings)
rowdata = [s.name, str(s.shares), f'{s.price:0.2f}' ]
formatter.row(rowdata)
```
Finally, try your new class by modifying the main program like this:
```python
# stock.py
...
if __name__ == '__main__':
from tableformat import TableFormatter
portfolio = read_portfolio('Data/portfolio.csv')
formatter = TableFormatter()
print_portfolio(portfolio, formatter)
```
When you run this new code, your program will immediately crash with a `NotImplementedError` exception.
Thats not too exciting, but continue to the next part.
### (c) Using Inheritance to Produce Different Output
The `TableFormatter` class you defined in part (a) is meant to be extended via inheritance.
In fact, thats the whole idea. To illustrate, define a class `TextTableFormatter` like this:
```python
# tableformat.py
...
class TextTableFormatter(TableFormatter):
'''
Emit a table in plain-text format
'''
def headings(self, headers):
for h in headers:
print(f'{h:>10s}', end=' ')
print()
print(('-'*10 + ' ')*len(headers))
def row(self, rowdata):
for d in rowdata:
print(f'{d:>10s}', end=' ')
print()
```
Modify your main program in `stock.py` like this and try it:
```python
# stock.py
...
if __name__ == '__main__':
from tableformat import TextTableFormatter
portfolio = read_portfolio('Data/portfolio.csv')
formatter = TextTableFormatter()
print_portfolio(portfolio, formatter)
```
This should produce the same output as before:
```bash
Name Shares Price
---------- ---------- ----------
AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
```
However, lets change the output to something else. Define a new class `CSVTableFormatter` that produces output in CSV format:
```python
# tableformat.py
...
class CSVTableFormatter(TableFormatter):
'''
Output portfolio data in CSV format.
'''
def headings(self, headers):
print(','.join(headers))
def row(self, rowdata):
print(','.join(rowdata))
```
Modify your main program as follows:
```python
# stock.py
...
if __name__ == '__main__':
from tableformat import CSVTableFormatter
portfolio = read_portfolio('Data/portfolio.csv')
formatter = CSVTableFormatter()
print_portfolio(portfolio, formatter)
```
You should now see CSV output like this:
```csv
Name,Shares,Price
AA,100,32.20
IBM,50,91.10
CAT,150,83.44
MSFT,200,51.23
GE,95,40.37
MSFT,50,65.10
IBM,100,70.44
```
Using a similar idea, define a class `HTMLTableFormatter` that produces a table with the following output:
```html
<tr> <th>Name</th> <th>Shares</th> <th>Price</th> </tr>
<tr> <td>AA</td> <td>100</td> <td>32.20</td> </tr>
<tr> <td>IBM</td> <td>50</td> <td>91.10</td> </tr>
```
Test your code by modifying the main program to create a `HTMLTableFormatter` object instead of a `CSVTableFormatter` object.
### (d) Polymorphism in Action
A major feature of object-oriented programming is that you can plug an
object into a program and it will work without having to change any of
the existing code. For example, if you wrote a program that expected
to use a `TableFormatter` object, it would work no matter what kind of
`TableFormatter` you actually gave it.
This behavior is sometimes referred to as *polymorphism*.
One potential problem is making it easier for the user to pick the formatter that they want.
This can sometimes be fixed by defining a helper function.
In the `tableformat.py` file, add a function `create_formatter(name)`
that allows a user to create a formatter given an output name such as
`'txt'`, `'csv'`, or `'html'`.
For example:
```python
# stock.py
...
if __name__ == '__main__':
from tableformat import create_formatter
portfolio = read_portfolio('Data/portfolio.csv')
formatter = create_formatter('csv')
print_portfolio(portfolio, formatter)
```
When you run this program, youll see output such as this:
```csv
Name,Shares,Price
AA,100,32.20
IBM,50,91.10
CAT,150,83.44
MSFT,200,51.23
GE,95,40.37
MSFT,50,65.10
IBM,100,70.44
```
Try changing the format to `'txt'` and `'html'` just to make sure your
code is working correctly. If the user provides a bad output format
to the `create_formatter()` function, have it raise a `RuntimeError`
exception. For example:
```python
>>> from tableformat import create_formatter
>>> formatter = create_formatter('xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tableformat.py", line 68, in create_formatter
raise RuntimeError('Unknown table format %s' % name)
RuntimeError: Unknown table format xls
>>>
```
Writing extensible code is one of the most common uses of inheritance in libraries and frameworks.
For example, a framework might instruct you to define your own object that inherits from a provided base class.
Youre then told to fill in various methods that implement various bits of functionality.
That said, designing object oriented programs can be extremely
difficult. For more information, you should probably look for books on
the topic of design patterns.
That said, understanding what happened in this exercise will take you
pretty far in terms of using most library modules and knowing
what inheritance is good for (extensibility).
[Next](03_Special_methods)

View File

@@ -0,0 +1,332 @@
# 4.3 Special Methods
Various parts of Python's behavior can be customized via special or magic methods.
This section introduces that idea.
### Introduction
Classes may define special methods. These have special meaning to the Python interpreter.
They are always preceded and followed by `__`. For example `__init__`.
```python
class Stock(object):
def __init__(self):
...
def __repr__(self):
...
```
There are dozens of special methods, but we will only look at a few specific examples.
### Special methods for String Conversions
Objects have two string representations.
```python
>>> from datetime import date
>>> d = date(2012, 12, 21)
>>> print(d)
2012-12-21
>>> d
datetime.date(2012, 12, 21)
>>>
```
The `str()` function is used to create a nice printable output:
```python
>>> str(d)
'2012-12-21'
>>>
```
The `repr()` function is used to create a more detailed representation
for programmers.
```python
>>> repr(d)
'datetime.date(2012, 12, 21)'
>>>
```
Those functions, `str()` and `repr()`, use a pair of special methods in the class to get the string to be printed.
```python
class Date(object):
def __init__(self, year, month, day):
self.year = year
self.month = month
self.day = day
# Used with `str()`
def __str__(self):
return f'{self.year}-{self.month}-{self.day}'
# Used with `repr()`
def __repr__(self):
return f'Date({self.year},{self.month},{self.day})'
```
*Note: The convention for `__repr__()` is to return a string that,
when fed to `eval()`., will recreate the underlying object. If this
is not possible, some kind of easily readable representation is used
instead.*
### Special Methods for Mathematics
Mathematical operators are just calls to special methods.
```python
a + b a.__add__(b)
a - b a.__sub__(b)
a * b a.__mul__(b)
a / b a.__div__(b)
a // b a.__floordiv__(b)
a % b a.__mod__(b)
a << b a.__lshift__(b)
a >> b a.__rshift__(b)
a & b a.__and__(b)
a | b a.__or__(b)
a ^ b a.__xor__(b)
a ** b a.__pow__(b)
-a a.__neg__()
~a a.__invert__()
abs(a) a.__abs__()
```
### Special Methods for Item Access
These are the methods to implement containers.
```python
len(x) x.__len__()
x[a] x.__getitem__(a)
x[a] = v x.__setitem__(a,v)
del x[a] x.__delitem__(a)
```
You can use them in your classes.
```python
class Sequence(object):
def __len__(self):
...
def __getitem__(self,a):
...
def __setitem__(self,a,v):
...
def __delitem__(self,a):
...
```
### Method Invocation
Invoking a method is a two-step process.
1. Lookup: The `.` operator
2. Method call: The `()` operator
```python
>>> s = Stock('GOOG',100,490.10)
>>> c = s.cost # Lookup
>>> c
<bound method Stock.cost of <Stock object at 0x590d0>>
>>> c() # Method call
49010.0
>>>
```
### Bound Methods
A method that has not yet been invoked by the function call operator `()` is known as a *bound method*.
It operates on the instance where it originated.
```python
>>> s = Stock('GOOG', 100, 490.10) >>> s
<Stock object at 0x590d0>
>>> c = s.cost
>>> c
<bound method Stock.cost of <Stock object at 0x590d0>>
>>> c()
49010.0
>>>
```
Bound methods are often a source of careless non-obvious errors. For example:
```python
>>> s = Stock('GOOG', 100, 490.10)
>>> print('Cost : %0.2f' % s.cost)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: float argument required
>>>
```
Or devious behavior that's hard to debug.
```python
f = open(filename, 'w')
...
f.close # Oops, Didn't do anything at all. `f` still open.
```
In both of these cases, the error is cause by forgetting to include the
trailing parentheses. For example, `s.cost()` or `f.close()`.
### Attribute Access
There is an alternative way to access, manipulate and manage attributes.
```python
getattr(obj, 'name') # Same as obj.name
setattr(obj, 'name', value) # Same as obj.name = value
delattr(obj, 'name') # Same as del obj.name
hasattr(obj, 'name') # Tests if attribute exists
```
Example:
```python
if hasattr(obj, 'x'):
x = getattr(obj, 'x'):
else:
x = None
```
*Note: `getattr()` also has a useful default value *arg*.
```python
x = getattr(obj, 'x', None)
```
## Exercises
### (a) Better output for printing objects
All Python objects have two string representations. The first
representation is created by string conversion via `str()` (which is
called by `print`). The string representation is usually a nicely
formatted version of the object meant for humans. The second
representation is a code representation of the object created by
`repr()` (or by viewing a value in the interactive shell). The code
representation typically shows you the code that you have to type to
get the object.
The two representations of an object are often different. For example, you can see the difference by trying the following:
```python
>>> s = 'Hello\nWorld'
>>> print(str(s)) # Notice nice output (no quotes)
Hello
World
>>> print(repr(s)) # Notice the added quotes and escape codes
'Hello\nWorld'
>>> print(f'{s!r}') # Alternate way to get repr() string
'Hello\nWorld'
>>>
```
Both kinds of string conversions can be redefined in a class if it defines the `__str__()` and `__repr__()` methods.
Modify the `Stock` object that you defined in Exercise 4.1 so that the `__repr__()` method produces more useful output.
```python
>>> goog = Stock('GOOG', 100, 490.1)
>>> goog
Stock('GOOG', 100, 490.1)
>>>
```
See what happens when you read a portfolio of stocks and view the resulting list after you have made these changes.
```python
>>> import stock
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
>>> portfolio
... see what the output is ...
>>>
```
### (b) An example of using `getattr()`
In Exercise 4.2 you worked with a function `print_portfolio()` that made a table for a stock portfolio.
That function was hard-coded to only work with stock data—-how limiting! You can do so much more if you use functions such as `getattr()`.
To begin, try this little example:
```python
>>> import stock
>>> s = stock.Stock('GOOG', 100, 490.1)
>>> columns = ['name', 'shares']
>>> for colname in columns:
print(colname, '=', getattr(s, colname))
name = GOOG
shares = 100
>>>
```
Carefully observe that the output data is determined entirely by the attribute names listed in the `columns` variable.
In the file `tableformat.py`, take this idea and expand it into a
generalized function `print_table()` that prints a table showing
user-specified attributes of a list of arbitrary objects.
As with the earlier `print_portfolio()` function, `print_table()`
should also accept a `TableFormatter` instance to control the output
format. Heres how it should work:
```python
>>> import stock
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
>>> from tableformat import create_formatter, print_table
>>> formatter = create_formatter('txt')
>>> print_table(portfolio, ['name','shares'], formatter)
name shares
---------- ----------
AA 100
IBM 50
CAT 150
MSFT 200
GE 95
MSFT 50
IBM 100
>>> print_table(portfolio, ['name','shares','price'], formatter)
name shares price
---------- ---------- ----------
AA 100 32.2
IBM 50 91.1
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.1
IBM 100 70.44
>>>
```
### (c) Exercise Bonus: Column Formatting
Modify the `print_table()` function in part (B) so that it also
accepts a list of format specifiers for formatting the contents of
each column.
```python
>>> print_table(portfolio,
['name','shares','price'],
['s','d','0.2f'],
formatter)
name shares price
---------- ---------- ----------
AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
>>>
```
[Next](04_Defining_exceptions)

View File

@@ -0,0 +1,49 @@
# 4.4 Defining Exceptions
User defined exceptions are defined by classes.
```python
class NetworkError(Exception):
pass
```
**Exceptions always inherit from `Exception`.**
Usually they are empty classes. Use `pass` for the body.
You can also make a hierarchy of your exceptions.
```python
class AuthenticationError(NetworkError):
pass
class ProtocolError(NetworkError):
pass
```
## Exercises
### (a) Defining a custom exception
It is often good practice for libraries to define their own exceptions.
This makes it easier to distinguish between Python exceptions raised
in response to common programming errors versus exceptions
intentionally raised by a library to a signal a specific usage
problem.
Modify the `create_formatter()` function from the last exercise so
that it raises a custom `FormatError` exception when the user provides
a bad format name.
For example:
```python
>>> from tableformat import create_formatter
>>> formatter = create_formatter('xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tableformat.py", line 71, in create_formatter
raise FormatError('Unknown table format %s' % name)
FormatError: Unknown table format xls
>>>
```

View File

@@ -39,6 +39,7 @@
{{ content }} {{ content }}
<footer class="site-footer"> <footer class="site-footer">
<span class="site-footer">Copyright (c) 2007-2020, David Beazley, All Rights Reserved</span>
{% if site.github.is_project_page %} {% if site.github.is_project_page %}
<span class="site-footer-owner"><a href="{{ site.github.repository_url }}">{{ site.github.repository_name }}</a> is maintained by <a href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a>.</span> <span class="site-footer-owner"><a href="{{ site.github.repository_url }}">{{ site.github.repository_name }}</a> is maintained by <a href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a>.</span>
{% endif %} {% endif %}