Added sections 2-4
This commit is contained in:
7
Notes/02_Working_with_data/00_Overview.md
Normal file
7
Notes/02_Working_with_data/00_Overview.md
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
# Working With Data Overview
|
||||||
|
|
||||||
|
In this section, we look at how Python programmers represent and work with data.
|
||||||
|
|
||||||
|
Most programs today work with data. We are going to learn to common programming idioms and how to not shoot yourself in the foot.
|
||||||
|
|
||||||
|
We will take a look at part of the object-model in Python. Which is a big part of understanding most Python programs.
|
||||||
431
Notes/02_Working_with_data/01_Datatypes.md
Normal file
431
Notes/02_Working_with_data/01_Datatypes.md
Normal file
@@ -0,0 +1,431 @@
|
|||||||
|
# 2.1 Datatypes and Data structures
|
||||||
|
|
||||||
|
This section introduces data structures in the form of tuples and dicts.
|
||||||
|
|
||||||
|
### Primitive Datatypes
|
||||||
|
|
||||||
|
Python has a few primitive types of data:
|
||||||
|
|
||||||
|
* Integers
|
||||||
|
* Floating point numbers
|
||||||
|
* Strings (text)
|
||||||
|
|
||||||
|
We have learned about these in the previous section.
|
||||||
|
|
||||||
|
### None type
|
||||||
|
|
||||||
|
```python
|
||||||
|
email_address = None
|
||||||
|
```
|
||||||
|
|
||||||
|
This type is often used as a placeholder for optional or missing value.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if email_address:
|
||||||
|
send_email(email_address, msg)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Structures
|
||||||
|
|
||||||
|
Real programs have more complex data than the ones that can be easily represented by the datatypes learned so far.
|
||||||
|
For example information about a stock:
|
||||||
|
|
||||||
|
```code
|
||||||
|
100 shares of GOOG at $490.10
|
||||||
|
```
|
||||||
|
|
||||||
|
This is an "object" with three parts:
|
||||||
|
|
||||||
|
* Name or symbol of the stock ("GOOG", a string)
|
||||||
|
* Number of shares (100, an integer)
|
||||||
|
* Price (490.10 a float)
|
||||||
|
|
||||||
|
### Tuples
|
||||||
|
|
||||||
|
A tuple is a collection of values grouped together.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = ('GOOG', 100, 490.1)
|
||||||
|
```
|
||||||
|
|
||||||
|
Sometimes the `()` are ommitted in the syntax.
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = 'GOOG', 100, 490.1
|
||||||
|
```
|
||||||
|
|
||||||
|
Special cases (0-tuple, 1-typle).
|
||||||
|
|
||||||
|
```python
|
||||||
|
t = () # An empty tuple
|
||||||
|
w = ('GOOG', ) # A 1-item tuple
|
||||||
|
```
|
||||||
|
|
||||||
|
Tuples are usually used to represent *simple* records or structures.
|
||||||
|
Typically, it is a single *object* of multiple parts. A good analogy: *A tuple is like a single row in a database table.*
|
||||||
|
|
||||||
|
Tuple contents are ordered (like an array).
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = ('GOOG', 100, 490.1)
|
||||||
|
name = s[0] # 'GOOG'
|
||||||
|
shares = s[1] # 100
|
||||||
|
price = s[2] # 490.1
|
||||||
|
```
|
||||||
|
|
||||||
|
However, th contents can't be modified.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> s[1] = 75
|
||||||
|
TypeError: object does not support item assignment
|
||||||
|
```
|
||||||
|
|
||||||
|
You can, however, make a new tuple based on a current tuple.
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = (s[0], 75, s[2])
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tuple Packing
|
||||||
|
|
||||||
|
Tuples are focused more on packing related items together into a single *entity*.
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = ('GOOG', 100, 490.1)
|
||||||
|
```
|
||||||
|
|
||||||
|
The tuple is then easy to pass around to other parts of a program as a single object.
|
||||||
|
|
||||||
|
### Tuple Unpacking
|
||||||
|
|
||||||
|
To use the tuple elsewhere, you can unpack its parts into variables.
|
||||||
|
|
||||||
|
```python
|
||||||
|
name, shares, price = s
|
||||||
|
print('Cost', shares * price)
|
||||||
|
```
|
||||||
|
|
||||||
|
The number of variables must match the tuple structure.
|
||||||
|
|
||||||
|
```python
|
||||||
|
name, shares = s # ERROR
|
||||||
|
Traceback (most recent call last):
|
||||||
|
...
|
||||||
|
ValueError: too many values to unpack
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tuples vs. Lists
|
||||||
|
|
||||||
|
Tuples are NOT just read-only lists. Tuples are most ofter used for a *single item* consisting of multiple parts.
|
||||||
|
Lists are usually a collection of distinct items, usually all of the same type.
|
||||||
|
|
||||||
|
```python
|
||||||
|
record = ('GOOG', 100, 490.1) # A tuple representing a stock in a portfolio
|
||||||
|
|
||||||
|
symbols = [ 'GOOG', 'AAPL', 'IBM' ] # A List representing three stock symbols
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dictionaries
|
||||||
|
|
||||||
|
A dictionary is a hash table or associative array.
|
||||||
|
It is a collection of values indexed by *keys*. These keys serve as field names.
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = {
|
||||||
|
'name': 'GOOG',
|
||||||
|
'shares': 100,
|
||||||
|
'price': 490.1
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common operations
|
||||||
|
|
||||||
|
To read values from a dictionary use the key names.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> print(s['name'], s['shares'])
|
||||||
|
GOOG 100
|
||||||
|
>>> s['price']
|
||||||
|
490.10
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To add or modify values assign using the key names.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> s['shares'] = 75
|
||||||
|
>>> s['date'] = '6/6/2007'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To delete a value use the `del` statement.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> del s['date']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why dictionaries?
|
||||||
|
|
||||||
|
Dictionaries are useful when there are *many* different values and those values
|
||||||
|
might be modified or manipulated. Dictionaries make your code more readable.
|
||||||
|
|
||||||
|
```python
|
||||||
|
s['price']
|
||||||
|
# vs
|
||||||
|
s[2]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### Note
|
||||||
|
|
||||||
|
In the last few exercises, you wrote a program that read a datafile `Data/portfolio.csv`. Using the `csv` module, it is easy to read the file row-by-row.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import csv
|
||||||
|
>>> f = open('Data/portfolio.csv')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> next(rows)
|
||||||
|
['name', 'shares', 'price']
|
||||||
|
>>> row = next(rows)
|
||||||
|
>>> row
|
||||||
|
['AA', '100', '32.20']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Although reading the file is easy, you often want to do more with the data than read it.
|
||||||
|
For instance, perhaps you want to store it and start performing some calculations on it.
|
||||||
|
Unfortunately, a raw "row" of data doesn’t give you enough to work with. For example, even a simple math calculation doesn’t work:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> row = ['AA', '100', '32.20']
|
||||||
|
>>> cost = row[1] * row[2]
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
TypeError: can't multiply sequence by non-int of type 'str'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To do more, you typically want to interpret the raw data in some way and turn it into a more useful kind of object so that you can work with it later.
|
||||||
|
Two simple options are tuples or dictionaries.
|
||||||
|
|
||||||
|
### (a) Tuples
|
||||||
|
|
||||||
|
At the interactive prompt, create the following tuple that represents
|
||||||
|
the above row, but with the numeric columns converted to proper
|
||||||
|
numbers:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> t = (row[0], int(row[1]), float(row[2]))
|
||||||
|
>>> t
|
||||||
|
('AA', 100, 32.2)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Using this, you can now calculate the total cost by multiplying the shares and the price:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> cost = t[1] * t[2]
|
||||||
|
>>> cost
|
||||||
|
3220.0000000000005
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Is math broken in Python? What’s the deal with the answer of
|
||||||
|
3220.0000000000005?
|
||||||
|
|
||||||
|
This is an artifact of the floating point hardware on your computer
|
||||||
|
only being able to accurately represent decimals in Base-2, not
|
||||||
|
Base-10. For even simple calculations involving base-10 decimals,
|
||||||
|
small errors are introduced. This is normal, although perhaps a bit
|
||||||
|
surprising if you haven’t seen it before.
|
||||||
|
|
||||||
|
This happens in all programming languages that use floating point
|
||||||
|
decimals, but it often gets hidden when printing. For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> print(f'{cost:0.2f}')
|
||||||
|
3220.00
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Tuples are read-only. Verify this by trying to change the number of shares to 75.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> t[1] = 75
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
TypeError: 'tuple' object does not support item assignment
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Although you can’t change tuple contents, you can always create a completely new tuple that replaces the old one.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> t = (t[0], 75, t[2])
|
||||||
|
>>> t
|
||||||
|
('AA', 75, 32.2)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Whenever you reassign an existing variable name like this, the old
|
||||||
|
value is discarded. Although the above assignment might look like you
|
||||||
|
are modifying the tuple, you are actually creating a new tuple and
|
||||||
|
throwing the old one away.
|
||||||
|
|
||||||
|
Tuples are often used to pack and unpack values into variables. Try the following:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> name, shares, price = t
|
||||||
|
>>> name
|
||||||
|
'AA'
|
||||||
|
>>> shares
|
||||||
|
75
|
||||||
|
>>> price
|
||||||
|
32.2
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Take the above variables and pack them back into a tuple
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> t = (name, 2*shares, price)
|
||||||
|
>>> t
|
||||||
|
('AA', 150, 32.2)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) Dictionaries as a data structure
|
||||||
|
|
||||||
|
An alternative to a tuple is to create a dictionary instead.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> d = {
|
||||||
|
'name' : row[0],
|
||||||
|
'shares' : int(row[1]),
|
||||||
|
'price' : float(row[2])
|
||||||
|
}
|
||||||
|
>>> d
|
||||||
|
{'name': 'AA', 'shares': 100, 'price': 32.2 }
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Calculate the total cost of this holding:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> cost = d['shares'] * d['price']
|
||||||
|
>>> cost
|
||||||
|
3220.0000000000005
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Compare this example with the same calculation involving tuples above. Change the number of shares to 75.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> d['shares'] = 75
|
||||||
|
>>> d
|
||||||
|
{'name': 'AA', 'shares': 75, 'price': 75}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Unlike tuples, dictionaries can be freely modified. Add some attributes:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> d['date'] = (6, 11, 2007)
|
||||||
|
>>> d['account'] = 12345
|
||||||
|
>>> d
|
||||||
|
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007), 'account': 12345}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Some additional dictionary operations
|
||||||
|
|
||||||
|
If you turn a dictionary into a list, you’ll get all of its keys:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> list(d)
|
||||||
|
['name', 'shares', 'price', 'date', 'account']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Similarly, if you use the `for` statement to iterate on a dictionary, you will get the keys:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for k in d:
|
||||||
|
print('k =', k)
|
||||||
|
|
||||||
|
k = name
|
||||||
|
k = shares
|
||||||
|
k = price
|
||||||
|
k = date
|
||||||
|
k = account
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try this variant that performs a lookup at the same time:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for k in d:
|
||||||
|
print(k, '=', d[k])
|
||||||
|
|
||||||
|
name = AA
|
||||||
|
shares = 75
|
||||||
|
price = 32.2
|
||||||
|
date = (6, 11, 2007)
|
||||||
|
account = 12345
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also obtain all of the keys using the `keys()` method:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> keys = d.keys()
|
||||||
|
>>> keys
|
||||||
|
dict_keys(['name', 'shares', 'price', 'date', 'account'])
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
`keys()` is a bit unusual in that it returns a special `dict_keys` object.
|
||||||
|
|
||||||
|
This is an overlay on the original dictionary that always gives you the current keys—even if the dictionary changes. For example, try this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> del d['account']
|
||||||
|
>>> keys
|
||||||
|
dict_keys(['name', 'shares', 'price', 'date'])
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Carefully notice that the `'account'` disappeared from `keys` even though you didn’t call `d.keys()` again.
|
||||||
|
|
||||||
|
A more elegant way to work with keys and values together is to use the `items()` method. This gives you `(key, value)` tuples:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> items = d.items()
|
||||||
|
>>> items
|
||||||
|
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
|
||||||
|
>>> for k, v in d.items():
|
||||||
|
print(k, '=', v)
|
||||||
|
|
||||||
|
name = AA
|
||||||
|
shares = 75
|
||||||
|
price = 32.2
|
||||||
|
date = (6, 11, 2007)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have tuples such as `items`, you can create a dictionary using the `dict()` function. Try it:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> items
|
||||||
|
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
|
||||||
|
>>> d = dict(items)
|
||||||
|
>>> d
|
||||||
|
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007)}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](02_Containers)
|
||||||
413
Notes/02_Working_with_data/02_Containers.md
Normal file
413
Notes/02_Working_with_data/02_Containers.md
Normal file
@@ -0,0 +1,413 @@
|
|||||||
|
# Containers
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Programs often have to work with many objects.
|
||||||
|
|
||||||
|
* A portfolio of stocks
|
||||||
|
* A table of stock prices
|
||||||
|
|
||||||
|
There are three main choices to use.
|
||||||
|
|
||||||
|
* Lists. Ordered data.
|
||||||
|
* Dictionaries. Unordered data.
|
||||||
|
* Sets. Unordered collection
|
||||||
|
|
||||||
|
### Lists as a Container
|
||||||
|
|
||||||
|
Use a list when the order of the data matters. Remember that lists can hold any kind of objects.
|
||||||
|
For example, a list of tuples.
|
||||||
|
|
||||||
|
```python
|
||||||
|
portfolio = [
|
||||||
|
('GOOG', 100, 490.1),
|
||||||
|
('IBM', 50, 91.3),
|
||||||
|
('CAT', 150, 83.44)
|
||||||
|
]
|
||||||
|
|
||||||
|
portfolio[0] # ('GOOG', 100, 490.1)
|
||||||
|
portfolio[2] # ('CAT', 150, 83.44)
|
||||||
|
```
|
||||||
|
|
||||||
|
### List construction
|
||||||
|
|
||||||
|
Building a list from scratch.
|
||||||
|
|
||||||
|
```python
|
||||||
|
records = [] # Initial empty list
|
||||||
|
|
||||||
|
# Use .append() to add more items
|
||||||
|
records.append(('GOOG', 100, 490.10))
|
||||||
|
records.append(('IBM', 50, 91.3))
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
An example when reading records from a file.
|
||||||
|
|
||||||
|
```python
|
||||||
|
records = [] # Initial empty list
|
||||||
|
|
||||||
|
with open('portfolio.csv', 'rt') as f:
|
||||||
|
for line in f:
|
||||||
|
row = line.split(',')
|
||||||
|
records.append((row[0], int(row[1])), float(row[2]))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dicts as a Container
|
||||||
|
|
||||||
|
Dictionaries are useful if you want fast random lookups (by key name). For
|
||||||
|
example, a dictionary of stock prices:
|
||||||
|
|
||||||
|
```python
|
||||||
|
prices = {
|
||||||
|
'GOOG': 513.25,
|
||||||
|
'CAT': 87.22,
|
||||||
|
'IBM': 93.37,
|
||||||
|
'MSFT': 44.12
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Here are some simple lookups:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> prices['IBM']
|
||||||
|
93.37
|
||||||
|
>>> prices['GOOG']
|
||||||
|
513.25
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dict Construction
|
||||||
|
|
||||||
|
Example of building a dict from scratch.
|
||||||
|
|
||||||
|
```python
|
||||||
|
prices = {} # Initial empty dict
|
||||||
|
|
||||||
|
# Insert new items
|
||||||
|
prices['GOOG'] = 513.25
|
||||||
|
prices['CAT'] = 87.22
|
||||||
|
prices['IBM'] = 93.37
|
||||||
|
```
|
||||||
|
|
||||||
|
An example populating the dict from the contents of a file.
|
||||||
|
|
||||||
|
```python
|
||||||
|
prices = {} # Initial empty dict
|
||||||
|
|
||||||
|
with open('prices.csv', 'rt') as f:
|
||||||
|
for line in f:
|
||||||
|
row = line.split(',')
|
||||||
|
prices[row[0]] = float(row[1])
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dictionary Lookups
|
||||||
|
|
||||||
|
You can test the existence of a key.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if key in d:
|
||||||
|
# YES
|
||||||
|
else:
|
||||||
|
# NO
|
||||||
|
```
|
||||||
|
|
||||||
|
You can look up a value that might not exist and provide a default value in case it doesn't.
|
||||||
|
|
||||||
|
```python
|
||||||
|
name = d.get(key, default)
|
||||||
|
```
|
||||||
|
|
||||||
|
An example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> prices.get('IBM', 0.0)
|
||||||
|
93.37
|
||||||
|
>>> prices.get('SCOX', 0.0)
|
||||||
|
0.0
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Composite keys
|
||||||
|
|
||||||
|
Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable.
|
||||||
|
For example, tuples:
|
||||||
|
|
||||||
|
```python
|
||||||
|
holidays = {
|
||||||
|
(1, 1) : 'New Years',
|
||||||
|
(3, 14) : 'Pi day',
|
||||||
|
(9, 13) : "Programmer's day",
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then to access:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> holidays[3, 14] 'Pi day'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.*
|
||||||
|
|
||||||
|
### Sets
|
||||||
|
|
||||||
|
Sets are collection of unordered unique items.
|
||||||
|
|
||||||
|
```python
|
||||||
|
tech_stocks = { 'IBM','AAPL','MSFT' }
|
||||||
|
# Alternative sintax
|
||||||
|
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
|
||||||
|
```
|
||||||
|
|
||||||
|
Sets are useful for membership tests.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> tech_stocks
|
||||||
|
set(['AAPL', 'IBM', 'MSFT'])
|
||||||
|
>>> 'IBM' in tech_stocks
|
||||||
|
True
|
||||||
|
>>> 'FB' in tech_stocks
|
||||||
|
False
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Sets are also useful for duplicate elimination.
|
||||||
|
|
||||||
|
```python
|
||||||
|
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
|
||||||
|
|
||||||
|
unique = set(names)
|
||||||
|
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])
|
||||||
|
```
|
||||||
|
|
||||||
|
Additional set operations:
|
||||||
|
|
||||||
|
```python
|
||||||
|
names.add('CAT') # Add an item
|
||||||
|
names.remove('YHOO') # Remove an item
|
||||||
|
|
||||||
|
s1 | s2 # Set union
|
||||||
|
s1 & s2 # Set intersection
|
||||||
|
s1 - s2 # Set difference
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### Objectives
|
||||||
|
|
||||||
|
### Exercise A: A list of tuples
|
||||||
|
|
||||||
|
The file `Data/portfolio.csv` contains a list of stocks in a portfolio.
|
||||||
|
In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation.
|
||||||
|
|
||||||
|
Your code should have looked something like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# pcost.py
|
||||||
|
|
||||||
|
import csv
|
||||||
|
|
||||||
|
def portfolio_cost(filename):
|
||||||
|
'''Computes the total cost (shares*price) of a portfolio file'''
|
||||||
|
total_cost = 0.0
|
||||||
|
|
||||||
|
with open(filename, 'rt') as f:
|
||||||
|
rows = csv.reader(f)
|
||||||
|
headers = next(rows)
|
||||||
|
for row in rows:
|
||||||
|
nshares = int(row[1])
|
||||||
|
price = float(row[2])
|
||||||
|
total_cost += nshares * price
|
||||||
|
return total_cost
|
||||||
|
```
|
||||||
|
|
||||||
|
Using this code as a rough guide, create a new file `report.py`. In
|
||||||
|
that file, define a function `read_portfolio(filename)` that opens a
|
||||||
|
given portfolio file and reads it into a list of tuples. To do this,
|
||||||
|
you’re going to make a few minor modifications to the above code.
|
||||||
|
|
||||||
|
First, instead of defining `total_cost = 0`, you’ll make a variable that’s initially set to an empty list. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
portfolio = []
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, instead of totaling up the cost, you’ll turn each row into a
|
||||||
|
tuple exactly as you just did in the last exercise and append it to
|
||||||
|
this list. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
for row in rows:
|
||||||
|
holding = (row[0], int(row[1]), float(row[2]))
|
||||||
|
portfolio.append(holding)
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, you’ll return the resulting `portfolio` list.
|
||||||
|
|
||||||
|
Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter):
|
||||||
|
|
||||||
|
*Hint: Use `-i` when executing the file in the terminal*
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
|
||||||
|
('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
|
||||||
|
>>>
|
||||||
|
>>> portfolio[0]
|
||||||
|
('AA', 100, 32.2)
|
||||||
|
>>> portfolio[1]
|
||||||
|
('IBM', 50, 91.1)
|
||||||
|
>>> portfolio[1][1]
|
||||||
|
50
|
||||||
|
>>> total = 0.0
|
||||||
|
>>> for s in portfolio:
|
||||||
|
total += s[1] * s[2]
|
||||||
|
|
||||||
|
>>> print(total)
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This list of tuples that you have created is very similar to a 2-D array.
|
||||||
|
For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers.
|
||||||
|
|
||||||
|
That said, you can also rewrite the last for-loop using a statement like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> total = 0.0
|
||||||
|
>>> for name, shares, price in portfolio:
|
||||||
|
total += shares*price
|
||||||
|
|
||||||
|
>>> print(total)
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) List of Dictionaries
|
||||||
|
|
||||||
|
Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple.
|
||||||
|
In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.
|
||||||
|
|
||||||
|
Experiment with this new function in the same manner as you did in part (a).
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = read_portfolio('portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
|
||||||
|
{'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
|
||||||
|
{'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
|
||||||
|
{'name': 'IBM', 'shares': 100, 'price': 70.44}]
|
||||||
|
>>> portfolio[0]
|
||||||
|
{'name': 'AA', 'shares': 100, 'price': 32.2}
|
||||||
|
>>> portfolio[1]
|
||||||
|
{'name': 'IBM', 'shares': 50, 'price': 91.1}
|
||||||
|
>>> portfolio[1]['shares']
|
||||||
|
50
|
||||||
|
>>> total = 0.0
|
||||||
|
>>> for s in portfolio:
|
||||||
|
total += s['shares']*s['price']
|
||||||
|
|
||||||
|
>>> print(total)
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers.
|
||||||
|
This is often preferred because the resulting code is easier to read later.
|
||||||
|
|
||||||
|
Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> from pprint import pprint
|
||||||
|
>>> pprint(portfolio)
|
||||||
|
[{'name': 'AA', 'price': 32.2, 'shares': 100},
|
||||||
|
{'name': 'IBM', 'price': 91.1, 'shares': 50},
|
||||||
|
{'name': 'CAT', 'price': 83.44, 'shares': 150},
|
||||||
|
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
|
||||||
|
{'name': 'GE', 'price': 40.37, 'shares': 95},
|
||||||
|
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
|
||||||
|
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Dictionaries as a container
|
||||||
|
|
||||||
|
A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer.
|
||||||
|
In the Python shell, try playing with a dictionary:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> prices = { }
|
||||||
|
>>> prices['IBM'] = 92.45
|
||||||
|
>>> prices['MSFT'] = 45.12
|
||||||
|
>>> prices
|
||||||
|
... look at the result ...
|
||||||
|
>>> prices['IBM']
|
||||||
|
92.45
|
||||||
|
>>> prices['AAPL']
|
||||||
|
... look at the result ...
|
||||||
|
>>> 'AAPL' in prices
|
||||||
|
False
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The file `Data/prices.csv` contains a series of lines with stock prices.
|
||||||
|
The file looks something like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
"AA",9.22
|
||||||
|
"AXP",24.85
|
||||||
|
"BA",44.85
|
||||||
|
"BAC",11.27
|
||||||
|
"C",3.72
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.
|
||||||
|
|
||||||
|
To do this, start with an empty dictionary and start inserting values into it just
|
||||||
|
as you did above. However, you are reading the values from a file now.
|
||||||
|
|
||||||
|
We’ll use this data structure to quickly lookup the price of a given stock name.
|
||||||
|
|
||||||
|
A few little tips that you’ll need for this part. First, make sure you use the `csv` module just as you did before—there’s no need to reinvent the wheel here.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import csv
|
||||||
|
>>> f = open('Data/prices.csv', 'r')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> for row in rows:
|
||||||
|
print(row)
|
||||||
|
|
||||||
|
|
||||||
|
['AA', '9.22']
|
||||||
|
['AXP', '24.85']
|
||||||
|
...
|
||||||
|
[]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.
|
||||||
|
|
||||||
|
There’s a possibility that this could cause your program to die with an exception.
|
||||||
|
Use the `try` and `except` statements to catch this as appropriate.
|
||||||
|
|
||||||
|
Once you have written your `read_prices()` function, test it interactively to make sure it works:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> prices = read_prices('Data/prices.csv')
|
||||||
|
>>> prices['IBM']
|
||||||
|
106.28
|
||||||
|
>>> prices['MSFT']
|
||||||
|
20.89
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (e) Finding out if you can retire
|
||||||
|
|
||||||
|
Tie all of this work together by adding the statements to your `report.py` program.
|
||||||
|
It takes the list of stocks in part (b) and the dictionary of prices in part (c) and
|
||||||
|
computes the current value of the portfolio along with the gain/loss.
|
||||||
|
|
||||||
|
[Next](03_Formatting)
|
||||||
276
Notes/02_Working_with_data/03_Formatting.md
Normal file
276
Notes/02_Working_with_data/03_Formatting.md
Normal file
@@ -0,0 +1,276 @@
|
|||||||
|
# 2.3 Formatting
|
||||||
|
|
||||||
|
This is a slight digression, but when you work with data, you often want to
|
||||||
|
produce structured output (tables, etc.). For example:
|
||||||
|
|
||||||
|
```code
|
||||||
|
Name Shares Price
|
||||||
|
---------- ---------- -----------
|
||||||
|
AA 100 32.20
|
||||||
|
IBM 50 91.10
|
||||||
|
CAT 150 83.44
|
||||||
|
MSFT 200 51.23
|
||||||
|
GE 95 40.37
|
||||||
|
MSFT 50 65.10
|
||||||
|
IBM 100 70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
### String Formatting
|
||||||
|
|
||||||
|
One way to format string in Python 3.6+ is with `f-strings`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> name = 'IBM'
|
||||||
|
>>> shares = 100
|
||||||
|
>>> price = 91.1
|
||||||
|
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
|
||||||
|
' IBM 100 91.10'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The part `{expression:format}` is replaced.
|
||||||
|
|
||||||
|
It is commonly used with `print`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(f'{name:>10s} {shares:>10d} {price:>10.2f}')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Format codes
|
||||||
|
|
||||||
|
Format codes (after the `:` inside the `{}`) are similar to C `printf()`. Common codes
|
||||||
|
include:
|
||||||
|
|
||||||
|
```code
|
||||||
|
d Decimal integer
|
||||||
|
b Binary integer
|
||||||
|
x Hexadecimal integer
|
||||||
|
f Float as [-]m.dddddd
|
||||||
|
e Float as [-]m.dddddde+-xx
|
||||||
|
g Float, but selective use of E notation s String
|
||||||
|
c Character (from integer)
|
||||||
|
```
|
||||||
|
|
||||||
|
Common modifiers adjust the field width and decimal precision. This is a partial list:
|
||||||
|
|
||||||
|
```code
|
||||||
|
:>10d Integer right aligned in 10-character field
|
||||||
|
:<10d Integer left aligned in 10-character field
|
||||||
|
:^10d Integer centered in 10-character field :0.2f Float with 2 digit precision
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dictionary Formatting
|
||||||
|
|
||||||
|
You can use the `format_map()` method on strings.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = {
|
||||||
|
'name': 'IBM',
|
||||||
|
'shares': 100,
|
||||||
|
'price': 91.1
|
||||||
|
}
|
||||||
|
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
|
||||||
|
' IBM 100 91.10'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
It uses the same `f-strings` but takes the values from the supplied dictionary.
|
||||||
|
|
||||||
|
### C-Style Formatting
|
||||||
|
|
||||||
|
You can also use the formatting operator `%`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> 'The value is %d' % 3
|
||||||
|
'The value is 3'
|
||||||
|
>>> '%5d %-5d %10d' % (3,4,5)
|
||||||
|
' 3 4 5'
|
||||||
|
>>> '%0.2f' % (3.1415926,)
|
||||||
|
'3.14'
|
||||||
|
```
|
||||||
|
|
||||||
|
This requires a single item or a tuple on the right. Format codes are modeled after the C `printf()` as well.
|
||||||
|
|
||||||
|
*Note: This is the only formatting available on byte strings.*
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> b'%s has %n messages' % (b'Dave', 37)
|
||||||
|
b'Dave has 37 messages'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
In the previous exercise, you wrote a program called `report.py` that computed the gain/loss of a
|
||||||
|
stock portfolio. In this exercise, you're going to modify it to produce a table like this:
|
||||||
|
|
||||||
|
```code
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 9.22 -22.98
|
||||||
|
IBM 50 106.28 15.18
|
||||||
|
CAT 150 35.46 -47.98
|
||||||
|
MSFT 200 20.89 -30.34
|
||||||
|
GE 95 13.48 -26.89
|
||||||
|
MSFT 50 20.89 -44.21
|
||||||
|
IBM 100 106.28 35.84
|
||||||
|
```
|
||||||
|
|
||||||
|
In this report, "Price" is the current share price of the stock and "Change" is the change in the share price from the initial purchase price.
|
||||||
|
|
||||||
|
### (a) How to format numbers
|
||||||
|
|
||||||
|
A common problem with printing numbers is specifying the number of decimal places. One way to fix this is to use f-strings. Try
|
||||||
|
these examples:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> value = 42863.1
|
||||||
|
>>> print(value)
|
||||||
|
42863.1
|
||||||
|
>>> print(f'{value:0.4f}')
|
||||||
|
42863.1000
|
||||||
|
>>> print(f'{value:>16.2f}')
|
||||||
|
42863.10
|
||||||
|
>>> print(f'{value:<16.2f}')
|
||||||
|
42863.10
|
||||||
|
>>> print(f'{value:*>16,.2f}')
|
||||||
|
*******42,863.10
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Full documentation on the formatting codes used f-strings can be found
|
||||||
|
[here](https://docs.python.org/3/library/string.html#format-specification-mini-language). Formatting
|
||||||
|
is also sometimes performed using the `%` operator of strings.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> print('%0.4f' % value)
|
||||||
|
42863.1000
|
||||||
|
>>> print('%16.2f' % value)
|
||||||
|
42863.10
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Documentation on various codes used with `%` can be found [here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting).
|
||||||
|
|
||||||
|
Although it’s commonly used with `print`, string formatting is not tied to printing.
|
||||||
|
If you want to save a formatted string. Just assign it to a variable.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> f = '%0.4f' % value
|
||||||
|
>>> f
|
||||||
|
'42863.1000'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) Collecting Data
|
||||||
|
|
||||||
|
In order to generate the above report, you’ll first want to collect
|
||||||
|
all of the data shown in the table. Write a function `make_report()`
|
||||||
|
that takes a list of stocks and dictionary of prices as input and
|
||||||
|
returns a list of tuples containing the rows of the above table.
|
||||||
|
|
||||||
|
Add this function to your `report.py` file. Here’s how it should work if you try it interactively:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> prices = read_prices('Data/prices.csv')
|
||||||
|
>>> report = make_report(portfolio, prices)
|
||||||
|
>>> for r in report:
|
||||||
|
print(r)
|
||||||
|
|
||||||
|
('AA', 100, 9.22, -22.980000000000004)
|
||||||
|
('IBM', 50, 106.28, 15.180000000000007)
|
||||||
|
('CAT', 150, 35.46, -47.98)
|
||||||
|
('MSFT', 200, 20.89, -30.339999999999996)
|
||||||
|
('GE', 95, 13.48, -26.889999999999997)
|
||||||
|
...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Printing a formatted table
|
||||||
|
|
||||||
|
Redo the above for-loop, but change the print statement to format the tuples.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for r in report:
|
||||||
|
print('%10s %10d %10.2f %10.2f' % r)
|
||||||
|
|
||||||
|
AA 100 9.22 -22.98
|
||||||
|
IBM 50 106.28 15.18
|
||||||
|
CAT 150 35.46 -47.98
|
||||||
|
MSFT 200 20.89 -30.34
|
||||||
|
...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also expand the values and use f-strings. For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for name, shares, price, change in report:
|
||||||
|
print(f'{name:>10s} {shares:>10d} {price:>10.2f} {change:>10.2f}')
|
||||||
|
|
||||||
|
AA 100 9.22 -22.98
|
||||||
|
IBM 50 106.28 15.18
|
||||||
|
CAT 150 35.46 -47.98
|
||||||
|
MSFT 200 20.89 -30.34
|
||||||
|
...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Take the above statements and add them to your `report.py` program.
|
||||||
|
Have your program take the output of the `make_report()` function and print a nicely formatted table as shown.
|
||||||
|
|
||||||
|
### (d) Adding some headers
|
||||||
|
|
||||||
|
Suppose you had a tuple of header names like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
headers = ('Name', 'Shares', 'Price', 'Change')
|
||||||
|
```
|
||||||
|
|
||||||
|
Add code to your program that takes the above tuple of headers and
|
||||||
|
creates a string where each header name is right-aligned in a
|
||||||
|
10-character wide field and each field is separated by a single space.
|
||||||
|
|
||||||
|
```python
|
||||||
|
' Name Shares Price Change'
|
||||||
|
```
|
||||||
|
|
||||||
|
Write code that takes the headers and creates the separator string between the headers and data to follow.
|
||||||
|
This string is just a bunch of "-" characters under each field name. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
'---------- ---------- ---------- -----------'
|
||||||
|
```
|
||||||
|
|
||||||
|
When you’re done, your program should produce the table shown at the top of this exercise.
|
||||||
|
|
||||||
|
```code
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 9.22 -22.98
|
||||||
|
IBM 50 106.28 15.18
|
||||||
|
CAT 150 35.46 -47.98
|
||||||
|
MSFT 200 20.89 -30.34
|
||||||
|
GE 95 13.48 -26.89
|
||||||
|
MSFT 50 20.89 -44.21
|
||||||
|
IBM 100 106.28 35.84
|
||||||
|
```
|
||||||
|
|
||||||
|
### (e) Formatting Challenge
|
||||||
|
|
||||||
|
How would you modify your code so that the price includes the currency symbol ($) and the output looks like this:
|
||||||
|
|
||||||
|
```code
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 $9.22 -22.98
|
||||||
|
IBM 50 $106.28 15.18
|
||||||
|
CAT 150 $35.46 -47.98
|
||||||
|
MSFT 200 $20.89 -30.34
|
||||||
|
GE 95 $13.48 -26.89
|
||||||
|
MSFT 50 $20.89 -44.21
|
||||||
|
IBM 100 $106.28 35.84
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](04_Sequences)
|
||||||
538
Notes/02_Working_with_data/04_Sequences.md
Normal file
538
Notes/02_Working_with_data/04_Sequences.md
Normal file
@@ -0,0 +1,538 @@
|
|||||||
|
# 2.4 Sequences
|
||||||
|
|
||||||
|
In this part, we look at some common idioms for working with sequence data.
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
|
||||||
|
Python has three *sequences* datatypes.
|
||||||
|
|
||||||
|
* String: `'Hello'`. A string is considered a sequence of characters.
|
||||||
|
* List: `[1, 4, 5]`.
|
||||||
|
* Tuple: `('GOOG', 100, 490.1)`.
|
||||||
|
|
||||||
|
All sequences are ordered and have length.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = 'Hello' # String
|
||||||
|
b = [1, 4, 5] # List
|
||||||
|
c = ('GOOG', 100, 490.1) # Tuple
|
||||||
|
|
||||||
|
# Indexed order
|
||||||
|
a[0] # 'H'
|
||||||
|
b[-1] # 5
|
||||||
|
c[1] # 100
|
||||||
|
|
||||||
|
# Length of sequence
|
||||||
|
len(a) # 5
|
||||||
|
len(b) # 3
|
||||||
|
len(c) # 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Sequences can be replicated: `s * n`.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = 'Hello'
|
||||||
|
>>> a * 3
|
||||||
|
'HelloHelloHello'
|
||||||
|
>>> b = [1, 2, 3]
|
||||||
|
>>> b * 2
|
||||||
|
[1, 2, 3, 1, 2, 3]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Sequences of the same type can be concatenated: `s + t`.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = (1, 2, 3)
|
||||||
|
>>> b = (4, 5)
|
||||||
|
>>> a + b
|
||||||
|
(1, 2, 3, 4, 5)
|
||||||
|
>>>
|
||||||
|
>>> c = [1, 5]
|
||||||
|
>>> a + c
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
TypeError: can only concatenate tuple (not "list") to tuple
|
||||||
|
```
|
||||||
|
|
||||||
|
### Slicing
|
||||||
|
|
||||||
|
Slicing means to take a subsequence from a sequence.
|
||||||
|
The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = [0,1,2,3,4,5,6,7,8]
|
||||||
|
|
||||||
|
a[2:5] # [2,3,4]
|
||||||
|
a[-5:] # [4,5,6,7,8]
|
||||||
|
a[:3] # [0,1,2]
|
||||||
|
```
|
||||||
|
|
||||||
|
* Indices `start` and `end` must be integers.
|
||||||
|
* Slices do *not* include the end value.
|
||||||
|
* If indices are omitted, they default to the beginning or end of the list.
|
||||||
|
|
||||||
|
### Slice re-assignment
|
||||||
|
|
||||||
|
Slices can also be reassigned and deleted.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Reassignment
|
||||||
|
a = [0,1,2,3,4,5,6,7,8]
|
||||||
|
a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8]
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note: The reassigned slice doesn't need to have the same length.*
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Deletion
|
||||||
|
a = [0,1,2,3,4,5,6,7,8]
|
||||||
|
del a[2:4] # [0,1,4,5,6,7,8]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sequence Reductions
|
||||||
|
|
||||||
|
There are some functions to reduce a sequence to a single value.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> s = [1, 2, 3, 4]
|
||||||
|
>>> sum(s)
|
||||||
|
10
|
||||||
|
>>> min(s) 1
|
||||||
|
>>> max(s) 4
|
||||||
|
>>> t = ['Hello', 'World']
|
||||||
|
>>> max(t)
|
||||||
|
'World'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Iteration over a sequence
|
||||||
|
|
||||||
|
The for-loop iterates over the elements in the sequence.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> s = [1, 4, 9, 16]
|
||||||
|
>>> for i in s:
|
||||||
|
... print(i)
|
||||||
|
...
|
||||||
|
1
|
||||||
|
4
|
||||||
|
9
|
||||||
|
16
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
On each iteration of the loop, you get a new item to work with.
|
||||||
|
This new value is placed into an iteration variable. In this example, the
|
||||||
|
iteration variable is `x`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
for x in s: # `x` is an iteration variable
|
||||||
|
...statements
|
||||||
|
```
|
||||||
|
|
||||||
|
In each iteration, it overwrites the previous value (if any).
|
||||||
|
After the loop finishes, the variable retains the last value.
|
||||||
|
|
||||||
|
### `break` statement
|
||||||
|
|
||||||
|
You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
|
||||||
|
|
||||||
|
```python
|
||||||
|
for name in namelist:
|
||||||
|
if name == 'Jake':
|
||||||
|
break
|
||||||
|
...
|
||||||
|
...
|
||||||
|
statements
|
||||||
|
```
|
||||||
|
|
||||||
|
When the `break` statement is executed, it will exit the loop and move
|
||||||
|
on the next `statements`. The `break` statement only applies to the
|
||||||
|
inner-most loop. If this loop is within another loop, it will not
|
||||||
|
break the outer loop.
|
||||||
|
|
||||||
|
### `continue` statement
|
||||||
|
|
||||||
|
To skip one element and move to the next one you use the `continue` statement.
|
||||||
|
|
||||||
|
```python
|
||||||
|
for line in lines:
|
||||||
|
if line == '\n': # Skip blank lines
|
||||||
|
continue
|
||||||
|
# More statements
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
This is useful when the current item is not of interest or needs to be ignored in the processing.
|
||||||
|
|
||||||
|
### Looping over integers
|
||||||
|
|
||||||
|
If you need to count, use `range()`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
for i in range(100):
|
||||||
|
# i = 0,1,...,99
|
||||||
|
```
|
||||||
|
|
||||||
|
The syntax is `range([start,] end [,step])`
|
||||||
|
|
||||||
|
```python
|
||||||
|
for i in range(100):
|
||||||
|
# i = 0,1,...,99
|
||||||
|
for j in range(10,20):
|
||||||
|
# j = 10,11,..., 19
|
||||||
|
for k in range(10,50,2):
|
||||||
|
# k = 10,12,...,48
|
||||||
|
# Notice how it counts in steps of 2, not 1.
|
||||||
|
```
|
||||||
|
|
||||||
|
* The ending value is never included. It mirrors the behavior of slices.
|
||||||
|
* `start` is optional. Default `0`.
|
||||||
|
* `step` is optional. Default `1`.
|
||||||
|
|
||||||
|
### `enumerate()` function
|
||||||
|
|
||||||
|
The `enumerate` function provides a loop with an extra counter value.
|
||||||
|
|
||||||
|
```python
|
||||||
|
names = ['Elwood', 'Jake', 'Curtis']
|
||||||
|
for i, name in enumerate(names):
|
||||||
|
# Loops with i = 0, name = 'Elwood'
|
||||||
|
# i = 1, name = 'Jake'
|
||||||
|
# i = 2, name = 'Curtis'
|
||||||
|
```
|
||||||
|
|
||||||
|
How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
|
||||||
|
A good example of using `enumerate()` is tracking line numbers while reading a file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
with open(filename) as f:
|
||||||
|
for lineno, line in enumerate(f, start=1):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
In the end, `enumerate` is just a nice shortcut for:
|
||||||
|
|
||||||
|
```python
|
||||||
|
i = 0
|
||||||
|
for x in s:
|
||||||
|
statements
|
||||||
|
i += 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Using `enumerate` is less typing and runs slightly faster.
|
||||||
|
|
||||||
|
### For and tuples
|
||||||
|
|
||||||
|
You can loop with multiple iteration variables.
|
||||||
|
|
||||||
|
```python
|
||||||
|
points = [
|
||||||
|
(1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
|
||||||
|
]
|
||||||
|
for x, y in points:
|
||||||
|
# Loops with x = 1, y = 4
|
||||||
|
# x = 10, y = 40
|
||||||
|
# x = 23, y = 14
|
||||||
|
# ...
|
||||||
|
```
|
||||||
|
|
||||||
|
When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
|
||||||
|
|
||||||
|
### `zip()` function
|
||||||
|
|
||||||
|
The `zip` function takes sequences and makes an iterator that combines them.
|
||||||
|
|
||||||
|
```python
|
||||||
|
columns = ['name', 'shares', 'price']
|
||||||
|
values = ['GOOG', 100, 490.1 ]
|
||||||
|
pairs = zip(a, b)
|
||||||
|
# ('name','GOOG'), ('shares',100), ('price',490.1)
|
||||||
|
```
|
||||||
|
|
||||||
|
To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
|
||||||
|
|
||||||
|
```python
|
||||||
|
for column, value in pairs:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
A common use of `zip` is to create key/value pairs for constructing dictionaries.
|
||||||
|
|
||||||
|
```python
|
||||||
|
d = dict(zip(columns, values))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Counting
|
||||||
|
|
||||||
|
Try some basic counting examples:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for n in range(10): # Count 0 ... 9
|
||||||
|
print(n, end=' ')
|
||||||
|
|
||||||
|
0 1 2 3 4 5 6 7 8 9
|
||||||
|
>>> for n in range(10,0,-1): # Count 10 ... 1
|
||||||
|
print(n, end=' ')
|
||||||
|
|
||||||
|
10 9 8 7 6 5 4 3 2 1
|
||||||
|
>>> for n in range(0,10,2): # Count 0, 2, ... 8
|
||||||
|
print(n, end=' ')
|
||||||
|
|
||||||
|
0 2 4 6 8
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) More sequence operations
|
||||||
|
|
||||||
|
Interactively experiment with some of the sequence reduction operations.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> data = [4, 9, 1, 25, 16, 100, 49]
|
||||||
|
>>> min(data)
|
||||||
|
1
|
||||||
|
>>> max(data)
|
||||||
|
100
|
||||||
|
>>> sum(data)
|
||||||
|
204
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try looping over the data.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for x in data:
|
||||||
|
print(x)
|
||||||
|
|
||||||
|
4
|
||||||
|
9
|
||||||
|
...
|
||||||
|
>>> for n, x in enumerate(data):
|
||||||
|
print(n, x)
|
||||||
|
|
||||||
|
0 4
|
||||||
|
1 9
|
||||||
|
2 1
|
||||||
|
...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Sometimes the `for` statement, `len()`, and `range()` get used by
|
||||||
|
novices in some kind of horrible code fragment that looks like it
|
||||||
|
emerged from the depths of a rusty C program.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for n in range(len(data)):
|
||||||
|
print(data[n])
|
||||||
|
|
||||||
|
4
|
||||||
|
9
|
||||||
|
1
|
||||||
|
...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Don’t do that! Not only does reading it make everyone’s eyes bleed, it’s inefficient with memory and it runs a lot slower.
|
||||||
|
Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason.
|
||||||
|
|
||||||
|
### (c) A practical `enumerate()` example
|
||||||
|
|
||||||
|
Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
|
||||||
|
Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> cost = portfolio_cost('Data/missing.csv')
|
||||||
|
Row 4: Couldn't convert: ['MSFT', '', '51.23']
|
||||||
|
Row 7: Couldn't convert: ['IBM', '', '70.44']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To do this, you’ll need to change just a few parts of your code.
|
||||||
|
|
||||||
|
```python
|
||||||
|
...
|
||||||
|
for rowno, row in enumerate(rows, start=1):
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
except ValueError:
|
||||||
|
print(f'Row {rowno}: Bad row: {row}')
|
||||||
|
```
|
||||||
|
|
||||||
|
### (d) Using the `zip()` function
|
||||||
|
|
||||||
|
In the file `portfolio.csv`, the first line contains column headers. In all previous code, we’ve been discarding them.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> f = open('Data/portfolio.csv')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> headers = next(rows)
|
||||||
|
>>> headers
|
||||||
|
['name', 'shares', 'price']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
|
||||||
|
First try this to pair the file headers with a row of data:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> row = next(rows)
|
||||||
|
>>> row
|
||||||
|
['AA', '100', '32.20']
|
||||||
|
>>> list(zip(headers, row))
|
||||||
|
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice how `zip()` paired the column headers with the column values.
|
||||||
|
We’ve used `list()` here to turn the result into a list so that you
|
||||||
|
can see it. Normally, `zip()` creates an iterator that must be
|
||||||
|
consumed by a for-loop.
|
||||||
|
|
||||||
|
This pairing is just an intermediate step to building a dictionary. Now try this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> record = dict(zip(headers, row))
|
||||||
|
>>> record
|
||||||
|
{'price': '32.20', 'name': 'AA', 'shares': '100'}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This transformation is one of the most useful tricks to know about
|
||||||
|
when processing a lot of data files. For example, suppose you wanted
|
||||||
|
to make the `pcost.py` program work with various input files, but
|
||||||
|
without regard for the actual column number where the name, shares,
|
||||||
|
and price appear.
|
||||||
|
|
||||||
|
Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# pcost.py
|
||||||
|
|
||||||
|
def portfolio_cost(filename):
|
||||||
|
...
|
||||||
|
for rowno, row in enumerate(rows, start=1):
|
||||||
|
record = dict(zip(headers, row))
|
||||||
|
try:
|
||||||
|
nshares = int(record['shares'])
|
||||||
|
price = float(record['price'])
|
||||||
|
total_cost += nshares * price
|
||||||
|
# This catches errors in int() and float() conversions above
|
||||||
|
except ValueError:
|
||||||
|
print(f'Row {rowno}: Bad row: {row}')
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
name,date,time,shares,price
|
||||||
|
"AA","6/11/2007","9:50am",100,32.20
|
||||||
|
"IBM","5/13/2007","4:20pm",50,91.10
|
||||||
|
"CAT","9/23/2006","1:30pm",150,83.44
|
||||||
|
"MSFT","5/17/2007","10:30am",200,51.23
|
||||||
|
"GE","2/1/2006","10:45am",95,40.37
|
||||||
|
"MSFT","10/31/2006","12:05pm",50,65.10
|
||||||
|
"IBM","7/9/2006","3:15pm",100,70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio_cost('Data/portfoliodate.csv')
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you did it right, you’ll find that your program still works even
|
||||||
|
though the data file has a completely different column format than
|
||||||
|
before. That’s cool!
|
||||||
|
|
||||||
|
The change made here is subtle, but significant. Instead of
|
||||||
|
`portfolio_cost()` being hardcoded to read a single fixed file format,
|
||||||
|
the new version reads any CSV file and picks the values of interest
|
||||||
|
out of it. As long as the file has the required columns, the code will work.
|
||||||
|
|
||||||
|
Modify the `report.py` program you wrote in Section 2.3 that it uses
|
||||||
|
the same technique to pick out column headers.
|
||||||
|
|
||||||
|
Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
|
||||||
|
produces the same answer as before.
|
||||||
|
|
||||||
|
### (e) Inverting a dictionary
|
||||||
|
|
||||||
|
A dictionary maps keys to values. For example, a dictionary of stock prices.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> prices = {
|
||||||
|
'GOOG' : 490.1,
|
||||||
|
'AA' : 23.45,
|
||||||
|
'IBM' : 91.1,
|
||||||
|
'MSFT' : 34.23
|
||||||
|
}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you use the `items()` method, you can get `(key,value)` pairs:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> prices.items()
|
||||||
|
dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
However, what if you wanted to get a list of `(value, key)` pairs instead?
|
||||||
|
*Hint: use `zip()`.*
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> pricelist = list(zip(prices.values(),prices.keys()))
|
||||||
|
>>> pricelist
|
||||||
|
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> min(pricelist)
|
||||||
|
(23.45, 'AA')
|
||||||
|
>>> max(pricelist)
|
||||||
|
(490.1, 'GOOG')
|
||||||
|
>>> sorted(pricelist)
|
||||||
|
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This also illustrates an important feature of tuples. When used in
|
||||||
|
comparisons, tuples are compared element-by-element starting with the
|
||||||
|
first item. Similar to how strings are compared
|
||||||
|
character-by-character.
|
||||||
|
|
||||||
|
`zip()` is often used in situations like this where you need to pair
|
||||||
|
up data from different places. For example, pairing up the column
|
||||||
|
names with column values in order to make a dictionary of named
|
||||||
|
values.
|
||||||
|
|
||||||
|
Note that `zip()` is not limited to pairs. For example, you can use it
|
||||||
|
with any number of input lists:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [1, 2, 3, 4]
|
||||||
|
>>> b = ['w', 'x', 'y', 'z']
|
||||||
|
>>> c = [0.2, 0.4, 0.6, 0.8]
|
||||||
|
>>> list(zip(a, b, c))
|
||||||
|
[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [1, 2, 3, 4, 5, 6]
|
||||||
|
>>> b = ['x', 'y', 'z']
|
||||||
|
>>> list(zip(a,b))
|
||||||
|
[(1, 'x'), (2, 'y'), (3, 'z')]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](05_Collections)
|
||||||
160
Notes/02_Working_with_data/05_Collections.md
Normal file
160
Notes/02_Working_with_data/05_Collections.md
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
# 2.5 `collections` module
|
||||||
|
|
||||||
|
The `collections` module provides a number of useful objects for data handling.
|
||||||
|
This part briefly introduces some of these features.
|
||||||
|
|
||||||
|
### Example: Counting Things
|
||||||
|
|
||||||
|
Let's say you want to tabulate the total shares of each stock.
|
||||||
|
|
||||||
|
```python
|
||||||
|
portfolio = [
|
||||||
|
('GOOG', 100, 490.1),
|
||||||
|
('IBM', 50, 91.1),
|
||||||
|
('CAT', 150, 83.44),
|
||||||
|
('IBM', 100, 45.23),
|
||||||
|
('GOOG', 75, 572.45),
|
||||||
|
('AA', 50, 23.15)
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
There are two `IBM` entries and two `GOOG` entries in this list. The shares need to be combined together somehow.
|
||||||
|
|
||||||
|
Solution: Use a `Counter`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from collections import Counter
|
||||||
|
total_shares = Counter()
|
||||||
|
for name, shares, price in portfolio:
|
||||||
|
total_shares[name] += shares
|
||||||
|
|
||||||
|
total_shares['IBM'] # 150
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: One-Many Mappings
|
||||||
|
|
||||||
|
Problem: You want to map a key to multiple values.
|
||||||
|
|
||||||
|
```python
|
||||||
|
portfolio = [
|
||||||
|
('GOOG', 100, 490.1),
|
||||||
|
('IBM', 50, 91.1),
|
||||||
|
('CAT', 150, 83.44),
|
||||||
|
('IBM', 100, 45.23),
|
||||||
|
('GOOG', 75, 572.45),
|
||||||
|
('AA', 50, 23.15)
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Like in the previous example, the key `IBM` should have two different tuples instead.
|
||||||
|
|
||||||
|
Solution: Use a `defaultdict`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from collections import defaultdict
|
||||||
|
holdings = defaultdict(list)
|
||||||
|
for name, shares, price in portfolio:
|
||||||
|
holdings[name].append((shares, price))
|
||||||
|
holdings['IBM'] # [ (50, 91.1), (100, 45.23) ]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `defaultdict` ensures that every time you access a key you get a default value.
|
||||||
|
|
||||||
|
### Example: Keeping a History
|
||||||
|
|
||||||
|
Problem: We want a history of the last N things.
|
||||||
|
Solution: Use a `deque`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from collections import deque
|
||||||
|
|
||||||
|
history = deque(maxlen=N)
|
||||||
|
with open(filename) as f:
|
||||||
|
for line in f:
|
||||||
|
history.append(line)
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
The `collections` module might be one of the most useful library
|
||||||
|
modules for dealing with special purpose kinds of data handling
|
||||||
|
problems such as tabulating and indexing.
|
||||||
|
|
||||||
|
In this exercise, we’ll look at a few simple examples. Start by
|
||||||
|
running your `report.py` program so that you have the portfolio of
|
||||||
|
stocks loaded in the interactive mode.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 -i report.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### (a) Tabulating with Counters
|
||||||
|
|
||||||
|
Suppose you wanted to tabulate the total number of shares of each stock.
|
||||||
|
This is easy using `Counter` objects. Try it:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> from collections import Counter
|
||||||
|
>>> holdings = Counter()
|
||||||
|
>>> for s in portfolio:
|
||||||
|
holdings[s['name']] += s['shares']
|
||||||
|
|
||||||
|
>>> holdings
|
||||||
|
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Carefully observe how the multiple entries for `MSFT` and `IBM` in `portfolio` get combined into a single entry here.
|
||||||
|
|
||||||
|
You can use a Counter just like a dictionary to retrieve individual values:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> holdings['IBM']
|
||||||
|
150
|
||||||
|
>>> holdings['MSFT']
|
||||||
|
250
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to rank the values, do this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> # Get three most held stocks
|
||||||
|
>>> holdings.most_common(3)
|
||||||
|
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Let’s grab another portfolio of stocks and make a new Counter:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
|
||||||
|
>>> holdings2 = Counter()
|
||||||
|
>>> for s in portfolio2:
|
||||||
|
holdings2[s['name']] += s['shares']
|
||||||
|
|
||||||
|
>>> holdings2
|
||||||
|
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, let’s combine all of the holdings doing one simple operation:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> holdings
|
||||||
|
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||||
|
>>> holdings2
|
||||||
|
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
|
||||||
|
>>> combined = holdings + holdings2
|
||||||
|
>>> combined
|
||||||
|
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This is only a small taste of what counters provide. However, if you
|
||||||
|
ever find yourself needing to tabulate values, you should consider
|
||||||
|
using one.
|
||||||
|
|
||||||
|
[Next](06_List_comprehension)
|
||||||
316
Notes/02_Working_with_data/06_List_comprehension.md
Normal file
316
Notes/02_Working_with_data/06_List_comprehension.md
Normal file
@@ -0,0 +1,316 @@
|
|||||||
|
# 2.6 List Comprehensions
|
||||||
|
|
||||||
|
A common task is processing items in a list. This section introduces list comprehensions,
|
||||||
|
a useful tool for doing just that.
|
||||||
|
|
||||||
|
### Creating new lists
|
||||||
|
|
||||||
|
A list comprehension creates a new list by applying an operation to each element of a sequence.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [1, 2, 3, 4, 5]
|
||||||
|
>>> b = [2*x for x in a ]
|
||||||
|
>>> b
|
||||||
|
[2, 4, 6, 8, 10]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Another example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> names = ['Elwood', 'Jake']
|
||||||
|
>>> a = [name.lower() for name in names]
|
||||||
|
>>> a
|
||||||
|
['elwood', 'jake']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The general syntax is: `[ <expression> for <variable_name> in <sequence> ]`.
|
||||||
|
|
||||||
|
### Filtering
|
||||||
|
|
||||||
|
You can also filter during the list comprehension.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [1, -5, 4, 2, -2, 10]
|
||||||
|
>>> b = [2*x for x in a if x > 0 ]
|
||||||
|
>>> b
|
||||||
|
[2, 8, 4, 20]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Use cases
|
||||||
|
|
||||||
|
List comprehensions are hugely useful. For example, you can collect values of a specific
|
||||||
|
record field:
|
||||||
|
|
||||||
|
```python
|
||||||
|
stocknames = [s['name'] for s in stocks]
|
||||||
|
```
|
||||||
|
|
||||||
|
You can perform database-like queries on sequences.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also combine a list comprehension with a sequence reduction:
|
||||||
|
|
||||||
|
```python
|
||||||
|
cost = sum([s['shares']*s['price'] for s in stocks])
|
||||||
|
```
|
||||||
|
|
||||||
|
### General Syntax
|
||||||
|
|
||||||
|
```code
|
||||||
|
[ <expression> for <variable_name> in <sequence> if <condition>]
|
||||||
|
```
|
||||||
|
|
||||||
|
What it means:
|
||||||
|
|
||||||
|
```python
|
||||||
|
result = []
|
||||||
|
for variable_name in sequence:
|
||||||
|
if condition:
|
||||||
|
result.append(expression)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Historical Digression
|
||||||
|
|
||||||
|
List comprehension come from math (set-builder notation).
|
||||||
|
|
||||||
|
```code
|
||||||
|
a = [ x * x for x in s if x > 0 ] # Python
|
||||||
|
|
||||||
|
a = { x^2 | x ∈ s, x > 0 } # Math
|
||||||
|
```
|
||||||
|
|
||||||
|
It is also implemented in several other languages. Most
|
||||||
|
coders probably aren't thinking about their math class though. So,
|
||||||
|
it's fine to view it as a cool list shortcut.
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
Start by running your `report.py` program so that you have the portfolio of stocks loaded in the interactive mode.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 -i report.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, at the Python interactive prompt, type statements to perform the operations described below.
|
||||||
|
These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.
|
||||||
|
|
||||||
|
### (a) List comprehensions
|
||||||
|
|
||||||
|
Try a few simple list comprehensions just to become familiar with the syntax.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> nums = [1,2,3,4]
|
||||||
|
>>> squares = [ x * x for x in nums ]
|
||||||
|
>>> squares
|
||||||
|
[1, 4, 9, 16]
|
||||||
|
>>> twice = [ 2 * x for x in nums if x > 2 ]
|
||||||
|
>>> twice
|
||||||
|
[6, 8]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.
|
||||||
|
|
||||||
|
### (b) Sequence Reductions
|
||||||
|
|
||||||
|
Compute the total cost of the portfolio using a single Python statement.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> cost = sum([ s['shares'] * s['price'] for s in portfolio ])
|
||||||
|
>>> cost
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
After you have done that, show how you can compute the current value of the portfolio using a single statement.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> value = sum([ s['shares'] * prices[s['name']] for s in portfolio ])
|
||||||
|
>>> value
|
||||||
|
28686.1
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> [ s['shares'] * s['price'] for s in portfolio ]
|
||||||
|
[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The `sum()` function is then performing a reduction across the result:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> sum(_)
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
With this knowledge, you are now ready to go launch a big-data startup company.
|
||||||
|
|
||||||
|
### (c) Data Queries
|
||||||
|
|
||||||
|
Try the following examples of various data queries.
|
||||||
|
|
||||||
|
First, a list of all portfolio holdings with more than 100 shares.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> more100 = [ s for s in portfolio if s['shares'] > 100 ]
|
||||||
|
>>> more100
|
||||||
|
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
All portfolio holdings for MSFT and IBM stocks.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> msftibm = [ s for s in portfolio if s['name'] in {'MSFT','IBM'} ]
|
||||||
|
>>> msftibm
|
||||||
|
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200},
|
||||||
|
{'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
A list of all portfolio holdings that cost more than $10000.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> cost10k = [ s for s in portfolio if s['shares'] * s['price'] > 10000 ]
|
||||||
|
>>> cost10k
|
||||||
|
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (d) Data Extraction
|
||||||
|
|
||||||
|
Show how you could build a list of tuples `(name, shares)` where `name` and `shares` are taken from `portfolio`.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> name_shares =[ (s['name'], s['shares']) for s in portfolio ]
|
||||||
|
>>> name_shares
|
||||||
|
[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you change the the square brackets (`[`,`]`) to curly braces (`{`, `}`), you get something known as a set comprehension.
|
||||||
|
This gives you unique or distinct values.
|
||||||
|
|
||||||
|
For example, this determines the set of stock names that appear in `portfolio`:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> names = { s['name'] for s in portfolio }
|
||||||
|
>>> names
|
||||||
|
{ 'AA', 'GE', 'IBM', 'MSFT', 'CAT'] }
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you specify `key:value` pairs, you can build a dictionary.
|
||||||
|
For example, make a dictionary that maps the name of a stock to the total number of shares held.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> holdings = { name: 0 for name in names }
|
||||||
|
>>> holdings
|
||||||
|
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This latter feature is known as a **dictionary comprehension**. Let’s tabulate:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> for s in portfolio:
|
||||||
|
holdings[s['name']] += s['shares']
|
||||||
|
|
||||||
|
>>> holdings
|
||||||
|
{ 'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT':250, 'CAT': 150 }
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try this example that filters the `prices` dictionary down to only those names that appear in the portfolio:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio_prices = { name: prices[name] for name in names }
|
||||||
|
>>> portfolio_prices
|
||||||
|
{'AA': 9.22, 'GE': 13.48, 'IBM': 106.28, 'MSFT': 20.89, 'CAT': 35.46}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (e) Advanced Bonus: Extracting Data From CSV Files
|
||||||
|
|
||||||
|
Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing.
|
||||||
|
Here’s an example that shows how to extract selected columns from a CSV file.
|
||||||
|
|
||||||
|
First, read a row of header information from a CSV file:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import csv
|
||||||
|
>>> f = open('Data/portfoliodate.csv')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> headers = next(rows)
|
||||||
|
>>> headers
|
||||||
|
['name', 'date', 'time', 'shares', 'price']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, define a variable that lists the columns that you actually care about:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> select = ['name', 'shares', 'price']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, locate the indices of the above columns in the source CSV file:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> indices = [ headers.index(colname) for colname in select ]
|
||||||
|
>>> indices
|
||||||
|
[0, 3, 4]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> row = next(rows)
|
||||||
|
>>> record = { colname: row[index] for colname, index in zip(select, indices) } # dict-comprehension
|
||||||
|
>>> record
|
||||||
|
{'price': '32.20', 'name': 'AA', 'shares': '100'}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you’re feeling comfortable with what just happened, read the rest
|
||||||
|
of the file:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ]
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'},
|
||||||
|
{'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'},
|
||||||
|
{'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Oh my, you just reduced much of the `read_portfolio()` function to a single statement.
|
||||||
|
|
||||||
|
### Commentary
|
||||||
|
|
||||||
|
List comprehensions are commonly used in Python as an efficient means
|
||||||
|
for transforming, filtering, or collecting data. Due to the syntax,
|
||||||
|
you don’t want to go overboard—try to keep each list comprehension as
|
||||||
|
simple as possible. It’s okay to break things into multiple
|
||||||
|
steps. For example, it’s not clear that you would want to spring that
|
||||||
|
last example on your unsuspecting co-workers.
|
||||||
|
|
||||||
|
That said, knowing how to quickly manipulate data is a skill that’s
|
||||||
|
incredibly useful. There are numerous situations where you might have
|
||||||
|
to solve some kind of one-off problem involving data imports, exports,
|
||||||
|
extraction, and so forth. Becoming a guru master of list
|
||||||
|
comprehensions can substantially reduce the time spent devising a
|
||||||
|
solution. Also, don't forget about the `collections` module.
|
||||||
|
|
||||||
|
[Next](07_Objects)
|
||||||
408
Notes/02_Working_with_data/07_Objects.md
Normal file
408
Notes/02_Working_with_data/07_Objects.md
Normal file
@@ -0,0 +1,408 @@
|
|||||||
|
# 2.7 Objects
|
||||||
|
|
||||||
|
This section introduces more details about Python's internal object model and
|
||||||
|
discusses some matters related to memory management, copying, and type checking.
|
||||||
|
|
||||||
|
### Assignment
|
||||||
|
|
||||||
|
Many operations in Python are related to *assigning* or *storing* values.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = value # Assignment to a variable
|
||||||
|
s[n] = value # Assignment to an list
|
||||||
|
s.append(value) # Appending to a list
|
||||||
|
d['key'] = value # Adding to a dictionary
|
||||||
|
```
|
||||||
|
|
||||||
|
*A caution: assignment operations **never make a copy** of the value being assigned.*
|
||||||
|
All assignments are merely reference copies (or pointer copies if you prefer).
|
||||||
|
|
||||||
|
### Assignment example
|
||||||
|
|
||||||
|
Consider this code fragment.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a = [1,2,3]
|
||||||
|
b = a
|
||||||
|
c = [a,b]
|
||||||
|
```
|
||||||
|
|
||||||
|
A picture of the underlying memory operations. In this example, there
|
||||||
|
is only one list object `[1,2,3]`, but there are four different
|
||||||
|
references to it.
|
||||||
|
|
||||||
|
This means that modifying a value affects *all* references.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a.append(999)
|
||||||
|
>>> a
|
||||||
|
[1,2,3,999]
|
||||||
|
>>> b
|
||||||
|
[1,2,3,999]
|
||||||
|
>>> c
|
||||||
|
[[1,2,3,999], [1,2,3,999]]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice how a change in the original list shows up everywhere else (yikes!).
|
||||||
|
This is because no copies were ever made. Everything is pointing to the same thing.
|
||||||
|
|
||||||
|
### Reassigning values
|
||||||
|
|
||||||
|
Reassigning a value *never* overwrites the memory used by the previous value.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
a = [1,2,3]
|
||||||
|
b = a
|
||||||
|
a = [4,5,6]
|
||||||
|
|
||||||
|
print(a) # [4, 5, 6]
|
||||||
|
print(b) # [1, 2, 3] Holds the original value
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember: **Variables are names, not memory locations.**
|
||||||
|
|
||||||
|
### Some Dangers
|
||||||
|
|
||||||
|
If you don't know about this sharing, you will shoot yourself in the
|
||||||
|
foot at some point. Typical scenario. You modify some data thinking
|
||||||
|
that it's your own private copy and it accidentally corrupts some data
|
||||||
|
in some other part of the program.
|
||||||
|
|
||||||
|
*Comment: This is one of the reasons why the primitive datatypes (int, float, string) are immutable (read-only).*
|
||||||
|
|
||||||
|
### Identity and References
|
||||||
|
|
||||||
|
Use ths `is` operator to check if two values are exactly the same object.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [1,2,3]
|
||||||
|
>>> b = a
|
||||||
|
>>> a is b
|
||||||
|
True
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
`is` compares the object identity (an integer). The identity can be
|
||||||
|
obtained using `id()`.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> id(a)
|
||||||
|
3588944
|
||||||
|
>>> id(b)
|
||||||
|
3588944
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Shallow copies
|
||||||
|
|
||||||
|
Lists and dicts have methods for copying.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [2,3,[100,101],4]
|
||||||
|
>>> b = list(a) # Make a copy
|
||||||
|
>>> a is b
|
||||||
|
False
|
||||||
|
```
|
||||||
|
|
||||||
|
It's a new list, but the list items are shared.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> a[2].append(102)
|
||||||
|
>>> b[2]
|
||||||
|
[100,101,102]
|
||||||
|
>>>
|
||||||
|
>>> a[2] is b[2]
|
||||||
|
True
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
For example, the inner list `[100, 101]` is being shared.
|
||||||
|
This is knows as a shallow copy.
|
||||||
|
|
||||||
|
### Deep copies
|
||||||
|
|
||||||
|
Sometimes you need to make a copy of an object and all the objects contained withn it.
|
||||||
|
You can use the `copy` module for this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = [2,3,[100,101],4]
|
||||||
|
>>> import copy
|
||||||
|
>>> b = copy.deepcopy(a)
|
||||||
|
>>> a[2].append(102)
|
||||||
|
>>> b[2]
|
||||||
|
[100,101]
|
||||||
|
>>> a[2] is b[2]
|
||||||
|
False
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Names, Values, Types
|
||||||
|
|
||||||
|
Variable names do not have a *type*. It's only a name.
|
||||||
|
However, values *do* have an underlying type.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> a = 42
|
||||||
|
>>> b = 'Hello World'
|
||||||
|
>>> type(a)
|
||||||
|
<type 'int'>
|
||||||
|
>>> type(b)
|
||||||
|
<type 'str'>
|
||||||
|
```
|
||||||
|
|
||||||
|
`type()` will tell you what it is. The type name is usually a function
|
||||||
|
that creates or converts a value to that type.
|
||||||
|
|
||||||
|
### Type Checking
|
||||||
|
|
||||||
|
How to tell if an object is a specific type.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if isinstance(a,list):
|
||||||
|
print('a is a list')
|
||||||
|
```
|
||||||
|
|
||||||
|
Checking for one of many types.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if isinstance(a, (list,tuple)):
|
||||||
|
print('a is a list or tuple')
|
||||||
|
```
|
||||||
|
|
||||||
|
*Caution: Don't go overboard with type checking. It can lead to excessive complexity.*
|
||||||
|
|
||||||
|
### Everything is an object
|
||||||
|
|
||||||
|
Numbers, strings, lists, functions, exceptions, classes, instances,
|
||||||
|
etc. are all objects. It means that all objects that can be named can
|
||||||
|
be passed around as data, placed in containers, etc., without any
|
||||||
|
restrictions. There are no *special* kinds of objects. Sometimes it
|
||||||
|
is said that all objects are "first-class".
|
||||||
|
|
||||||
|
A simple example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import math
|
||||||
|
>>> items = [abs, math, ValueError ]
|
||||||
|
>>> items
|
||||||
|
[<built-in function abs>,
|
||||||
|
<module 'math' (builtin)>,
|
||||||
|
<type 'exceptions.ValueError'>]
|
||||||
|
>>> items[0](-45)
|
||||||
|
45
|
||||||
|
>>> items[1].sqrt(2)
|
||||||
|
1.4142135623730951
|
||||||
|
>>> try:
|
||||||
|
x = int('not a number')
|
||||||
|
except items[2]:
|
||||||
|
print('Failed!')
|
||||||
|
Failed!
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, `items` is a list containing a function, a module and an exception.
|
||||||
|
You can use the items in the list in place of the original names:
|
||||||
|
|
||||||
|
```python
|
||||||
|
items[0](-45) # abs
|
||||||
|
items[1].sqrt(2) # math
|
||||||
|
except items[2]: # ValueError
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
In this set of exercises, we look at some of the power that comes from first-class
|
||||||
|
objects.
|
||||||
|
|
||||||
|
### (a) First-class Data
|
||||||
|
|
||||||
|
In the file `Data/portfolio.csv`, we read data organized as columns that look like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
name,shares,price
|
||||||
|
"AA",100,32.20
|
||||||
|
"IBM",50,91.10
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
In previous code, we used the `csv` module to read the file, but still had to perform manual type conversions. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
for row in rows:
|
||||||
|
name = row[0]
|
||||||
|
shares = int(row[1])
|
||||||
|
price = float(row[2])
|
||||||
|
```
|
||||||
|
|
||||||
|
This kind of conversion can also be performed in a more clever manner using some list basic operations.
|
||||||
|
|
||||||
|
Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types = [str, int, float]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The reason you can even create this list is that everything in Python
|
||||||
|
is *first-class*. So, if you want to have a list of functions, that’s
|
||||||
|
fine. The items in the list you created are functions for converting
|
||||||
|
a value `x` into a given type (e.g., `str(x)`, `int(x)`, `float(x)`).
|
||||||
|
|
||||||
|
Now, read a row of data from the above file:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import csv
|
||||||
|
>>> f = open('Data/portfolio.csv')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> headers = next(rows)
|
||||||
|
>>> row = next(rows)
|
||||||
|
>>> row
|
||||||
|
['AA', '100', '32.20']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
As noted, this row isn’t enough to do calculations because the types are wrong. For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> row[1] * row[2]
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
TypeError: can't multiply sequence by non-int of type 'str'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
However, maybe the data can be paired up with the types you specified in `types`. For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types[1]
|
||||||
|
<type 'int'>
|
||||||
|
>>> row[1]
|
||||||
|
'100'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try converting one of the values:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types[1](row[1]) # Same as int(row[1])
|
||||||
|
100
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try converting a different value:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types[2](row[2]) # Same as float(row[2])
|
||||||
|
32.2
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try the calculation with converted values:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types[1](row[1])*types[2](row[2])
|
||||||
|
3220.0000000000005
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Zip the column types with the fields and look at the result:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> r = list(zip(types, row))
|
||||||
|
>>> r
|
||||||
|
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
You will notice that this has paired a type conversion with a
|
||||||
|
value. For example, `int` is paired with the value `'100'`.
|
||||||
|
|
||||||
|
The zipped list is useful if you want to perform conversions on all of the values, one
|
||||||
|
after the other. Try this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> converted = []
|
||||||
|
>>> for func, val in zip(types, row):
|
||||||
|
converted.append(func(val))
|
||||||
|
...
|
||||||
|
>>> converted
|
||||||
|
['AA', 100, 32.2]
|
||||||
|
>>> converted[1] * converted[2]
|
||||||
|
3220.0000000000005
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Make sure you understand what’s happening in the above code.
|
||||||
|
In the loop, the `func` variable is one of the type conversion functions (e.g.,
|
||||||
|
`str`, `int`, etc.) and the `val` variable is one of the values like
|
||||||
|
`'AA'`, `'100'`. The expression `func(val)` is converting a value (kind of like a type cast).
|
||||||
|
|
||||||
|
The above code can be compressed into a single list comprehension.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> converted = [func(val) for func, val in zip(types, row)]
|
||||||
|
>>> converted
|
||||||
|
['AA', 100, 32.2]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) Making dictionaries
|
||||||
|
|
||||||
|
Remember how the `dict()` function can easily make a dictionary if you have a sequence of key names and values?
|
||||||
|
Let’s make a dictionary from the column headers:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> headers
|
||||||
|
['name', 'shares', 'price']
|
||||||
|
>>> converted
|
||||||
|
['AA', 100, 32.2]
|
||||||
|
>>> dict(zip(headers, converted))
|
||||||
|
{'price': 32.2, 'name': 'AA', 'shares': 100}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course, if you’re up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> { name: func(val) for name, func, val in zip(headers, types, row) }
|
||||||
|
{'price': 32.2, 'name': 'AA', 'shares': 100}
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) The Big Picture
|
||||||
|
|
||||||
|
Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary.
|
||||||
|
|
||||||
|
Just to illustrate, suppose you read data from a different datafile like this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> f = open('Data/dowstocks.csv')
|
||||||
|
>>> rows = csv.reader(f)
|
||||||
|
>>> headers = next(rows)
|
||||||
|
>>> row = next(rows)
|
||||||
|
>>> headers
|
||||||
|
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
|
||||||
|
>>> row
|
||||||
|
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Let’s convert the fields using a similar trick:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> types = [str, float, str, str, float, float, float, float, int]
|
||||||
|
>>> converted = [func(val) for func, val in zip(types, row)]
|
||||||
|
>>> record = dict(zip(headers, converted))
|
||||||
|
>>> record
|
||||||
|
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
|
||||||
|
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
|
||||||
|
'change': -0.18}
|
||||||
|
>>> record['name']
|
||||||
|
'AA'
|
||||||
|
>>> record['price']
|
||||||
|
39.48
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Spend some time to ponder what you’ve done in this exercise. We’ll revisit these ideas a little later.
|
||||||
11
Notes/03_Program_organization/00_Overview.md
Normal file
11
Notes/03_Program_organization/00_Overview.md
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
# Overview
|
||||||
|
|
||||||
|
In this section you will learn:
|
||||||
|
|
||||||
|
* How to organize larger programs.
|
||||||
|
* Defining and working with functions.
|
||||||
|
* Exceptions and Error handling.
|
||||||
|
* Basic module management.
|
||||||
|
* Script writing.
|
||||||
|
|
||||||
|
Python is great for short scripts, one-off problems, prototyping, testing, etc.
|
||||||
275
Notes/03_Program_organization/01_Script.md
Normal file
275
Notes/03_Program_organization/01_Script.md
Normal file
@@ -0,0 +1,275 @@
|
|||||||
|
# 3.1 Python Scripting
|
||||||
|
|
||||||
|
In this part we look more closely at the practice of writing Python
|
||||||
|
scripts.
|
||||||
|
|
||||||
|
### What is a Script?
|
||||||
|
|
||||||
|
A *script* is a program that runs a series of statements and stops.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# program.py
|
||||||
|
|
||||||
|
statement1
|
||||||
|
statement2
|
||||||
|
statement3
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
We have been writing scripts to this point.
|
||||||
|
|
||||||
|
### A Problem
|
||||||
|
|
||||||
|
If you write a useful script, it will grow in features and
|
||||||
|
functionality. You may want to apply it to other related problems.
|
||||||
|
Over time, it might become a critical application. And if you don't
|
||||||
|
take care, it might turn into a huge tangled mess. So, let's get
|
||||||
|
organized.
|
||||||
|
|
||||||
|
### Defining Things
|
||||||
|
|
||||||
|
You must always define things before they get used later on in a program.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def square(x):
|
||||||
|
return x*x
|
||||||
|
|
||||||
|
a = 42
|
||||||
|
b = a + 2 # Requires that `a` is defined
|
||||||
|
|
||||||
|
z = square(b) # Requires `square` and `b` to be defined
|
||||||
|
```
|
||||||
|
|
||||||
|
**The order is important.**
|
||||||
|
You almost always put the definitions of variables an functions near the beginning.
|
||||||
|
|
||||||
|
### Defining Functions
|
||||||
|
|
||||||
|
It is a good idea to put all of the code related to a single *task* all in one place.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_prices(filename):
|
||||||
|
prices = {}
|
||||||
|
with open(filename) as f:
|
||||||
|
f_csv = csv.reader(f)
|
||||||
|
for row in f_csv:
|
||||||
|
prices[row[0]] = float(row[1])
|
||||||
|
return prices
|
||||||
|
```
|
||||||
|
|
||||||
|
A function also simplifies repeated operations.
|
||||||
|
|
||||||
|
```python
|
||||||
|
oldprices = read_prices('oldprices.csv')
|
||||||
|
newprices = read_prices('newprices.csv')
|
||||||
|
```
|
||||||
|
|
||||||
|
### What is a Function?
|
||||||
|
|
||||||
|
A function is a named sequence of statements.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def funcname(args):
|
||||||
|
statement
|
||||||
|
statement
|
||||||
|
...
|
||||||
|
return result
|
||||||
|
```
|
||||||
|
|
||||||
|
*Any* Python statement can be used inside.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def foo():
|
||||||
|
import math
|
||||||
|
print(math.sqrt(2))
|
||||||
|
help(math)
|
||||||
|
```
|
||||||
|
|
||||||
|
There are no *special* statements in Python.
|
||||||
|
|
||||||
|
### Function Definition
|
||||||
|
|
||||||
|
Functions can be *defined* in any order.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def foo(x):
|
||||||
|
bar(x)
|
||||||
|
|
||||||
|
def bar(x):
|
||||||
|
statements
|
||||||
|
|
||||||
|
# OR
|
||||||
|
def bar(x)
|
||||||
|
statements
|
||||||
|
|
||||||
|
def foo(x):
|
||||||
|
bar(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
Functions must only be defined before they are actually *used* (or called) during program execution.
|
||||||
|
|
||||||
|
```python
|
||||||
|
foo(3) # foo must be defined already
|
||||||
|
```
|
||||||
|
|
||||||
|
Stylistically, it is probably more common to see functions defined in a *bottom-up* fashion.
|
||||||
|
|
||||||
|
### Bottom-up Style
|
||||||
|
|
||||||
|
Functions are treated as building blocks.
|
||||||
|
The smaller/simpler blocks go first.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# myprogram.py
|
||||||
|
def foo(x):
|
||||||
|
...
|
||||||
|
|
||||||
|
def bar(x):
|
||||||
|
...
|
||||||
|
foo(x) # Defined above
|
||||||
|
...
|
||||||
|
|
||||||
|
def spam(x):
|
||||||
|
...
|
||||||
|
bar(x) # Defined above
|
||||||
|
...
|
||||||
|
|
||||||
|
spam(42) # Code that uses the functions appears at the end
|
||||||
|
```
|
||||||
|
|
||||||
|
Later functions build upon earlier functions.
|
||||||
|
|
||||||
|
### Function Design
|
||||||
|
|
||||||
|
Ideally, functions should be a *black box*.
|
||||||
|
They should only operate on passed inputs and avoid global variables
|
||||||
|
and mysterious side-effects. Main goals: *Modularity* and *Predictability*.
|
||||||
|
|
||||||
|
### Doc Strings
|
||||||
|
|
||||||
|
A good practice is to include documentations in the form of
|
||||||
|
doc-strings. Doc-strings are strings written immediately after the
|
||||||
|
name of the function. They feed `help()`, IDEs and other tools.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_prices(filename):
|
||||||
|
'''
|
||||||
|
Read prices from a CSV file of name,price
|
||||||
|
'''
|
||||||
|
prices = {}
|
||||||
|
with open(filename) as f:
|
||||||
|
f_csv = csv.reader(f)
|
||||||
|
for row in f_csv:
|
||||||
|
prices[row[0]] = float(row[1])
|
||||||
|
return prices
|
||||||
|
```
|
||||||
|
|
||||||
|
### Type Annotations
|
||||||
|
|
||||||
|
You can also add some optional type annotations to your function definitions.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_prices(filename: str) -> dict:
|
||||||
|
'''
|
||||||
|
Read prices from a CSV file of name,price
|
||||||
|
'''
|
||||||
|
prices = {}
|
||||||
|
with open(filename) as f:
|
||||||
|
f_csv = csv.reader(f)
|
||||||
|
for row in f_csv:
|
||||||
|
prices[row[0]] = float(row[1])
|
||||||
|
return prices
|
||||||
|
```
|
||||||
|
|
||||||
|
These do nothing. It is purely informational.
|
||||||
|
They may be used by IDEs, code checkers, etc.
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
In section 2, you wrote a program called `report.py` that printed out a report showing the performance of a stock portfolio.
|
||||||
|
This program consisted of some functions. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# report.py
|
||||||
|
import csv
|
||||||
|
|
||||||
|
def read_portfolio(filename):
|
||||||
|
'''
|
||||||
|
Read a stock portfolio file into a list of dictionaries with keys
|
||||||
|
name, shares, and price.
|
||||||
|
'''
|
||||||
|
portfolio = []
|
||||||
|
with open(filename) as f:
|
||||||
|
rows = csv.reader(f)
|
||||||
|
headers = next(rows)
|
||||||
|
|
||||||
|
for row in rows:
|
||||||
|
record = dict(zip(headers, row))
|
||||||
|
stock = {
|
||||||
|
'name' : record['name'],
|
||||||
|
'shares' : int(record['shares']),
|
||||||
|
'price' : float(record['price'])
|
||||||
|
}
|
||||||
|
portfolio.append(stock)
|
||||||
|
return portfolio
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
However, there were also portions of the program that just performed a series of scripted calculations.
|
||||||
|
This code appeared near the end of the program. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
...
|
||||||
|
|
||||||
|
# Output the report
|
||||||
|
|
||||||
|
headers = ('Name', 'Shares', 'Price', 'Change')
|
||||||
|
print('%10s %10s %10s %10s' % headers)
|
||||||
|
print(('-' * 10 + ' ') * len(headers))
|
||||||
|
for row in report:
|
||||||
|
print('%10s %10d %10.2f %10.2f' % row)
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
In this exercise, we’re going take this program and organize it a little more strongly around the use of functions.
|
||||||
|
|
||||||
|
### (a) Structuring a program as a collection of functions
|
||||||
|
|
||||||
|
Modify your `report.py` program so that all major operations,
|
||||||
|
including calculations and output, are carried out by a collection of
|
||||||
|
functions. Specifically:
|
||||||
|
|
||||||
|
* Create a function `print_report(report)` that prints out the report.
|
||||||
|
* Change the last part of the program so that it is nothing more than a series of function calls and no other computation.
|
||||||
|
|
||||||
|
### (b) Creating a function for program execution
|
||||||
|
|
||||||
|
Take the last part of your program and package it into a single function `portfolio_report(portfolio_filename, prices_filename)`.
|
||||||
|
Have the function work so that the following function call creates the report as before:
|
||||||
|
|
||||||
|
```python
|
||||||
|
portfolio_report('Data/portfolio.csv', 'Data/prices.csv')
|
||||||
|
```
|
||||||
|
|
||||||
|
In this final version, your program will be nothing more than a series
|
||||||
|
of function definitions followed by a single function call to
|
||||||
|
`portfolio_report()` at the very end (which executes all of the steps
|
||||||
|
involved in the program).
|
||||||
|
|
||||||
|
By turning your program into a single function, it becomes easy to run it on different inputs.
|
||||||
|
For example, try these statements interactively after running your program:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv')
|
||||||
|
... look at the output ...
|
||||||
|
>>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv']
|
||||||
|
>>> for name in files:
|
||||||
|
print(f'{name:-^43s}')
|
||||||
|
portfolio_report(name, 'prices.csv')
|
||||||
|
print()
|
||||||
|
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](02_More_functions)
|
||||||
491
Notes/03_Program_organization/02_More_functions.md
Normal file
491
Notes/03_Program_organization/02_More_functions.md
Normal file
@@ -0,0 +1,491 @@
|
|||||||
|
# 3.2 More on Functions
|
||||||
|
|
||||||
|
This section fills in a few more details about how functions work and are defined.
|
||||||
|
|
||||||
|
### Calling a Function
|
||||||
|
|
||||||
|
Consider this function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_prices(filename, debug):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
You can call the function with positional arguments:
|
||||||
|
|
||||||
|
```
|
||||||
|
prices = read_prices('prices.csv', True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Or you can call the function with keyword arguments:
|
||||||
|
|
||||||
|
```python
|
||||||
|
prices = read_prices(filename='prices.csv', debug=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Default Arguments
|
||||||
|
|
||||||
|
Sometimes you want an optional argument.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_prices(filename, debug=False):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
If a default value is assigned, the argument is optional in function calls.
|
||||||
|
|
||||||
|
```python
|
||||||
|
d = read_prices('prices.csv')
|
||||||
|
e = read_prices('prices.dat', True)
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).*
|
||||||
|
|
||||||
|
### Prefer keyword arguments for optional arguments
|
||||||
|
|
||||||
|
Compare and contrast these two different calling styles:
|
||||||
|
|
||||||
|
```python
|
||||||
|
parse_data(data, False, True) # ?????
|
||||||
|
|
||||||
|
parse_data(data, ignore_errors=True)
|
||||||
|
parse_data(data, debug=True)
|
||||||
|
parse_data(data, debug=True, ignore_errors=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Keyword arguments improve code clarity.
|
||||||
|
|
||||||
|
### Design Best Practices
|
||||||
|
|
||||||
|
Always give short, but meaningful names to functions arguments.
|
||||||
|
|
||||||
|
Someone using a function may want to use the keyword calling style.
|
||||||
|
|
||||||
|
```python
|
||||||
|
d = read_prices('prices.csv', debug=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Python development tools will show the names in help features and documentation.
|
||||||
|
|
||||||
|
### Return Values
|
||||||
|
|
||||||
|
The `return` statement returns a value
|
||||||
|
|
||||||
|
```python
|
||||||
|
def square(x):
|
||||||
|
return x * x
|
||||||
|
```
|
||||||
|
|
||||||
|
If no return value or `return` not specified, `None` is returned.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def bar(x):
|
||||||
|
statements
|
||||||
|
return
|
||||||
|
|
||||||
|
a = bar(4) # a = None
|
||||||
|
|
||||||
|
# OR
|
||||||
|
def foo(x):
|
||||||
|
statements # No `return`
|
||||||
|
|
||||||
|
b = foo(4) # b = None
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multiple Return Values
|
||||||
|
|
||||||
|
Functions can only return one value.
|
||||||
|
However, a function may return multiple values by returning a tuple.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def divide(a,b):
|
||||||
|
q = a // b # Quotient
|
||||||
|
r = a % b # Remainder
|
||||||
|
return q, r # Return a tuple
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
x, y = divide(37,5) # x = 7, y = 2
|
||||||
|
|
||||||
|
x = divide(37, 5) # x = (7, 2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Variable Scope
|
||||||
|
|
||||||
|
Programs assign values to variables.
|
||||||
|
|
||||||
|
```python
|
||||||
|
x = value # Global variable
|
||||||
|
|
||||||
|
def foo():
|
||||||
|
y = value # Local variable
|
||||||
|
```
|
||||||
|
|
||||||
|
Variables assignments occur outside and inside function definitions.
|
||||||
|
Variables defined outside are "global". Variables inside a function are "local".
|
||||||
|
|
||||||
|
### Local Variables
|
||||||
|
|
||||||
|
Variables inside functions are private.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_portfolio(filename):
|
||||||
|
portfolio = []
|
||||||
|
for line in open(filename):
|
||||||
|
fields = line.split()
|
||||||
|
s = (fields[0],int(fields[1]),float(fields[2]))
|
||||||
|
portfolio.append(s)
|
||||||
|
return portfolio
|
||||||
|
```
|
||||||
|
|
||||||
|
In this example, `filename`, `portfolio`, `line`, `fields` and `s` are local variables.
|
||||||
|
Those variables are not retained or accessible after the function call.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> stocks = read_portfolio('stocks.dat')
|
||||||
|
>>> fields
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in ?
|
||||||
|
NameError: name 'fields' is not defined
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
They also can't conflict with variables found elsewhere.
|
||||||
|
|
||||||
|
### Global Variables
|
||||||
|
|
||||||
|
Functions can freely access the values of globals.
|
||||||
|
|
||||||
|
```python
|
||||||
|
name = 'Dave'
|
||||||
|
|
||||||
|
def greeting():
|
||||||
|
print('Hello', name) # Using `name` global variable
|
||||||
|
```
|
||||||
|
|
||||||
|
However, functions can't modify globals:
|
||||||
|
|
||||||
|
```python
|
||||||
|
name = 'Dave'
|
||||||
|
|
||||||
|
def spam():
|
||||||
|
name = 'Guido'
|
||||||
|
|
||||||
|
spam()
|
||||||
|
print(name) # prints 'Dave'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Remember: All assignments in functions are local.**
|
||||||
|
|
||||||
|
### Modifying Globals
|
||||||
|
|
||||||
|
If you must modify a global variable you must declare it as such.
|
||||||
|
|
||||||
|
```python
|
||||||
|
name = 'Dave'
|
||||||
|
def spam():
|
||||||
|
global name
|
||||||
|
name = 'Guido' # Changes the global name above
|
||||||
|
```
|
||||||
|
|
||||||
|
The global declaration must appear before its use. Having seen this,
|
||||||
|
know that it is considered poor form. In fact, try to avoid entirely
|
||||||
|
if you can. If you need a function to modify some kind of state outside
|
||||||
|
of the function, it's better to use a class instead (more on this later).
|
||||||
|
|
||||||
|
### Argument Passing
|
||||||
|
|
||||||
|
When you call a function, the argument variables are names for passed values.
|
||||||
|
If mutable data types are passed (e.g. lists, dicts), they can be modified *in-place*.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def foo(items):
|
||||||
|
items.append(42) # Modifies the input object
|
||||||
|
|
||||||
|
a = [1, 2, 3]
|
||||||
|
foo(a)
|
||||||
|
print(a) # [1, 2, 3, 42]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key point: Functions don't receive a copy of the input arguments.**
|
||||||
|
|
||||||
|
### Reassignment vs Modifying
|
||||||
|
|
||||||
|
Make sure you understand the subtle difference between modifying a value and reassigning a variable name.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def foo(items):
|
||||||
|
items.append(42) # Modifies the input object
|
||||||
|
|
||||||
|
a = [1, 2, 3]
|
||||||
|
foo(a)
|
||||||
|
print(a) # [1, 2, 3, 42]
|
||||||
|
|
||||||
|
# VS
|
||||||
|
def bar(items):
|
||||||
|
items = [4,5,6] # Reassigns `items` variable
|
||||||
|
|
||||||
|
b = [1, 2, 3]
|
||||||
|
bar(b)
|
||||||
|
print(b) # [1, 2, 3]
|
||||||
|
```
|
||||||
|
|
||||||
|
*Reminder: Variable assignment never overwrites memory. The name is simply bound to a new value.*
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
This exercise involves a lot of steps and putting concepts together from past exercises.
|
||||||
|
The final solution is only about 25 lines of code, but take your time and make sure you understand each part.
|
||||||
|
|
||||||
|
A central part of your `report.py` program focuses on the reading of
|
||||||
|
CSV files. For example, the function `read_portfolio()` reads a file
|
||||||
|
containing rows of portfolio data and the function `read_prices()`
|
||||||
|
reads a file containing rows of price data. In both of those
|
||||||
|
functions, there are a lot of low-level "fiddly" bits and similar
|
||||||
|
features. For example, they both open a file and wrap it with the
|
||||||
|
`csv` module and they both convert various fields into new types.
|
||||||
|
|
||||||
|
If you were doing a lot of file parsing for real, you’d probably want
|
||||||
|
to clean some of this up and make it more general purpose. That's
|
||||||
|
our goal.
|
||||||
|
|
||||||
|
Start this exercise by creating a new file called `fileparse.py`. This is where we will be doing our work.
|
||||||
|
|
||||||
|
### (a) Reading CSV Files
|
||||||
|
|
||||||
|
To start, let’s just focus on the problem of reading a CSV file into a
|
||||||
|
list of dictionaries. In the file `fileparse.py`, define a simple
|
||||||
|
function that looks like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# fileparse.py
|
||||||
|
import csv
|
||||||
|
|
||||||
|
def parse_csv(filename):
|
||||||
|
'''
|
||||||
|
Parse a CSV file into a list of records
|
||||||
|
'''
|
||||||
|
with open(filename) as f:
|
||||||
|
rows = csv.reader(f)
|
||||||
|
|
||||||
|
# Read the file headers
|
||||||
|
headers = next(rows)
|
||||||
|
records = []
|
||||||
|
for row in rows:
|
||||||
|
if not row: # Skip rows with no data
|
||||||
|
continue
|
||||||
|
record = dict(zip(headers, row))
|
||||||
|
records.append(record)
|
||||||
|
|
||||||
|
return records
|
||||||
|
```
|
||||||
|
|
||||||
|
This function reads a CSV file into a list of dictionaries while
|
||||||
|
hiding the details of opening the file, wrapping it with the `csv`
|
||||||
|
module, ignoring blank lines, and so forth.
|
||||||
|
|
||||||
|
Try it out:
|
||||||
|
|
||||||
|
Hint: `python3 -i fileparse.py`.
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = parse_csv('Data/portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
This is great except that you can’t do any kind of useful calculation with the data because everything is represented as a string.
|
||||||
|
We’ll fix this shortly, but let’s keep building on it.
|
||||||
|
|
||||||
|
### (b) Building a Column Selector
|
||||||
|
|
||||||
|
In many cases, you’re only interested in selected columns from a CSV file, not all of the data.
|
||||||
|
Modify the `parse_csv()` function so that it optionally allows user-specified columns to be picked out as follows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> # Read all of the data
|
||||||
|
>>> portfolio = parse_csv('Data/portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
|
||||||
|
|
||||||
|
>>> # Read some of the data
|
||||||
|
>>> shares_held = parse_csv('portfolio.csv', select=['name','shares'])
|
||||||
|
>>> shares_held
|
||||||
|
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
An example of a column selector was given in Section 2.5.
|
||||||
|
However, here’s one way to do it:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# fileparse.py
|
||||||
|
import csv
|
||||||
|
|
||||||
|
def parse_csv(filename, select=None):
|
||||||
|
'''
|
||||||
|
Parse a CSV file into a list of records
|
||||||
|
'''
|
||||||
|
with open(filename) as f:
|
||||||
|
rows = csv.reader(f)
|
||||||
|
|
||||||
|
# Read the file headers
|
||||||
|
headers = next(rows)
|
||||||
|
|
||||||
|
# If a column selector was given, find indices of the specified columns.
|
||||||
|
# Also narrow the set of headers used for resulting dictionaries
|
||||||
|
if select:
|
||||||
|
indices = [headers.index(colname) for colname in select]
|
||||||
|
headers = select
|
||||||
|
else:
|
||||||
|
indices = []
|
||||||
|
|
||||||
|
records = []
|
||||||
|
for row in rows:
|
||||||
|
if not row: # Skip rows with no data
|
||||||
|
continue
|
||||||
|
# Filter the row if specific columns were selected
|
||||||
|
if indices:
|
||||||
|
row = [ row[index] for index in indices ]
|
||||||
|
|
||||||
|
# Make a dictionary
|
||||||
|
record = dict(zip(headers, row))
|
||||||
|
records.append(record)
|
||||||
|
|
||||||
|
return records
|
||||||
|
```
|
||||||
|
|
||||||
|
There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices.
|
||||||
|
For example, suppose the input file had the following headers:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> headers = ['name', 'date', 'time', 'shares', 'price']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, suppose the selected columns were as follows:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> select = ['name', 'shares']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To perform the proper selection, you have to map the selected column names to column indices in the file.
|
||||||
|
That’s what this step is doing:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> indices = [headers.index(colname) for colname in select ]
|
||||||
|
>>> indices
|
||||||
|
[0, 3]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
In other words, "name" is column 0 and "shares" is column 3.
|
||||||
|
When you read a row of data from the file, the indices are used to filter it:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
|
||||||
|
>>> row = [ row[index] for index in indices ]
|
||||||
|
>>> row
|
||||||
|
['AA', '100']
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Performing Type Conversion
|
||||||
|
|
||||||
|
Modify the `parse_csv()` function so that it optionally allows type-conversions to be applied to the returned data.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
|
||||||
|
|
||||||
|
>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
|
||||||
|
>>> shares_held
|
||||||
|
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
You already explored this in Exercise 2.7. You'll need to insert the
|
||||||
|
following fragment of code into your solution:
|
||||||
|
|
||||||
|
```python
|
||||||
|
...
|
||||||
|
if types:
|
||||||
|
row = [func(val) for func, val in zip(types, row) ]
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### (d) Working with Headers
|
||||||
|
|
||||||
|
Some CSV files don’t include any header information.
|
||||||
|
For example, the file `prices.csv` looks like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
"AA",9.22
|
||||||
|
"AXP",24.85
|
||||||
|
"BA",44.85
|
||||||
|
"BAC",11.27
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify the `parse_csv()` function so that it can work with such files by creating a list of tuples instead.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
|
||||||
|
>>> prices
|
||||||
|
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To make this change, you’ll need to modify the code so that the first
|
||||||
|
line of data isn’t interpreted as a header line. Also, you’ll need to
|
||||||
|
make sure you don’t create dictionaries as there are no longer any
|
||||||
|
column names to use for keys.
|
||||||
|
|
||||||
|
### (e) Picking a different column delimitier
|
||||||
|
|
||||||
|
Although CSV files are pretty common, it’s also possible that you could encounter a file that uses a different column separator such as a tab or space.
|
||||||
|
For example, the file `Data/portfolio.dat` looks like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
name shares price
|
||||||
|
"AA" 100 32.20
|
||||||
|
"IBM" 50 91.10
|
||||||
|
"CAT" 150 83.44
|
||||||
|
"MSFT" 200 51.23
|
||||||
|
"GE" 95 40.37
|
||||||
|
"MSFT" 50 65.10
|
||||||
|
"IBM" 100 70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
The `csv.reader()` function allows a different delimiter to be given as follows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
rows = csv.reader(f, delimiter=' ')
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify your `parse_csv()` function so that it also allows the delimiter to be changed.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you’ve made it this far, you’ve created a nice library function that’s genuinely useful.
|
||||||
|
You can use it to parse arbitrary CSV files, select out columns of
|
||||||
|
interest, perform type conversions, without having to worry too much
|
||||||
|
about the inner workings of files or the `csv` module.
|
||||||
|
|
||||||
|
Nice!
|
||||||
|
|
||||||
|
[Next](03_Error_checking)
|
||||||
393
Notes/03_Program_organization/03_Error_checking.md
Normal file
393
Notes/03_Program_organization/03_Error_checking.md
Normal file
@@ -0,0 +1,393 @@
|
|||||||
|
# 3.3 Error Checking
|
||||||
|
|
||||||
|
This section discusses some aspects of error checking and exception handling.
|
||||||
|
|
||||||
|
### How programs fail
|
||||||
|
|
||||||
|
Python performs no checking or validation of function argument types or values.
|
||||||
|
A function will work on any data that is compatible with the statements in the function.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def add(x, y):
|
||||||
|
return x + y
|
||||||
|
|
||||||
|
add(3, 4) # 7
|
||||||
|
add('Hello', 'World') # 'HelloWorld'
|
||||||
|
add('3', '4') # '34'
|
||||||
|
```
|
||||||
|
|
||||||
|
If there are errors in a function, they will show up at run time (as an exception).
|
||||||
|
|
||||||
|
```python
|
||||||
|
def add(x, y):
|
||||||
|
return x + y
|
||||||
|
|
||||||
|
>>> add(3, '4')
|
||||||
|
Traceback (most recent call last):
|
||||||
|
...
|
||||||
|
TypeError: unsupported operand type(s) for +:
|
||||||
|
'int' and 'str'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
To verify code, there is a strong emphasis on testing (covered later).
|
||||||
|
|
||||||
|
### Exceptions
|
||||||
|
|
||||||
|
Exceptions are used to signal errors.
|
||||||
|
To raise an exception yourself, use `raise` statement.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if name not in names:
|
||||||
|
raise RuntimeError('Name not found')
|
||||||
|
```
|
||||||
|
|
||||||
|
To catch an exception use `try-except`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
authenticate(username)
|
||||||
|
except RuntimeError as e:
|
||||||
|
print(e)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exception Handling
|
||||||
|
|
||||||
|
Exceptions propagate to the first matching `except`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def grok():
|
||||||
|
...
|
||||||
|
raise RuntimeError('Whoa!') # Exception raised here
|
||||||
|
|
||||||
|
def spam():
|
||||||
|
grok() # Call that will raise exception
|
||||||
|
|
||||||
|
def bar():
|
||||||
|
try:
|
||||||
|
spam()
|
||||||
|
except RuntimeError as e: # Exception caught here
|
||||||
|
...
|
||||||
|
|
||||||
|
def foo():
|
||||||
|
try:
|
||||||
|
bar()
|
||||||
|
except RuntimeError as e: # Exception does NOT arrive here
|
||||||
|
...
|
||||||
|
|
||||||
|
foo()
|
||||||
|
```
|
||||||
|
|
||||||
|
To handle the exception, use the `except` block. You can add any statements you want to handle the error.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def grok(): ...
|
||||||
|
raise RuntimeError('Whoa!')
|
||||||
|
|
||||||
|
def bar():
|
||||||
|
try:
|
||||||
|
grok()
|
||||||
|
except RuntimeError as e: # Exception caught here
|
||||||
|
statements # Use this statements
|
||||||
|
statements
|
||||||
|
...
|
||||||
|
|
||||||
|
bar()
|
||||||
|
```
|
||||||
|
|
||||||
|
After handling, execution resumes with the first statement after the `try-except`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def grok(): ...
|
||||||
|
raise RuntimeError('Whoa!')
|
||||||
|
|
||||||
|
def bar():
|
||||||
|
try:
|
||||||
|
grok()
|
||||||
|
except RuntimeError as e: # Exception caught here
|
||||||
|
statements
|
||||||
|
statements
|
||||||
|
...
|
||||||
|
statements # Resumes execution here
|
||||||
|
statements # And continues here
|
||||||
|
...
|
||||||
|
|
||||||
|
bar()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Built-in Exceptions
|
||||||
|
|
||||||
|
There are about two-dozen built-in exceptions.
|
||||||
|
This is not an exhaustive list. Check the documentation for more.
|
||||||
|
|
||||||
|
```python
|
||||||
|
ArithmeticError
|
||||||
|
AssertionError
|
||||||
|
EnvironmentError
|
||||||
|
EOFError
|
||||||
|
ImportError
|
||||||
|
IndexError
|
||||||
|
KeyboardInterrupt
|
||||||
|
KeyError
|
||||||
|
MemoryError
|
||||||
|
NameError
|
||||||
|
ReferenceError
|
||||||
|
RuntimeError
|
||||||
|
SyntaxError
|
||||||
|
SystemError
|
||||||
|
TypeError
|
||||||
|
ValueError
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exception Values
|
||||||
|
|
||||||
|
Most exceptions have an associated value. It contains more information about what's wrong.
|
||||||
|
|
||||||
|
```python
|
||||||
|
raise RuntimeError('Invalid user name')
|
||||||
|
```
|
||||||
|
|
||||||
|
This value is passed to the variable supplied in `except`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
except RuntimeError as e: # `e` holds the value raised
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The value is an instance of the exception type. However, it often looks like a string when
|
||||||
|
printed.
|
||||||
|
|
||||||
|
```python
|
||||||
|
except RuntimeError as e:
|
||||||
|
print('Failed : Reason', e)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Catching Multiple Errors
|
||||||
|
|
||||||
|
You can catch different kinds of exceptions with multiple `except` blocks.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
except LookupError as e:
|
||||||
|
...
|
||||||
|
except RuntimeError as e:
|
||||||
|
...
|
||||||
|
except IOError as e:
|
||||||
|
...
|
||||||
|
except KeyboardInterrupt as e:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, if the block to handle them is the same, you can group them:
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
except (IOError,LookupError,RuntimeError) as e:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Catching All Errors
|
||||||
|
|
||||||
|
To catch any exception, use `Exception` like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
except Exception:
|
||||||
|
print('An error occurred')
|
||||||
|
```
|
||||||
|
|
||||||
|
In general, writing code like that is a bad idea because you'll have no idea
|
||||||
|
why it failed.
|
||||||
|
|
||||||
|
### Wrong Way to Catch Errors
|
||||||
|
|
||||||
|
Here is the wrong way to use exceptions.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
go_do_something()
|
||||||
|
except Exception:
|
||||||
|
print('Computer says no')
|
||||||
|
```
|
||||||
|
|
||||||
|
This swallows all possible errors. It may make it impossible to debug
|
||||||
|
when the code is failing for some reason you didn't expect at all
|
||||||
|
(e.g. uninstalled Python module, etc.).
|
||||||
|
|
||||||
|
### Somewhat Better Approach
|
||||||
|
|
||||||
|
This is a more sane approach.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
go_do_something()
|
||||||
|
except Exception as e:
|
||||||
|
print('Computer says no. Reason :', e)
|
||||||
|
```
|
||||||
|
|
||||||
|
It reports a specific reason for failure. It is almost always a good
|
||||||
|
idea to have some mechanism for viewing/reporting errors when you
|
||||||
|
write code that catches all possible exceptions.
|
||||||
|
|
||||||
|
In general though, it's better to catch the error more narrowly. Only
|
||||||
|
catch the errors you can actually deal with. Let other errors pass to
|
||||||
|
other code.
|
||||||
|
|
||||||
|
### Reraising an Exception
|
||||||
|
|
||||||
|
Use `raise` to propagate a caught error.
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
go_do_something()
|
||||||
|
except Exception as e:
|
||||||
|
print('Computer says no. Reason :', e)
|
||||||
|
raise
|
||||||
|
```
|
||||||
|
|
||||||
|
It allows you to take action (e.g. logging) and pass the error on to the caller.
|
||||||
|
|
||||||
|
### Exception Best Practices
|
||||||
|
|
||||||
|
Don't catch exceptions. Fail fast and loud. If it's important, someone
|
||||||
|
else will take care of the problem. Only catch an exception if you
|
||||||
|
are *that* someone. That is, only catch errors where you can recover
|
||||||
|
and sanely keep going.
|
||||||
|
|
||||||
|
### `finally` statement
|
||||||
|
|
||||||
|
It specifies code that must fun regardless of whether or not an exception occurs.
|
||||||
|
|
||||||
|
```python
|
||||||
|
lock = Lock()
|
||||||
|
...
|
||||||
|
lock.acquire()
|
||||||
|
try:
|
||||||
|
...
|
||||||
|
finally:
|
||||||
|
lock.release() # this will ALWAYS be executed. With and without exception.
|
||||||
|
```
|
||||||
|
|
||||||
|
Comonly used to properly manage resources (especially locks, files, etc.).
|
||||||
|
|
||||||
|
### `with` statement
|
||||||
|
|
||||||
|
In modern code, `try-finally` often replaced with the `with` statement.
|
||||||
|
|
||||||
|
```python
|
||||||
|
lock = Lock()
|
||||||
|
with lock:
|
||||||
|
# lock acquired
|
||||||
|
...
|
||||||
|
# lock released
|
||||||
|
```
|
||||||
|
|
||||||
|
A more familiar example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
with open(filename) as f:
|
||||||
|
# Use the file
|
||||||
|
...
|
||||||
|
# File closed
|
||||||
|
```
|
||||||
|
|
||||||
|
It defines a usage *context* for a resource. When execution leaves that context,
|
||||||
|
resources are released. `with` only works with certain objects.
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Raising exceptions
|
||||||
|
|
||||||
|
The `parse_csv()` function you wrote in the last section allows
|
||||||
|
user-specified columns to be selected, but that only works if the
|
||||||
|
input data file has column headers.
|
||||||
|
|
||||||
|
Modify the code so that an exception gets raised if both the `select`
|
||||||
|
and `has_headers=False` arguments are passed.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False)
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "fileparse.py", line 9, in parse_csv
|
||||||
|
raise RuntimeError("select argument requires column headers")
|
||||||
|
RuntimeError: select argument requires column headers
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Having added this one check, you might ask if you should be performing
|
||||||
|
other kinds of sanity checks in the function. For example, should you
|
||||||
|
check that the filename is a string, that types is a list, or anything
|
||||||
|
of that nature?
|
||||||
|
|
||||||
|
As a general rule, it’s usually best to skip such tests and to just
|
||||||
|
let the program fail on bad inputs. The traceback message will point
|
||||||
|
at the source of the problem and can assist in debugging.
|
||||||
|
|
||||||
|
The main reason for adding the above check to avoid running the code
|
||||||
|
in a non-sensical mode (e.g., using a feature that requires column
|
||||||
|
headers, but simultaneously specifying that there are no headers).
|
||||||
|
|
||||||
|
This indicates a programming error on the part of the calling code.
|
||||||
|
|
||||||
|
### (b) Catching exceptions
|
||||||
|
|
||||||
|
The `parse_csv()` function you wrote is used to process the entire
|
||||||
|
contents of a file. However, in the real-world, it’s possible that
|
||||||
|
input files might have corrupted, missing, or dirty data. Try this
|
||||||
|
experiment:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "fileparse.py", line 36, in parse_csv
|
||||||
|
row = [func(val) for func, val in zip(types, row)]
|
||||||
|
ValueError: invalid literal for int() with base 10: ''
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify the `parse_csv()` function to catch all `ValueError` exceptions
|
||||||
|
generated during record creation and print a warning message for rows
|
||||||
|
that can’t be converted.
|
||||||
|
|
||||||
|
The message should include the row number and information about the reason why it failed.
|
||||||
|
To test your function, try reading the file `Data/missing.csv` above.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
|
||||||
|
Row 4: Couldn't convert ['MSFT', '', '51.23']
|
||||||
|
Row 4: Reason invalid literal for int() with base 10: ''
|
||||||
|
Row 7: Couldn't convert ['IBM', '', '70.44']
|
||||||
|
Row 7: Reason invalid literal for int() with base 10: ''
|
||||||
|
>>>
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Silencing Errors
|
||||||
|
|
||||||
|
Modify the `parse_csv()` function so that parsing error messages can be silenced if explicitly desired by the user.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True)
|
||||||
|
>>> portfolio
|
||||||
|
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Error handling is one of the most difficult things to get right in
|
||||||
|
most programs. As a general rule, you shouldn’t silently ignore
|
||||||
|
errors. Instead, it’s better to report problems and to give the user
|
||||||
|
an option to the silence the error message if they choose to do so.
|
||||||
|
|
||||||
|
[Next](04_Modules)
|
||||||
317
Notes/03_Program_organization/04_Modules.md
Normal file
317
Notes/03_Program_organization/04_Modules.md
Normal file
@@ -0,0 +1,317 @@
|
|||||||
|
# 3.4 Modules
|
||||||
|
|
||||||
|
This section introduces the concept of modules.
|
||||||
|
|
||||||
|
### Modules and import
|
||||||
|
|
||||||
|
Any Python source file is a module.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# foo.py
|
||||||
|
def grok(a):
|
||||||
|
...
|
||||||
|
def spam(b):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The `import` statement loads and *executes* a module.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# program.py
|
||||||
|
import foo
|
||||||
|
|
||||||
|
a = foo.grok(2)
|
||||||
|
b = foo.spam('Hello')
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Namespaces
|
||||||
|
|
||||||
|
A module is a collection of named values and is sometimes said to be a *namespace*.
|
||||||
|
The names are all of the global variables and functions defined in the source file.
|
||||||
|
After importing, the module name is used as a prefix. Hence the *namespace*.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import foo
|
||||||
|
|
||||||
|
a = foo.grok(2)
|
||||||
|
b = foo.spam('Hello')
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The module name is tied to the file name (foo -> foo.py).
|
||||||
|
|
||||||
|
### Global Definitions
|
||||||
|
|
||||||
|
Everything defined in the *global* scope is what populates the module
|
||||||
|
namespace. `foo` in our previous example. Consider two modules
|
||||||
|
that define the same variable `x`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# foo.py
|
||||||
|
x = 42
|
||||||
|
def grok(a):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# bar.py
|
||||||
|
x = 37
|
||||||
|
def spam(a):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
In this case, the `x` definitions refer to different variables. One
|
||||||
|
is `foo.x` and the other is `bar.x`. Different modules can use the
|
||||||
|
same names and those names won't conflict with each other.
|
||||||
|
|
||||||
|
**Modules are isolated.**
|
||||||
|
|
||||||
|
### Modules as Environments
|
||||||
|
|
||||||
|
Modules form an enclosing environment for all of the code defined inside.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# foo.py
|
||||||
|
x = 42
|
||||||
|
|
||||||
|
def grok(a):
|
||||||
|
print(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
*Global* variables are always bound to the enclosing module (same file).
|
||||||
|
Each source file is its own little universe.
|
||||||
|
|
||||||
|
### Module Execution
|
||||||
|
|
||||||
|
When a module is imported, *all of the statements in the module
|
||||||
|
execute* one after another until the end of the file is reached. The
|
||||||
|
contents of the module namespace are all of the *global* names that
|
||||||
|
are still defined at the end of the execution process. If there are
|
||||||
|
scripting statements that carry out tasks in the global scope
|
||||||
|
(printing, creating files, etc.) you will see them run on import.
|
||||||
|
|
||||||
|
### `import as` statement
|
||||||
|
|
||||||
|
You can change the name of a module as you import it:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import math as m
|
||||||
|
def rectangular(r, theta):
|
||||||
|
x = r * m.cos(theta)
|
||||||
|
y = r * m.sin(theta)
|
||||||
|
return x, y
|
||||||
|
```
|
||||||
|
|
||||||
|
It works the same as a normal import. It just renames the module in that one file.
|
||||||
|
|
||||||
|
### `from` module import
|
||||||
|
|
||||||
|
This picks selected symbols out of a module and makes them available locally.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from math import sin, cos
|
||||||
|
|
||||||
|
def rectangular(r, theta):
|
||||||
|
x = r * cos(theta)
|
||||||
|
y = r * sin(theta)
|
||||||
|
return x, y
|
||||||
|
```
|
||||||
|
|
||||||
|
It allows parts of a module to be used without having to type the module prefix.
|
||||||
|
Useful for frequently used names.
|
||||||
|
|
||||||
|
### Comments on importing
|
||||||
|
|
||||||
|
Variations on import do *not* change the way that modules work.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import math as m
|
||||||
|
# vs
|
||||||
|
from math import cos, sin
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Specifically, `import` always executes the *entire* file and modules
|
||||||
|
are still isolated environments.
|
||||||
|
|
||||||
|
The `import module as` statement is only manipulating the names.
|
||||||
|
|
||||||
|
### Module Loading
|
||||||
|
|
||||||
|
Each module loads and executes only *once*.
|
||||||
|
*Note: Repeated imports just return a reference to the previously loaded module.*
|
||||||
|
|
||||||
|
`sys.modules` is a dict of all loaded modules.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import sys
|
||||||
|
>>> sys.modules.keys()
|
||||||
|
['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Locating Modules
|
||||||
|
|
||||||
|
Python consults a path list (sys.path) when looking for modules.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import sys
|
||||||
|
>>> sys.path
|
||||||
|
[
|
||||||
|
'',
|
||||||
|
'/usr/local/lib/python36/python36.zip',
|
||||||
|
'/usr/local/lib/python36',
|
||||||
|
...
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Current working directory is usually first.
|
||||||
|
|
||||||
|
### Module Search Path
|
||||||
|
|
||||||
|
`sys.path` contains the search paths.
|
||||||
|
|
||||||
|
You can manually adjust if you need to.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sys
|
||||||
|
sys.path.append('/project/foo/pyfiles')
|
||||||
|
```
|
||||||
|
|
||||||
|
Paths are also added via environment variables.
|
||||||
|
|
||||||
|
```python
|
||||||
|
% env PYTHONPATH=/project/foo/pyfiles python3
|
||||||
|
Python 3.6.0 (default, Feb 3 2017, 05:53:21)
|
||||||
|
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)]
|
||||||
|
>>> import sys
|
||||||
|
>>> sys.path
|
||||||
|
['','/project/foo/pyfiles', ...]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
For this exercise involving modules, it is critically important to
|
||||||
|
make sure you are running Python in a proper environment. Modules
|
||||||
|
are usually when programmers encounter problems with the current working
|
||||||
|
directory or with Python's path settings.
|
||||||
|
|
||||||
|
### (a) Module imports
|
||||||
|
|
||||||
|
In section 3, we created a general purpose function `parse_csv()` for parsing the contents of CSV datafiles.
|
||||||
|
|
||||||
|
Now, we’re going to see how to use that function in other programs.
|
||||||
|
First, start in a new shell window. Navigate to the folder where you
|
||||||
|
have all your files. We are going to import them.
|
||||||
|
|
||||||
|
Start Python interactive mode.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash % python3
|
||||||
|
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
|
||||||
|
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
|
||||||
|
Type "help", "copyright", "credits" or "license" for more information.
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Once you’ve done that, try importing some of the programs you
|
||||||
|
previously wrote. You should see their output exactly as before.
|
||||||
|
Just emphasize, importing a module runs its code.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import bounce
|
||||||
|
... watch output ...
|
||||||
|
>>> import mortgage
|
||||||
|
... watch output ...
|
||||||
|
>>> import report
|
||||||
|
... watch output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
If none of this works, you’re probably running Python in the wrong directory.
|
||||||
|
Now, try importing your `fileparse` module and getting some help on it.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import fileparse
|
||||||
|
>>> help(fileparse)
|
||||||
|
... look at the output ...
|
||||||
|
>>> dir(fileparse)
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try using the module to read some data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float])
|
||||||
|
>>> portfolio
|
||||||
|
... look at the output ...
|
||||||
|
>>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False)
|
||||||
|
>>> pricelist
|
||||||
|
... look at the output ...
|
||||||
|
>>> prices = dict(pricelist)
|
||||||
|
>>> prices
|
||||||
|
... look at the output ...
|
||||||
|
>>> prices['IBM']
|
||||||
|
106.11
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try importing a function so that you don’t need to include the module name:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from fileparse import parse_csv
|
||||||
|
>>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float])
|
||||||
|
>>> portfolio
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) Using your library module
|
||||||
|
|
||||||
|
In section 2, you wrote a program `report.py` that produced a stock report like this:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 39.91 7.71
|
||||||
|
IBM 50 106.11 15.01
|
||||||
|
CAT 150 78.58 -4.86
|
||||||
|
MSFT 200 30.47 -20.76
|
||||||
|
GE 95 37.38 -2.99
|
||||||
|
MSFT 50 30.47 -34.63
|
||||||
|
IBM 100 106.11 35.67
|
||||||
|
```
|
||||||
|
|
||||||
|
Take that program and modify it so that all of the input file
|
||||||
|
processing is done using functions in your `fileparse` module. To do
|
||||||
|
that, import `fileparse` as a module and change the `read_portfolio()`
|
||||||
|
and `read_prices()` functions to use the `parse_csv()` function.
|
||||||
|
|
||||||
|
Use the interactive example at the start of this exercise as a guide.
|
||||||
|
Afterwards, you should get exactly the same output as before.
|
||||||
|
|
||||||
|
### (c) Using more library imports
|
||||||
|
|
||||||
|
In section 1, you wrote a program `pcost.py` that read a portfolio and computed its cost.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import pcost
|
||||||
|
>>> pcost.portfolio_cost('Data/portfolio.csv')
|
||||||
|
44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify the `pcost.py` file so that it uses the `report.read_portfolio()` function.
|
||||||
|
|
||||||
|
### Commentary
|
||||||
|
|
||||||
|
When you are done with this exercise, you should have three
|
||||||
|
programs. `fileparse.py` which contains a general purpose
|
||||||
|
`parse_csv()` function. `report.py` which produces a nice report, but
|
||||||
|
also contains `read_portfolio()` and `read_prices()` functions. And
|
||||||
|
finally, `pcost.py` which computes the portfolio cost, but makes use
|
||||||
|
of the code written for the `report.py` program.
|
||||||
|
|
||||||
|
[Next](05_Main_module)
|
||||||
299
Notes/03_Program_organization/05_Main_module.md
Normal file
299
Notes/03_Program_organization/05_Main_module.md
Normal file
@@ -0,0 +1,299 @@
|
|||||||
|
# 3.5 Main Module
|
||||||
|
|
||||||
|
This section introduces the concept of a main program or main module.
|
||||||
|
|
||||||
|
### Main Functions
|
||||||
|
|
||||||
|
In many programming languages, there is a concept of a *main* function or method.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// c / c++
|
||||||
|
int main(int argc, char *argv[]) {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```java
|
||||||
|
// java
|
||||||
|
class myprog {
|
||||||
|
public static void main(String args[]) {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the first function that is being executing when an application is launched.
|
||||||
|
|
||||||
|
### Python Main Module
|
||||||
|
|
||||||
|
Python has no *main* function or method. Instead, there is a *main*
|
||||||
|
module. The *main module* is the source file that runs first.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 prog.py
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Whatever module you give to the interpreter at startup becomes *main*. It doesn't matter the name.
|
||||||
|
|
||||||
|
### `__main__` check
|
||||||
|
|
||||||
|
It is standard practice for modules that can run as a main script to use this convention:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# prog.py
|
||||||
|
...
|
||||||
|
if __name__ == '__main__':
|
||||||
|
# Running as the main program ...
|
||||||
|
statements
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Statements inclosed inside the `if` statement become the *main* program.
|
||||||
|
|
||||||
|
### Main programs vs. library imports
|
||||||
|
|
||||||
|
Any file can either run as main or as a library import:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 prog.py # Running as main
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
import prog
|
||||||
|
```
|
||||||
|
|
||||||
|
In both cases, `__name__` is the name of the module. However, it will only be set to `__main__` if
|
||||||
|
running as main.
|
||||||
|
|
||||||
|
As a general rule, you don't want statements that are part of the main
|
||||||
|
program to execute on a library import. So, it's common to have an `if-`check in code
|
||||||
|
that might be used either way.
|
||||||
|
|
||||||
|
```python
|
||||||
|
if __name__ == '__main__':
|
||||||
|
# Does not execute if loaded with import ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Program Template
|
||||||
|
|
||||||
|
Here is a common program template for writing a Python program:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# prog.py
|
||||||
|
# Import statements (libraries)
|
||||||
|
import modules
|
||||||
|
|
||||||
|
# Functions
|
||||||
|
def spam():
|
||||||
|
...
|
||||||
|
|
||||||
|
def blah():
|
||||||
|
...
|
||||||
|
|
||||||
|
# Main function
|
||||||
|
def main():
|
||||||
|
...
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Command Line Tools
|
||||||
|
|
||||||
|
Python is often used for command-line tools
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 report.py portfolio.csv prices.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
It means that the scripts are executed from the shell /
|
||||||
|
terminal. Common use cases are for automation, background tasks, etc.
|
||||||
|
|
||||||
|
### Command Line Args
|
||||||
|
|
||||||
|
The command line is a list of text strings.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 report.py portfolio.csv prices.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
This list of text strings is found in `sys.argv`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In the previous bash command
|
||||||
|
sys.argv # ['report.py, 'portfolio.csv', 'prices.csv']
|
||||||
|
```
|
||||||
|
|
||||||
|
Here is a simple example of processing the arguments:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) != 3:
|
||||||
|
raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile')
|
||||||
|
portfile = sys.argv[1]
|
||||||
|
pricefile = sys.argv[2]
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Standard I/O
|
||||||
|
|
||||||
|
Standard Input / Output (or stdio) are files that work the same as normal files.
|
||||||
|
|
||||||
|
```python
|
||||||
|
sys.stdout
|
||||||
|
sys.stderr
|
||||||
|
sys.stdin
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, print is directed to `sys.stdout`. Input is read from
|
||||||
|
`sys.stdin`. Tracebacks and errors are directed to `sys.stderr`.
|
||||||
|
|
||||||
|
Be aware that *stdio* could be connected to terminals, files, pipes, etc.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % python3 prog.py > results.txt
|
||||||
|
# or
|
||||||
|
bash % cmd1 | python3 prog.py | cmd2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
Environment variables are set in the shell.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % setenv NAME dave
|
||||||
|
bash % setenv RSH ssh
|
||||||
|
bash % python3 prog.py
|
||||||
|
```
|
||||||
|
|
||||||
|
`os.environ` is a dictionary that contains these values.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
|
||||||
|
name = os.environ['NAME'] # 'dave'
|
||||||
|
```
|
||||||
|
|
||||||
|
Changes are reflected in any subprocesses later launched by the program.
|
||||||
|
|
||||||
|
### Program Exit
|
||||||
|
|
||||||
|
Program exit is handled through exceptions.
|
||||||
|
|
||||||
|
```python
|
||||||
|
raise SystemExit
|
||||||
|
raise SystemExit(exitcode)
|
||||||
|
raise SystemExit('Informative message')
|
||||||
|
```
|
||||||
|
|
||||||
|
An alternative.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sys
|
||||||
|
sys.exit(exitcode)
|
||||||
|
```
|
||||||
|
|
||||||
|
A non-zero exit code indicates an error.
|
||||||
|
|
||||||
|
### The `#!` line
|
||||||
|
|
||||||
|
On Unix, the `#!` line can launch a script as Python.
|
||||||
|
Add the following to the first line of your script file.
|
||||||
|
|
||||||
|
```python
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# prog.py
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
It requires the executable permission.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash % chmod +x prog.py
|
||||||
|
# Then you can execute
|
||||||
|
bash % prog.py
|
||||||
|
... output ...
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note: The Python Launcher on Windows also looks for the `#!` line to indicate language version.*
|
||||||
|
|
||||||
|
### Script Template
|
||||||
|
|
||||||
|
Here is a common code template for Python programs that run as command-line scripts:
|
||||||
|
|
||||||
|
```python
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# prog.py
|
||||||
|
|
||||||
|
# Import statements (libraries)
|
||||||
|
import modules
|
||||||
|
|
||||||
|
# Functions
|
||||||
|
def spam():
|
||||||
|
...
|
||||||
|
|
||||||
|
def blah():
|
||||||
|
...
|
||||||
|
|
||||||
|
# Main function
|
||||||
|
def main(argv):
|
||||||
|
# Parse command line args, environment, etc.
|
||||||
|
...
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
import sys
|
||||||
|
main(sys.argv)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) `main()` functions
|
||||||
|
|
||||||
|
In the file `report.py` add a `main()` function that accepts a list of command line options and produces the same output as before.
|
||||||
|
You should be able to run it interatively like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import report
|
||||||
|
>>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv'])
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 39.91 7.71
|
||||||
|
IBM 50 106.11 15.01
|
||||||
|
CAT 150 78.58 -4.86
|
||||||
|
MSFT 200 30.47 -20.76
|
||||||
|
GE 95 37.38 -2.99
|
||||||
|
MSFT 50 30.47 -34.63
|
||||||
|
IBM 100 106.11 35.67
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify the `pcost.py` file so that it has a similar `main()` function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import pcost
|
||||||
|
>>> pcost.main(['pcost.py', 'Data/portfolio.csv'])
|
||||||
|
Total cost: 44671.15
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) Making Scripts
|
||||||
|
|
||||||
|
Modify the `report.py` and `pcost.py` programs so that they can execute as a script on the command line:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash $ python3 report.py Data/portfolio.csv Data/prices.csv
|
||||||
|
Name Shares Price Change
|
||||||
|
---------- ---------- ---------- ----------
|
||||||
|
AA 100 39.91 7.71
|
||||||
|
IBM 50 106.11 15.01
|
||||||
|
CAT 150 78.58 -4.86
|
||||||
|
MSFT 200 30.47 -20.76
|
||||||
|
GE 95 37.38 -2.99
|
||||||
|
MSFT 50 30.47 -34.63
|
||||||
|
IBM 100 106.11 35.67
|
||||||
|
|
||||||
|
bash $ python3 pcost.py Data/portfolio.csv
|
||||||
|
Total cost: 44671.15
|
||||||
|
```
|
||||||
132
Notes/03_Program_organization/06_Design_discussion.md
Normal file
132
Notes/03_Program_organization/06_Design_discussion.md
Normal file
@@ -0,0 +1,132 @@
|
|||||||
|
# 3.6 Design Discussion
|
||||||
|
|
||||||
|
In this section we consider some design decisions made in code so far.
|
||||||
|
|
||||||
|
### Filenames versus Iterables
|
||||||
|
|
||||||
|
Compare these two programs that return the same output.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Provide a filename
|
||||||
|
def read_data(filename):
|
||||||
|
records = []
|
||||||
|
with open(filename) as f:
|
||||||
|
for line in f:
|
||||||
|
...
|
||||||
|
records.append(r)
|
||||||
|
return records
|
||||||
|
|
||||||
|
d = read_data('file.csv')
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Provide lines
|
||||||
|
def read_data(lines):
|
||||||
|
records = []
|
||||||
|
for line in lines:
|
||||||
|
...
|
||||||
|
records.append(r)
|
||||||
|
return records
|
||||||
|
|
||||||
|
with open('file.csv') as f:
|
||||||
|
d = read_data(f)
|
||||||
|
```
|
||||||
|
|
||||||
|
* Which of these functions do you prefer? Why?
|
||||||
|
* Which of these functions is more flexible?
|
||||||
|
|
||||||
|
|
||||||
|
### Deep Idea: "Duck Typing"
|
||||||
|
|
||||||
|
[Duck Typing](https://en.wikipedia.org/wiki/Duck_typing) is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the [duck test](https://en.wikipedia.org/wiki/Duck_test).
|
||||||
|
|
||||||
|
> If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
|
||||||
|
|
||||||
|
In our previous example that reads the lines, our `read_data` expects
|
||||||
|
any iterable object. Not just the lines of a file.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def read_data(lines):
|
||||||
|
records = []
|
||||||
|
for line in lines:
|
||||||
|
...
|
||||||
|
records.append(r)
|
||||||
|
return records
|
||||||
|
```
|
||||||
|
|
||||||
|
This means that we can use it with other *lines*.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# A CSV file
|
||||||
|
lines = open('data.csv')
|
||||||
|
data = read_data(lines)
|
||||||
|
|
||||||
|
# A zipped file
|
||||||
|
lines = gzip.open('data.csv.gz','rt')
|
||||||
|
data = read_data(lines)
|
||||||
|
|
||||||
|
# The Standard Input
|
||||||
|
lines = sys.stdin
|
||||||
|
data = read_data(lines)
|
||||||
|
|
||||||
|
# A list of strings
|
||||||
|
lines = ['ACME,50,91.1','IBM,75,123.45', ... ]
|
||||||
|
data = read_data(lines)
|
||||||
|
```
|
||||||
|
|
||||||
|
There is considerable flexibility with this design.
|
||||||
|
|
||||||
|
*Question: Shall we embrace or fight this flexibility?*
|
||||||
|
|
||||||
|
### Library Design Best Practices
|
||||||
|
|
||||||
|
Code libraries are often better served by embracing flexibility.
|
||||||
|
Don't restrict your options. With great flexibility comes great power.
|
||||||
|
|
||||||
|
## Exercise
|
||||||
|
|
||||||
|
### (a)From filenames to file-like objects
|
||||||
|
|
||||||
|
In this section, you worked on a file `fileparse.py` that contained a
|
||||||
|
function `parse_csv()`. The function worked like this:
|
||||||
|
|
||||||
|
```pycon
|
||||||
|
>>> import fileparse
|
||||||
|
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Right now, the function expects to be passed a filename. However, you
|
||||||
|
can make the code more flexible. Modify the function so that it works
|
||||||
|
with any file-like/iterable object. For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
>>> import fileparse
|
||||||
|
>>> import gzip
|
||||||
|
>>> with gzip.open('Data/portfolio.csv.gz', 'rt') as f:
|
||||||
|
... port = fileparse.parse_csv(f, types=[str,int,float])
|
||||||
|
...
|
||||||
|
>>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1']
|
||||||
|
>>> port = fileparse.parse_csv(lines, types=[str,int,float])
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
In this new code, what happens if you pass a filename as before?
|
||||||
|
|
||||||
|
```
|
||||||
|
>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
|
||||||
|
>>> port
|
||||||
|
... look at output (it should be crazy) ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
With flexibility comes power and with power comes responsibility. Sometimes you'll
|
||||||
|
need to be careful.
|
||||||
|
|
||||||
|
### (b) Fixing existing functions
|
||||||
|
|
||||||
|
Fix the `read_portfolio()` and `read_prices()` functions in the
|
||||||
|
`report.py` file so that they work with the modified version of
|
||||||
|
`parse_csv()`. This should only involve a minor modification.
|
||||||
|
Afterwards, your `report.py` and `pcost.py` programs should work
|
||||||
|
the same way they always did.
|
||||||
35
Notes/04_Classes_objects/00_Overview.md
Normal file
35
Notes/04_Classes_objects/00_Overview.md
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
# Overview
|
||||||
|
|
||||||
|
## Object Oriented (OO) programming
|
||||||
|
|
||||||
|
A Programming technique where code is organized as a collection of *objects*.
|
||||||
|
|
||||||
|
An *object* consists of:
|
||||||
|
|
||||||
|
* Data. Attributes
|
||||||
|
* Behavior. Methods, functions applied to the object.
|
||||||
|
|
||||||
|
You have already been using some OO during this course.
|
||||||
|
|
||||||
|
For example with Lists.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> nums = [1, 2, 3]
|
||||||
|
>>> nums.append(4) # Method
|
||||||
|
>>> nums.insert(1,10) # Method
|
||||||
|
>>> nums
|
||||||
|
[1, 10, 2, 3, 4] # Data
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
`nums` is an *instance* of a list.
|
||||||
|
|
||||||
|
Methods (`append` and `insert`) are attached to the instance (`nums`).
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This will be a high-level overview of classes.
|
||||||
|
|
||||||
|
Most code involving classes will involve the topics covered in this section.
|
||||||
|
|
||||||
|
If you're merely using existing libraries, the code is typically fairly simple.
|
||||||
253
Notes/04_Classes_objects/01_Class.md
Normal file
253
Notes/04_Classes_objects/01_Class.md
Normal file
@@ -0,0 +1,253 @@
|
|||||||
|
# 4.1 Classes
|
||||||
|
|
||||||
|
### The `class` statement
|
||||||
|
|
||||||
|
Use the `class` statement to define a new object.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Player(object):
|
||||||
|
def __init__(self, x, y):
|
||||||
|
self.x = x
|
||||||
|
self.y = y
|
||||||
|
self.health = 100
|
||||||
|
|
||||||
|
def move(self, dx, dy):
|
||||||
|
self.dx += dx
|
||||||
|
self.dy += dy
|
||||||
|
|
||||||
|
def damage(self, pts):
|
||||||
|
self.health -= pts
|
||||||
|
```
|
||||||
|
|
||||||
|
In a nutshell, a class is a set of functions that carry out various operations on so-called *instances*.
|
||||||
|
|
||||||
|
### Instances
|
||||||
|
|
||||||
|
Instances are the actual *objects* that you manipulate in your program.
|
||||||
|
|
||||||
|
They are created by calling the class as a function.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> a = Player(2, 3)
|
||||||
|
>>> b = Player(10, 20)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
`a` anb `b` are instances of `Player`.
|
||||||
|
|
||||||
|
*Emphasize: The class statement is just the definition (it does nothing by itself). Similar to a function definition.*
|
||||||
|
|
||||||
|
### Instance Data
|
||||||
|
|
||||||
|
Each instance has its own local data.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> a.x
|
||||||
|
2
|
||||||
|
>>> b.x
|
||||||
|
10
|
||||||
|
```
|
||||||
|
|
||||||
|
This data is initialized by the `__init__()`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Player(object):
|
||||||
|
def __init__(self, x, y):
|
||||||
|
# Any value stored on `self` is instance data
|
||||||
|
self.x = x
|
||||||
|
self.y = y
|
||||||
|
self.health = 100
|
||||||
|
```
|
||||||
|
|
||||||
|
There are no restrictions on the total number or type of attributes stored.
|
||||||
|
|
||||||
|
### Instance Methods
|
||||||
|
|
||||||
|
Instance methods are functions applied to instances of an object.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Player(object):
|
||||||
|
...
|
||||||
|
# `move` is a method
|
||||||
|
def move(self, dx, dy):
|
||||||
|
self.x += dx
|
||||||
|
self.y += dy
|
||||||
|
```
|
||||||
|
|
||||||
|
The object itself is always passed as first argument.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> a.move(1, 2)
|
||||||
|
|
||||||
|
# matches `a` to `self`
|
||||||
|
# matches `1` to `dx`
|
||||||
|
# matches `2` to `dy`
|
||||||
|
def move(self, dx, dy):
|
||||||
|
```
|
||||||
|
|
||||||
|
By convention, the instance is called `self`. However, the actual name
|
||||||
|
used is unimportant. The object is always passed as the first
|
||||||
|
argument. It is simply Python programming style to call this argument
|
||||||
|
`self`.
|
||||||
|
|
||||||
|
### Class Scoping
|
||||||
|
|
||||||
|
Classes do not define a scope.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Player(object):
|
||||||
|
...
|
||||||
|
def move(self, dx, dy):
|
||||||
|
self.x += dx
|
||||||
|
self.y += dy
|
||||||
|
|
||||||
|
def left(self, amt):
|
||||||
|
move(-amt, 0) # NO. Calls a global `move` function
|
||||||
|
self.move(-amt, 0) # YES. Calls method `move` from above.
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to operate on an instance, you always have to refer too it explicitly (e.g., `self`).
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Objects as Data Structures
|
||||||
|
|
||||||
|
In section 2 and 3, we worked with data represented as tuples and dictionaries.
|
||||||
|
For example, a holding of stock could be represented as a tuple like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = ('GOOG',100,490.10)
|
||||||
|
```
|
||||||
|
|
||||||
|
or as a dictionary like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
s = { 'name' : 'GOOG',
|
||||||
|
'shares' : 100,
|
||||||
|
'price' : 490.10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
You can even write functions for manipulating such data. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def cost(s):
|
||||||
|
return s['shares'] * s['price']
|
||||||
|
```
|
||||||
|
|
||||||
|
However, as your program gets large, you might want to create a better sense of organization.
|
||||||
|
Thus, another approach for representing data would be to define a class.
|
||||||
|
|
||||||
|
Create a file called `stock.py` and define a class `Stock` that represents a single holding of stock.
|
||||||
|
Have the instances of `Stock` have `name`, `shares`, and `price` attributes.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> s = stock.Stock('GOOG',100,490.10)
|
||||||
|
>>> s.name
|
||||||
|
'GOOG'
|
||||||
|
>>> s.shares
|
||||||
|
100
|
||||||
|
>>> s.price
|
||||||
|
490.1
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Create a few more `Stock` objects and manipulate them. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> a = stock.Stock('AAPL',50,122.34)
|
||||||
|
>>> b = stock.Stock('IBM',75,91.75)
|
||||||
|
>>> a.shares * a.price
|
||||||
|
6117.0
|
||||||
|
>>> b.shares * b.price
|
||||||
|
6881.25
|
||||||
|
>>> stocks = [a,b,s]
|
||||||
|
>>> stocks
|
||||||
|
[<stock.Stock object at 0x37d0b0>, <stock.Stock object at 0x37d110>, <stock.Stock object at 0x37d050>]
|
||||||
|
>>> for t in stocks:
|
||||||
|
print(f'{t.name:>10s} {t.shares:>10d} {t.price:>10.2f}')
|
||||||
|
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
One thing to emphasize here is that the class `Stock` acts like a factory for creating instances of objects.
|
||||||
|
Basically, you just call it as a function and it creates a new object for you.
|
||||||
|
|
||||||
|
Also, it needs to be emphasized that each object is distinct---they
|
||||||
|
each have their own data that is separate from other objects that have
|
||||||
|
been created. An object defined by a class is somewhat similar to a
|
||||||
|
dictionary, just with somewhat different syntax.
|
||||||
|
For example, instead of writing `s['name']` or `s['price']`, you now
|
||||||
|
write `s.name` and `s.price`.
|
||||||
|
|
||||||
|
### (b) Reading Data into a List of Objects
|
||||||
|
|
||||||
|
In your `stock.py` program, write a function
|
||||||
|
`read_portfolio(filename)` that reads portfolio data from a file into
|
||||||
|
a list of `Stock` objects. This function is going to mimic the
|
||||||
|
behavior of earlier code you have written. Here’s how your function
|
||||||
|
will behave:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
[<stock.Stock object at 0x81d70>, <stock.Stock object at 0x81cf0>, <stock.Stock object at 0x81db0>,
|
||||||
|
<stock.Stock object at 0x81df0>, <stock.Stock object at 0x81e30>, <stock.Stock object at 0x81e70>,
|
||||||
|
<stock.Stock object at 0x81eb0>]
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
It is important to emphasize that `read_portfolio()` is a top-level function, not a method of the `Stock` class.
|
||||||
|
This function is merely creating a list of `Stock` objects; it’s not an operation on an individual `Stock` instance.
|
||||||
|
|
||||||
|
Try performing some calculations with the above data. First, try printing a formatted table:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> for s in portfolio:
|
||||||
|
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
|
||||||
|
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Try a list comprehension:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> more100 = [s for s in portfolio if s.shares > 100]
|
||||||
|
>>> for s in more100:
|
||||||
|
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
|
||||||
|
|
||||||
|
... look at the output ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Again, notice the similarity between `Stock` objects and dictionaries. They’re basically the same idea, but the syntax for accessing values differs.
|
||||||
|
|
||||||
|
### (c) Adding some Methods
|
||||||
|
|
||||||
|
With classes, you can attach functions to your objects. These are
|
||||||
|
known as methods and are functions that operate on the data stored
|
||||||
|
inside an object.
|
||||||
|
|
||||||
|
Add a `cost()` and `sell()` method to your `Stock` object. They should
|
||||||
|
work like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> s = stock.Stock('GOOG',100,490.10)
|
||||||
|
>>> s.cost()
|
||||||
|
49010.0
|
||||||
|
>>> s.shares
|
||||||
|
100
|
||||||
|
>>> s.sell(25)
|
||||||
|
>>> s.shares
|
||||||
|
75
|
||||||
|
>>> s.cost()
|
||||||
|
36757.5
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](02_Inheritance)
|
||||||
502
Notes/04_Classes_objects/02_Inheritance.md
Normal file
502
Notes/04_Classes_objects/02_Inheritance.md
Normal file
@@ -0,0 +1,502 @@
|
|||||||
|
# 4.2 Inheritance
|
||||||
|
|
||||||
|
Inheritance is a commonly used tool for writing extensible programs. This section explores that idea.
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
|
||||||
|
Inheritance is used to specialize existing objects:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Parent:
|
||||||
|
...
|
||||||
|
|
||||||
|
class Child(Parent): # Check how `Parent` is between the parenthesis
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The new class `Child` is called a derived class or subclass.
|
||||||
|
The `Parent` class is known as base class or superclass.
|
||||||
|
`Parent` is specified in `()` after the class name, `class Child(Parent):`.
|
||||||
|
|
||||||
|
### Extending
|
||||||
|
|
||||||
|
With inheritance, you are taking an existing class and:
|
||||||
|
|
||||||
|
* Adding new methods
|
||||||
|
* Redefining some of the existing methods
|
||||||
|
* Adding new attributes to instances
|
||||||
|
|
||||||
|
In the end you are **extending existing code**.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
Suppose that this is your starting class:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Stock(object):
|
||||||
|
def __init__(self, name, shares, price):
|
||||||
|
self.name = name
|
||||||
|
self.shares = shares
|
||||||
|
self.price = price
|
||||||
|
|
||||||
|
def cost(self):
|
||||||
|
return self.shares * self.price
|
||||||
|
|
||||||
|
def sell(self, nshares):
|
||||||
|
self.shares -= nshares
|
||||||
|
```
|
||||||
|
|
||||||
|
You can change any part of this via inheritance.
|
||||||
|
|
||||||
|
### Add a new method
|
||||||
|
|
||||||
|
```python
|
||||||
|
class MyStock(Stock):
|
||||||
|
def panic(self):
|
||||||
|
self.sell(self.shares)
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage example.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = MyStock('GOOG', 100, 490.1)
|
||||||
|
>>> s.sell(25)
|
||||||
|
>>> s.shares 75
|
||||||
|
>>> s.panic()
|
||||||
|
>>> s.shares 0
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Redefining an existing method
|
||||||
|
|
||||||
|
```python
|
||||||
|
class MyStock(Stock):
|
||||||
|
def cost(self):
|
||||||
|
return 1.25 * self.shares * self.price
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage example.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = MyStock('GOOG', 100, 490.1)
|
||||||
|
>>> s.cost()
|
||||||
|
61262.5
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The new method takes the place of the old one. The other methods are unaffected.
|
||||||
|
|
||||||
|
## Overriding
|
||||||
|
|
||||||
|
Sometimes a class extends an existing method, but it wants to use the original implementation.
|
||||||
|
For this, use `super()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Stock(object):
|
||||||
|
...
|
||||||
|
def cost(self):
|
||||||
|
return self.shares * self.price
|
||||||
|
...
|
||||||
|
|
||||||
|
class MyStock(Stock):
|
||||||
|
def cost(self):
|
||||||
|
# Check the call to `super`
|
||||||
|
actual_cost = super().cost()
|
||||||
|
return 1.25 * actual_cost
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `super()` to call the previous version.
|
||||||
|
|
||||||
|
*Caution: Python 2 is different.*
|
||||||
|
|
||||||
|
```python
|
||||||
|
actual_cost = super(MyStock, self).cost()
|
||||||
|
```
|
||||||
|
|
||||||
|
### `__init__` and inheritance
|
||||||
|
|
||||||
|
If `__init__` is redefined, it is mandatory to initialize the parent.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Stock(object):
|
||||||
|
def __init__(self, name, shares, price):
|
||||||
|
self.name = name
|
||||||
|
self.shares = shares
|
||||||
|
self.price = price
|
||||||
|
|
||||||
|
class MyStock(Stock):
|
||||||
|
def __init__(self, name, shares, price, factor):
|
||||||
|
# Check the call to `super` and `__init__`
|
||||||
|
super().__init__(name, shares, price)
|
||||||
|
self.factor = factor
|
||||||
|
|
||||||
|
def cost(self):
|
||||||
|
return self.factor * super().cost()
|
||||||
|
```
|
||||||
|
|
||||||
|
You should call the `init` on the `super` which is the way to call the previous version as shown previously.
|
||||||
|
|
||||||
|
### Using Inheritance
|
||||||
|
|
||||||
|
Inheritance is sometimes used to organize related objects.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Shape(object):
|
||||||
|
...
|
||||||
|
|
||||||
|
class Circle(Shape):
|
||||||
|
...
|
||||||
|
|
||||||
|
class Rectangle(Shape):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Think of a logical hierarchy or taxonomy. However, a more common usage is
|
||||||
|
related to making reusable or extensible code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class CustomHandler(TCPHandler):
|
||||||
|
def handle_request(self):
|
||||||
|
...
|
||||||
|
# Custom processing
|
||||||
|
```
|
||||||
|
|
||||||
|
The base class contains some general purpose code.
|
||||||
|
Your class inherits and customized specific parts. Maybe it plugs into a framework.
|
||||||
|
|
||||||
|
### "is a" relationship
|
||||||
|
|
||||||
|
Inheritance establishes a type relationship.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Shape(object):
|
||||||
|
...
|
||||||
|
|
||||||
|
class Circle(Shape):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Check for object instance.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> c = Circle(4.0)
|
||||||
|
>>> isinstance(c, Shape)
|
||||||
|
True
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
*Important: Code that works with the parent is also supposed to work with the child.*
|
||||||
|
|
||||||
|
### `object` base class
|
||||||
|
|
||||||
|
If a class has no parent, you sometimes see `object` used as the base.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Shape(object):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
`object` is the parent of all objects in Python.
|
||||||
|
|
||||||
|
*Note: it's not technically required in Python 3. If omitted in Python 2, it results in an "old style class" which should be avoided.*
|
||||||
|
|
||||||
|
### Multiple Inheritance
|
||||||
|
|
||||||
|
You can inherit from multiple classes by specifying them in the definition of the class.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Mother(object):
|
||||||
|
...
|
||||||
|
|
||||||
|
class Father(object):
|
||||||
|
...
|
||||||
|
|
||||||
|
class Child(Mother, Father):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The class `Child` inherits features from both parents. There are some rather tricky details. Don't do it unless you know what you are doing.
|
||||||
|
We're not going to explore multiple inheritance further in this course.
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Print Portfolio
|
||||||
|
|
||||||
|
A major use of inheritance is in writing code that’s meant to be extended or customized in various ways—especially in libraries or frameworks.
|
||||||
|
To illustrate, start by adding the following function to your `stock.py` program:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
def print_portfolio(portfolio):
|
||||||
|
'''
|
||||||
|
Make a nicely formatted table showing portfolio contents.
|
||||||
|
'''
|
||||||
|
headers = ('Name','Shares','Price')
|
||||||
|
for h in headers:
|
||||||
|
print(f'{h:>10s}',end=' ')
|
||||||
|
print()
|
||||||
|
print(('-'*10 + ' ')*len(headers))
|
||||||
|
for s in portfolio:
|
||||||
|
print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}')
|
||||||
|
```
|
||||||
|
|
||||||
|
Add a little testing section to the bottom of your `stock.py` file that runs the above function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if __name__ == '__main__':
|
||||||
|
portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
print_portfolio(portfolio)
|
||||||
|
```
|
||||||
|
|
||||||
|
When you run your `stock.py`, you should get this output:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
Name Shares Price
|
||||||
|
---------- ---------- ----------
|
||||||
|
AA 100 32.20
|
||||||
|
IBM 50 91.10
|
||||||
|
CAT 150 83.44
|
||||||
|
MSFT 200 51.23
|
||||||
|
GE 95 40.37
|
||||||
|
MSFT 50 65.10
|
||||||
|
IBM 100 70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) An Extensibility Problem
|
||||||
|
|
||||||
|
Suppose that you wanted to modify the `print_portfolio()` function to
|
||||||
|
support a variety of different output formats such as plain-text,
|
||||||
|
HTML, CSV, or XML. To do this, you could try to write one gigantic
|
||||||
|
function that did everything. However, doing so would likely lead to
|
||||||
|
an unmaintainable mess. Instead, this is a perfect opportunity to use
|
||||||
|
inheritance instead.
|
||||||
|
|
||||||
|
To start, focus on the steps that are involved in a creating a
|
||||||
|
table. At the top of the table is a set of table headers. After that,
|
||||||
|
rows of table data appear. Let’s take those steps and and put them into their own class.
|
||||||
|
|
||||||
|
Create a file called `tableformat.py` and define the following class:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tableformat.py
|
||||||
|
|
||||||
|
class TableFormatter(object):
|
||||||
|
def headings(self, headers):
|
||||||
|
'''
|
||||||
|
Emit the table headings.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def row(self, rowdata):
|
||||||
|
'''
|
||||||
|
Emit a single row of table data.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
```
|
||||||
|
|
||||||
|
This class does nothing, but it serves as a kind of design specification for additional classes that will be defined shortly.
|
||||||
|
|
||||||
|
Modify the `print_portfolio()` function so that it accepts a `TableFormatter` object as input and invokes methods on it to produce the output.
|
||||||
|
For example, like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
def print_portfolio(portfolio, formatter):
|
||||||
|
'''
|
||||||
|
Make a nicely formatted table showing portfolio contents.
|
||||||
|
'''
|
||||||
|
formatter.headings(['Name', 'Shares', 'Price'])
|
||||||
|
for s in portfolio:
|
||||||
|
# Form a row of output data (as strings)
|
||||||
|
rowdata = [s.name, str(s.shares), f'{s.price:0.2f}' ]
|
||||||
|
formatter.row(rowdata)
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, try your new class by modifying the main program like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from tableformat import TableFormatter
|
||||||
|
portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
formatter = TableFormatter()
|
||||||
|
print_portfolio(portfolio, formatter)
|
||||||
|
```
|
||||||
|
|
||||||
|
When you run this new code, your program will immediately crash with a `NotImplementedError` exception.
|
||||||
|
That’s not too exciting, but continue to the next part.
|
||||||
|
|
||||||
|
### (c) Using Inheritance to Produce Different Output
|
||||||
|
|
||||||
|
The `TableFormatter` class you defined in part (a) is meant to be extended via inheritance.
|
||||||
|
In fact, that’s the whole idea. To illustrate, define a class `TextTableFormatter` like this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tableformat.py
|
||||||
|
...
|
||||||
|
class TextTableFormatter(TableFormatter):
|
||||||
|
'''
|
||||||
|
Emit a table in plain-text format
|
||||||
|
'''
|
||||||
|
def headings(self, headers):
|
||||||
|
for h in headers:
|
||||||
|
print(f'{h:>10s}', end=' ')
|
||||||
|
print()
|
||||||
|
print(('-'*10 + ' ')*len(headers))
|
||||||
|
|
||||||
|
def row(self, rowdata):
|
||||||
|
for d in rowdata:
|
||||||
|
print(f'{d:>10s}', end=' ')
|
||||||
|
print()
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify your main program in `stock.py` like this and try it:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from tableformat import TextTableFormatter
|
||||||
|
portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
formatter = TextTableFormatter()
|
||||||
|
print_portfolio(portfolio, formatter)
|
||||||
|
```
|
||||||
|
|
||||||
|
This should produce the same output as before:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
Name Shares Price
|
||||||
|
---------- ---------- ----------
|
||||||
|
AA 100 32.20
|
||||||
|
IBM 50 91.10
|
||||||
|
CAT 150 83.44
|
||||||
|
MSFT 200 51.23
|
||||||
|
GE 95 40.37
|
||||||
|
MSFT 50 65.10
|
||||||
|
IBM 100 70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
However, let’s change the output to something else. Define a new class `CSVTableFormatter` that produces output in CSV format:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tableformat.py
|
||||||
|
...
|
||||||
|
class CSVTableFormatter(TableFormatter):
|
||||||
|
'''
|
||||||
|
Output portfolio data in CSV format.
|
||||||
|
'''
|
||||||
|
def headings(self, headers):
|
||||||
|
print(','.join(headers))
|
||||||
|
|
||||||
|
def row(self, rowdata):
|
||||||
|
print(','.join(rowdata))
|
||||||
|
```
|
||||||
|
|
||||||
|
Modify your main program as follows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from tableformat import CSVTableFormatter
|
||||||
|
portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
formatter = CSVTableFormatter()
|
||||||
|
print_portfolio(portfolio, formatter)
|
||||||
|
```
|
||||||
|
|
||||||
|
You should now see CSV output like this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
Name,Shares,Price
|
||||||
|
AA,100,32.20
|
||||||
|
IBM,50,91.10
|
||||||
|
CAT,150,83.44
|
||||||
|
MSFT,200,51.23
|
||||||
|
GE,95,40.37
|
||||||
|
MSFT,50,65.10
|
||||||
|
IBM,100,70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
Using a similar idea, define a class `HTMLTableFormatter` that produces a table with the following output:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<tr> <th>Name</th> <th>Shares</th> <th>Price</th> </tr>
|
||||||
|
<tr> <td>AA</td> <td>100</td> <td>32.20</td> </tr>
|
||||||
|
<tr> <td>IBM</td> <td>50</td> <td>91.10</td> </tr>
|
||||||
|
```
|
||||||
|
|
||||||
|
Test your code by modifying the main program to create a `HTMLTableFormatter` object instead of a `CSVTableFormatter` object.
|
||||||
|
|
||||||
|
### (d) Polymorphism in Action
|
||||||
|
|
||||||
|
A major feature of object-oriented programming is that you can plug an
|
||||||
|
object into a program and it will work without having to change any of
|
||||||
|
the existing code. For example, if you wrote a program that expected
|
||||||
|
to use a `TableFormatter` object, it would work no matter what kind of
|
||||||
|
`TableFormatter` you actually gave it.
|
||||||
|
|
||||||
|
This behavior is sometimes referred to as *polymorphism*.
|
||||||
|
|
||||||
|
One potential problem is making it easier for the user to pick the formatter that they want.
|
||||||
|
This can sometimes be fixed by defining a helper function.
|
||||||
|
|
||||||
|
In the `tableformat.py` file, add a function `create_formatter(name)`
|
||||||
|
that allows a user to create a formatter given an output name such as
|
||||||
|
`'txt'`, `'csv'`, or `'html'`.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# stock.py
|
||||||
|
...
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from tableformat import create_formatter
|
||||||
|
portfolio = read_portfolio('Data/portfolio.csv')
|
||||||
|
formatter = create_formatter('csv')
|
||||||
|
print_portfolio(portfolio, formatter)
|
||||||
|
```
|
||||||
|
|
||||||
|
When you run this program, you’ll see output such as this:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
Name,Shares,Price
|
||||||
|
AA,100,32.20
|
||||||
|
IBM,50,91.10
|
||||||
|
CAT,150,83.44
|
||||||
|
MSFT,200,51.23
|
||||||
|
GE,95,40.37
|
||||||
|
MSFT,50,65.10
|
||||||
|
IBM,100,70.44
|
||||||
|
```
|
||||||
|
|
||||||
|
Try changing the format to `'txt'` and `'html'` just to make sure your
|
||||||
|
code is working correctly. If the user provides a bad output format
|
||||||
|
to the `create_formatter()` function, have it raise a `RuntimeError`
|
||||||
|
exception. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from tableformat import create_formatter
|
||||||
|
>>> formatter = create_formatter('xls')
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "tableformat.py", line 68, in create_formatter
|
||||||
|
raise RuntimeError('Unknown table format %s' % name)
|
||||||
|
RuntimeError: Unknown table format xls
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Writing extensible code is one of the most common uses of inheritance in libraries and frameworks.
|
||||||
|
For example, a framework might instruct you to define your own object that inherits from a provided base class.
|
||||||
|
You’re then told to fill in various methods that implement various bits of functionality.
|
||||||
|
That said, designing object oriented programs can be extremely
|
||||||
|
difficult. For more information, you should probably look for books on
|
||||||
|
the topic of design patterns.
|
||||||
|
|
||||||
|
That said, understanding what happened in this exercise will take you
|
||||||
|
pretty far in terms of using most library modules and knowing
|
||||||
|
what inheritance is good for (extensibility).
|
||||||
|
|
||||||
|
[Next](03_Special_methods)
|
||||||
332
Notes/04_Classes_objects/03_Special_methods.md
Normal file
332
Notes/04_Classes_objects/03_Special_methods.md
Normal file
@@ -0,0 +1,332 @@
|
|||||||
|
# 4.3 Special Methods
|
||||||
|
|
||||||
|
Various parts of Python's behavior can be customized via special or magic methods.
|
||||||
|
This section introduces that idea.
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
|
||||||
|
Classes may define special methods. These have special meaning to the Python interpreter.
|
||||||
|
They are always preceded and followed by `__`. For example `__init__`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Stock(object):
|
||||||
|
def __init__(self):
|
||||||
|
...
|
||||||
|
def __repr__(self):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
There are dozens of special methods, but we will only look at a few specific examples.
|
||||||
|
|
||||||
|
### Special methods for String Conversions
|
||||||
|
|
||||||
|
Objects have two string representations.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from datetime import date
|
||||||
|
>>> d = date(2012, 12, 21)
|
||||||
|
>>> print(d)
|
||||||
|
2012-12-21
|
||||||
|
>>> d
|
||||||
|
datetime.date(2012, 12, 21)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The `str()` function is used to create a nice printable output:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> str(d)
|
||||||
|
'2012-12-21'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
The `repr()` function is used to create a more detailed representation
|
||||||
|
for programmers.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> repr(d)
|
||||||
|
'datetime.date(2012, 12, 21)'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Those functions, `str()` and `repr()`, use a pair of special methods in the class to get the string to be printed.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Date(object):
|
||||||
|
def __init__(self, year, month, day):
|
||||||
|
self.year = year
|
||||||
|
self.month = month
|
||||||
|
self.day = day
|
||||||
|
|
||||||
|
# Used with `str()`
|
||||||
|
def __str__(self):
|
||||||
|
return f'{self.year}-{self.month}-{self.day}'
|
||||||
|
|
||||||
|
# Used with `repr()`
|
||||||
|
def __repr__(self):
|
||||||
|
return f'Date({self.year},{self.month},{self.day})'
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note: The convention for `__repr__()` is to return a string that,
|
||||||
|
when fed to `eval()`., will recreate the underlying object. If this
|
||||||
|
is not possible, some kind of easily readable representation is used
|
||||||
|
instead.*
|
||||||
|
|
||||||
|
### Special Methods for Mathematics
|
||||||
|
|
||||||
|
Mathematical operators are just calls to special methods.
|
||||||
|
|
||||||
|
```python
|
||||||
|
a + b a.__add__(b)
|
||||||
|
a - b a.__sub__(b)
|
||||||
|
a * b a.__mul__(b)
|
||||||
|
a / b a.__div__(b)
|
||||||
|
a // b a.__floordiv__(b)
|
||||||
|
a % b a.__mod__(b)
|
||||||
|
a << b a.__lshift__(b)
|
||||||
|
a >> b a.__rshift__(b)
|
||||||
|
a & b a.__and__(b)
|
||||||
|
a | b a.__or__(b)
|
||||||
|
a ^ b a.__xor__(b)
|
||||||
|
a ** b a.__pow__(b)
|
||||||
|
-a a.__neg__()
|
||||||
|
~a a.__invert__()
|
||||||
|
abs(a) a.__abs__()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Special Methods for Item Access
|
||||||
|
|
||||||
|
These are the methods to implement containers.
|
||||||
|
|
||||||
|
```python
|
||||||
|
len(x) x.__len__()
|
||||||
|
x[a] x.__getitem__(a)
|
||||||
|
x[a] = v x.__setitem__(a,v)
|
||||||
|
del x[a] x.__delitem__(a)
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use them in your classes.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Sequence(object):
|
||||||
|
def __len__(self):
|
||||||
|
...
|
||||||
|
def __getitem__(self,a):
|
||||||
|
...
|
||||||
|
def __setitem__(self,a,v):
|
||||||
|
...
|
||||||
|
def __delitem__(self,a):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Method Invocation
|
||||||
|
|
||||||
|
Invoking a method is a two-step process.
|
||||||
|
|
||||||
|
1. Lookup: The `.` operator
|
||||||
|
2. Method call: The `()` operator
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = Stock('GOOG',100,490.10)
|
||||||
|
>>> c = s.cost # Lookup
|
||||||
|
>>> c
|
||||||
|
<bound method Stock.cost of <Stock object at 0x590d0>>
|
||||||
|
>>> c() # Method call
|
||||||
|
49010.0
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bound Methods
|
||||||
|
|
||||||
|
A method that has not yet been invoked by the function call operator `()` is known as a *bound method*.
|
||||||
|
It operates on the instance where it originated.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = Stock('GOOG', 100, 490.10) >>> s
|
||||||
|
<Stock object at 0x590d0>
|
||||||
|
>>> c = s.cost
|
||||||
|
>>> c
|
||||||
|
<bound method Stock.cost of <Stock object at 0x590d0>>
|
||||||
|
>>> c()
|
||||||
|
49010.0
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Bound methods are often a source of careless non-obvious errors. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = Stock('GOOG', 100, 490.10)
|
||||||
|
>>> print('Cost : %0.2f' % s.cost)
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
TypeError: float argument required
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Or devious behavior that's hard to debug.
|
||||||
|
|
||||||
|
```python
|
||||||
|
f = open(filename, 'w')
|
||||||
|
...
|
||||||
|
f.close # Oops, Didn't do anything at all. `f` still open.
|
||||||
|
```
|
||||||
|
|
||||||
|
In both of these cases, the error is cause by forgetting to include the
|
||||||
|
trailing parentheses. For example, `s.cost()` or `f.close()`.
|
||||||
|
|
||||||
|
### Attribute Access
|
||||||
|
|
||||||
|
There is an alternative way to access, manipulate and manage attributes.
|
||||||
|
|
||||||
|
```python
|
||||||
|
getattr(obj, 'name') # Same as obj.name
|
||||||
|
setattr(obj, 'name', value) # Same as obj.name = value
|
||||||
|
delattr(obj, 'name') # Same as del obj.name
|
||||||
|
hasattr(obj, 'name') # Tests if attribute exists
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if hasattr(obj, 'x'):
|
||||||
|
x = getattr(obj, 'x'):
|
||||||
|
else:
|
||||||
|
x = None
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note: `getattr()` also has a useful default value *arg*.
|
||||||
|
|
||||||
|
```python
|
||||||
|
x = getattr(obj, 'x', None)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Better output for printing objects
|
||||||
|
|
||||||
|
All Python objects have two string representations. The first
|
||||||
|
representation is created by string conversion via `str()` (which is
|
||||||
|
called by `print`). The string representation is usually a nicely
|
||||||
|
formatted version of the object meant for humans. The second
|
||||||
|
representation is a code representation of the object created by
|
||||||
|
`repr()` (or by viewing a value in the interactive shell). The code
|
||||||
|
representation typically shows you the code that you have to type to
|
||||||
|
get the object.
|
||||||
|
|
||||||
|
The two representations of an object are often different. For example, you can see the difference by trying the following:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> s = 'Hello\nWorld'
|
||||||
|
>>> print(str(s)) # Notice nice output (no quotes)
|
||||||
|
Hello
|
||||||
|
World
|
||||||
|
>>> print(repr(s)) # Notice the added quotes and escape codes
|
||||||
|
'Hello\nWorld'
|
||||||
|
>>> print(f'{s!r}') # Alternate way to get repr() string
|
||||||
|
'Hello\nWorld'
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Both kinds of string conversions can be redefined in a class if it defines the `__str__()` and `__repr__()` methods.
|
||||||
|
|
||||||
|
Modify the `Stock` object that you defined in Exercise 4.1 so that the `__repr__()` method produces more useful output.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> goog = Stock('GOOG', 100, 490.1)
|
||||||
|
>>> goog
|
||||||
|
Stock('GOOG', 100, 490.1)
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
See what happens when you read a portfolio of stocks and view the resulting list after you have made these changes.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> portfolio
|
||||||
|
... see what the output is ...
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (b) An example of using `getattr()`
|
||||||
|
|
||||||
|
In Exercise 4.2 you worked with a function `print_portfolio()` that made a table for a stock portfolio.
|
||||||
|
That function was hard-coded to only work with stock data—-how limiting! You can do so much more if you use functions such as `getattr()`.
|
||||||
|
|
||||||
|
To begin, try this little example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> s = stock.Stock('GOOG', 100, 490.1)
|
||||||
|
>>> columns = ['name', 'shares']
|
||||||
|
>>> for colname in columns:
|
||||||
|
print(colname, '=', getattr(s, colname))
|
||||||
|
|
||||||
|
name = GOOG
|
||||||
|
shares = 100
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Carefully observe that the output data is determined entirely by the attribute names listed in the `columns` variable.
|
||||||
|
|
||||||
|
In the file `tableformat.py`, take this idea and expand it into a
|
||||||
|
generalized function `print_table()` that prints a table showing
|
||||||
|
user-specified attributes of a list of arbitrary objects.
|
||||||
|
|
||||||
|
As with the earlier `print_portfolio()` function, `print_table()`
|
||||||
|
should also accept a `TableFormatter` instance to control the output
|
||||||
|
format. Here’s how it should work:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> import stock
|
||||||
|
>>> portfolio = stock.read_portfolio('Data/portfolio.csv')
|
||||||
|
>>> from tableformat import create_formatter, print_table
|
||||||
|
>>> formatter = create_formatter('txt')
|
||||||
|
>>> print_table(portfolio, ['name','shares'], formatter)
|
||||||
|
name shares
|
||||||
|
---------- ----------
|
||||||
|
AA 100
|
||||||
|
IBM 50
|
||||||
|
CAT 150
|
||||||
|
MSFT 200
|
||||||
|
GE 95
|
||||||
|
MSFT 50
|
||||||
|
IBM 100
|
||||||
|
|
||||||
|
>>> print_table(portfolio, ['name','shares','price'], formatter)
|
||||||
|
name shares price
|
||||||
|
---------- ---------- ----------
|
||||||
|
AA 100 32.2
|
||||||
|
IBM 50 91.1
|
||||||
|
CAT 150 83.44
|
||||||
|
MSFT 200 51.23
|
||||||
|
GE 95 40.37
|
||||||
|
MSFT 50 65.1
|
||||||
|
IBM 100 70.44
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
### (c) Exercise Bonus: Column Formatting
|
||||||
|
|
||||||
|
Modify the `print_table()` function in part (B) so that it also
|
||||||
|
accepts a list of format specifiers for formatting the contents of
|
||||||
|
each column.
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> print_table(portfolio,
|
||||||
|
['name','shares','price'],
|
||||||
|
['s','d','0.2f'],
|
||||||
|
formatter)
|
||||||
|
name shares price
|
||||||
|
---------- ---------- ----------
|
||||||
|
AA 100 32.20
|
||||||
|
IBM 50 91.10
|
||||||
|
CAT 150 83.44
|
||||||
|
MSFT 200 51.23
|
||||||
|
GE 95 40.37
|
||||||
|
MSFT 50 65.10
|
||||||
|
IBM 100 70.44
|
||||||
|
>>>
|
||||||
|
```
|
||||||
|
|
||||||
|
[Next](04_Defining_exceptions)
|
||||||
49
Notes/04_Classes_objects/04_Defining_exceptions.md
Normal file
49
Notes/04_Classes_objects/04_Defining_exceptions.md
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
# 4.4 Defining Exceptions
|
||||||
|
|
||||||
|
User defined exceptions are defined by classes.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class NetworkError(Exception):
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
**Exceptions always inherit from `Exception`.**
|
||||||
|
Usually they are empty classes. Use `pass` for the body.
|
||||||
|
|
||||||
|
You can also make a hierarchy of your exceptions.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AuthenticationError(NetworkError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
class ProtocolError(NetworkError):
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercises
|
||||||
|
|
||||||
|
### (a) Defining a custom exception
|
||||||
|
|
||||||
|
It is often good practice for libraries to define their own exceptions.
|
||||||
|
|
||||||
|
This makes it easier to distinguish between Python exceptions raised
|
||||||
|
in response to common programming errors versus exceptions
|
||||||
|
intentionally raised by a library to a signal a specific usage
|
||||||
|
problem.
|
||||||
|
|
||||||
|
Modify the `create_formatter()` function from the last exercise so
|
||||||
|
that it raises a custom `FormatError` exception when the user provides
|
||||||
|
a bad format name.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from tableformat import create_formatter
|
||||||
|
>>> formatter = create_formatter('xls')
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
File "tableformat.py", line 71, in create_formatter
|
||||||
|
raise FormatError('Unknown table format %s' % name)
|
||||||
|
FormatError: Unknown table format xls
|
||||||
|
>>>
|
||||||
|
```
|
||||||
@@ -39,6 +39,7 @@
|
|||||||
{{ content }}
|
{{ content }}
|
||||||
|
|
||||||
<footer class="site-footer">
|
<footer class="site-footer">
|
||||||
|
<span class="site-footer">Copyright (c) 2007-2020, David Beazley, All Rights Reserved</span>
|
||||||
{% if site.github.is_project_page %}
|
{% if site.github.is_project_page %}
|
||||||
<span class="site-footer-owner"><a href="{{ site.github.repository_url }}">{{ site.github.repository_name }}</a> is maintained by <a href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a>.</span>
|
<span class="site-footer-owner"><a href="{{ site.github.repository_url }}">{{ site.github.repository_name }}</a> is maintained by <a href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a>.</span>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|||||||
Reference in New Issue
Block a user