Added sections 2-4

This commit is contained in:
David Beazley
2020-05-25 12:40:11 -05:00
parent 7a4423dee4
commit c06face3a5
21 changed files with 5639 additions and 0 deletions

View File

@@ -0,0 +1,7 @@
# Working With Data Overview
In this section, we look at how Python programmers represent and work with data.
Most programs today work with data. We are going to learn to common programming idioms and how to not shoot yourself in the foot.
We will take a look at part of the object-model in Python. Which is a big part of understanding most Python programs.

View File

@@ -0,0 +1,431 @@
# 2.1 Datatypes and Data structures
This section introduces data structures in the form of tuples and dicts.
### Primitive Datatypes
Python has a few primitive types of data:
* Integers
* Floating point numbers
* Strings (text)
We have learned about these in the previous section.
### None type
```python
email_address = None
```
This type is often used as a placeholder for optional or missing value.
```python
if email_address:
send_email(email_address, msg)
```
### Data Structures
Real programs have more complex data than the ones that can be easily represented by the datatypes learned so far.
For example information about a stock:
```code
100 shares of GOOG at $490.10
```
This is an "object" with three parts:
* Name or symbol of the stock ("GOOG", a string)
* Number of shares (100, an integer)
* Price (490.10 a float)
### Tuples
A tuple is a collection of values grouped together.
Example:
```python
s = ('GOOG', 100, 490.1)
```
Sometimes the `()` are ommitted in the syntax.
```python
s = 'GOOG', 100, 490.1
```
Special cases (0-tuple, 1-typle).
```python
t = () # An empty tuple
w = ('GOOG', ) # A 1-item tuple
```
Tuples are usually used to represent *simple* records or structures.
Typically, it is a single *object* of multiple parts. A good analogy: *A tuple is like a single row in a database table.*
Tuple contents are ordered (like an array).
```python
s = ('GOOG', 100, 490.1)
name = s[0] # 'GOOG'
shares = s[1] # 100
price = s[2] # 490.1
```
However, th contents can't be modified.
```pycon
>>> s[1] = 75
TypeError: object does not support item assignment
```
You can, however, make a new tuple based on a current tuple.
```python
s = (s[0], 75, s[2])
```
### Tuple Packing
Tuples are focused more on packing related items together into a single *entity*.
```python
s = ('GOOG', 100, 490.1)
```
The tuple is then easy to pass around to other parts of a program as a single object.
### Tuple Unpacking
To use the tuple elsewhere, you can unpack its parts into variables.
```python
name, shares, price = s
print('Cost', shares * price)
```
The number of variables must match the tuple structure.
```python
name, shares = s # ERROR
Traceback (most recent call last):
...
ValueError: too many values to unpack
```
### Tuples vs. Lists
Tuples are NOT just read-only lists. Tuples are most ofter used for a *single item* consisting of multiple parts.
Lists are usually a collection of distinct items, usually all of the same type.
```python
record = ('GOOG', 100, 490.1) # A tuple representing a stock in a portfolio
symbols = [ 'GOOG', 'AAPL', 'IBM' ] # A List representing three stock symbols
```
### Dictionaries
A dictionary is a hash table or associative array.
It is a collection of values indexed by *keys*. These keys serve as field names.
```python
s = {
'name': 'GOOG',
'shares': 100,
'price': 490.1
}
```
### Common operations
To read values from a dictionary use the key names.
```pycon
>>> print(s['name'], s['shares'])
GOOG 100
>>> s['price']
490.10
>>>
```
To add or modify values assign using the key names.
```pycon
>>> s['shares'] = 75
>>> s['date'] = '6/6/2007'
>>>
```
To delete a value use the `del` statement.
```pycon
>>> del s['date']
>>>
```
### Why dictionaries?
Dictionaries are useful when there are *many* different values and those values
might be modified or manipulated. Dictionaries make your code more readable.
```python
s['price']
# vs
s[2]
```
## Exercises
### Note
In the last few exercises, you wrote a program that read a datafile `Data/portfolio.csv`. Using the `csv` module, it is easy to read the file row-by-row.
```pycon
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> next(rows)
['name', 'shares', 'price']
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>>
```
Although reading the file is easy, you often want to do more with the data than read it.
For instance, perhaps you want to store it and start performing some calculations on it.
Unfortunately, a raw "row" of data doesnt give you enough to work with. For example, even a simple math calculation doesnt work:
```pycon
>>> row = ['AA', '100', '32.20']
>>> cost = row[1] * row[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>
```
To do more, you typically want to interpret the raw data in some way and turn it into a more useful kind of object so that you can work with it later.
Two simple options are tuples or dictionaries.
### (a) Tuples
At the interactive prompt, create the following tuple that represents
the above row, but with the numeric columns converted to proper
numbers:
```pycon
>>> t = (row[0], int(row[1]), float(row[2]))
>>> t
('AA', 100, 32.2)
>>>
```
Using this, you can now calculate the total cost by multiplying the shares and the price:
```pycon
>>> cost = t[1] * t[2]
>>> cost
3220.0000000000005
>>>
```
Is math broken in Python? Whats the deal with the answer of
3220.0000000000005?
This is an artifact of the floating point hardware on your computer
only being able to accurately represent decimals in Base-2, not
Base-10. For even simple calculations involving base-10 decimals,
small errors are introduced. This is normal, although perhaps a bit
surprising if you havent seen it before.
This happens in all programming languages that use floating point
decimals, but it often gets hidden when printing. For example:
```pycon
>>> print(f'{cost:0.2f}')
3220.00
>>>
```
Tuples are read-only. Verify this by trying to change the number of shares to 75.
```pycon
>>> t[1] = 75
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>>
```
Although you cant change tuple contents, you can always create a completely new tuple that replaces the old one.
```pycon
>>> t = (t[0], 75, t[2])
>>> t
('AA', 75, 32.2)
>>>
```
Whenever you reassign an existing variable name like this, the old
value is discarded. Although the above assignment might look like you
are modifying the tuple, you are actually creating a new tuple and
throwing the old one away.
Tuples are often used to pack and unpack values into variables. Try the following:
```pycon
>>> name, shares, price = t
>>> name
'AA'
>>> shares
75
>>> price
32.2
>>>
```
Take the above variables and pack them back into a tuple
```pycon
>>> t = (name, 2*shares, price)
>>> t
('AA', 150, 32.2)
>>>
```
### (b) Dictionaries as a data structure
An alternative to a tuple is to create a dictionary instead.
```pycon
>>> d = {
'name' : row[0],
'shares' : int(row[1]),
'price' : float(row[2])
}
>>> d
{'name': 'AA', 'shares': 100, 'price': 32.2 }
>>>
```
Calculate the total cost of this holding:
```pycon
>>> cost = d['shares'] * d['price']
>>> cost
3220.0000000000005
>>>
```
Compare this example with the same calculation involving tuples above. Change the number of shares to 75.
```pycon
>>> d['shares'] = 75
>>> d
{'name': 'AA', 'shares': 75, 'price': 75}
>>>
```
Unlike tuples, dictionaries can be freely modified. Add some attributes:
```pycon
>>> d['date'] = (6, 11, 2007)
>>> d['account'] = 12345
>>> d
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007), 'account': 12345}
>>>
```
### (c) Some additional dictionary operations
If you turn a dictionary into a list, youll get all of its keys:
```pycon
>>> list(d)
['name', 'shares', 'price', 'date', 'account']
>>>
```
Similarly, if you use the `for` statement to iterate on a dictionary, you will get the keys:
```pycon
>>> for k in d:
print('k =', k)
k = name
k = shares
k = price
k = date
k = account
>>>
```
Try this variant that performs a lookup at the same time:
```pycon
>>> for k in d:
print(k, '=', d[k])
name = AA
shares = 75
price = 32.2
date = (6, 11, 2007)
account = 12345
>>>
```
You can also obtain all of the keys using the `keys()` method:
```pycon
>>> keys = d.keys()
>>> keys
dict_keys(['name', 'shares', 'price', 'date', 'account'])
>>>
```
`keys()` is a bit unusual in that it returns a special `dict_keys` object.
This is an overlay on the original dictionary that always gives you the current keys—even if the dictionary changes. For example, try this:
```pycon
>>> del d['account']
>>> keys
dict_keys(['name', 'shares', 'price', 'date'])
>>>
```
Carefully notice that the `'account'` disappeared from `keys` even though you didnt call `d.keys()` again.
A more elegant way to work with keys and values together is to use the `items()` method. This gives you `(key, value)` tuples:
```pycon
>>> items = d.items()
>>> items
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
>>> for k, v in d.items():
print(k, '=', v)
name = AA
shares = 75
price = 32.2
date = (6, 11, 2007)
>>>
```
If you have tuples such as `items`, you can create a dictionary using the `dict()` function. Try it:
```pycon
>>> items
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
>>> d = dict(items)
>>> d
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007)}
>>>
```
[Next](02_Containers)

View File

@@ -0,0 +1,413 @@
# Containers
### Overview
Programs often have to work with many objects.
* A portfolio of stocks
* A table of stock prices
There are three main choices to use.
* Lists. Ordered data.
* Dictionaries. Unordered data.
* Sets. Unordered collection
### Lists as a Container
Use a list when the order of the data matters. Remember that lists can hold any kind of objects.
For example, a list of tuples.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.3),
('CAT', 150, 83.44)
]
portfolio[0] # ('GOOG', 100, 490.1)
portfolio[2] # ('CAT', 150, 83.44)
```
### List construction
Building a list from scratch.
```python
records = [] # Initial empty list
# Use .append() to add more items
records.append(('GOOG', 100, 490.10))
records.append(('IBM', 50, 91.3))
...
```
An example when reading records from a file.
```python
records = [] # Initial empty list
with open('portfolio.csv', 'rt') as f:
for line in f:
row = line.split(',')
records.append((row[0], int(row[1])), float(row[2]))
```
### Dicts as a Container
Dictionaries are useful if you want fast random lookups (by key name). For
example, a dictionary of stock prices:
```python
prices = {
'GOOG': 513.25,
'CAT': 87.22,
'IBM': 93.37,
'MSFT': 44.12
}
```
Here are some simple lookups:
```pycon
>>> prices['IBM']
93.37
>>> prices['GOOG']
513.25
>>>
```
### Dict Construction
Example of building a dict from scratch.
```python
prices = {} # Initial empty dict
# Insert new items
prices['GOOG'] = 513.25
prices['CAT'] = 87.22
prices['IBM'] = 93.37
```
An example populating the dict from the contents of a file.
```python
prices = {} # Initial empty dict
with open('prices.csv', 'rt') as f:
for line in f:
row = line.split(',')
prices[row[0]] = float(row[1])
```
### Dictionary Lookups
You can test the existence of a key.
```python
if key in d:
# YES
else:
# NO
```
You can look up a value that might not exist and provide a default value in case it doesn't.
```python
name = d.get(key, default)
```
An example:
```python
>>> prices.get('IBM', 0.0)
93.37
>>> prices.get('SCOX', 0.0)
0.0
>>>
```
### Composite keys
Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable.
For example, tuples:
```python
holidays = {
(1, 1) : 'New Years',
(3, 14) : 'Pi day',
(9, 13) : "Programmer's day",
}
```
Then to access:
```pycon
>>> holidays[3, 14] 'Pi day'
>>>
```
*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.*
### Sets
Sets are collection of unordered unique items.
```python
tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative sintax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
```
Sets are useful for membership tests.
```pycon
>>> tech_stocks
set(['AAPL', 'IBM', 'MSFT'])
>>> 'IBM' in tech_stocks
True
>>> 'FB' in tech_stocks
False
>>>
```
Sets are also useful for duplicate elimination.
```python
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
unique = set(names)
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])
```
Additional set operations:
```python
names.add('CAT') # Add an item
names.remove('YHOO') # Remove an item
s1 | s2 # Set union
s1 & s2 # Set intersection
s1 - s2 # Set difference
```
## Exercises
### Objectives
### Exercise A: A list of tuples
The file `Data/portfolio.csv` contains a list of stocks in a portfolio.
In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation.
Your code should have looked something like this:
```python
# pcost.py
import csv
def portfolio_cost(filename):
'''Computes the total cost (shares*price) of a portfolio file'''
total_cost = 0.0
with open(filename, 'rt') as f:
rows = csv.reader(f)
headers = next(rows)
for row in rows:
nshares = int(row[1])
price = float(row[2])
total_cost += nshares * price
return total_cost
```
Using this code as a rough guide, create a new file `report.py`. In
that file, define a function `read_portfolio(filename)` that opens a
given portfolio file and reads it into a list of tuples. To do this,
youre going to make a few minor modifications to the above code.
First, instead of defining `total_cost = 0`, youll make a variable thats initially set to an empty list. For example:
```python
portfolio = []
```
Next, instead of totaling up the cost, youll turn each row into a
tuple exactly as you just did in the last exercise and append it to
this list. For example:
```python
for row in rows:
holding = (row[0], int(row[1]), float(row[2]))
portfolio.append(holding)
```
Finally, youll return the resulting `portfolio` list.
Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter):
*Hint: Use `-i` when executing the file in the terminal*
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> portfolio
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
>>>
>>> portfolio[0]
('AA', 100, 32.2)
>>> portfolio[1]
('IBM', 50, 91.1)
>>> portfolio[1][1]
50
>>> total = 0.0
>>> for s in portfolio:
total += s[1] * s[2]
>>> print(total)
44671.15
>>>
```
This list of tuples that you have created is very similar to a 2-D array.
For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers.
That said, you can also rewrite the last for-loop using a statement like this:
```python
>>> total = 0.0
>>> for name, shares, price in portfolio:
total += shares*price
>>> print(total)
44671.15
>>>
```
### (b) List of Dictionaries
Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple.
In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.
Experiment with this new function in the same manner as you did in part (a).
```pycon
>>> portfolio = read_portfolio('portfolio.csv')
>>> portfolio
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
{'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
{'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
{'name': 'IBM', 'shares': 100, 'price': 70.44}]
>>> portfolio[0]
{'name': 'AA', 'shares': 100, 'price': 32.2}
>>> portfolio[1]
{'name': 'IBM', 'shares': 50, 'price': 91.1}
>>> portfolio[1]['shares']
50
>>> total = 0.0
>>> for s in portfolio:
total += s['shares']*s['price']
>>> print(total)
44671.15
>>>
```
Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers.
This is often preferred because the resulting code is easier to read later.
Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function.
```pycon
>>> from pprint import pprint
>>> pprint(portfolio)
[{'name': 'AA', 'price': 32.2, 'shares': 100},
{'name': 'IBM', 'price': 91.1, 'shares': 50},
{'name': 'CAT', 'price': 83.44, 'shares': 150},
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
{'name': 'GE', 'price': 40.37, 'shares': 95},
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
>>>
```
### (c) Dictionaries as a container
A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer.
In the Python shell, try playing with a dictionary:
```pycon
>>> prices = { }
>>> prices['IBM'] = 92.45
>>> prices['MSFT'] = 45.12
>>> prices
... look at the result ...
>>> prices['IBM']
92.45
>>> prices['AAPL']
... look at the result ...
>>> 'AAPL' in prices
False
>>>
```
The file `Data/prices.csv` contains a series of lines with stock prices.
The file looks something like this:
```csv
"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
"C",3.72
...
```
Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.
To do this, start with an empty dictionary and start inserting values into it just
as you did above. However, you are reading the values from a file now.
Well use this data structure to quickly lookup the price of a given stock name.
A few little tips that youll need for this part. First, make sure you use the `csv` module just as you did before—theres no need to reinvent the wheel here.
```pycon
>>> import csv
>>> f = open('Data/prices.csv', 'r')
>>> rows = csv.reader(f)
>>> for row in rows:
print(row)
['AA', '9.22']
['AXP', '24.85']
...
[]
>>>
```
The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.
Theres a possibility that this could cause your program to die with an exception.
Use the `try` and `except` statements to catch this as appropriate.
Once you have written your `read_prices()` function, test it interactively to make sure it works:
```python
>>> prices = read_prices('Data/prices.csv')
>>> prices['IBM']
106.28
>>> prices['MSFT']
20.89
>>>
```
### (e) Finding out if you can retire
Tie all of this work together by adding the statements to your `report.py` program.
It takes the list of stocks in part (b) and the dictionary of prices in part (c) and
computes the current value of the portfolio along with the gain/loss.
[Next](03_Formatting)

View File

@@ -0,0 +1,276 @@
# 2.3 Formatting
This is a slight digression, but when you work with data, you often want to
produce structured output (tables, etc.). For example:
```code
Name Shares Price
---------- ---------- -----------
AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
```
### String Formatting
One way to format string in Python 3.6+ is with `f-strings`.
```python
>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
' IBM 100 91.10'
>>>
```
The part `{expression:format}` is replaced.
It is commonly used with `print`.
```python
print(f'{name:>10s} {shares:>10d} {price:>10.2f}')
```
### Format codes
Format codes (after the `:` inside the `{}`) are similar to C `printf()`. Common codes
include:
```code
d Decimal integer
b Binary integer
x Hexadecimal integer
f Float as [-]m.dddddd
e Float as [-]m.dddddde+-xx
g Float, but selective use of E notation s String
c Character (from integer)
```
Common modifiers adjust the field width and decimal precision. This is a partial list:
```code
:>10d Integer right aligned in 10-character field
:<10d Integer left aligned in 10-character field
:^10d Integer centered in 10-character field :0.2f Float with 2 digit precision
```
### Dictionary Formatting
You can use the `format_map()` method on strings.
```python
>>> s = {
'name': 'IBM',
'shares': 100,
'price': 91.1
}
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
' IBM 100 91.10'
>>>
```
It uses the same `f-strings` but takes the values from the supplied dictionary.
### C-Style Formatting
You can also use the formatting operator `%`.
```python
>>> 'The value is %d' % 3
'The value is 3'
>>> '%5d %-5d %10d' % (3,4,5)
' 3 4 5'
>>> '%0.2f' % (3.1415926,)
'3.14'
```
This requires a single item or a tuple on the right. Format codes are modeled after the C `printf()` as well.
*Note: This is the only formatting available on byte strings.*
```python
>>> b'%s has %n messages' % (b'Dave', 37)
b'Dave has 37 messages'
>>>
```
## Exercises
In the previous exercise, you wrote a program called `report.py` that computed the gain/loss of a
stock portfolio. In this exercise, you're going to modify it to produce a table like this:
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
GE 95 13.48 -26.89
MSFT 50 20.89 -44.21
IBM 100 106.28 35.84
```
In this report, "Price" is the current share price of the stock and "Change" is the change in the share price from the initial purchase price.
### (a) How to format numbers
A common problem with printing numbers is specifying the number of decimal places. One way to fix this is to use f-strings. Try
these examples:
```python
>>> value = 42863.1
>>> print(value)
42863.1
>>> print(f'{value:0.4f}')
42863.1000
>>> print(f'{value:>16.2f}')
42863.10
>>> print(f'{value:<16.2f}')
42863.10
>>> print(f'{value:*>16,.2f}')
*******42,863.10
>>>
```
Full documentation on the formatting codes used f-strings can be found
[here](https://docs.python.org/3/library/string.html#format-specification-mini-language). Formatting
is also sometimes performed using the `%` operator of strings.
```pycon
>>> print('%0.4f' % value)
42863.1000
>>> print('%16.2f' % value)
42863.10
>>>
```
Documentation on various codes used with `%` can be found [here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting).
Although its commonly used with `print`, string formatting is not tied to printing.
If you want to save a formatted string. Just assign it to a variable.
```pycon
>>> f = '%0.4f' % value
>>> f
'42863.1000'
>>>
```
### (b) Collecting Data
In order to generate the above report, youll first want to collect
all of the data shown in the table. Write a function `make_report()`
that takes a list of stocks and dictionary of prices as input and
returns a list of tuples containing the rows of the above table.
Add this function to your `report.py` file. Heres how it should work if you try it interactively:
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> prices = read_prices('Data/prices.csv')
>>> report = make_report(portfolio, prices)
>>> for r in report:
print(r)
('AA', 100, 9.22, -22.980000000000004)
('IBM', 50, 106.28, 15.180000000000007)
('CAT', 150, 35.46, -47.98)
('MSFT', 200, 20.89, -30.339999999999996)
('GE', 95, 13.48, -26.889999999999997)
...
>>>
```
### (c) Printing a formatted table
Redo the above for-loop, but change the print statement to format the tuples.
```pycon
>>> for r in report:
print('%10s %10d %10.2f %10.2f' % r)
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
...
>>>
```
You can also expand the values and use f-strings. For example:
```pycon
>>> for name, shares, price, change in report:
print(f'{name:>10s} {shares:>10d} {price:>10.2f} {change:>10.2f}')
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
...
>>>
```
Take the above statements and add them to your `report.py` program.
Have your program take the output of the `make_report()` function and print a nicely formatted table as shown.
### (d) Adding some headers
Suppose you had a tuple of header names like this:
```python
headers = ('Name', 'Shares', 'Price', 'Change')
```
Add code to your program that takes the above tuple of headers and
creates a string where each header name is right-aligned in a
10-character wide field and each field is separated by a single space.
```python
' Name Shares Price Change'
```
Write code that takes the headers and creates the separator string between the headers and data to follow.
This string is just a bunch of "-" characters under each field name. For example:
```python
'---------- ---------- ---------- -----------'
```
When youre done, your program should produce the table shown at the top of this exercise.
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 9.22 -22.98
IBM 50 106.28 15.18
CAT 150 35.46 -47.98
MSFT 200 20.89 -30.34
GE 95 13.48 -26.89
MSFT 50 20.89 -44.21
IBM 100 106.28 35.84
```
### (e) Formatting Challenge
How would you modify your code so that the price includes the currency symbol ($) and the output looks like this:
```code
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 $9.22 -22.98
IBM 50 $106.28 15.18
CAT 150 $35.46 -47.98
MSFT 200 $20.89 -30.34
GE 95 $13.48 -26.89
MSFT 50 $20.89 -44.21
IBM 100 $106.28 35.84
```
[Next](04_Sequences)

View File

@@ -0,0 +1,538 @@
# 2.4 Sequences
In this part, we look at some common idioms for working with sequence data.
### Introduction
Python has three *sequences* datatypes.
* String: `'Hello'`. A string is considered a sequence of characters.
* List: `[1, 4, 5]`.
* Tuple: `('GOOG', 100, 490.1)`.
All sequences are ordered and have length.
```python
a = 'Hello' # String
b = [1, 4, 5] # List
c = ('GOOG', 100, 490.1) # Tuple
# Indexed order
a[0] # 'H'
b[-1] # 5
c[1] # 100
# Length of sequence
len(a) # 5
len(b) # 3
len(c) # 3
```
Sequences can be replicated: `s * n`.
```pycon
>>> a = 'Hello'
>>> a * 3
'HelloHelloHello'
>>> b = [1, 2, 3]
>>> b * 2
[1, 2, 3, 1, 2, 3]
>>>
```
Sequences of the same type can be concatenated: `s + t`.
```pycon
>>> a = (1, 2, 3)
>>> b = (4, 5)
>>> a + b
(1, 2, 3, 4, 5)
>>>
>>> c = [1, 5]
>>> a + c
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "list") to tuple
```
### Slicing
Slicing means to take a subsequence from a sequence.
The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
```python
a = [0,1,2,3,4,5,6,7,8]
a[2:5] # [2,3,4]
a[-5:] # [4,5,6,7,8]
a[:3] # [0,1,2]
```
* Indices `start` and `end` must be integers.
* Slices do *not* include the end value.
* If indices are omitted, they default to the beginning or end of the list.
### Slice re-assignment
Slices can also be reassigned and deleted.
```python
# Reassignment
a = [0,1,2,3,4,5,6,7,8]
a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8]
```
*Note: The reassigned slice doesn't need to have the same length.*
```python
# Deletion
a = [0,1,2,3,4,5,6,7,8]
del a[2:4] # [0,1,4,5,6,7,8]
```
### Sequence Reductions
There are some functions to reduce a sequence to a single value.
```pycon
>>> s = [1, 2, 3, 4]
>>> sum(s)
10
>>> min(s) 1
>>> max(s) 4
>>> t = ['Hello', 'World']
>>> max(t)
'World'
>>>
```
### Iteration over a sequence
The for-loop iterates over the elements in the sequence.
```pycon
>>> s = [1, 4, 9, 16]
>>> for i in s:
... print(i)
...
1
4
9
16
>>>
```
On each iteration of the loop, you get a new item to work with.
This new value is placed into an iteration variable. In this example, the
iteration variable is `x`:
```python
for x in s: # `x` is an iteration variable
...statements
```
In each iteration, it overwrites the previous value (if any).
After the loop finishes, the variable retains the last value.
### `break` statement
You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
```python
for name in namelist:
if name == 'Jake':
break
...
...
statements
```
When the `break` statement is executed, it will exit the loop and move
on the next `statements`. The `break` statement only applies to the
inner-most loop. If this loop is within another loop, it will not
break the outer loop.
### `continue` statement
To skip one element and move to the next one you use the `continue` statement.
```python
for line in lines:
if line == '\n': # Skip blank lines
continue
# More statements
...
```
This is useful when the current item is not of interest or needs to be ignored in the processing.
### Looping over integers
If you need to count, use `range()`.
```python
for i in range(100):
# i = 0,1,...,99
```
The syntax is `range([start,] end [,step])`
```python
for i in range(100):
# i = 0,1,...,99
for j in range(10,20):
# j = 10,11,..., 19
for k in range(10,50,2):
# k = 10,12,...,48
# Notice how it counts in steps of 2, not 1.
```
* The ending value is never included. It mirrors the behavior of slices.
* `start` is optional. Default `0`.
* `step` is optional. Default `1`.
### `enumerate()` function
The `enumerate` function provides a loop with an extra counter value.
```python
names = ['Elwood', 'Jake', 'Curtis']
for i, name in enumerate(names):
# Loops with i = 0, name = 'Elwood'
# i = 1, name = 'Jake'
# i = 2, name = 'Curtis'
```
How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
A good example of using `enumerate()` is tracking line numbers while reading a file:
```python
with open(filename) as f:
for lineno, line in enumerate(f, start=1):
...
```
In the end, `enumerate` is just a nice shortcut for:
```python
i = 0
for x in s:
statements
i += 1
```
Using `enumerate` is less typing and runs slightly faster.
### For and tuples
You can loop with multiple iteration variables.
```python
points = [
(1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
]
for x, y in points:
# Loops with x = 1, y = 4
# x = 10, y = 40
# x = 23, y = 14
# ...
```
When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
### `zip()` function
The `zip` function takes sequences and makes an iterator that combines them.
```python
columns = ['name', 'shares', 'price']
values = ['GOOG', 100, 490.1 ]
pairs = zip(a, b)
# ('name','GOOG'), ('shares',100), ('price',490.1)
```
To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
```python
for column, value in pairs:
...
```
A common use of `zip` is to create key/value pairs for constructing dictionaries.
```python
d = dict(zip(columns, values))
```
## Exercises
### (a) Counting
Try some basic counting examples:
```pycon
>>> for n in range(10): # Count 0 ... 9
print(n, end=' ')
0 1 2 3 4 5 6 7 8 9
>>> for n in range(10,0,-1): # Count 10 ... 1
print(n, end=' ')
10 9 8 7 6 5 4 3 2 1
>>> for n in range(0,10,2): # Count 0, 2, ... 8
print(n, end=' ')
0 2 4 6 8
>>>
```
### (b) More sequence operations
Interactively experiment with some of the sequence reduction operations.
```pycon
>>> data = [4, 9, 1, 25, 16, 100, 49]
>>> min(data)
1
>>> max(data)
100
>>> sum(data)
204
>>>
```
Try looping over the data.
```pycon
>>> for x in data:
print(x)
4
9
...
>>> for n, x in enumerate(data):
print(n, x)
0 4
1 9
2 1
...
>>>
```
Sometimes the `for` statement, `len()`, and `range()` get used by
novices in some kind of horrible code fragment that looks like it
emerged from the depths of a rusty C program.
```pycon
>>> for n in range(len(data)):
print(data[n])
4
9
1
...
>>>
```
Dont do that! Not only does reading it make everyones eyes bleed, its inefficient with memory and it runs a lot slower.
Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason.
### (c) A practical `enumerate()` example
Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
```python
>>> cost = portfolio_cost('Data/missing.csv')
Row 4: Couldn't convert: ['MSFT', '', '51.23']
Row 7: Couldn't convert: ['IBM', '', '70.44']
>>>
```
To do this, youll need to change just a few parts of your code.
```python
...
for rowno, row in enumerate(rows, start=1):
try:
...
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
```
### (d) Using the `zip()` function
In the file `portfolio.csv`, the first line contains column headers. In all previous code, weve been discarding them.
```pycon
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> headers
['name', 'shares', 'price']
>>>
```
However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
First try this to pair the file headers with a row of data:
```pycon
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>> list(zip(headers, row))
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
>>>
```
Notice how `zip()` paired the column headers with the column values.
Weve used `list()` here to turn the result into a list so that you
can see it. Normally, `zip()` creates an iterator that must be
consumed by a for-loop.
This pairing is just an intermediate step to building a dictionary. Now try this:
```pycon
>>> record = dict(zip(headers, row))
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
```
This transformation is one of the most useful tricks to know about
when processing a lot of data files. For example, suppose you wanted
to make the `pcost.py` program work with various input files, but
without regard for the actual column number where the name, shares,
and price appear.
Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
```python
# pcost.py
def portfolio_cost(filename):
...
for rowno, row in enumerate(rows, start=1):
record = dict(zip(headers, row))
try:
nshares = int(record['shares'])
price = float(record['price'])
total_cost += nshares * price
# This catches errors in int() and float() conversions above
except ValueError:
print(f'Row {rowno}: Bad row: {row}')
...
```
Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
```csv
name,date,time,shares,price
"AA","6/11/2007","9:50am",100,32.20
"IBM","5/13/2007","4:20pm",50,91.10
"CAT","9/23/2006","1:30pm",150,83.44
"MSFT","5/17/2007","10:30am",200,51.23
"GE","2/1/2006","10:45am",95,40.37
"MSFT","10/31/2006","12:05pm",50,65.10
"IBM","7/9/2006","3:15pm",100,70.44
```
```python
>>> portfolio_cost('Data/portfoliodate.csv')
44671.15
>>>
```
If you did it right, youll find that your program still works even
though the data file has a completely different column format than
before. Thats cool!
The change made here is subtle, but significant. Instead of
`portfolio_cost()` being hardcoded to read a single fixed file format,
the new version reads any CSV file and picks the values of interest
out of it. As long as the file has the required columns, the code will work.
Modify the `report.py` program you wrote in Section 2.3 that it uses
the same technique to pick out column headers.
Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
produces the same answer as before.
### (e) Inverting a dictionary
A dictionary maps keys to values. For example, a dictionary of stock prices.
```pycon
>>> prices = {
'GOOG' : 490.1,
'AA' : 23.45,
'IBM' : 91.1,
'MSFT' : 34.23
}
>>>
```
If you use the `items()` method, you can get `(key,value)` pairs:
```pycon
>>> prices.items()
dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
>>>
```
However, what if you wanted to get a list of `(value, key)` pairs instead?
*Hint: use `zip()`.*
```pycon
>>> pricelist = list(zip(prices.values(),prices.keys()))
>>> pricelist
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
>>>
```
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
```pycon
>>> min(pricelist)
(23.45, 'AA')
>>> max(pricelist)
(490.1, 'GOOG')
>>> sorted(pricelist)
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
>>>
```
This also illustrates an important feature of tuples. When used in
comparisons, tuples are compared element-by-element starting with the
first item. Similar to how strings are compared
character-by-character.
`zip()` is often used in situations like this where you need to pair
up data from different places. For example, pairing up the column
names with column values in order to make a dictionary of named
values.
Note that `zip()` is not limited to pairs. For example, you can use it
with any number of input lists:
```pycon
>>> a = [1, 2, 3, 4]
>>> b = ['w', 'x', 'y', 'z']
>>> c = [0.2, 0.4, 0.6, 0.8]
>>> list(zip(a, b, c))
[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
>>>
```
Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
```pycon
>>> a = [1, 2, 3, 4, 5, 6]
>>> b = ['x', 'y', 'z']
>>> list(zip(a,b))
[(1, 'x'), (2, 'y'), (3, 'z')]
>>>
```
[Next](05_Collections)

View File

@@ -0,0 +1,160 @@
# 2.5 `collections` module
The `collections` module provides a number of useful objects for data handling.
This part briefly introduces some of these features.
### Example: Counting Things
Let's say you want to tabulate the total shares of each stock.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
```
There are two `IBM` entries and two `GOOG` entries in this list. The shares need to be combined together somehow.
Solution: Use a `Counter`.
```python
from collections import Counter
total_shares = Counter()
for name, shares, price in portfolio:
total_shares[name] += shares
total_shares['IBM'] # 150
```
### Example: One-Many Mappings
Problem: You want to map a key to multiple values.
```python
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
```
Like in the previous example, the key `IBM` should have two different tuples instead.
Solution: Use a `defaultdict`.
```python
from collections import defaultdict
holdings = defaultdict(list)
for name, shares, price in portfolio:
holdings[name].append((shares, price))
holdings['IBM'] # [ (50, 91.1), (100, 45.23) ]
```
The `defaultdict` ensures that every time you access a key you get a default value.
### Example: Keeping a History
Problem: We want a history of the last N things.
Solution: Use a `deque`.
```python
from collections import deque
history = deque(maxlen=N)
with open(filename) as f:
for line in f:
history.append(line)
...
```
## Exercises
The `collections` module might be one of the most useful library
modules for dealing with special purpose kinds of data handling
problems such as tabulating and indexing.
In this exercise, well look at a few simple examples. Start by
running your `report.py` program so that you have the portfolio of
stocks loaded in the interactive mode.
```bash
bash % python3 -i report.py
```
### (a) Tabulating with Counters
Suppose you wanted to tabulate the total number of shares of each stock.
This is easy using `Counter` objects. Try it:
```pycon
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
```
Carefully observe how the multiple entries for `MSFT` and `IBM` in `portfolio` get combined into a single entry here.
You can use a Counter just like a dictionary to retrieve individual values:
```python
>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>
```
If you want to rank the values, do this:
```python
>>> # Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>
```
Lets grab another portfolio of stocks and make a new Counter:
```pycon
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
```
Finally, lets combine all of the holdings doing one simple operation:
```pycon
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>
```
This is only a small taste of what counters provide. However, if you
ever find yourself needing to tabulate values, you should consider
using one.
[Next](06_List_comprehension)

View File

@@ -0,0 +1,316 @@
# 2.6 List Comprehensions
A common task is processing items in a list. This section introduces list comprehensions,
a useful tool for doing just that.
### Creating new lists
A list comprehension creates a new list by applying an operation to each element of a sequence.
```pycon
>>> a = [1, 2, 3, 4, 5]
>>> b = [2*x for x in a ]
>>> b
[2, 4, 6, 8, 10]
>>>
```
Another example:
```pycon
>>> names = ['Elwood', 'Jake']
>>> a = [name.lower() for name in names]
>>> a
['elwood', 'jake']
>>>
```
The general syntax is: `[ <expression> for <variable_name> in <sequence> ]`.
### Filtering
You can also filter during the list comprehension.
```pycon
>>> a = [1, -5, 4, 2, -2, 10]
>>> b = [2*x for x in a if x > 0 ]
>>> b
[2, 8, 4, 20]
>>>
```
### Use cases
List comprehensions are hugely useful. For example, you can collect values of a specific
record field:
```python
stocknames = [s['name'] for s in stocks]
```
You can perform database-like queries on sequences.
```python
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]
```
You can also combine a list comprehension with a sequence reduction:
```python
cost = sum([s['shares']*s['price'] for s in stocks])
```
### General Syntax
```code
[ <expression> for <variable_name> in <sequence> if <condition>]
```
What it means:
```python
result = []
for variable_name in sequence:
if condition:
result.append(expression)
```
### Historical Digression
List comprehension come from math (set-builder notation).
```code
a = [ x * x for x in s if x > 0 ] # Python
a = { x^2 | x ∈ s, x > 0 } # Math
```
It is also implemented in several other languages. Most
coders probably aren't thinking about their math class though. So,
it's fine to view it as a cool list shortcut.
## Exercises
Start by running your `report.py` program so that you have the portfolio of stocks loaded in the interactive mode.
```bash
bash % python3 -i report.py
```
Now, at the Python interactive prompt, type statements to perform the operations described below.
These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.
### (a) List comprehensions
Try a few simple list comprehensions just to become familiar with the syntax.
```pycon
>>> nums = [1,2,3,4]
>>> squares = [ x * x for x in nums ]
>>> squares
[1, 4, 9, 16]
>>> twice = [ 2 * x for x in nums if x > 2 ]
>>> twice
[6, 8]
>>>
```
Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.
### (b) Sequence Reductions
Compute the total cost of the portfolio using a single Python statement.
```pycon
>>> cost = sum([ s['shares'] * s['price'] for s in portfolio ])
>>> cost
44671.15
>>>
```
After you have done that, show how you can compute the current value of the portfolio using a single statement.
```pycon
>>> value = sum([ s['shares'] * prices[s['name']] for s in portfolio ])
>>> value
28686.1
>>>
```
Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list.
```pycon
>>> [ s['shares'] * s['price'] for s in portfolio ]
[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0]
>>>
```
The `sum()` function is then performing a reduction across the result:
```python
>>> sum(_)
44671.15
>>>
```
With this knowledge, you are now ready to go launch a big-data startup company.
### (c) Data Queries
Try the following examples of various data queries.
First, a list of all portfolio holdings with more than 100 shares.
```pycon
>>> more100 = [ s for s in portfolio if s['shares'] > 100 ]
>>> more100
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>
```
All portfolio holdings for MSFT and IBM stocks.
```pycon
>>> msftibm = [ s for s in portfolio if s['name'] in {'MSFT','IBM'} ]
>>> msftibm
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200},
{'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>>
```
A list of all portfolio holdings that cost more than $10000.
```pycon
>>> cost10k = [ s for s in portfolio if s['shares'] * s['price'] > 10000 ]
>>> cost10k
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>
```
### (d) Data Extraction
Show how you could build a list of tuples `(name, shares)` where `name` and `shares` are taken from `portfolio`.
```pycon
>>> name_shares =[ (s['name'], s['shares']) for s in portfolio ]
>>> name_shares
[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)]
>>>
```
If you change the the square brackets (`[`,`]`) to curly braces (`{`, `}`), you get something known as a set comprehension.
This gives you unique or distinct values.
For example, this determines the set of stock names that appear in `portfolio`:
```pycon
>>> names = { s['name'] for s in portfolio }
>>> names
{ 'AA', 'GE', 'IBM', 'MSFT', 'CAT'] }
>>>
```
If you specify `key:value` pairs, you can build a dictionary.
For example, make a dictionary that maps the name of a stock to the total number of shares held.
```pycon
>>> holdings = { name: 0 for name in names }
>>> holdings
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
>>>
```
This latter feature is known as a **dictionary comprehension**. Lets tabulate:
```pycon
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
{ 'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT':250, 'CAT': 150 }
>>>
```
Try this example that filters the `prices` dictionary down to only those names that appear in the portfolio:
```pycon
>>> portfolio_prices = { name: prices[name] for name in names }
>>> portfolio_prices
{'AA': 9.22, 'GE': 13.48, 'IBM': 106.28, 'MSFT': 20.89, 'CAT': 35.46}
>>>
```
### (e) Advanced Bonus: Extracting Data From CSV Files
Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing.
Heres an example that shows how to extract selected columns from a CSV file.
First, read a row of header information from a CSV file:
```pycon
>>> import csv
>>> f = open('Data/portfoliodate.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> headers
['name', 'date', 'time', 'shares', 'price']
>>>
```
Next, define a variable that lists the columns that you actually care about:
```pycon
>>> select = ['name', 'shares', 'price']
>>>
```
Now, locate the indices of the above columns in the source CSV file:
```pycon
>>> indices = [ headers.index(colname) for colname in select ]
>>> indices
[0, 3, 4]
>>>
```
Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:
```pycon
>>> row = next(rows)
>>> record = { colname: row[index] for colname, index in zip(select, indices) } # dict-comprehension
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
```
If youre feeling comfortable with what just happened, read the rest
of the file:
```pycon
>>> portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ]
>>> portfolio
[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'},
{'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'},
{'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
Oh my, you just reduced much of the `read_portfolio()` function to a single statement.
### Commentary
List comprehensions are commonly used in Python as an efficient means
for transforming, filtering, or collecting data. Due to the syntax,
you dont want to go overboard—try to keep each list comprehension as
simple as possible. Its okay to break things into multiple
steps. For example, its not clear that you would want to spring that
last example on your unsuspecting co-workers.
That said, knowing how to quickly manipulate data is a skill thats
incredibly useful. There are numerous situations where you might have
to solve some kind of one-off problem involving data imports, exports,
extraction, and so forth. Becoming a guru master of list
comprehensions can substantially reduce the time spent devising a
solution. Also, don't forget about the `collections` module.
[Next](07_Objects)

View File

@@ -0,0 +1,408 @@
# 2.7 Objects
This section introduces more details about Python's internal object model and
discusses some matters related to memory management, copying, and type checking.
### Assignment
Many operations in Python are related to *assigning* or *storing* values.
```python
a = value # Assignment to a variable
s[n] = value # Assignment to an list
s.append(value) # Appending to a list
d['key'] = value # Adding to a dictionary
```
*A caution: assignment operations **never make a copy** of the value being assigned.*
All assignments are merely reference copies (or pointer copies if you prefer).
### Assignment example
Consider this code fragment.
```python
a = [1,2,3]
b = a
c = [a,b]
```
A picture of the underlying memory operations. In this example, there
is only one list object `[1,2,3]`, but there are four different
references to it.
This means that modifying a value affects *all* references.
```pycon
>>> a.append(999)
>>> a
[1,2,3,999]
>>> b
[1,2,3,999]
>>> c
[[1,2,3,999], [1,2,3,999]]
>>>
```
Notice how a change in the original list shows up everywhere else (yikes!).
This is because no copies were ever made. Everything is pointing to the same thing.
### Reassigning values
Reassigning a value *never* overwrites the memory used by the previous value.
```pycon
a = [1,2,3]
b = a
a = [4,5,6]
print(a) # [4, 5, 6]
print(b) # [1, 2, 3] Holds the original value
```
Remember: **Variables are names, not memory locations.**
### Some Dangers
If you don't know about this sharing, you will shoot yourself in the
foot at some point. Typical scenario. You modify some data thinking
that it's your own private copy and it accidentally corrupts some data
in some other part of the program.
*Comment: This is one of the reasons why the primitive datatypes (int, float, string) are immutable (read-only).*
### Identity and References
Use ths `is` operator to check if two values are exactly the same object.
```pycon
>>> a = [1,2,3]
>>> b = a
>>> a is b
True
>>>
```
`is` compares the object identity (an integer). The identity can be
obtained using `id()`.
```pycon
>>> id(a)
3588944
>>> id(b)
3588944
>>>
```
### Shallow copies
Lists and dicts have methods for copying.
```pycon
>>> a = [2,3,[100,101],4]
>>> b = list(a) # Make a copy
>>> a is b
False
```
It's a new list, but the list items are shared.
```python
>>> a[2].append(102)
>>> b[2]
[100,101,102]
>>>
>>> a[2] is b[2]
True
>>>
```
For example, the inner list `[100, 101]` is being shared.
This is knows as a shallow copy.
### Deep copies
Sometimes you need to make a copy of an object and all the objects contained withn it.
You can use the `copy` module for this:
```pycon
>>> a = [2,3,[100,101],4]
>>> import copy
>>> b = copy.deepcopy(a)
>>> a[2].append(102)
>>> b[2]
[100,101]
>>> a[2] is b[2]
False
>>>
```
### Names, Values, Types
Variable names do not have a *type*. It's only a name.
However, values *do* have an underlying type.
```pycon
>>> a = 42
>>> b = 'Hello World'
>>> type(a)
<type 'int'>
>>> type(b)
<type 'str'>
```
`type()` will tell you what it is. The type name is usually a function
that creates or converts a value to that type.
### Type Checking
How to tell if an object is a specific type.
```python
if isinstance(a,list):
print('a is a list')
```
Checking for one of many types.
```python
if isinstance(a, (list,tuple)):
print('a is a list or tuple')
```
*Caution: Don't go overboard with type checking. It can lead to excessive complexity.*
### Everything is an object
Numbers, strings, lists, functions, exceptions, classes, instances,
etc. are all objects. It means that all objects that can be named can
be passed around as data, placed in containers, etc., without any
restrictions. There are no *special* kinds of objects. Sometimes it
is said that all objects are "first-class".
A simple example:
```pycon
>>> import math
>>> items = [abs, math, ValueError ]
>>> items
[<built-in function abs>,
<module 'math' (builtin)>,
<type 'exceptions.ValueError'>]
>>> items[0](-45)
45
>>> items[1].sqrt(2)
1.4142135623730951
>>> try:
x = int('not a number')
except items[2]:
print('Failed!')
Failed!
>>>
```
Here, `items` is a list containing a function, a module and an exception.
You can use the items in the list in place of the original names:
```python
items[0](-45) # abs
items[1].sqrt(2) # math
except items[2]: # ValueError
```
## Exercises
In this set of exercises, we look at some of the power that comes from first-class
objects.
### (a) First-class Data
In the file `Data/portfolio.csv`, we read data organized as columns that look like this:
```csv
name,shares,price
"AA",100,32.20
"IBM",50,91.10
...
```
In previous code, we used the `csv` module to read the file, but still had to perform manual type conversions. For example:
```python
for row in rows:
name = row[0]
shares = int(row[1])
price = float(row[2])
```
This kind of conversion can also be performed in a more clever manner using some list basic operations.
Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:
```pycon
>>> types = [str, int, float]
>>>
```
The reason you can even create this list is that everything in Python
is *first-class*. So, if you want to have a list of functions, thats
fine. The items in the list you created are functions for converting
a value `x` into a given type (e.g., `str(x)`, `int(x)`, `float(x)`).
Now, read a row of data from the above file:
```pycon
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> row = next(rows)
>>> row
['AA', '100', '32.20']
>>>
```
As noted, this row isnt enough to do calculations because the types are wrong. For example:
```pycon
>>> row[1] * row[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>
```
However, maybe the data can be paired up with the types you specified in `types`. For example:
```pycon
>>> types[1]
<type 'int'>
>>> row[1]
'100'
>>>
```
Try converting one of the values:
```pycon
>>> types[1](row[1]) # Same as int(row[1])
100
>>>
```
Try converting a different value:
```pycon
>>> types[2](row[2]) # Same as float(row[2])
32.2
>>>
```
Try the calculation with converted values:
```pycon
>>> types[1](row[1])*types[2](row[2])
3220.0000000000005
>>>
```
Zip the column types with the fields and look at the result:
```pycon
>>> r = list(zip(types, row))
>>> r
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
>>>
```
You will notice that this has paired a type conversion with a
value. For example, `int` is paired with the value `'100'`.
The zipped list is useful if you want to perform conversions on all of the values, one
after the other. Try this:
```pycon
>>> converted = []
>>> for func, val in zip(types, row):
converted.append(func(val))
...
>>> converted
['AA', 100, 32.2]
>>> converted[1] * converted[2]
3220.0000000000005
>>>
```
Make sure you understand whats happening in the above code.
In the loop, the `func` variable is one of the type conversion functions (e.g.,
`str`, `int`, etc.) and the `val` variable is one of the values like
`'AA'`, `'100'`. The expression `func(val)` is converting a value (kind of like a type cast).
The above code can be compressed into a single list comprehension.
```pycon
>>> converted = [func(val) for func, val in zip(types, row)]
>>> converted
['AA', 100, 32.2]
>>>
```
### (b) Making dictionaries
Remember how the `dict()` function can easily make a dictionary if you have a sequence of key names and values?
Lets make a dictionary from the column headers:
```pycon
>>> headers
['name', 'shares', 'price']
>>> converted
['AA', 100, 32.2]
>>> dict(zip(headers, converted))
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
```
Of course, if youre up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:
```pycon
>>> { name: func(val) for name, func, val in zip(headers, types, row) }
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
```
### (c) The Big Picture
Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary.
Just to illustrate, suppose you read data from a different datafile like this:
```pycon
>>> f = open('Data/dowstocks.csv')
>>> rows = csv.reader(f)
>>> headers = next(rows)
>>> row = next(rows)
>>> headers
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
>>> row
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
>>>
```
Lets convert the fields using a similar trick:
```pycon
>>> types = [str, float, str, str, float, float, float, float, int]
>>> converted = [func(val) for func, val in zip(types, row)]
>>> record = dict(zip(headers, converted))
>>> record
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
'change': -0.18}
>>> record['name']
'AA'
>>> record['price']
39.48
>>>
```
Spend some time to ponder what youve done in this exercise. Well revisit these ideas a little later.