Added sections 2-4
This commit is contained in:
413
Notes/02_Working_with_data/02_Containers.md
Normal file
413
Notes/02_Working_with_data/02_Containers.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# Containers
|
||||
|
||||
### Overview
|
||||
|
||||
Programs often have to work with many objects.
|
||||
|
||||
* A portfolio of stocks
|
||||
* A table of stock prices
|
||||
|
||||
There are three main choices to use.
|
||||
|
||||
* Lists. Ordered data.
|
||||
* Dictionaries. Unordered data.
|
||||
* Sets. Unordered collection
|
||||
|
||||
### Lists as a Container
|
||||
|
||||
Use a list when the order of the data matters. Remember that lists can hold any kind of objects.
|
||||
For example, a list of tuples.
|
||||
|
||||
```python
|
||||
portfolio = [
|
||||
('GOOG', 100, 490.1),
|
||||
('IBM', 50, 91.3),
|
||||
('CAT', 150, 83.44)
|
||||
]
|
||||
|
||||
portfolio[0] # ('GOOG', 100, 490.1)
|
||||
portfolio[2] # ('CAT', 150, 83.44)
|
||||
```
|
||||
|
||||
### List construction
|
||||
|
||||
Building a list from scratch.
|
||||
|
||||
```python
|
||||
records = [] # Initial empty list
|
||||
|
||||
# Use .append() to add more items
|
||||
records.append(('GOOG', 100, 490.10))
|
||||
records.append(('IBM', 50, 91.3))
|
||||
...
|
||||
```
|
||||
|
||||
An example when reading records from a file.
|
||||
|
||||
```python
|
||||
records = [] # Initial empty list
|
||||
|
||||
with open('portfolio.csv', 'rt') as f:
|
||||
for line in f:
|
||||
row = line.split(',')
|
||||
records.append((row[0], int(row[1])), float(row[2]))
|
||||
```
|
||||
|
||||
### Dicts as a Container
|
||||
|
||||
Dictionaries are useful if you want fast random lookups (by key name). For
|
||||
example, a dictionary of stock prices:
|
||||
|
||||
```python
|
||||
prices = {
|
||||
'GOOG': 513.25,
|
||||
'CAT': 87.22,
|
||||
'IBM': 93.37,
|
||||
'MSFT': 44.12
|
||||
}
|
||||
```
|
||||
|
||||
Here are some simple lookups:
|
||||
|
||||
```pycon
|
||||
>>> prices['IBM']
|
||||
93.37
|
||||
>>> prices['GOOG']
|
||||
513.25
|
||||
>>>
|
||||
```
|
||||
|
||||
### Dict Construction
|
||||
|
||||
Example of building a dict from scratch.
|
||||
|
||||
```python
|
||||
prices = {} # Initial empty dict
|
||||
|
||||
# Insert new items
|
||||
prices['GOOG'] = 513.25
|
||||
prices['CAT'] = 87.22
|
||||
prices['IBM'] = 93.37
|
||||
```
|
||||
|
||||
An example populating the dict from the contents of a file.
|
||||
|
||||
```python
|
||||
prices = {} # Initial empty dict
|
||||
|
||||
with open('prices.csv', 'rt') as f:
|
||||
for line in f:
|
||||
row = line.split(',')
|
||||
prices[row[0]] = float(row[1])
|
||||
```
|
||||
|
||||
### Dictionary Lookups
|
||||
|
||||
You can test the existence of a key.
|
||||
|
||||
```python
|
||||
if key in d:
|
||||
# YES
|
||||
else:
|
||||
# NO
|
||||
```
|
||||
|
||||
You can look up a value that might not exist and provide a default value in case it doesn't.
|
||||
|
||||
```python
|
||||
name = d.get(key, default)
|
||||
```
|
||||
|
||||
An example:
|
||||
|
||||
```python
|
||||
>>> prices.get('IBM', 0.0)
|
||||
93.37
|
||||
>>> prices.get('SCOX', 0.0)
|
||||
0.0
|
||||
>>>
|
||||
```
|
||||
|
||||
### Composite keys
|
||||
|
||||
Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable.
|
||||
For example, tuples:
|
||||
|
||||
```python
|
||||
holidays = {
|
||||
(1, 1) : 'New Years',
|
||||
(3, 14) : 'Pi day',
|
||||
(9, 13) : "Programmer's day",
|
||||
}
|
||||
```
|
||||
|
||||
Then to access:
|
||||
|
||||
```pycon
|
||||
>>> holidays[3, 14] 'Pi day'
|
||||
>>>
|
||||
```
|
||||
|
||||
*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.*
|
||||
|
||||
### Sets
|
||||
|
||||
Sets are collection of unordered unique items.
|
||||
|
||||
```python
|
||||
tech_stocks = { 'IBM','AAPL','MSFT' }
|
||||
# Alternative sintax
|
||||
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
|
||||
```
|
||||
|
||||
Sets are useful for membership tests.
|
||||
|
||||
```pycon
|
||||
>>> tech_stocks
|
||||
set(['AAPL', 'IBM', 'MSFT'])
|
||||
>>> 'IBM' in tech_stocks
|
||||
True
|
||||
>>> 'FB' in tech_stocks
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
Sets are also useful for duplicate elimination.
|
||||
|
||||
```python
|
||||
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
|
||||
|
||||
unique = set(names)
|
||||
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])
|
||||
```
|
||||
|
||||
Additional set operations:
|
||||
|
||||
```python
|
||||
names.add('CAT') # Add an item
|
||||
names.remove('YHOO') # Remove an item
|
||||
|
||||
s1 | s2 # Set union
|
||||
s1 & s2 # Set intersection
|
||||
s1 - s2 # Set difference
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
### Objectives
|
||||
|
||||
### Exercise A: A list of tuples
|
||||
|
||||
The file `Data/portfolio.csv` contains a list of stocks in a portfolio.
|
||||
In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation.
|
||||
|
||||
Your code should have looked something like this:
|
||||
|
||||
```python
|
||||
# pcost.py
|
||||
|
||||
import csv
|
||||
|
||||
def portfolio_cost(filename):
|
||||
'''Computes the total cost (shares*price) of a portfolio file'''
|
||||
total_cost = 0.0
|
||||
|
||||
with open(filename, 'rt') as f:
|
||||
rows = csv.reader(f)
|
||||
headers = next(rows)
|
||||
for row in rows:
|
||||
nshares = int(row[1])
|
||||
price = float(row[2])
|
||||
total_cost += nshares * price
|
||||
return total_cost
|
||||
```
|
||||
|
||||
Using this code as a rough guide, create a new file `report.py`. In
|
||||
that file, define a function `read_portfolio(filename)` that opens a
|
||||
given portfolio file and reads it into a list of tuples. To do this,
|
||||
you’re going to make a few minor modifications to the above code.
|
||||
|
||||
First, instead of defining `total_cost = 0`, you’ll make a variable that’s initially set to an empty list. For example:
|
||||
|
||||
```python
|
||||
portfolio = []
|
||||
```
|
||||
|
||||
Next, instead of totaling up the cost, you’ll turn each row into a
|
||||
tuple exactly as you just did in the last exercise and append it to
|
||||
this list. For example:
|
||||
|
||||
```python
|
||||
for row in rows:
|
||||
holding = (row[0], int(row[1]), float(row[2]))
|
||||
portfolio.append(holding)
|
||||
```
|
||||
|
||||
Finally, you’ll return the resulting `portfolio` list.
|
||||
|
||||
Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter):
|
||||
|
||||
*Hint: Use `-i` when executing the file in the terminal*
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> portfolio
|
||||
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
|
||||
('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
|
||||
>>>
|
||||
>>> portfolio[0]
|
||||
('AA', 100, 32.2)
|
||||
>>> portfolio[1]
|
||||
('IBM', 50, 91.1)
|
||||
>>> portfolio[1][1]
|
||||
50
|
||||
>>> total = 0.0
|
||||
>>> for s in portfolio:
|
||||
total += s[1] * s[2]
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
This list of tuples that you have created is very similar to a 2-D array.
|
||||
For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers.
|
||||
|
||||
That said, you can also rewrite the last for-loop using a statement like this:
|
||||
|
||||
```python
|
||||
>>> total = 0.0
|
||||
>>> for name, shares, price in portfolio:
|
||||
total += shares*price
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) List of Dictionaries
|
||||
|
||||
Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple.
|
||||
In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.
|
||||
|
||||
Experiment with this new function in the same manner as you did in part (a).
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('portfolio.csv')
|
||||
>>> portfolio
|
||||
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
|
||||
{'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
|
||||
{'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
|
||||
{'name': 'IBM', 'shares': 100, 'price': 70.44}]
|
||||
>>> portfolio[0]
|
||||
{'name': 'AA', 'shares': 100, 'price': 32.2}
|
||||
>>> portfolio[1]
|
||||
{'name': 'IBM', 'shares': 50, 'price': 91.1}
|
||||
>>> portfolio[1]['shares']
|
||||
50
|
||||
>>> total = 0.0
|
||||
>>> for s in portfolio:
|
||||
total += s['shares']*s['price']
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers.
|
||||
This is often preferred because the resulting code is easier to read later.
|
||||
|
||||
Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function.
|
||||
|
||||
```pycon
|
||||
>>> from pprint import pprint
|
||||
>>> pprint(portfolio)
|
||||
[{'name': 'AA', 'price': 32.2, 'shares': 100},
|
||||
{'name': 'IBM', 'price': 91.1, 'shares': 50},
|
||||
{'name': 'CAT', 'price': 83.44, 'shares': 150},
|
||||
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
|
||||
{'name': 'GE', 'price': 40.37, 'shares': 95},
|
||||
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
|
||||
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
|
||||
>>>
|
||||
```
|
||||
|
||||
### (c) Dictionaries as a container
|
||||
|
||||
A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer.
|
||||
In the Python shell, try playing with a dictionary:
|
||||
|
||||
```pycon
|
||||
>>> prices = { }
|
||||
>>> prices['IBM'] = 92.45
|
||||
>>> prices['MSFT'] = 45.12
|
||||
>>> prices
|
||||
... look at the result ...
|
||||
>>> prices['IBM']
|
||||
92.45
|
||||
>>> prices['AAPL']
|
||||
... look at the result ...
|
||||
>>> 'AAPL' in prices
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
The file `Data/prices.csv` contains a series of lines with stock prices.
|
||||
The file looks something like this:
|
||||
|
||||
```csv
|
||||
"AA",9.22
|
||||
"AXP",24.85
|
||||
"BA",44.85
|
||||
"BAC",11.27
|
||||
"C",3.72
|
||||
...
|
||||
```
|
||||
|
||||
Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.
|
||||
|
||||
To do this, start with an empty dictionary and start inserting values into it just
|
||||
as you did above. However, you are reading the values from a file now.
|
||||
|
||||
We’ll use this data structure to quickly lookup the price of a given stock name.
|
||||
|
||||
A few little tips that you’ll need for this part. First, make sure you use the `csv` module just as you did before—there’s no need to reinvent the wheel here.
|
||||
|
||||
```pycon
|
||||
>>> import csv
|
||||
>>> f = open('Data/prices.csv', 'r')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> for row in rows:
|
||||
print(row)
|
||||
|
||||
|
||||
['AA', '9.22']
|
||||
['AXP', '24.85']
|
||||
...
|
||||
[]
|
||||
>>>
|
||||
```
|
||||
|
||||
The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.
|
||||
|
||||
There’s a possibility that this could cause your program to die with an exception.
|
||||
Use the `try` and `except` statements to catch this as appropriate.
|
||||
|
||||
Once you have written your `read_prices()` function, test it interactively to make sure it works:
|
||||
|
||||
```python
|
||||
>>> prices = read_prices('Data/prices.csv')
|
||||
>>> prices['IBM']
|
||||
106.28
|
||||
>>> prices['MSFT']
|
||||
20.89
|
||||
>>>
|
||||
```
|
||||
|
||||
### (e) Finding out if you can retire
|
||||
|
||||
Tie all of this work together by adding the statements to your `report.py` program.
|
||||
It takes the list of stocks in part (b) and the dictionary of prices in part (c) and
|
||||
computes the current value of the portfolio along with the gain/loss.
|
||||
|
||||
[Next](03_Formatting)
|
||||
Reference in New Issue
Block a user