Added sections 2-4
This commit is contained in:
7
Notes/02_Working_with_data/00_Overview.md
Normal file
7
Notes/02_Working_with_data/00_Overview.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Working With Data Overview
|
||||
|
||||
In this section, we look at how Python programmers represent and work with data.
|
||||
|
||||
Most programs today work with data. We are going to learn to common programming idioms and how to not shoot yourself in the foot.
|
||||
|
||||
We will take a look at part of the object-model in Python. Which is a big part of understanding most Python programs.
|
||||
431
Notes/02_Working_with_data/01_Datatypes.md
Normal file
431
Notes/02_Working_with_data/01_Datatypes.md
Normal file
@@ -0,0 +1,431 @@
|
||||
# 2.1 Datatypes and Data structures
|
||||
|
||||
This section introduces data structures in the form of tuples and dicts.
|
||||
|
||||
### Primitive Datatypes
|
||||
|
||||
Python has a few primitive types of data:
|
||||
|
||||
* Integers
|
||||
* Floating point numbers
|
||||
* Strings (text)
|
||||
|
||||
We have learned about these in the previous section.
|
||||
|
||||
### None type
|
||||
|
||||
```python
|
||||
email_address = None
|
||||
```
|
||||
|
||||
This type is often used as a placeholder for optional or missing value.
|
||||
|
||||
```python
|
||||
if email_address:
|
||||
send_email(email_address, msg)
|
||||
```
|
||||
|
||||
### Data Structures
|
||||
|
||||
Real programs have more complex data than the ones that can be easily represented by the datatypes learned so far.
|
||||
For example information about a stock:
|
||||
|
||||
```code
|
||||
100 shares of GOOG at $490.10
|
||||
```
|
||||
|
||||
This is an "object" with three parts:
|
||||
|
||||
* Name or symbol of the stock ("GOOG", a string)
|
||||
* Number of shares (100, an integer)
|
||||
* Price (490.10 a float)
|
||||
|
||||
### Tuples
|
||||
|
||||
A tuple is a collection of values grouped together.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
s = ('GOOG', 100, 490.1)
|
||||
```
|
||||
|
||||
Sometimes the `()` are ommitted in the syntax.
|
||||
|
||||
```python
|
||||
s = 'GOOG', 100, 490.1
|
||||
```
|
||||
|
||||
Special cases (0-tuple, 1-typle).
|
||||
|
||||
```python
|
||||
t = () # An empty tuple
|
||||
w = ('GOOG', ) # A 1-item tuple
|
||||
```
|
||||
|
||||
Tuples are usually used to represent *simple* records or structures.
|
||||
Typically, it is a single *object* of multiple parts. A good analogy: *A tuple is like a single row in a database table.*
|
||||
|
||||
Tuple contents are ordered (like an array).
|
||||
|
||||
```python
|
||||
s = ('GOOG', 100, 490.1)
|
||||
name = s[0] # 'GOOG'
|
||||
shares = s[1] # 100
|
||||
price = s[2] # 490.1
|
||||
```
|
||||
|
||||
However, th contents can't be modified.
|
||||
|
||||
```pycon
|
||||
>>> s[1] = 75
|
||||
TypeError: object does not support item assignment
|
||||
```
|
||||
|
||||
You can, however, make a new tuple based on a current tuple.
|
||||
|
||||
```python
|
||||
s = (s[0], 75, s[2])
|
||||
```
|
||||
|
||||
### Tuple Packing
|
||||
|
||||
Tuples are focused more on packing related items together into a single *entity*.
|
||||
|
||||
```python
|
||||
s = ('GOOG', 100, 490.1)
|
||||
```
|
||||
|
||||
The tuple is then easy to pass around to other parts of a program as a single object.
|
||||
|
||||
### Tuple Unpacking
|
||||
|
||||
To use the tuple elsewhere, you can unpack its parts into variables.
|
||||
|
||||
```python
|
||||
name, shares, price = s
|
||||
print('Cost', shares * price)
|
||||
```
|
||||
|
||||
The number of variables must match the tuple structure.
|
||||
|
||||
```python
|
||||
name, shares = s # ERROR
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ValueError: too many values to unpack
|
||||
```
|
||||
|
||||
### Tuples vs. Lists
|
||||
|
||||
Tuples are NOT just read-only lists. Tuples are most ofter used for a *single item* consisting of multiple parts.
|
||||
Lists are usually a collection of distinct items, usually all of the same type.
|
||||
|
||||
```python
|
||||
record = ('GOOG', 100, 490.1) # A tuple representing a stock in a portfolio
|
||||
|
||||
symbols = [ 'GOOG', 'AAPL', 'IBM' ] # A List representing three stock symbols
|
||||
```
|
||||
|
||||
### Dictionaries
|
||||
|
||||
A dictionary is a hash table or associative array.
|
||||
It is a collection of values indexed by *keys*. These keys serve as field names.
|
||||
|
||||
```python
|
||||
s = {
|
||||
'name': 'GOOG',
|
||||
'shares': 100,
|
||||
'price': 490.1
|
||||
}
|
||||
```
|
||||
|
||||
### Common operations
|
||||
|
||||
To read values from a dictionary use the key names.
|
||||
|
||||
```pycon
|
||||
>>> print(s['name'], s['shares'])
|
||||
GOOG 100
|
||||
>>> s['price']
|
||||
490.10
|
||||
>>>
|
||||
```
|
||||
|
||||
To add or modify values assign using the key names.
|
||||
|
||||
```pycon
|
||||
>>> s['shares'] = 75
|
||||
>>> s['date'] = '6/6/2007'
|
||||
>>>
|
||||
```
|
||||
|
||||
To delete a value use the `del` statement.
|
||||
|
||||
```pycon
|
||||
>>> del s['date']
|
||||
>>>
|
||||
```
|
||||
|
||||
### Why dictionaries?
|
||||
|
||||
Dictionaries are useful when there are *many* different values and those values
|
||||
might be modified or manipulated. Dictionaries make your code more readable.
|
||||
|
||||
```python
|
||||
s['price']
|
||||
# vs
|
||||
s[2]
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
### Note
|
||||
|
||||
In the last few exercises, you wrote a program that read a datafile `Data/portfolio.csv`. Using the `csv` module, it is easy to read the file row-by-row.
|
||||
|
||||
```pycon
|
||||
>>> import csv
|
||||
>>> f = open('Data/portfolio.csv')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> next(rows)
|
||||
['name', 'shares', 'price']
|
||||
>>> row = next(rows)
|
||||
>>> row
|
||||
['AA', '100', '32.20']
|
||||
>>>
|
||||
```
|
||||
|
||||
Although reading the file is easy, you often want to do more with the data than read it.
|
||||
For instance, perhaps you want to store it and start performing some calculations on it.
|
||||
Unfortunately, a raw "row" of data doesn’t give you enough to work with. For example, even a simple math calculation doesn’t work:
|
||||
|
||||
```pycon
|
||||
>>> row = ['AA', '100', '32.20']
|
||||
>>> cost = row[1] * row[2]
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: can't multiply sequence by non-int of type 'str'
|
||||
>>>
|
||||
```
|
||||
|
||||
To do more, you typically want to interpret the raw data in some way and turn it into a more useful kind of object so that you can work with it later.
|
||||
Two simple options are tuples or dictionaries.
|
||||
|
||||
### (a) Tuples
|
||||
|
||||
At the interactive prompt, create the following tuple that represents
|
||||
the above row, but with the numeric columns converted to proper
|
||||
numbers:
|
||||
|
||||
```pycon
|
||||
>>> t = (row[0], int(row[1]), float(row[2]))
|
||||
>>> t
|
||||
('AA', 100, 32.2)
|
||||
>>>
|
||||
```
|
||||
|
||||
Using this, you can now calculate the total cost by multiplying the shares and the price:
|
||||
|
||||
```pycon
|
||||
>>> cost = t[1] * t[2]
|
||||
>>> cost
|
||||
3220.0000000000005
|
||||
>>>
|
||||
```
|
||||
|
||||
Is math broken in Python? What’s the deal with the answer of
|
||||
3220.0000000000005?
|
||||
|
||||
This is an artifact of the floating point hardware on your computer
|
||||
only being able to accurately represent decimals in Base-2, not
|
||||
Base-10. For even simple calculations involving base-10 decimals,
|
||||
small errors are introduced. This is normal, although perhaps a bit
|
||||
surprising if you haven’t seen it before.
|
||||
|
||||
This happens in all programming languages that use floating point
|
||||
decimals, but it often gets hidden when printing. For example:
|
||||
|
||||
```pycon
|
||||
>>> print(f'{cost:0.2f}')
|
||||
3220.00
|
||||
>>>
|
||||
```
|
||||
|
||||
Tuples are read-only. Verify this by trying to change the number of shares to 75.
|
||||
|
||||
```pycon
|
||||
>>> t[1] = 75
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'tuple' object does not support item assignment
|
||||
>>>
|
||||
```
|
||||
|
||||
Although you can’t change tuple contents, you can always create a completely new tuple that replaces the old one.
|
||||
|
||||
```pycon
|
||||
>>> t = (t[0], 75, t[2])
|
||||
>>> t
|
||||
('AA', 75, 32.2)
|
||||
>>>
|
||||
```
|
||||
|
||||
Whenever you reassign an existing variable name like this, the old
|
||||
value is discarded. Although the above assignment might look like you
|
||||
are modifying the tuple, you are actually creating a new tuple and
|
||||
throwing the old one away.
|
||||
|
||||
Tuples are often used to pack and unpack values into variables. Try the following:
|
||||
|
||||
```pycon
|
||||
>>> name, shares, price = t
|
||||
>>> name
|
||||
'AA'
|
||||
>>> shares
|
||||
75
|
||||
>>> price
|
||||
32.2
|
||||
>>>
|
||||
```
|
||||
|
||||
Take the above variables and pack them back into a tuple
|
||||
|
||||
```pycon
|
||||
>>> t = (name, 2*shares, price)
|
||||
>>> t
|
||||
('AA', 150, 32.2)
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) Dictionaries as a data structure
|
||||
|
||||
An alternative to a tuple is to create a dictionary instead.
|
||||
|
||||
```pycon
|
||||
>>> d = {
|
||||
'name' : row[0],
|
||||
'shares' : int(row[1]),
|
||||
'price' : float(row[2])
|
||||
}
|
||||
>>> d
|
||||
{'name': 'AA', 'shares': 100, 'price': 32.2 }
|
||||
>>>
|
||||
```
|
||||
|
||||
Calculate the total cost of this holding:
|
||||
|
||||
```pycon
|
||||
>>> cost = d['shares'] * d['price']
|
||||
>>> cost
|
||||
3220.0000000000005
|
||||
>>>
|
||||
```
|
||||
|
||||
Compare this example with the same calculation involving tuples above. Change the number of shares to 75.
|
||||
|
||||
```pycon
|
||||
>>> d['shares'] = 75
|
||||
>>> d
|
||||
{'name': 'AA', 'shares': 75, 'price': 75}
|
||||
>>>
|
||||
```
|
||||
|
||||
Unlike tuples, dictionaries can be freely modified. Add some attributes:
|
||||
|
||||
```pycon
|
||||
>>> d['date'] = (6, 11, 2007)
|
||||
>>> d['account'] = 12345
|
||||
>>> d
|
||||
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007), 'account': 12345}
|
||||
>>>
|
||||
```
|
||||
|
||||
### (c) Some additional dictionary operations
|
||||
|
||||
If you turn a dictionary into a list, you’ll get all of its keys:
|
||||
|
||||
```pycon
|
||||
>>> list(d)
|
||||
['name', 'shares', 'price', 'date', 'account']
|
||||
>>>
|
||||
```
|
||||
|
||||
Similarly, if you use the `for` statement to iterate on a dictionary, you will get the keys:
|
||||
|
||||
```pycon
|
||||
>>> for k in d:
|
||||
print('k =', k)
|
||||
|
||||
k = name
|
||||
k = shares
|
||||
k = price
|
||||
k = date
|
||||
k = account
|
||||
>>>
|
||||
```
|
||||
|
||||
Try this variant that performs a lookup at the same time:
|
||||
|
||||
```pycon
|
||||
>>> for k in d:
|
||||
print(k, '=', d[k])
|
||||
|
||||
name = AA
|
||||
shares = 75
|
||||
price = 32.2
|
||||
date = (6, 11, 2007)
|
||||
account = 12345
|
||||
>>>
|
||||
```
|
||||
|
||||
You can also obtain all of the keys using the `keys()` method:
|
||||
|
||||
```pycon
|
||||
>>> keys = d.keys()
|
||||
>>> keys
|
||||
dict_keys(['name', 'shares', 'price', 'date', 'account'])
|
||||
>>>
|
||||
```
|
||||
|
||||
`keys()` is a bit unusual in that it returns a special `dict_keys` object.
|
||||
|
||||
This is an overlay on the original dictionary that always gives you the current keys—even if the dictionary changes. For example, try this:
|
||||
|
||||
```pycon
|
||||
>>> del d['account']
|
||||
>>> keys
|
||||
dict_keys(['name', 'shares', 'price', 'date'])
|
||||
>>>
|
||||
```
|
||||
|
||||
Carefully notice that the `'account'` disappeared from `keys` even though you didn’t call `d.keys()` again.
|
||||
|
||||
A more elegant way to work with keys and values together is to use the `items()` method. This gives you `(key, value)` tuples:
|
||||
|
||||
```pycon
|
||||
>>> items = d.items()
|
||||
>>> items
|
||||
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
|
||||
>>> for k, v in d.items():
|
||||
print(k, '=', v)
|
||||
|
||||
name = AA
|
||||
shares = 75
|
||||
price = 32.2
|
||||
date = (6, 11, 2007)
|
||||
>>>
|
||||
```
|
||||
|
||||
If you have tuples such as `items`, you can create a dictionary using the `dict()` function. Try it:
|
||||
|
||||
```pycon
|
||||
>>> items
|
||||
dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))])
|
||||
>>> d = dict(items)
|
||||
>>> d
|
||||
{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007)}
|
||||
>>>
|
||||
```
|
||||
|
||||
[Next](02_Containers)
|
||||
413
Notes/02_Working_with_data/02_Containers.md
Normal file
413
Notes/02_Working_with_data/02_Containers.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# Containers
|
||||
|
||||
### Overview
|
||||
|
||||
Programs often have to work with many objects.
|
||||
|
||||
* A portfolio of stocks
|
||||
* A table of stock prices
|
||||
|
||||
There are three main choices to use.
|
||||
|
||||
* Lists. Ordered data.
|
||||
* Dictionaries. Unordered data.
|
||||
* Sets. Unordered collection
|
||||
|
||||
### Lists as a Container
|
||||
|
||||
Use a list when the order of the data matters. Remember that lists can hold any kind of objects.
|
||||
For example, a list of tuples.
|
||||
|
||||
```python
|
||||
portfolio = [
|
||||
('GOOG', 100, 490.1),
|
||||
('IBM', 50, 91.3),
|
||||
('CAT', 150, 83.44)
|
||||
]
|
||||
|
||||
portfolio[0] # ('GOOG', 100, 490.1)
|
||||
portfolio[2] # ('CAT', 150, 83.44)
|
||||
```
|
||||
|
||||
### List construction
|
||||
|
||||
Building a list from scratch.
|
||||
|
||||
```python
|
||||
records = [] # Initial empty list
|
||||
|
||||
# Use .append() to add more items
|
||||
records.append(('GOOG', 100, 490.10))
|
||||
records.append(('IBM', 50, 91.3))
|
||||
...
|
||||
```
|
||||
|
||||
An example when reading records from a file.
|
||||
|
||||
```python
|
||||
records = [] # Initial empty list
|
||||
|
||||
with open('portfolio.csv', 'rt') as f:
|
||||
for line in f:
|
||||
row = line.split(',')
|
||||
records.append((row[0], int(row[1])), float(row[2]))
|
||||
```
|
||||
|
||||
### Dicts as a Container
|
||||
|
||||
Dictionaries are useful if you want fast random lookups (by key name). For
|
||||
example, a dictionary of stock prices:
|
||||
|
||||
```python
|
||||
prices = {
|
||||
'GOOG': 513.25,
|
||||
'CAT': 87.22,
|
||||
'IBM': 93.37,
|
||||
'MSFT': 44.12
|
||||
}
|
||||
```
|
||||
|
||||
Here are some simple lookups:
|
||||
|
||||
```pycon
|
||||
>>> prices['IBM']
|
||||
93.37
|
||||
>>> prices['GOOG']
|
||||
513.25
|
||||
>>>
|
||||
```
|
||||
|
||||
### Dict Construction
|
||||
|
||||
Example of building a dict from scratch.
|
||||
|
||||
```python
|
||||
prices = {} # Initial empty dict
|
||||
|
||||
# Insert new items
|
||||
prices['GOOG'] = 513.25
|
||||
prices['CAT'] = 87.22
|
||||
prices['IBM'] = 93.37
|
||||
```
|
||||
|
||||
An example populating the dict from the contents of a file.
|
||||
|
||||
```python
|
||||
prices = {} # Initial empty dict
|
||||
|
||||
with open('prices.csv', 'rt') as f:
|
||||
for line in f:
|
||||
row = line.split(',')
|
||||
prices[row[0]] = float(row[1])
|
||||
```
|
||||
|
||||
### Dictionary Lookups
|
||||
|
||||
You can test the existence of a key.
|
||||
|
||||
```python
|
||||
if key in d:
|
||||
# YES
|
||||
else:
|
||||
# NO
|
||||
```
|
||||
|
||||
You can look up a value that might not exist and provide a default value in case it doesn't.
|
||||
|
||||
```python
|
||||
name = d.get(key, default)
|
||||
```
|
||||
|
||||
An example:
|
||||
|
||||
```python
|
||||
>>> prices.get('IBM', 0.0)
|
||||
93.37
|
||||
>>> prices.get('SCOX', 0.0)
|
||||
0.0
|
||||
>>>
|
||||
```
|
||||
|
||||
### Composite keys
|
||||
|
||||
Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable.
|
||||
For example, tuples:
|
||||
|
||||
```python
|
||||
holidays = {
|
||||
(1, 1) : 'New Years',
|
||||
(3, 14) : 'Pi day',
|
||||
(9, 13) : "Programmer's day",
|
||||
}
|
||||
```
|
||||
|
||||
Then to access:
|
||||
|
||||
```pycon
|
||||
>>> holidays[3, 14] 'Pi day'
|
||||
>>>
|
||||
```
|
||||
|
||||
*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.*
|
||||
|
||||
### Sets
|
||||
|
||||
Sets are collection of unordered unique items.
|
||||
|
||||
```python
|
||||
tech_stocks = { 'IBM','AAPL','MSFT' }
|
||||
# Alternative sintax
|
||||
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
|
||||
```
|
||||
|
||||
Sets are useful for membership tests.
|
||||
|
||||
```pycon
|
||||
>>> tech_stocks
|
||||
set(['AAPL', 'IBM', 'MSFT'])
|
||||
>>> 'IBM' in tech_stocks
|
||||
True
|
||||
>>> 'FB' in tech_stocks
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
Sets are also useful for duplicate elimination.
|
||||
|
||||
```python
|
||||
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
|
||||
|
||||
unique = set(names)
|
||||
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])
|
||||
```
|
||||
|
||||
Additional set operations:
|
||||
|
||||
```python
|
||||
names.add('CAT') # Add an item
|
||||
names.remove('YHOO') # Remove an item
|
||||
|
||||
s1 | s2 # Set union
|
||||
s1 & s2 # Set intersection
|
||||
s1 - s2 # Set difference
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
### Objectives
|
||||
|
||||
### Exercise A: A list of tuples
|
||||
|
||||
The file `Data/portfolio.csv` contains a list of stocks in a portfolio.
|
||||
In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation.
|
||||
|
||||
Your code should have looked something like this:
|
||||
|
||||
```python
|
||||
# pcost.py
|
||||
|
||||
import csv
|
||||
|
||||
def portfolio_cost(filename):
|
||||
'''Computes the total cost (shares*price) of a portfolio file'''
|
||||
total_cost = 0.0
|
||||
|
||||
with open(filename, 'rt') as f:
|
||||
rows = csv.reader(f)
|
||||
headers = next(rows)
|
||||
for row in rows:
|
||||
nshares = int(row[1])
|
||||
price = float(row[2])
|
||||
total_cost += nshares * price
|
||||
return total_cost
|
||||
```
|
||||
|
||||
Using this code as a rough guide, create a new file `report.py`. In
|
||||
that file, define a function `read_portfolio(filename)` that opens a
|
||||
given portfolio file and reads it into a list of tuples. To do this,
|
||||
you’re going to make a few minor modifications to the above code.
|
||||
|
||||
First, instead of defining `total_cost = 0`, you’ll make a variable that’s initially set to an empty list. For example:
|
||||
|
||||
```python
|
||||
portfolio = []
|
||||
```
|
||||
|
||||
Next, instead of totaling up the cost, you’ll turn each row into a
|
||||
tuple exactly as you just did in the last exercise and append it to
|
||||
this list. For example:
|
||||
|
||||
```python
|
||||
for row in rows:
|
||||
holding = (row[0], int(row[1]), float(row[2]))
|
||||
portfolio.append(holding)
|
||||
```
|
||||
|
||||
Finally, you’ll return the resulting `portfolio` list.
|
||||
|
||||
Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter):
|
||||
|
||||
*Hint: Use `-i` when executing the file in the terminal*
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> portfolio
|
||||
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
|
||||
('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
|
||||
>>>
|
||||
>>> portfolio[0]
|
||||
('AA', 100, 32.2)
|
||||
>>> portfolio[1]
|
||||
('IBM', 50, 91.1)
|
||||
>>> portfolio[1][1]
|
||||
50
|
||||
>>> total = 0.0
|
||||
>>> for s in portfolio:
|
||||
total += s[1] * s[2]
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
This list of tuples that you have created is very similar to a 2-D array.
|
||||
For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers.
|
||||
|
||||
That said, you can also rewrite the last for-loop using a statement like this:
|
||||
|
||||
```python
|
||||
>>> total = 0.0
|
||||
>>> for name, shares, price in portfolio:
|
||||
total += shares*price
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) List of Dictionaries
|
||||
|
||||
Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple.
|
||||
In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.
|
||||
|
||||
Experiment with this new function in the same manner as you did in part (a).
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('portfolio.csv')
|
||||
>>> portfolio
|
||||
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
|
||||
{'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
|
||||
{'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
|
||||
{'name': 'IBM', 'shares': 100, 'price': 70.44}]
|
||||
>>> portfolio[0]
|
||||
{'name': 'AA', 'shares': 100, 'price': 32.2}
|
||||
>>> portfolio[1]
|
||||
{'name': 'IBM', 'shares': 50, 'price': 91.1}
|
||||
>>> portfolio[1]['shares']
|
||||
50
|
||||
>>> total = 0.0
|
||||
>>> for s in portfolio:
|
||||
total += s['shares']*s['price']
|
||||
|
||||
>>> print(total)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers.
|
||||
This is often preferred because the resulting code is easier to read later.
|
||||
|
||||
Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function.
|
||||
|
||||
```pycon
|
||||
>>> from pprint import pprint
|
||||
>>> pprint(portfolio)
|
||||
[{'name': 'AA', 'price': 32.2, 'shares': 100},
|
||||
{'name': 'IBM', 'price': 91.1, 'shares': 50},
|
||||
{'name': 'CAT', 'price': 83.44, 'shares': 150},
|
||||
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
|
||||
{'name': 'GE', 'price': 40.37, 'shares': 95},
|
||||
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
|
||||
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
|
||||
>>>
|
||||
```
|
||||
|
||||
### (c) Dictionaries as a container
|
||||
|
||||
A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer.
|
||||
In the Python shell, try playing with a dictionary:
|
||||
|
||||
```pycon
|
||||
>>> prices = { }
|
||||
>>> prices['IBM'] = 92.45
|
||||
>>> prices['MSFT'] = 45.12
|
||||
>>> prices
|
||||
... look at the result ...
|
||||
>>> prices['IBM']
|
||||
92.45
|
||||
>>> prices['AAPL']
|
||||
... look at the result ...
|
||||
>>> 'AAPL' in prices
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
The file `Data/prices.csv` contains a series of lines with stock prices.
|
||||
The file looks something like this:
|
||||
|
||||
```csv
|
||||
"AA",9.22
|
||||
"AXP",24.85
|
||||
"BA",44.85
|
||||
"BAC",11.27
|
||||
"C",3.72
|
||||
...
|
||||
```
|
||||
|
||||
Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.
|
||||
|
||||
To do this, start with an empty dictionary and start inserting values into it just
|
||||
as you did above. However, you are reading the values from a file now.
|
||||
|
||||
We’ll use this data structure to quickly lookup the price of a given stock name.
|
||||
|
||||
A few little tips that you’ll need for this part. First, make sure you use the `csv` module just as you did before—there’s no need to reinvent the wheel here.
|
||||
|
||||
```pycon
|
||||
>>> import csv
|
||||
>>> f = open('Data/prices.csv', 'r')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> for row in rows:
|
||||
print(row)
|
||||
|
||||
|
||||
['AA', '9.22']
|
||||
['AXP', '24.85']
|
||||
...
|
||||
[]
|
||||
>>>
|
||||
```
|
||||
|
||||
The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.
|
||||
|
||||
There’s a possibility that this could cause your program to die with an exception.
|
||||
Use the `try` and `except` statements to catch this as appropriate.
|
||||
|
||||
Once you have written your `read_prices()` function, test it interactively to make sure it works:
|
||||
|
||||
```python
|
||||
>>> prices = read_prices('Data/prices.csv')
|
||||
>>> prices['IBM']
|
||||
106.28
|
||||
>>> prices['MSFT']
|
||||
20.89
|
||||
>>>
|
||||
```
|
||||
|
||||
### (e) Finding out if you can retire
|
||||
|
||||
Tie all of this work together by adding the statements to your `report.py` program.
|
||||
It takes the list of stocks in part (b) and the dictionary of prices in part (c) and
|
||||
computes the current value of the portfolio along with the gain/loss.
|
||||
|
||||
[Next](03_Formatting)
|
||||
276
Notes/02_Working_with_data/03_Formatting.md
Normal file
276
Notes/02_Working_with_data/03_Formatting.md
Normal file
@@ -0,0 +1,276 @@
|
||||
# 2.3 Formatting
|
||||
|
||||
This is a slight digression, but when you work with data, you often want to
|
||||
produce structured output (tables, etc.). For example:
|
||||
|
||||
```code
|
||||
Name Shares Price
|
||||
---------- ---------- -----------
|
||||
AA 100 32.20
|
||||
IBM 50 91.10
|
||||
CAT 150 83.44
|
||||
MSFT 200 51.23
|
||||
GE 95 40.37
|
||||
MSFT 50 65.10
|
||||
IBM 100 70.44
|
||||
```
|
||||
|
||||
### String Formatting
|
||||
|
||||
One way to format string in Python 3.6+ is with `f-strings`.
|
||||
|
||||
```python
|
||||
>>> name = 'IBM'
|
||||
>>> shares = 100
|
||||
>>> price = 91.1
|
||||
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
|
||||
' IBM 100 91.10'
|
||||
>>>
|
||||
```
|
||||
|
||||
The part `{expression:format}` is replaced.
|
||||
|
||||
It is commonly used with `print`.
|
||||
|
||||
```python
|
||||
print(f'{name:>10s} {shares:>10d} {price:>10.2f}')
|
||||
```
|
||||
|
||||
### Format codes
|
||||
|
||||
Format codes (after the `:` inside the `{}`) are similar to C `printf()`. Common codes
|
||||
include:
|
||||
|
||||
```code
|
||||
d Decimal integer
|
||||
b Binary integer
|
||||
x Hexadecimal integer
|
||||
f Float as [-]m.dddddd
|
||||
e Float as [-]m.dddddde+-xx
|
||||
g Float, but selective use of E notation s String
|
||||
c Character (from integer)
|
||||
```
|
||||
|
||||
Common modifiers adjust the field width and decimal precision. This is a partial list:
|
||||
|
||||
```code
|
||||
:>10d Integer right aligned in 10-character field
|
||||
:<10d Integer left aligned in 10-character field
|
||||
:^10d Integer centered in 10-character field :0.2f Float with 2 digit precision
|
||||
```
|
||||
|
||||
### Dictionary Formatting
|
||||
|
||||
You can use the `format_map()` method on strings.
|
||||
|
||||
```python
|
||||
>>> s = {
|
||||
'name': 'IBM',
|
||||
'shares': 100,
|
||||
'price': 91.1
|
||||
}
|
||||
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
|
||||
' IBM 100 91.10'
|
||||
>>>
|
||||
```
|
||||
|
||||
It uses the same `f-strings` but takes the values from the supplied dictionary.
|
||||
|
||||
### C-Style Formatting
|
||||
|
||||
You can also use the formatting operator `%`.
|
||||
|
||||
```python
|
||||
>>> 'The value is %d' % 3
|
||||
'The value is 3'
|
||||
>>> '%5d %-5d %10d' % (3,4,5)
|
||||
' 3 4 5'
|
||||
>>> '%0.2f' % (3.1415926,)
|
||||
'3.14'
|
||||
```
|
||||
|
||||
This requires a single item or a tuple on the right. Format codes are modeled after the C `printf()` as well.
|
||||
|
||||
*Note: This is the only formatting available on byte strings.*
|
||||
|
||||
```python
|
||||
>>> b'%s has %n messages' % (b'Dave', 37)
|
||||
b'Dave has 37 messages'
|
||||
>>>
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
In the previous exercise, you wrote a program called `report.py` that computed the gain/loss of a
|
||||
stock portfolio. In this exercise, you're going to modify it to produce a table like this:
|
||||
|
||||
```code
|
||||
Name Shares Price Change
|
||||
---------- ---------- ---------- ----------
|
||||
AA 100 9.22 -22.98
|
||||
IBM 50 106.28 15.18
|
||||
CAT 150 35.46 -47.98
|
||||
MSFT 200 20.89 -30.34
|
||||
GE 95 13.48 -26.89
|
||||
MSFT 50 20.89 -44.21
|
||||
IBM 100 106.28 35.84
|
||||
```
|
||||
|
||||
In this report, "Price" is the current share price of the stock and "Change" is the change in the share price from the initial purchase price.
|
||||
|
||||
### (a) How to format numbers
|
||||
|
||||
A common problem with printing numbers is specifying the number of decimal places. One way to fix this is to use f-strings. Try
|
||||
these examples:
|
||||
|
||||
```python
|
||||
>>> value = 42863.1
|
||||
>>> print(value)
|
||||
42863.1
|
||||
>>> print(f'{value:0.4f}')
|
||||
42863.1000
|
||||
>>> print(f'{value:>16.2f}')
|
||||
42863.10
|
||||
>>> print(f'{value:<16.2f}')
|
||||
42863.10
|
||||
>>> print(f'{value:*>16,.2f}')
|
||||
*******42,863.10
|
||||
>>>
|
||||
```
|
||||
|
||||
Full documentation on the formatting codes used f-strings can be found
|
||||
[here](https://docs.python.org/3/library/string.html#format-specification-mini-language). Formatting
|
||||
is also sometimes performed using the `%` operator of strings.
|
||||
|
||||
```pycon
|
||||
>>> print('%0.4f' % value)
|
||||
42863.1000
|
||||
>>> print('%16.2f' % value)
|
||||
42863.10
|
||||
>>>
|
||||
```
|
||||
|
||||
Documentation on various codes used with `%` can be found [here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting).
|
||||
|
||||
Although it’s commonly used with `print`, string formatting is not tied to printing.
|
||||
If you want to save a formatted string. Just assign it to a variable.
|
||||
|
||||
```pycon
|
||||
>>> f = '%0.4f' % value
|
||||
>>> f
|
||||
'42863.1000'
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) Collecting Data
|
||||
|
||||
In order to generate the above report, you’ll first want to collect
|
||||
all of the data shown in the table. Write a function `make_report()`
|
||||
that takes a list of stocks and dictionary of prices as input and
|
||||
returns a list of tuples containing the rows of the above table.
|
||||
|
||||
Add this function to your `report.py` file. Here’s how it should work if you try it interactively:
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> prices = read_prices('Data/prices.csv')
|
||||
>>> report = make_report(portfolio, prices)
|
||||
>>> for r in report:
|
||||
print(r)
|
||||
|
||||
('AA', 100, 9.22, -22.980000000000004)
|
||||
('IBM', 50, 106.28, 15.180000000000007)
|
||||
('CAT', 150, 35.46, -47.98)
|
||||
('MSFT', 200, 20.89, -30.339999999999996)
|
||||
('GE', 95, 13.48, -26.889999999999997)
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
### (c) Printing a formatted table
|
||||
|
||||
Redo the above for-loop, but change the print statement to format the tuples.
|
||||
|
||||
```pycon
|
||||
>>> for r in report:
|
||||
print('%10s %10d %10.2f %10.2f' % r)
|
||||
|
||||
AA 100 9.22 -22.98
|
||||
IBM 50 106.28 15.18
|
||||
CAT 150 35.46 -47.98
|
||||
MSFT 200 20.89 -30.34
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
You can also expand the values and use f-strings. For example:
|
||||
|
||||
```pycon
|
||||
>>> for name, shares, price, change in report:
|
||||
print(f'{name:>10s} {shares:>10d} {price:>10.2f} {change:>10.2f}')
|
||||
|
||||
AA 100 9.22 -22.98
|
||||
IBM 50 106.28 15.18
|
||||
CAT 150 35.46 -47.98
|
||||
MSFT 200 20.89 -30.34
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
Take the above statements and add them to your `report.py` program.
|
||||
Have your program take the output of the `make_report()` function and print a nicely formatted table as shown.
|
||||
|
||||
### (d) Adding some headers
|
||||
|
||||
Suppose you had a tuple of header names like this:
|
||||
|
||||
```python
|
||||
headers = ('Name', 'Shares', 'Price', 'Change')
|
||||
```
|
||||
|
||||
Add code to your program that takes the above tuple of headers and
|
||||
creates a string where each header name is right-aligned in a
|
||||
10-character wide field and each field is separated by a single space.
|
||||
|
||||
```python
|
||||
' Name Shares Price Change'
|
||||
```
|
||||
|
||||
Write code that takes the headers and creates the separator string between the headers and data to follow.
|
||||
This string is just a bunch of "-" characters under each field name. For example:
|
||||
|
||||
```python
|
||||
'---------- ---------- ---------- -----------'
|
||||
```
|
||||
|
||||
When you’re done, your program should produce the table shown at the top of this exercise.
|
||||
|
||||
```code
|
||||
Name Shares Price Change
|
||||
---------- ---------- ---------- ----------
|
||||
AA 100 9.22 -22.98
|
||||
IBM 50 106.28 15.18
|
||||
CAT 150 35.46 -47.98
|
||||
MSFT 200 20.89 -30.34
|
||||
GE 95 13.48 -26.89
|
||||
MSFT 50 20.89 -44.21
|
||||
IBM 100 106.28 35.84
|
||||
```
|
||||
|
||||
### (e) Formatting Challenge
|
||||
|
||||
How would you modify your code so that the price includes the currency symbol ($) and the output looks like this:
|
||||
|
||||
```code
|
||||
Name Shares Price Change
|
||||
---------- ---------- ---------- ----------
|
||||
AA 100 $9.22 -22.98
|
||||
IBM 50 $106.28 15.18
|
||||
CAT 150 $35.46 -47.98
|
||||
MSFT 200 $20.89 -30.34
|
||||
GE 95 $13.48 -26.89
|
||||
MSFT 50 $20.89 -44.21
|
||||
IBM 100 $106.28 35.84
|
||||
```
|
||||
|
||||
[Next](04_Sequences)
|
||||
538
Notes/02_Working_with_data/04_Sequences.md
Normal file
538
Notes/02_Working_with_data/04_Sequences.md
Normal file
@@ -0,0 +1,538 @@
|
||||
# 2.4 Sequences
|
||||
|
||||
In this part, we look at some common idioms for working with sequence data.
|
||||
|
||||
### Introduction
|
||||
|
||||
Python has three *sequences* datatypes.
|
||||
|
||||
* String: `'Hello'`. A string is considered a sequence of characters.
|
||||
* List: `[1, 4, 5]`.
|
||||
* Tuple: `('GOOG', 100, 490.1)`.
|
||||
|
||||
All sequences are ordered and have length.
|
||||
|
||||
```python
|
||||
a = 'Hello' # String
|
||||
b = [1, 4, 5] # List
|
||||
c = ('GOOG', 100, 490.1) # Tuple
|
||||
|
||||
# Indexed order
|
||||
a[0] # 'H'
|
||||
b[-1] # 5
|
||||
c[1] # 100
|
||||
|
||||
# Length of sequence
|
||||
len(a) # 5
|
||||
len(b) # 3
|
||||
len(c) # 3
|
||||
```
|
||||
|
||||
Sequences can be replicated: `s * n`.
|
||||
|
||||
```pycon
|
||||
>>> a = 'Hello'
|
||||
>>> a * 3
|
||||
'HelloHelloHello'
|
||||
>>> b = [1, 2, 3]
|
||||
>>> b * 2
|
||||
[1, 2, 3, 1, 2, 3]
|
||||
>>>
|
||||
```
|
||||
|
||||
Sequences of the same type can be concatenated: `s + t`.
|
||||
|
||||
```pycon
|
||||
>>> a = (1, 2, 3)
|
||||
>>> b = (4, 5)
|
||||
>>> a + b
|
||||
(1, 2, 3, 4, 5)
|
||||
>>>
|
||||
>>> c = [1, 5]
|
||||
>>> a + c
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: can only concatenate tuple (not "list") to tuple
|
||||
```
|
||||
|
||||
### Slicing
|
||||
|
||||
Slicing means to take a subsequence from a sequence.
|
||||
The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
|
||||
|
||||
```python
|
||||
a = [0,1,2,3,4,5,6,7,8]
|
||||
|
||||
a[2:5] # [2,3,4]
|
||||
a[-5:] # [4,5,6,7,8]
|
||||
a[:3] # [0,1,2]
|
||||
```
|
||||
|
||||
* Indices `start` and `end` must be integers.
|
||||
* Slices do *not* include the end value.
|
||||
* If indices are omitted, they default to the beginning or end of the list.
|
||||
|
||||
### Slice re-assignment
|
||||
|
||||
Slices can also be reassigned and deleted.
|
||||
|
||||
```python
|
||||
# Reassignment
|
||||
a = [0,1,2,3,4,5,6,7,8]
|
||||
a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8]
|
||||
```
|
||||
|
||||
*Note: The reassigned slice doesn't need to have the same length.*
|
||||
|
||||
```python
|
||||
# Deletion
|
||||
a = [0,1,2,3,4,5,6,7,8]
|
||||
del a[2:4] # [0,1,4,5,6,7,8]
|
||||
```
|
||||
|
||||
### Sequence Reductions
|
||||
|
||||
There are some functions to reduce a sequence to a single value.
|
||||
|
||||
```pycon
|
||||
>>> s = [1, 2, 3, 4]
|
||||
>>> sum(s)
|
||||
10
|
||||
>>> min(s) 1
|
||||
>>> max(s) 4
|
||||
>>> t = ['Hello', 'World']
|
||||
>>> max(t)
|
||||
'World'
|
||||
>>>
|
||||
```
|
||||
|
||||
### Iteration over a sequence
|
||||
|
||||
The for-loop iterates over the elements in the sequence.
|
||||
|
||||
```pycon
|
||||
>>> s = [1, 4, 9, 16]
|
||||
>>> for i in s:
|
||||
... print(i)
|
||||
...
|
||||
1
|
||||
4
|
||||
9
|
||||
16
|
||||
>>>
|
||||
```
|
||||
|
||||
On each iteration of the loop, you get a new item to work with.
|
||||
This new value is placed into an iteration variable. In this example, the
|
||||
iteration variable is `x`:
|
||||
|
||||
```python
|
||||
for x in s: # `x` is an iteration variable
|
||||
...statements
|
||||
```
|
||||
|
||||
In each iteration, it overwrites the previous value (if any).
|
||||
After the loop finishes, the variable retains the last value.
|
||||
|
||||
### `break` statement
|
||||
|
||||
You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
|
||||
|
||||
```python
|
||||
for name in namelist:
|
||||
if name == 'Jake':
|
||||
break
|
||||
...
|
||||
...
|
||||
statements
|
||||
```
|
||||
|
||||
When the `break` statement is executed, it will exit the loop and move
|
||||
on the next `statements`. The `break` statement only applies to the
|
||||
inner-most loop. If this loop is within another loop, it will not
|
||||
break the outer loop.
|
||||
|
||||
### `continue` statement
|
||||
|
||||
To skip one element and move to the next one you use the `continue` statement.
|
||||
|
||||
```python
|
||||
for line in lines:
|
||||
if line == '\n': # Skip blank lines
|
||||
continue
|
||||
# More statements
|
||||
...
|
||||
```
|
||||
|
||||
This is useful when the current item is not of interest or needs to be ignored in the processing.
|
||||
|
||||
### Looping over integers
|
||||
|
||||
If you need to count, use `range()`.
|
||||
|
||||
```python
|
||||
for i in range(100):
|
||||
# i = 0,1,...,99
|
||||
```
|
||||
|
||||
The syntax is `range([start,] end [,step])`
|
||||
|
||||
```python
|
||||
for i in range(100):
|
||||
# i = 0,1,...,99
|
||||
for j in range(10,20):
|
||||
# j = 10,11,..., 19
|
||||
for k in range(10,50,2):
|
||||
# k = 10,12,...,48
|
||||
# Notice how it counts in steps of 2, not 1.
|
||||
```
|
||||
|
||||
* The ending value is never included. It mirrors the behavior of slices.
|
||||
* `start` is optional. Default `0`.
|
||||
* `step` is optional. Default `1`.
|
||||
|
||||
### `enumerate()` function
|
||||
|
||||
The `enumerate` function provides a loop with an extra counter value.
|
||||
|
||||
```python
|
||||
names = ['Elwood', 'Jake', 'Curtis']
|
||||
for i, name in enumerate(names):
|
||||
# Loops with i = 0, name = 'Elwood'
|
||||
# i = 1, name = 'Jake'
|
||||
# i = 2, name = 'Curtis'
|
||||
```
|
||||
|
||||
How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
|
||||
A good example of using `enumerate()` is tracking line numbers while reading a file:
|
||||
|
||||
```python
|
||||
with open(filename) as f:
|
||||
for lineno, line in enumerate(f, start=1):
|
||||
...
|
||||
```
|
||||
|
||||
In the end, `enumerate` is just a nice shortcut for:
|
||||
|
||||
```python
|
||||
i = 0
|
||||
for x in s:
|
||||
statements
|
||||
i += 1
|
||||
```
|
||||
|
||||
Using `enumerate` is less typing and runs slightly faster.
|
||||
|
||||
### For and tuples
|
||||
|
||||
You can loop with multiple iteration variables.
|
||||
|
||||
```python
|
||||
points = [
|
||||
(1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
|
||||
]
|
||||
for x, y in points:
|
||||
# Loops with x = 1, y = 4
|
||||
# x = 10, y = 40
|
||||
# x = 23, y = 14
|
||||
# ...
|
||||
```
|
||||
|
||||
When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
|
||||
|
||||
### `zip()` function
|
||||
|
||||
The `zip` function takes sequences and makes an iterator that combines them.
|
||||
|
||||
```python
|
||||
columns = ['name', 'shares', 'price']
|
||||
values = ['GOOG', 100, 490.1 ]
|
||||
pairs = zip(a, b)
|
||||
# ('name','GOOG'), ('shares',100), ('price',490.1)
|
||||
```
|
||||
|
||||
To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
|
||||
|
||||
```python
|
||||
for column, value in pairs:
|
||||
...
|
||||
```
|
||||
|
||||
A common use of `zip` is to create key/value pairs for constructing dictionaries.
|
||||
|
||||
```python
|
||||
d = dict(zip(columns, values))
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
### (a) Counting
|
||||
|
||||
Try some basic counting examples:
|
||||
|
||||
```pycon
|
||||
>>> for n in range(10): # Count 0 ... 9
|
||||
print(n, end=' ')
|
||||
|
||||
0 1 2 3 4 5 6 7 8 9
|
||||
>>> for n in range(10,0,-1): # Count 10 ... 1
|
||||
print(n, end=' ')
|
||||
|
||||
10 9 8 7 6 5 4 3 2 1
|
||||
>>> for n in range(0,10,2): # Count 0, 2, ... 8
|
||||
print(n, end=' ')
|
||||
|
||||
0 2 4 6 8
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) More sequence operations
|
||||
|
||||
Interactively experiment with some of the sequence reduction operations.
|
||||
|
||||
```pycon
|
||||
>>> data = [4, 9, 1, 25, 16, 100, 49]
|
||||
>>> min(data)
|
||||
1
|
||||
>>> max(data)
|
||||
100
|
||||
>>> sum(data)
|
||||
204
|
||||
>>>
|
||||
```
|
||||
|
||||
Try looping over the data.
|
||||
|
||||
```pycon
|
||||
>>> for x in data:
|
||||
print(x)
|
||||
|
||||
4
|
||||
9
|
||||
...
|
||||
>>> for n, x in enumerate(data):
|
||||
print(n, x)
|
||||
|
||||
0 4
|
||||
1 9
|
||||
2 1
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
Sometimes the `for` statement, `len()`, and `range()` get used by
|
||||
novices in some kind of horrible code fragment that looks like it
|
||||
emerged from the depths of a rusty C program.
|
||||
|
||||
```pycon
|
||||
>>> for n in range(len(data)):
|
||||
print(data[n])
|
||||
|
||||
4
|
||||
9
|
||||
1
|
||||
...
|
||||
>>>
|
||||
```
|
||||
|
||||
Don’t do that! Not only does reading it make everyone’s eyes bleed, it’s inefficient with memory and it runs a lot slower.
|
||||
Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason.
|
||||
|
||||
### (c) A practical `enumerate()` example
|
||||
|
||||
Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
|
||||
Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
|
||||
|
||||
```python
|
||||
>>> cost = portfolio_cost('Data/missing.csv')
|
||||
Row 4: Couldn't convert: ['MSFT', '', '51.23']
|
||||
Row 7: Couldn't convert: ['IBM', '', '70.44']
|
||||
>>>
|
||||
```
|
||||
|
||||
To do this, you’ll need to change just a few parts of your code.
|
||||
|
||||
```python
|
||||
...
|
||||
for rowno, row in enumerate(rows, start=1):
|
||||
try:
|
||||
...
|
||||
except ValueError:
|
||||
print(f'Row {rowno}: Bad row: {row}')
|
||||
```
|
||||
|
||||
### (d) Using the `zip()` function
|
||||
|
||||
In the file `portfolio.csv`, the first line contains column headers. In all previous code, we’ve been discarding them.
|
||||
|
||||
```pycon
|
||||
>>> f = open('Data/portfolio.csv')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> headers = next(rows)
|
||||
>>> headers
|
||||
['name', 'shares', 'price']
|
||||
>>>
|
||||
```
|
||||
|
||||
However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
|
||||
First try this to pair the file headers with a row of data:
|
||||
|
||||
```pycon
|
||||
>>> row = next(rows)
|
||||
>>> row
|
||||
['AA', '100', '32.20']
|
||||
>>> list(zip(headers, row))
|
||||
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
|
||||
>>>
|
||||
```
|
||||
|
||||
Notice how `zip()` paired the column headers with the column values.
|
||||
We’ve used `list()` here to turn the result into a list so that you
|
||||
can see it. Normally, `zip()` creates an iterator that must be
|
||||
consumed by a for-loop.
|
||||
|
||||
This pairing is just an intermediate step to building a dictionary. Now try this:
|
||||
|
||||
```pycon
|
||||
>>> record = dict(zip(headers, row))
|
||||
>>> record
|
||||
{'price': '32.20', 'name': 'AA', 'shares': '100'}
|
||||
>>>
|
||||
```
|
||||
|
||||
This transformation is one of the most useful tricks to know about
|
||||
when processing a lot of data files. For example, suppose you wanted
|
||||
to make the `pcost.py` program work with various input files, but
|
||||
without regard for the actual column number where the name, shares,
|
||||
and price appear.
|
||||
|
||||
Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
|
||||
|
||||
```python
|
||||
# pcost.py
|
||||
|
||||
def portfolio_cost(filename):
|
||||
...
|
||||
for rowno, row in enumerate(rows, start=1):
|
||||
record = dict(zip(headers, row))
|
||||
try:
|
||||
nshares = int(record['shares'])
|
||||
price = float(record['price'])
|
||||
total_cost += nshares * price
|
||||
# This catches errors in int() and float() conversions above
|
||||
except ValueError:
|
||||
print(f'Row {rowno}: Bad row: {row}')
|
||||
...
|
||||
```
|
||||
|
||||
Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
|
||||
|
||||
```csv
|
||||
name,date,time,shares,price
|
||||
"AA","6/11/2007","9:50am",100,32.20
|
||||
"IBM","5/13/2007","4:20pm",50,91.10
|
||||
"CAT","9/23/2006","1:30pm",150,83.44
|
||||
"MSFT","5/17/2007","10:30am",200,51.23
|
||||
"GE","2/1/2006","10:45am",95,40.37
|
||||
"MSFT","10/31/2006","12:05pm",50,65.10
|
||||
"IBM","7/9/2006","3:15pm",100,70.44
|
||||
```
|
||||
|
||||
```python
|
||||
>>> portfolio_cost('Data/portfoliodate.csv')
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
If you did it right, you’ll find that your program still works even
|
||||
though the data file has a completely different column format than
|
||||
before. That’s cool!
|
||||
|
||||
The change made here is subtle, but significant. Instead of
|
||||
`portfolio_cost()` being hardcoded to read a single fixed file format,
|
||||
the new version reads any CSV file and picks the values of interest
|
||||
out of it. As long as the file has the required columns, the code will work.
|
||||
|
||||
Modify the `report.py` program you wrote in Section 2.3 that it uses
|
||||
the same technique to pick out column headers.
|
||||
|
||||
Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
|
||||
produces the same answer as before.
|
||||
|
||||
### (e) Inverting a dictionary
|
||||
|
||||
A dictionary maps keys to values. For example, a dictionary of stock prices.
|
||||
|
||||
```pycon
|
||||
>>> prices = {
|
||||
'GOOG' : 490.1,
|
||||
'AA' : 23.45,
|
||||
'IBM' : 91.1,
|
||||
'MSFT' : 34.23
|
||||
}
|
||||
>>>
|
||||
```
|
||||
|
||||
If you use the `items()` method, you can get `(key,value)` pairs:
|
||||
|
||||
```pycon
|
||||
>>> prices.items()
|
||||
dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
|
||||
>>>
|
||||
```
|
||||
|
||||
However, what if you wanted to get a list of `(value, key)` pairs instead?
|
||||
*Hint: use `zip()`.*
|
||||
|
||||
```pycon
|
||||
>>> pricelist = list(zip(prices.values(),prices.keys()))
|
||||
>>> pricelist
|
||||
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
|
||||
>>>
|
||||
```
|
||||
|
||||
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
|
||||
|
||||
```pycon
|
||||
>>> min(pricelist)
|
||||
(23.45, 'AA')
|
||||
>>> max(pricelist)
|
||||
(490.1, 'GOOG')
|
||||
>>> sorted(pricelist)
|
||||
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
|
||||
>>>
|
||||
```
|
||||
|
||||
This also illustrates an important feature of tuples. When used in
|
||||
comparisons, tuples are compared element-by-element starting with the
|
||||
first item. Similar to how strings are compared
|
||||
character-by-character.
|
||||
|
||||
`zip()` is often used in situations like this where you need to pair
|
||||
up data from different places. For example, pairing up the column
|
||||
names with column values in order to make a dictionary of named
|
||||
values.
|
||||
|
||||
Note that `zip()` is not limited to pairs. For example, you can use it
|
||||
with any number of input lists:
|
||||
|
||||
```pycon
|
||||
>>> a = [1, 2, 3, 4]
|
||||
>>> b = ['w', 'x', 'y', 'z']
|
||||
>>> c = [0.2, 0.4, 0.6, 0.8]
|
||||
>>> list(zip(a, b, c))
|
||||
[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
|
||||
>>>
|
||||
```
|
||||
|
||||
Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
|
||||
|
||||
```pycon
|
||||
>>> a = [1, 2, 3, 4, 5, 6]
|
||||
>>> b = ['x', 'y', 'z']
|
||||
>>> list(zip(a,b))
|
||||
[(1, 'x'), (2, 'y'), (3, 'z')]
|
||||
>>>
|
||||
```
|
||||
|
||||
[Next](05_Collections)
|
||||
160
Notes/02_Working_with_data/05_Collections.md
Normal file
160
Notes/02_Working_with_data/05_Collections.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# 2.5 `collections` module
|
||||
|
||||
The `collections` module provides a number of useful objects for data handling.
|
||||
This part briefly introduces some of these features.
|
||||
|
||||
### Example: Counting Things
|
||||
|
||||
Let's say you want to tabulate the total shares of each stock.
|
||||
|
||||
```python
|
||||
portfolio = [
|
||||
('GOOG', 100, 490.1),
|
||||
('IBM', 50, 91.1),
|
||||
('CAT', 150, 83.44),
|
||||
('IBM', 100, 45.23),
|
||||
('GOOG', 75, 572.45),
|
||||
('AA', 50, 23.15)
|
||||
]
|
||||
```
|
||||
|
||||
There are two `IBM` entries and two `GOOG` entries in this list. The shares need to be combined together somehow.
|
||||
|
||||
Solution: Use a `Counter`.
|
||||
|
||||
```python
|
||||
from collections import Counter
|
||||
total_shares = Counter()
|
||||
for name, shares, price in portfolio:
|
||||
total_shares[name] += shares
|
||||
|
||||
total_shares['IBM'] # 150
|
||||
```
|
||||
|
||||
### Example: One-Many Mappings
|
||||
|
||||
Problem: You want to map a key to multiple values.
|
||||
|
||||
```python
|
||||
portfolio = [
|
||||
('GOOG', 100, 490.1),
|
||||
('IBM', 50, 91.1),
|
||||
('CAT', 150, 83.44),
|
||||
('IBM', 100, 45.23),
|
||||
('GOOG', 75, 572.45),
|
||||
('AA', 50, 23.15)
|
||||
]
|
||||
```
|
||||
|
||||
Like in the previous example, the key `IBM` should have two different tuples instead.
|
||||
|
||||
Solution: Use a `defaultdict`.
|
||||
|
||||
```python
|
||||
from collections import defaultdict
|
||||
holdings = defaultdict(list)
|
||||
for name, shares, price in portfolio:
|
||||
holdings[name].append((shares, price))
|
||||
holdings['IBM'] # [ (50, 91.1), (100, 45.23) ]
|
||||
```
|
||||
|
||||
The `defaultdict` ensures that every time you access a key you get a default value.
|
||||
|
||||
### Example: Keeping a History
|
||||
|
||||
Problem: We want a history of the last N things.
|
||||
Solution: Use a `deque`.
|
||||
|
||||
```python
|
||||
from collections import deque
|
||||
|
||||
history = deque(maxlen=N)
|
||||
with open(filename) as f:
|
||||
for line in f:
|
||||
history.append(line)
|
||||
...
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
The `collections` module might be one of the most useful library
|
||||
modules for dealing with special purpose kinds of data handling
|
||||
problems such as tabulating and indexing.
|
||||
|
||||
In this exercise, we’ll look at a few simple examples. Start by
|
||||
running your `report.py` program so that you have the portfolio of
|
||||
stocks loaded in the interactive mode.
|
||||
|
||||
```bash
|
||||
bash % python3 -i report.py
|
||||
```
|
||||
|
||||
### (a) Tabulating with Counters
|
||||
|
||||
Suppose you wanted to tabulate the total number of shares of each stock.
|
||||
This is easy using `Counter` objects. Try it:
|
||||
|
||||
```pycon
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> from collections import Counter
|
||||
>>> holdings = Counter()
|
||||
>>> for s in portfolio:
|
||||
holdings[s['name']] += s['shares']
|
||||
|
||||
>>> holdings
|
||||
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||
>>>
|
||||
```
|
||||
|
||||
Carefully observe how the multiple entries for `MSFT` and `IBM` in `portfolio` get combined into a single entry here.
|
||||
|
||||
You can use a Counter just like a dictionary to retrieve individual values:
|
||||
|
||||
```python
|
||||
>>> holdings['IBM']
|
||||
150
|
||||
>>> holdings['MSFT']
|
||||
250
|
||||
>>>
|
||||
```
|
||||
|
||||
If you want to rank the values, do this:
|
||||
|
||||
```python
|
||||
>>> # Get three most held stocks
|
||||
>>> holdings.most_common(3)
|
||||
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
|
||||
>>>
|
||||
```
|
||||
|
||||
Let’s grab another portfolio of stocks and make a new Counter:
|
||||
|
||||
```pycon
|
||||
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
|
||||
>>> holdings2 = Counter()
|
||||
>>> for s in portfolio2:
|
||||
holdings2[s['name']] += s['shares']
|
||||
|
||||
>>> holdings2
|
||||
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
|
||||
>>>
|
||||
```
|
||||
|
||||
Finally, let’s combine all of the holdings doing one simple operation:
|
||||
|
||||
```pycon
|
||||
>>> holdings
|
||||
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||
>>> holdings2
|
||||
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
|
||||
>>> combined = holdings + holdings2
|
||||
>>> combined
|
||||
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
|
||||
>>>
|
||||
```
|
||||
|
||||
This is only a small taste of what counters provide. However, if you
|
||||
ever find yourself needing to tabulate values, you should consider
|
||||
using one.
|
||||
|
||||
[Next](06_List_comprehension)
|
||||
316
Notes/02_Working_with_data/06_List_comprehension.md
Normal file
316
Notes/02_Working_with_data/06_List_comprehension.md
Normal file
@@ -0,0 +1,316 @@
|
||||
# 2.6 List Comprehensions
|
||||
|
||||
A common task is processing items in a list. This section introduces list comprehensions,
|
||||
a useful tool for doing just that.
|
||||
|
||||
### Creating new lists
|
||||
|
||||
A list comprehension creates a new list by applying an operation to each element of a sequence.
|
||||
|
||||
```pycon
|
||||
>>> a = [1, 2, 3, 4, 5]
|
||||
>>> b = [2*x for x in a ]
|
||||
>>> b
|
||||
[2, 4, 6, 8, 10]
|
||||
>>>
|
||||
```
|
||||
|
||||
Another example:
|
||||
|
||||
```pycon
|
||||
>>> names = ['Elwood', 'Jake']
|
||||
>>> a = [name.lower() for name in names]
|
||||
>>> a
|
||||
['elwood', 'jake']
|
||||
>>>
|
||||
```
|
||||
|
||||
The general syntax is: `[ <expression> for <variable_name> in <sequence> ]`.
|
||||
|
||||
### Filtering
|
||||
|
||||
You can also filter during the list comprehension.
|
||||
|
||||
```pycon
|
||||
>>> a = [1, -5, 4, 2, -2, 10]
|
||||
>>> b = [2*x for x in a if x > 0 ]
|
||||
>>> b
|
||||
[2, 8, 4, 20]
|
||||
>>>
|
||||
```
|
||||
|
||||
### Use cases
|
||||
|
||||
List comprehensions are hugely useful. For example, you can collect values of a specific
|
||||
record field:
|
||||
|
||||
```python
|
||||
stocknames = [s['name'] for s in stocks]
|
||||
```
|
||||
|
||||
You can perform database-like queries on sequences.
|
||||
|
||||
```python
|
||||
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]
|
||||
```
|
||||
|
||||
You can also combine a list comprehension with a sequence reduction:
|
||||
|
||||
```python
|
||||
cost = sum([s['shares']*s['price'] for s in stocks])
|
||||
```
|
||||
|
||||
### General Syntax
|
||||
|
||||
```code
|
||||
[ <expression> for <variable_name> in <sequence> if <condition>]
|
||||
```
|
||||
|
||||
What it means:
|
||||
|
||||
```python
|
||||
result = []
|
||||
for variable_name in sequence:
|
||||
if condition:
|
||||
result.append(expression)
|
||||
```
|
||||
|
||||
### Historical Digression
|
||||
|
||||
List comprehension come from math (set-builder notation).
|
||||
|
||||
```code
|
||||
a = [ x * x for x in s if x > 0 ] # Python
|
||||
|
||||
a = { x^2 | x ∈ s, x > 0 } # Math
|
||||
```
|
||||
|
||||
It is also implemented in several other languages. Most
|
||||
coders probably aren't thinking about their math class though. So,
|
||||
it's fine to view it as a cool list shortcut.
|
||||
|
||||
## Exercises
|
||||
|
||||
Start by running your `report.py` program so that you have the portfolio of stocks loaded in the interactive mode.
|
||||
|
||||
```bash
|
||||
bash % python3 -i report.py
|
||||
```
|
||||
|
||||
Now, at the Python interactive prompt, type statements to perform the operations described below.
|
||||
These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.
|
||||
|
||||
### (a) List comprehensions
|
||||
|
||||
Try a few simple list comprehensions just to become familiar with the syntax.
|
||||
|
||||
```pycon
|
||||
>>> nums = [1,2,3,4]
|
||||
>>> squares = [ x * x for x in nums ]
|
||||
>>> squares
|
||||
[1, 4, 9, 16]
|
||||
>>> twice = [ 2 * x for x in nums if x > 2 ]
|
||||
>>> twice
|
||||
[6, 8]
|
||||
>>>
|
||||
```
|
||||
|
||||
Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.
|
||||
|
||||
### (b) Sequence Reductions
|
||||
|
||||
Compute the total cost of the portfolio using a single Python statement.
|
||||
|
||||
```pycon
|
||||
>>> cost = sum([ s['shares'] * s['price'] for s in portfolio ])
|
||||
>>> cost
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
After you have done that, show how you can compute the current value of the portfolio using a single statement.
|
||||
|
||||
```pycon
|
||||
>>> value = sum([ s['shares'] * prices[s['name']] for s in portfolio ])
|
||||
>>> value
|
||||
28686.1
|
||||
>>>
|
||||
```
|
||||
|
||||
Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list.
|
||||
|
||||
```pycon
|
||||
>>> [ s['shares'] * s['price'] for s in portfolio ]
|
||||
[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0]
|
||||
>>>
|
||||
```
|
||||
|
||||
The `sum()` function is then performing a reduction across the result:
|
||||
|
||||
```python
|
||||
>>> sum(_)
|
||||
44671.15
|
||||
>>>
|
||||
```
|
||||
|
||||
With this knowledge, you are now ready to go launch a big-data startup company.
|
||||
|
||||
### (c) Data Queries
|
||||
|
||||
Try the following examples of various data queries.
|
||||
|
||||
First, a list of all portfolio holdings with more than 100 shares.
|
||||
|
||||
```pycon
|
||||
>>> more100 = [ s for s in portfolio if s['shares'] > 100 ]
|
||||
>>> more100
|
||||
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
|
||||
>>>
|
||||
```
|
||||
|
||||
All portfolio holdings for MSFT and IBM stocks.
|
||||
|
||||
```pycon
|
||||
>>> msftibm = [ s for s in portfolio if s['name'] in {'MSFT','IBM'} ]
|
||||
>>> msftibm
|
||||
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200},
|
||||
{'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
|
||||
>>>
|
||||
```
|
||||
|
||||
A list of all portfolio holdings that cost more than $10000.
|
||||
|
||||
```pycon
|
||||
>>> cost10k = [ s for s in portfolio if s['shares'] * s['price'] > 10000 ]
|
||||
>>> cost10k
|
||||
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
|
||||
>>>
|
||||
```
|
||||
|
||||
### (d) Data Extraction
|
||||
|
||||
Show how you could build a list of tuples `(name, shares)` where `name` and `shares` are taken from `portfolio`.
|
||||
|
||||
```pycon
|
||||
>>> name_shares =[ (s['name'], s['shares']) for s in portfolio ]
|
||||
>>> name_shares
|
||||
[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)]
|
||||
>>>
|
||||
```
|
||||
|
||||
If you change the the square brackets (`[`,`]`) to curly braces (`{`, `}`), you get something known as a set comprehension.
|
||||
This gives you unique or distinct values.
|
||||
|
||||
For example, this determines the set of stock names that appear in `portfolio`:
|
||||
|
||||
```pycon
|
||||
>>> names = { s['name'] for s in portfolio }
|
||||
>>> names
|
||||
{ 'AA', 'GE', 'IBM', 'MSFT', 'CAT'] }
|
||||
>>>
|
||||
```
|
||||
|
||||
If you specify `key:value` pairs, you can build a dictionary.
|
||||
For example, make a dictionary that maps the name of a stock to the total number of shares held.
|
||||
|
||||
```pycon
|
||||
>>> holdings = { name: 0 for name in names }
|
||||
>>> holdings
|
||||
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
|
||||
>>>
|
||||
```
|
||||
|
||||
This latter feature is known as a **dictionary comprehension**. Let’s tabulate:
|
||||
|
||||
```pycon
|
||||
>>> for s in portfolio:
|
||||
holdings[s['name']] += s['shares']
|
||||
|
||||
>>> holdings
|
||||
{ 'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT':250, 'CAT': 150 }
|
||||
>>>
|
||||
```
|
||||
|
||||
Try this example that filters the `prices` dictionary down to only those names that appear in the portfolio:
|
||||
|
||||
```pycon
|
||||
>>> portfolio_prices = { name: prices[name] for name in names }
|
||||
>>> portfolio_prices
|
||||
{'AA': 9.22, 'GE': 13.48, 'IBM': 106.28, 'MSFT': 20.89, 'CAT': 35.46}
|
||||
>>>
|
||||
```
|
||||
|
||||
### (e) Advanced Bonus: Extracting Data From CSV Files
|
||||
|
||||
Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing.
|
||||
Here’s an example that shows how to extract selected columns from a CSV file.
|
||||
|
||||
First, read a row of header information from a CSV file:
|
||||
|
||||
```pycon
|
||||
>>> import csv
|
||||
>>> f = open('Data/portfoliodate.csv')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> headers = next(rows)
|
||||
>>> headers
|
||||
['name', 'date', 'time', 'shares', 'price']
|
||||
>>>
|
||||
```
|
||||
|
||||
Next, define a variable that lists the columns that you actually care about:
|
||||
|
||||
```pycon
|
||||
>>> select = ['name', 'shares', 'price']
|
||||
>>>
|
||||
```
|
||||
|
||||
Now, locate the indices of the above columns in the source CSV file:
|
||||
|
||||
```pycon
|
||||
>>> indices = [ headers.index(colname) for colname in select ]
|
||||
>>> indices
|
||||
[0, 3, 4]
|
||||
>>>
|
||||
```
|
||||
|
||||
Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:
|
||||
|
||||
```pycon
|
||||
>>> row = next(rows)
|
||||
>>> record = { colname: row[index] for colname, index in zip(select, indices) } # dict-comprehension
|
||||
>>> record
|
||||
{'price': '32.20', 'name': 'AA', 'shares': '100'}
|
||||
>>>
|
||||
```
|
||||
|
||||
If you’re feeling comfortable with what just happened, read the rest
|
||||
of the file:
|
||||
|
||||
```pycon
|
||||
>>> portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ]
|
||||
>>> portfolio
|
||||
[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'},
|
||||
{'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'},
|
||||
{'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
|
||||
>>>
|
||||
```
|
||||
|
||||
Oh my, you just reduced much of the `read_portfolio()` function to a single statement.
|
||||
|
||||
### Commentary
|
||||
|
||||
List comprehensions are commonly used in Python as an efficient means
|
||||
for transforming, filtering, or collecting data. Due to the syntax,
|
||||
you don’t want to go overboard—try to keep each list comprehension as
|
||||
simple as possible. It’s okay to break things into multiple
|
||||
steps. For example, it’s not clear that you would want to spring that
|
||||
last example on your unsuspecting co-workers.
|
||||
|
||||
That said, knowing how to quickly manipulate data is a skill that’s
|
||||
incredibly useful. There are numerous situations where you might have
|
||||
to solve some kind of one-off problem involving data imports, exports,
|
||||
extraction, and so forth. Becoming a guru master of list
|
||||
comprehensions can substantially reduce the time spent devising a
|
||||
solution. Also, don't forget about the `collections` module.
|
||||
|
||||
[Next](07_Objects)
|
||||
408
Notes/02_Working_with_data/07_Objects.md
Normal file
408
Notes/02_Working_with_data/07_Objects.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# 2.7 Objects
|
||||
|
||||
This section introduces more details about Python's internal object model and
|
||||
discusses some matters related to memory management, copying, and type checking.
|
||||
|
||||
### Assignment
|
||||
|
||||
Many operations in Python are related to *assigning* or *storing* values.
|
||||
|
||||
```python
|
||||
a = value # Assignment to a variable
|
||||
s[n] = value # Assignment to an list
|
||||
s.append(value) # Appending to a list
|
||||
d['key'] = value # Adding to a dictionary
|
||||
```
|
||||
|
||||
*A caution: assignment operations **never make a copy** of the value being assigned.*
|
||||
All assignments are merely reference copies (or pointer copies if you prefer).
|
||||
|
||||
### Assignment example
|
||||
|
||||
Consider this code fragment.
|
||||
|
||||
```python
|
||||
a = [1,2,3]
|
||||
b = a
|
||||
c = [a,b]
|
||||
```
|
||||
|
||||
A picture of the underlying memory operations. In this example, there
|
||||
is only one list object `[1,2,3]`, but there are four different
|
||||
references to it.
|
||||
|
||||
This means that modifying a value affects *all* references.
|
||||
|
||||
```pycon
|
||||
>>> a.append(999)
|
||||
>>> a
|
||||
[1,2,3,999]
|
||||
>>> b
|
||||
[1,2,3,999]
|
||||
>>> c
|
||||
[[1,2,3,999], [1,2,3,999]]
|
||||
>>>
|
||||
```
|
||||
|
||||
Notice how a change in the original list shows up everywhere else (yikes!).
|
||||
This is because no copies were ever made. Everything is pointing to the same thing.
|
||||
|
||||
### Reassigning values
|
||||
|
||||
Reassigning a value *never* overwrites the memory used by the previous value.
|
||||
|
||||
```pycon
|
||||
a = [1,2,3]
|
||||
b = a
|
||||
a = [4,5,6]
|
||||
|
||||
print(a) # [4, 5, 6]
|
||||
print(b) # [1, 2, 3] Holds the original value
|
||||
```
|
||||
|
||||
Remember: **Variables are names, not memory locations.**
|
||||
|
||||
### Some Dangers
|
||||
|
||||
If you don't know about this sharing, you will shoot yourself in the
|
||||
foot at some point. Typical scenario. You modify some data thinking
|
||||
that it's your own private copy and it accidentally corrupts some data
|
||||
in some other part of the program.
|
||||
|
||||
*Comment: This is one of the reasons why the primitive datatypes (int, float, string) are immutable (read-only).*
|
||||
|
||||
### Identity and References
|
||||
|
||||
Use ths `is` operator to check if two values are exactly the same object.
|
||||
|
||||
```pycon
|
||||
>>> a = [1,2,3]
|
||||
>>> b = a
|
||||
>>> a is b
|
||||
True
|
||||
>>>
|
||||
```
|
||||
|
||||
`is` compares the object identity (an integer). The identity can be
|
||||
obtained using `id()`.
|
||||
|
||||
```pycon
|
||||
>>> id(a)
|
||||
3588944
|
||||
>>> id(b)
|
||||
3588944
|
||||
>>>
|
||||
```
|
||||
|
||||
### Shallow copies
|
||||
|
||||
Lists and dicts have methods for copying.
|
||||
|
||||
```pycon
|
||||
>>> a = [2,3,[100,101],4]
|
||||
>>> b = list(a) # Make a copy
|
||||
>>> a is b
|
||||
False
|
||||
```
|
||||
|
||||
It's a new list, but the list items are shared.
|
||||
|
||||
```python
|
||||
>>> a[2].append(102)
|
||||
>>> b[2]
|
||||
[100,101,102]
|
||||
>>>
|
||||
>>> a[2] is b[2]
|
||||
True
|
||||
>>>
|
||||
```
|
||||
|
||||
For example, the inner list `[100, 101]` is being shared.
|
||||
This is knows as a shallow copy.
|
||||
|
||||
### Deep copies
|
||||
|
||||
Sometimes you need to make a copy of an object and all the objects contained withn it.
|
||||
You can use the `copy` module for this:
|
||||
|
||||
```pycon
|
||||
>>> a = [2,3,[100,101],4]
|
||||
>>> import copy
|
||||
>>> b = copy.deepcopy(a)
|
||||
>>> a[2].append(102)
|
||||
>>> b[2]
|
||||
[100,101]
|
||||
>>> a[2] is b[2]
|
||||
False
|
||||
>>>
|
||||
```
|
||||
|
||||
### Names, Values, Types
|
||||
|
||||
Variable names do not have a *type*. It's only a name.
|
||||
However, values *do* have an underlying type.
|
||||
|
||||
```pycon
|
||||
>>> a = 42
|
||||
>>> b = 'Hello World'
|
||||
>>> type(a)
|
||||
<type 'int'>
|
||||
>>> type(b)
|
||||
<type 'str'>
|
||||
```
|
||||
|
||||
`type()` will tell you what it is. The type name is usually a function
|
||||
that creates or converts a value to that type.
|
||||
|
||||
### Type Checking
|
||||
|
||||
How to tell if an object is a specific type.
|
||||
|
||||
```python
|
||||
if isinstance(a,list):
|
||||
print('a is a list')
|
||||
```
|
||||
|
||||
Checking for one of many types.
|
||||
|
||||
```python
|
||||
if isinstance(a, (list,tuple)):
|
||||
print('a is a list or tuple')
|
||||
```
|
||||
|
||||
*Caution: Don't go overboard with type checking. It can lead to excessive complexity.*
|
||||
|
||||
### Everything is an object
|
||||
|
||||
Numbers, strings, lists, functions, exceptions, classes, instances,
|
||||
etc. are all objects. It means that all objects that can be named can
|
||||
be passed around as data, placed in containers, etc., without any
|
||||
restrictions. There are no *special* kinds of objects. Sometimes it
|
||||
is said that all objects are "first-class".
|
||||
|
||||
A simple example:
|
||||
|
||||
```pycon
|
||||
>>> import math
|
||||
>>> items = [abs, math, ValueError ]
|
||||
>>> items
|
||||
[<built-in function abs>,
|
||||
<module 'math' (builtin)>,
|
||||
<type 'exceptions.ValueError'>]
|
||||
>>> items[0](-45)
|
||||
45
|
||||
>>> items[1].sqrt(2)
|
||||
1.4142135623730951
|
||||
>>> try:
|
||||
x = int('not a number')
|
||||
except items[2]:
|
||||
print('Failed!')
|
||||
Failed!
|
||||
>>>
|
||||
```
|
||||
|
||||
Here, `items` is a list containing a function, a module and an exception.
|
||||
You can use the items in the list in place of the original names:
|
||||
|
||||
```python
|
||||
items[0](-45) # abs
|
||||
items[1].sqrt(2) # math
|
||||
except items[2]: # ValueError
|
||||
```
|
||||
|
||||
## Exercises
|
||||
|
||||
In this set of exercises, we look at some of the power that comes from first-class
|
||||
objects.
|
||||
|
||||
### (a) First-class Data
|
||||
|
||||
In the file `Data/portfolio.csv`, we read data organized as columns that look like this:
|
||||
|
||||
```csv
|
||||
name,shares,price
|
||||
"AA",100,32.20
|
||||
"IBM",50,91.10
|
||||
...
|
||||
```
|
||||
|
||||
In previous code, we used the `csv` module to read the file, but still had to perform manual type conversions. For example:
|
||||
|
||||
```python
|
||||
for row in rows:
|
||||
name = row[0]
|
||||
shares = int(row[1])
|
||||
price = float(row[2])
|
||||
```
|
||||
|
||||
This kind of conversion can also be performed in a more clever manner using some list basic operations.
|
||||
|
||||
Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:
|
||||
|
||||
```pycon
|
||||
>>> types = [str, int, float]
|
||||
>>>
|
||||
```
|
||||
|
||||
The reason you can even create this list is that everything in Python
|
||||
is *first-class*. So, if you want to have a list of functions, that’s
|
||||
fine. The items in the list you created are functions for converting
|
||||
a value `x` into a given type (e.g., `str(x)`, `int(x)`, `float(x)`).
|
||||
|
||||
Now, read a row of data from the above file:
|
||||
|
||||
```pycon
|
||||
>>> import csv
|
||||
>>> f = open('Data/portfolio.csv')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> headers = next(rows)
|
||||
>>> row = next(rows)
|
||||
>>> row
|
||||
['AA', '100', '32.20']
|
||||
>>>
|
||||
```
|
||||
|
||||
As noted, this row isn’t enough to do calculations because the types are wrong. For example:
|
||||
|
||||
```pycon
|
||||
>>> row[1] * row[2]
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: can't multiply sequence by non-int of type 'str'
|
||||
>>>
|
||||
```
|
||||
|
||||
However, maybe the data can be paired up with the types you specified in `types`. For example:
|
||||
|
||||
```pycon
|
||||
>>> types[1]
|
||||
<type 'int'>
|
||||
>>> row[1]
|
||||
'100'
|
||||
>>>
|
||||
```
|
||||
|
||||
Try converting one of the values:
|
||||
|
||||
```pycon
|
||||
>>> types[1](row[1]) # Same as int(row[1])
|
||||
100
|
||||
>>>
|
||||
```
|
||||
|
||||
Try converting a different value:
|
||||
|
||||
```pycon
|
||||
>>> types[2](row[2]) # Same as float(row[2])
|
||||
32.2
|
||||
>>>
|
||||
```
|
||||
|
||||
Try the calculation with converted values:
|
||||
|
||||
```pycon
|
||||
>>> types[1](row[1])*types[2](row[2])
|
||||
3220.0000000000005
|
||||
>>>
|
||||
```
|
||||
|
||||
Zip the column types with the fields and look at the result:
|
||||
|
||||
```pycon
|
||||
>>> r = list(zip(types, row))
|
||||
>>> r
|
||||
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
|
||||
>>>
|
||||
```
|
||||
|
||||
You will notice that this has paired a type conversion with a
|
||||
value. For example, `int` is paired with the value `'100'`.
|
||||
|
||||
The zipped list is useful if you want to perform conversions on all of the values, one
|
||||
after the other. Try this:
|
||||
|
||||
```pycon
|
||||
>>> converted = []
|
||||
>>> for func, val in zip(types, row):
|
||||
converted.append(func(val))
|
||||
...
|
||||
>>> converted
|
||||
['AA', 100, 32.2]
|
||||
>>> converted[1] * converted[2]
|
||||
3220.0000000000005
|
||||
>>>
|
||||
```
|
||||
|
||||
Make sure you understand what’s happening in the above code.
|
||||
In the loop, the `func` variable is one of the type conversion functions (e.g.,
|
||||
`str`, `int`, etc.) and the `val` variable is one of the values like
|
||||
`'AA'`, `'100'`. The expression `func(val)` is converting a value (kind of like a type cast).
|
||||
|
||||
The above code can be compressed into a single list comprehension.
|
||||
|
||||
```pycon
|
||||
>>> converted = [func(val) for func, val in zip(types, row)]
|
||||
>>> converted
|
||||
['AA', 100, 32.2]
|
||||
>>>
|
||||
```
|
||||
|
||||
### (b) Making dictionaries
|
||||
|
||||
Remember how the `dict()` function can easily make a dictionary if you have a sequence of key names and values?
|
||||
Let’s make a dictionary from the column headers:
|
||||
|
||||
```pycon
|
||||
>>> headers
|
||||
['name', 'shares', 'price']
|
||||
>>> converted
|
||||
['AA', 100, 32.2]
|
||||
>>> dict(zip(headers, converted))
|
||||
{'price': 32.2, 'name': 'AA', 'shares': 100}
|
||||
>>>
|
||||
```
|
||||
|
||||
Of course, if you’re up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:
|
||||
|
||||
```pycon
|
||||
>>> { name: func(val) for name, func, val in zip(headers, types, row) }
|
||||
{'price': 32.2, 'name': 'AA', 'shares': 100}
|
||||
>>>
|
||||
```
|
||||
|
||||
### (c) The Big Picture
|
||||
|
||||
Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary.
|
||||
|
||||
Just to illustrate, suppose you read data from a different datafile like this:
|
||||
|
||||
```pycon
|
||||
>>> f = open('Data/dowstocks.csv')
|
||||
>>> rows = csv.reader(f)
|
||||
>>> headers = next(rows)
|
||||
>>> row = next(rows)
|
||||
>>> headers
|
||||
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
|
||||
>>> row
|
||||
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
|
||||
>>>
|
||||
```
|
||||
|
||||
Let’s convert the fields using a similar trick:
|
||||
|
||||
```pycon
|
||||
>>> types = [str, float, str, str, float, float, float, float, int]
|
||||
>>> converted = [func(val) for func, val in zip(types, row)]
|
||||
>>> record = dict(zip(headers, converted))
|
||||
>>> record
|
||||
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
|
||||
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
|
||||
'change': -0.18}
|
||||
>>> record['name']
|
||||
'AA'
|
||||
>>> record['price']
|
||||
39.48
|
||||
>>>
|
||||
```
|
||||
|
||||
Spend some time to ponder what you’ve done in this exercise. We’ll revisit these ideas a little later.
|
||||
Reference in New Issue
Block a user