From c06face3a5b5160f7e313b0e199276467faf76e4 Mon Sep 17 00:00:00 2001 From: David Beazley Date: Mon, 25 May 2020 12:40:11 -0500 Subject: [PATCH] Added sections 2-4 --- Notes/02_Working_with_data/00_Overview.md | 7 + Notes/02_Working_with_data/01_Datatypes.md | 431 ++++++++++++++ Notes/02_Working_with_data/02_Containers.md | 413 ++++++++++++++ Notes/02_Working_with_data/03_Formatting.md | 276 +++++++++ Notes/02_Working_with_data/04_Sequences.md | 538 ++++++++++++++++++ Notes/02_Working_with_data/05_Collections.md | 160 ++++++ .../06_List_comprehension.md | 316 ++++++++++ Notes/02_Working_with_data/07_Objects.md | 408 +++++++++++++ Notes/03_Program_organization/00_Overview.md | 11 + Notes/03_Program_organization/01_Script.md | 275 +++++++++ .../02_More_functions.md | 491 ++++++++++++++++ .../03_Error_checking.md | 393 +++++++++++++ Notes/03_Program_organization/04_Modules.md | 317 +++++++++++ .../03_Program_organization/05_Main_module.md | 299 ++++++++++ .../06_Design_discussion.md | 132 +++++ Notes/04_Classes_objects/00_Overview.md | 35 ++ Notes/04_Classes_objects/01_Class.md | 253 ++++++++ Notes/04_Classes_objects/02_Inheritance.md | 502 ++++++++++++++++ .../04_Classes_objects/03_Special_methods.md | 332 +++++++++++ .../04_Defining_exceptions.md | 49 ++ _layouts/default.html | 1 + 21 files changed, 5639 insertions(+) create mode 100644 Notes/02_Working_with_data/00_Overview.md create mode 100644 Notes/02_Working_with_data/01_Datatypes.md create mode 100644 Notes/02_Working_with_data/02_Containers.md create mode 100644 Notes/02_Working_with_data/03_Formatting.md create mode 100644 Notes/02_Working_with_data/04_Sequences.md create mode 100644 Notes/02_Working_with_data/05_Collections.md create mode 100644 Notes/02_Working_with_data/06_List_comprehension.md create mode 100644 Notes/02_Working_with_data/07_Objects.md create mode 100644 Notes/03_Program_organization/00_Overview.md create mode 100644 Notes/03_Program_organization/01_Script.md create mode 100644 Notes/03_Program_organization/02_More_functions.md create mode 100644 Notes/03_Program_organization/03_Error_checking.md create mode 100644 Notes/03_Program_organization/04_Modules.md create mode 100644 Notes/03_Program_organization/05_Main_module.md create mode 100644 Notes/03_Program_organization/06_Design_discussion.md create mode 100644 Notes/04_Classes_objects/00_Overview.md create mode 100644 Notes/04_Classes_objects/01_Class.md create mode 100644 Notes/04_Classes_objects/02_Inheritance.md create mode 100644 Notes/04_Classes_objects/03_Special_methods.md create mode 100644 Notes/04_Classes_objects/04_Defining_exceptions.md diff --git a/Notes/02_Working_with_data/00_Overview.md b/Notes/02_Working_with_data/00_Overview.md new file mode 100644 index 0000000..e05cd35 --- /dev/null +++ b/Notes/02_Working_with_data/00_Overview.md @@ -0,0 +1,7 @@ +# Working With Data Overview + +In this section, we look at how Python programmers represent and work with data. + +Most programs today work with data. We are going to learn to common programming idioms and how to not shoot yourself in the foot. + +We will take a look at part of the object-model in Python. Which is a big part of understanding most Python programs. diff --git a/Notes/02_Working_with_data/01_Datatypes.md b/Notes/02_Working_with_data/01_Datatypes.md new file mode 100644 index 0000000..000c6c5 --- /dev/null +++ b/Notes/02_Working_with_data/01_Datatypes.md @@ -0,0 +1,431 @@ +# 2.1 Datatypes and Data structures + +This section introduces data structures in the form of tuples and dicts. + +### Primitive Datatypes + +Python has a few primitive types of data: + +* Integers +* Floating point numbers +* Strings (text) + +We have learned about these in the previous section. + +### None type + +```python +email_address = None +``` + +This type is often used as a placeholder for optional or missing value. + +```python +if email_address: + send_email(email_address, msg) +``` + +### Data Structures + +Real programs have more complex data than the ones that can be easily represented by the datatypes learned so far. +For example information about a stock: + +```code +100 shares of GOOG at $490.10 +``` + +This is an "object" with three parts: + +* Name or symbol of the stock ("GOOG", a string) +* Number of shares (100, an integer) +* Price (490.10 a float) + +### Tuples + +A tuple is a collection of values grouped together. + +Example: + +```python +s = ('GOOG', 100, 490.1) +``` + +Sometimes the `()` are ommitted in the syntax. + +```python +s = 'GOOG', 100, 490.1 +``` + +Special cases (0-tuple, 1-typle). + +```python +t = () # An empty tuple +w = ('GOOG', ) # A 1-item tuple +``` + +Tuples are usually used to represent *simple* records or structures. +Typically, it is a single *object* of multiple parts. A good analogy: *A tuple is like a single row in a database table.* + +Tuple contents are ordered (like an array). + +```python +s = ('GOOG', 100, 490.1) +name = s[0] # 'GOOG' +shares = s[1] # 100 +price = s[2] # 490.1 +``` + +However, th contents can't be modified. + +```pycon +>>> s[1] = 75 +TypeError: object does not support item assignment +``` + +You can, however, make a new tuple based on a current tuple. + +```python +s = (s[0], 75, s[2]) +``` + +### Tuple Packing + +Tuples are focused more on packing related items together into a single *entity*. + +```python +s = ('GOOG', 100, 490.1) +``` + +The tuple is then easy to pass around to other parts of a program as a single object. + +### Tuple Unpacking + +To use the tuple elsewhere, you can unpack its parts into variables. + +```python +name, shares, price = s +print('Cost', shares * price) +``` + +The number of variables must match the tuple structure. + +```python +name, shares = s # ERROR +Traceback (most recent call last): +... +ValueError: too many values to unpack +``` + +### Tuples vs. Lists + +Tuples are NOT just read-only lists. Tuples are most ofter used for a *single item* consisting of multiple parts. +Lists are usually a collection of distinct items, usually all of the same type. + +```python +record = ('GOOG', 100, 490.1) # A tuple representing a stock in a portfolio + +symbols = [ 'GOOG', 'AAPL', 'IBM' ] # A List representing three stock symbols +``` + +### Dictionaries + +A dictionary is a hash table or associative array. +It is a collection of values indexed by *keys*. These keys serve as field names. + +```python +s = { + 'name': 'GOOG', + 'shares': 100, + 'price': 490.1 +} +``` + +### Common operations + +To read values from a dictionary use the key names. + +```pycon +>>> print(s['name'], s['shares']) +GOOG 100 +>>> s['price'] +490.10 +>>> +``` + +To add or modify values assign using the key names. + +```pycon +>>> s['shares'] = 75 +>>> s['date'] = '6/6/2007' +>>> +``` + +To delete a value use the `del` statement. + +```pycon +>>> del s['date'] +>>> +``` + +### Why dictionaries? + +Dictionaries are useful when there are *many* different values and those values +might be modified or manipulated. Dictionaries make your code more readable. + +```python +s['price'] +# vs +s[2] +``` + +## Exercises + +### Note + +In the last few exercises, you wrote a program that read a datafile `Data/portfolio.csv`. Using the `csv` module, it is easy to read the file row-by-row. + +```pycon +>>> import csv +>>> f = open('Data/portfolio.csv') +>>> rows = csv.reader(f) +>>> next(rows) +['name', 'shares', 'price'] +>>> row = next(rows) +>>> row +['AA', '100', '32.20'] +>>> +``` + +Although reading the file is easy, you often want to do more with the data than read it. +For instance, perhaps you want to store it and start performing some calculations on it. +Unfortunately, a raw "row" of data doesn’t give you enough to work with. For example, even a simple math calculation doesn’t work: + +```pycon +>>> row = ['AA', '100', '32.20'] +>>> cost = row[1] * row[2] +Traceback (most recent call last): + File "", line 1, in +TypeError: can't multiply sequence by non-int of type 'str' +>>> +``` + +To do more, you typically want to interpret the raw data in some way and turn it into a more useful kind of object so that you can work with it later. +Two simple options are tuples or dictionaries. + +### (a) Tuples + +At the interactive prompt, create the following tuple that represents +the above row, but with the numeric columns converted to proper +numbers: + +```pycon +>>> t = (row[0], int(row[1]), float(row[2])) +>>> t +('AA', 100, 32.2) +>>> +``` + +Using this, you can now calculate the total cost by multiplying the shares and the price: + +```pycon +>>> cost = t[1] * t[2] +>>> cost +3220.0000000000005 +>>> +``` + +Is math broken in Python? What’s the deal with the answer of +3220.0000000000005? + +This is an artifact of the floating point hardware on your computer +only being able to accurately represent decimals in Base-2, not +Base-10. For even simple calculations involving base-10 decimals, +small errors are introduced. This is normal, although perhaps a bit +surprising if you haven’t seen it before. + +This happens in all programming languages that use floating point +decimals, but it often gets hidden when printing. For example: + +```pycon +>>> print(f'{cost:0.2f}') +3220.00 +>>> +``` + +Tuples are read-only. Verify this by trying to change the number of shares to 75. + +```pycon +>>> t[1] = 75 +Traceback (most recent call last): + File "", line 1, in +TypeError: 'tuple' object does not support item assignment +>>> +``` + +Although you can’t change tuple contents, you can always create a completely new tuple that replaces the old one. + +```pycon +>>> t = (t[0], 75, t[2]) +>>> t +('AA', 75, 32.2) +>>> +``` + +Whenever you reassign an existing variable name like this, the old +value is discarded. Although the above assignment might look like you +are modifying the tuple, you are actually creating a new tuple and +throwing the old one away. + +Tuples are often used to pack and unpack values into variables. Try the following: + +```pycon +>>> name, shares, price = t +>>> name +'AA' +>>> shares +75 +>>> price +32.2 +>>> +``` + +Take the above variables and pack them back into a tuple + +```pycon +>>> t = (name, 2*shares, price) +>>> t +('AA', 150, 32.2) +>>> +``` + +### (b) Dictionaries as a data structure + +An alternative to a tuple is to create a dictionary instead. + +```pycon +>>> d = { + 'name' : row[0], + 'shares' : int(row[1]), + 'price' : float(row[2]) + } +>>> d +{'name': 'AA', 'shares': 100, 'price': 32.2 } +>>> +``` + +Calculate the total cost of this holding: + +```pycon +>>> cost = d['shares'] * d['price'] +>>> cost +3220.0000000000005 +>>> +``` + +Compare this example with the same calculation involving tuples above. Change the number of shares to 75. + +```pycon +>>> d['shares'] = 75 +>>> d +{'name': 'AA', 'shares': 75, 'price': 75} +>>> +``` + +Unlike tuples, dictionaries can be freely modified. Add some attributes: + +```pycon +>>> d['date'] = (6, 11, 2007) +>>> d['account'] = 12345 +>>> d +{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007), 'account': 12345} +>>> +``` + +### (c) Some additional dictionary operations + +If you turn a dictionary into a list, you’ll get all of its keys: + +```pycon +>>> list(d) +['name', 'shares', 'price', 'date', 'account'] +>>> +``` + +Similarly, if you use the `for` statement to iterate on a dictionary, you will get the keys: + +```pycon +>>> for k in d: + print('k =', k) + +k = name +k = shares +k = price +k = date +k = account +>>> +``` + +Try this variant that performs a lookup at the same time: + +```pycon +>>> for k in d: + print(k, '=', d[k]) + +name = AA +shares = 75 +price = 32.2 +date = (6, 11, 2007) +account = 12345 +>>> +``` + +You can also obtain all of the keys using the `keys()` method: + +```pycon +>>> keys = d.keys() +>>> keys +dict_keys(['name', 'shares', 'price', 'date', 'account']) +>>> +``` + +`keys()` is a bit unusual in that it returns a special `dict_keys` object. + +This is an overlay on the original dictionary that always gives you the current keys—even if the dictionary changes. For example, try this: + +```pycon +>>> del d['account'] +>>> keys +dict_keys(['name', 'shares', 'price', 'date']) +>>> +``` + +Carefully notice that the `'account'` disappeared from `keys` even though you didn’t call `d.keys()` again. + +A more elegant way to work with keys and values together is to use the `items()` method. This gives you `(key, value)` tuples: + +```pycon +>>> items = d.items() +>>> items +dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))]) +>>> for k, v in d.items(): + print(k, '=', v) + +name = AA +shares = 75 +price = 32.2 +date = (6, 11, 2007) +>>> +``` + +If you have tuples such as `items`, you can create a dictionary using the `dict()` function. Try it: + +```pycon +>>> items +dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))]) +>>> d = dict(items) +>>> d +{'name': 'AA', 'shares': 75, 'price':32.2, 'date': (6, 11, 2007)} +>>> +``` + +[Next](02_Containers) diff --git a/Notes/02_Working_with_data/02_Containers.md b/Notes/02_Working_with_data/02_Containers.md new file mode 100644 index 0000000..f1ab6fe --- /dev/null +++ b/Notes/02_Working_with_data/02_Containers.md @@ -0,0 +1,413 @@ +# Containers + +### Overview + +Programs often have to work with many objects. + +* A portfolio of stocks +* A table of stock prices + +There are three main choices to use. + +* Lists. Ordered data. +* Dictionaries. Unordered data. +* Sets. Unordered collection + +### Lists as a Container + +Use a list when the order of the data matters. Remember that lists can hold any kind of objects. +For example, a list of tuples. + +```python +portfolio = [ + ('GOOG', 100, 490.1), + ('IBM', 50, 91.3), + ('CAT', 150, 83.44) +] + +portfolio[0] # ('GOOG', 100, 490.1) +portfolio[2] # ('CAT', 150, 83.44) +``` + +### List construction + +Building a list from scratch. + +```python +records = [] # Initial empty list + +# Use .append() to add more items +records.append(('GOOG', 100, 490.10)) +records.append(('IBM', 50, 91.3)) +... +``` + +An example when reading records from a file. + +```python +records = [] # Initial empty list + +with open('portfolio.csv', 'rt') as f: + for line in f: + row = line.split(',') + records.append((row[0], int(row[1])), float(row[2])) +``` + +### Dicts as a Container + +Dictionaries are useful if you want fast random lookups (by key name). For +example, a dictionary of stock prices: + +```python +prices = { + 'GOOG': 513.25, + 'CAT': 87.22, + 'IBM': 93.37, + 'MSFT': 44.12 +} +``` + +Here are some simple lookups: + +```pycon +>>> prices['IBM'] +93.37 +>>> prices['GOOG'] +513.25 +>>> +``` + +### Dict Construction + +Example of building a dict from scratch. + +```python +prices = {} # Initial empty dict + +# Insert new items +prices['GOOG'] = 513.25 +prices['CAT'] = 87.22 +prices['IBM'] = 93.37 +``` + +An example populating the dict from the contents of a file. + +```python +prices = {} # Initial empty dict + +with open('prices.csv', 'rt') as f: + for line in f: + row = line.split(',') + prices[row[0]] = float(row[1]) +``` + +### Dictionary Lookups + +You can test the existence of a key. + +```python +if key in d: + # YES +else: + # NO +``` + +You can look up a value that might not exist and provide a default value in case it doesn't. + +```python +name = d.get(key, default) +``` + +An example: + +```python +>>> prices.get('IBM', 0.0) +93.37 +>>> prices.get('SCOX', 0.0) +0.0 +>>> +``` + +### Composite keys + +Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable. +For example, tuples: + +```python +holidays = { + (1, 1) : 'New Years', + (3, 14) : 'Pi day', + (9, 13) : "Programmer's day", +} +``` + +Then to access: + +```pycon +>>> holidays[3, 14] 'Pi day' +>>> +``` + +*Neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.* + +### Sets + +Sets are collection of unordered unique items. + +```python +tech_stocks = { 'IBM','AAPL','MSFT' } +# Alternative sintax +tech_stocks = set(['IBM', 'AAPL', 'MSFT']) +``` + +Sets are useful for membership tests. + +```pycon +>>> tech_stocks +set(['AAPL', 'IBM', 'MSFT']) +>>> 'IBM' in tech_stocks +True +>>> 'FB' in tech_stocks +False +>>> +``` + +Sets are also useful for duplicate elimination. + +```python +names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO'] + +unique = set(names) +# unique = set(['IBM', 'AAPL','GOOG','YHOO']) +``` + +Additional set operations: + +```python +names.add('CAT') # Add an item +names.remove('YHOO') # Remove an item + +s1 | s2 # Set union +s1 & s2 # Set intersection +s1 - s2 # Set difference +``` + +## Exercises + +### Objectives + +### Exercise A: A list of tuples + +The file `Data/portfolio.csv` contains a list of stocks in a portfolio. +In [Section 1.7](), you wrote a function `portfolio_cost(filename)` that read this file and performed a simple calculation. + +Your code should have looked something like this: + +```python +# pcost.py + +import csv + +def portfolio_cost(filename): + '''Computes the total cost (shares*price) of a portfolio file''' + total_cost = 0.0 + + with open(filename, 'rt') as f: + rows = csv.reader(f) + headers = next(rows) + for row in rows: + nshares = int(row[1]) + price = float(row[2]) + total_cost += nshares * price + return total_cost +``` + +Using this code as a rough guide, create a new file `report.py`. In +that file, define a function `read_portfolio(filename)` that opens a +given portfolio file and reads it into a list of tuples. To do this, +you’re going to make a few minor modifications to the above code. + +First, instead of defining `total_cost = 0`, you’ll make a variable that’s initially set to an empty list. For example: + +```python +portfolio = [] +``` + +Next, instead of totaling up the cost, you’ll turn each row into a +tuple exactly as you just did in the last exercise and append it to +this list. For example: + +```python +for row in rows: + holding = (row[0], int(row[1]), float(row[2])) + portfolio.append(holding) +``` + +Finally, you’ll return the resulting `portfolio` list. + +Experiment with your function interactively (just a reminder that in order to do this, you first have to run the `report.py` program in the interpreter): + +*Hint: Use `-i` when executing the file in the terminal* + +```pycon +>>> portfolio = read_portfolio('Data/portfolio.csv') +>>> portfolio +[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23), + ('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)] +>>> +>>> portfolio[0] +('AA', 100, 32.2) +>>> portfolio[1] +('IBM', 50, 91.1) +>>> portfolio[1][1] +50 +>>> total = 0.0 +>>> for s in portfolio: + total += s[1] * s[2] + +>>> print(total) +44671.15 +>>> +``` + +This list of tuples that you have created is very similar to a 2-D array. +For example, you can access a specific column and row using a lookup such as `portfolio[row][column]` where `row` and `column` are integers. + +That said, you can also rewrite the last for-loop using a statement like this: + +```python +>>> total = 0.0 +>>> for name, shares, price in portfolio: + total += shares*price + +>>> print(total) +44671.15 +>>> +``` + +### (b) List of Dictionaries + +Take the function you wrote in part (a) and modify to represent each stock in the portfolio with a dictionary instead of a tuple. +In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file. + +Experiment with this new function in the same manner as you did in part (a). + +```pycon +>>> portfolio = read_portfolio('portfolio.csv') +>>> portfolio +[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1}, + {'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23}, + {'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1}, + {'name': 'IBM', 'shares': 100, 'price': 70.44}] +>>> portfolio[0] +{'name': 'AA', 'shares': 100, 'price': 32.2} +>>> portfolio[1] +{'name': 'IBM', 'shares': 50, 'price': 91.1} +>>> portfolio[1]['shares'] +50 +>>> total = 0.0 +>>> for s in portfolio: + total += s['shares']*s['price'] + +>>> print(total) +44671.15 +>>> +``` + +Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers. +This is often preferred because the resulting code is easier to read later. + +Viewing large dictionaries and lists can be messy. To clean up the output for debugging, considering using the `pprint` function. + +```pycon +>>> from pprint import pprint +>>> pprint(portfolio) +[{'name': 'AA', 'price': 32.2, 'shares': 100}, + {'name': 'IBM', 'price': 91.1, 'shares': 50}, + {'name': 'CAT', 'price': 83.44, 'shares': 150}, + {'name': 'MSFT', 'price': 51.23, 'shares': 200}, + {'name': 'GE', 'price': 40.37, 'shares': 95}, + {'name': 'MSFT', 'price': 65.1, 'shares': 50}, + {'name': 'IBM', 'price': 70.44, 'shares': 100}] +>>> +``` + +### (c) Dictionaries as a container + +A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer. +In the Python shell, try playing with a dictionary: + +```pycon +>>> prices = { } +>>> prices['IBM'] = 92.45 +>>> prices['MSFT'] = 45.12 +>>> prices +... look at the result ... +>>> prices['IBM'] +92.45 +>>> prices['AAPL'] +... look at the result ... +>>> 'AAPL' in prices +False +>>> +``` + +The file `Data/prices.csv` contains a series of lines with stock prices. +The file looks something like this: + +```csv +"AA",9.22 +"AXP",24.85 +"BA",44.85 +"BAC",11.27 +"C",3.72 +... +``` + +Write a function `read_prices(filename)` that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices. + +To do this, start with an empty dictionary and start inserting values into it just +as you did above. However, you are reading the values from a file now. + +We’ll use this data structure to quickly lookup the price of a given stock name. + +A few little tips that you’ll need for this part. First, make sure you use the `csv` module just as you did before—there’s no need to reinvent the wheel here. + +```pycon +>>> import csv +>>> f = open('Data/prices.csv', 'r') +>>> rows = csv.reader(f) +>>> for row in rows: + print(row) + + +['AA', '9.22'] +['AXP', '24.85'] +... +[] +>>> +``` + +The other little complication is that the `Data/prices.csv` file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line. + +There’s a possibility that this could cause your program to die with an exception. +Use the `try` and `except` statements to catch this as appropriate. + +Once you have written your `read_prices()` function, test it interactively to make sure it works: + +```python +>>> prices = read_prices('Data/prices.csv') +>>> prices['IBM'] +106.28 +>>> prices['MSFT'] +20.89 +>>> +``` + +### (e) Finding out if you can retire + +Tie all of this work together by adding the statements to your `report.py` program. +It takes the list of stocks in part (b) and the dictionary of prices in part (c) and +computes the current value of the portfolio along with the gain/loss. + +[Next](03_Formatting) diff --git a/Notes/02_Working_with_data/03_Formatting.md b/Notes/02_Working_with_data/03_Formatting.md new file mode 100644 index 0000000..d8e3158 --- /dev/null +++ b/Notes/02_Working_with_data/03_Formatting.md @@ -0,0 +1,276 @@ +# 2.3 Formatting + +This is a slight digression, but when you work with data, you often want to +produce structured output (tables, etc.). For example: + +```code + Name Shares Price +---------- ---------- ----------- + AA 100 32.20 + IBM 50 91.10 + CAT 150 83.44 + MSFT 200 51.23 + GE 95 40.37 + MSFT 50 65.10 + IBM 100 70.44 +``` + +### String Formatting + +One way to format string in Python 3.6+ is with `f-strings`. + +```python +>>> name = 'IBM' +>>> shares = 100 +>>> price = 91.1 +>>> f'{name:>10s} {shares:>10d} {price:>10.2f}' +' IBM 100 91.10' +>>> +``` + +The part `{expression:format}` is replaced. + +It is commonly used with `print`. + +```python +print(f'{name:>10s} {shares:>10d} {price:>10.2f}') +``` + +### Format codes + +Format codes (after the `:` inside the `{}`) are similar to C `printf()`. Common codes +include: + +```code +d Decimal integer +b Binary integer +x Hexadecimal integer +f Float as [-]m.dddddd +e Float as [-]m.dddddde+-xx +g Float, but selective use of E notation s String +c Character (from integer) +``` + +Common modifiers adjust the field width and decimal precision. This is a partial list: + +```code +:>10d Integer right aligned in 10-character field +:<10d Integer left aligned in 10-character field +:^10d Integer centered in 10-character field :0.2f Float with 2 digit precision +``` + +### Dictionary Formatting + +You can use the `format_map()` method on strings. + +```python +>>> s = { + 'name': 'IBM', + 'shares': 100, + 'price': 91.1 +} +>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s) +' IBM 100 91.10' +>>> +``` + +It uses the same `f-strings` but takes the values from the supplied dictionary. + +### C-Style Formatting + +You can also use the formatting operator `%`. + +```python +>>> 'The value is %d' % 3 +'The value is 3' +>>> '%5d %-5d %10d' % (3,4,5) +' 3 4 5' +>>> '%0.2f' % (3.1415926,) +'3.14' +``` + +This requires a single item or a tuple on the right. Format codes are modeled after the C `printf()` as well. + +*Note: This is the only formatting available on byte strings.* + +```python +>>> b'%s has %n messages' % (b'Dave', 37) +b'Dave has 37 messages' +>>> +``` + +## Exercises + +In the previous exercise, you wrote a program called `report.py` that computed the gain/loss of a +stock portfolio. In this exercise, you're going to modify it to produce a table like this: + +```code + Name Shares Price Change + ---------- ---------- ---------- ---------- + AA 100 9.22 -22.98 + IBM 50 106.28 15.18 + CAT 150 35.46 -47.98 + MSFT 200 20.89 -30.34 + GE 95 13.48 -26.89 + MSFT 50 20.89 -44.21 + IBM 100 106.28 35.84 +``` + +In this report, "Price" is the current share price of the stock and "Change" is the change in the share price from the initial purchase price. + +### (a) How to format numbers + +A common problem with printing numbers is specifying the number of decimal places. One way to fix this is to use f-strings. Try +these examples: + +```python +>>> value = 42863.1 +>>> print(value) +42863.1 +>>> print(f'{value:0.4f}') +42863.1000 +>>> print(f'{value:>16.2f}') + 42863.10 +>>> print(f'{value:<16.2f}') +42863.10 +>>> print(f'{value:*>16,.2f}') +*******42,863.10 +>>> +``` + +Full documentation on the formatting codes used f-strings can be found +[here](https://docs.python.org/3/library/string.html#format-specification-mini-language). Formatting +is also sometimes performed using the `%` operator of strings. + +```pycon +>>> print('%0.4f' % value) +42863.1000 +>>> print('%16.2f' % value) + 42863.10 +>>> +``` + +Documentation on various codes used with `%` can be found [here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting). + +Although it’s commonly used with `print`, string formatting is not tied to printing. +If you want to save a formatted string. Just assign it to a variable. + +```pycon +>>> f = '%0.4f' % value +>>> f +'42863.1000' +>>> +``` + +### (b) Collecting Data + +In order to generate the above report, you’ll first want to collect +all of the data shown in the table. Write a function `make_report()` +that takes a list of stocks and dictionary of prices as input and +returns a list of tuples containing the rows of the above table. + +Add this function to your `report.py` file. Here’s how it should work if you try it interactively: + +```pycon +>>> portfolio = read_portfolio('Data/portfolio.csv') +>>> prices = read_prices('Data/prices.csv') +>>> report = make_report(portfolio, prices) +>>> for r in report: + print(r) + +('AA', 100, 9.22, -22.980000000000004) +('IBM', 50, 106.28, 15.180000000000007) +('CAT', 150, 35.46, -47.98) +('MSFT', 200, 20.89, -30.339999999999996) +('GE', 95, 13.48, -26.889999999999997) +... +>>> +``` + +### (c) Printing a formatted table + +Redo the above for-loop, but change the print statement to format the tuples. + +```pycon +>>> for r in report: + print('%10s %10d %10.2f %10.2f' % r) + + AA 100 9.22 -22.98 + IBM 50 106.28 15.18 + CAT 150 35.46 -47.98 + MSFT 200 20.89 -30.34 +... +>>> +``` + +You can also expand the values and use f-strings. For example: + +```pycon +>>> for name, shares, price, change in report: + print(f'{name:>10s} {shares:>10d} {price:>10.2f} {change:>10.2f}') + + AA 100 9.22 -22.98 + IBM 50 106.28 15.18 + CAT 150 35.46 -47.98 + MSFT 200 20.89 -30.34 +... +>>> +``` + +Take the above statements and add them to your `report.py` program. +Have your program take the output of the `make_report()` function and print a nicely formatted table as shown. + +### (d) Adding some headers + +Suppose you had a tuple of header names like this: + +```python +headers = ('Name', 'Shares', 'Price', 'Change') +``` + +Add code to your program that takes the above tuple of headers and +creates a string where each header name is right-aligned in a +10-character wide field and each field is separated by a single space. + +```python +' Name Shares Price Change' +``` + +Write code that takes the headers and creates the separator string between the headers and data to follow. +This string is just a bunch of "-" characters under each field name. For example: + +```python +'---------- ---------- ---------- -----------' +``` + +When you’re done, your program should produce the table shown at the top of this exercise. + +```code + Name Shares Price Change + ---------- ---------- ---------- ---------- + AA 100 9.22 -22.98 + IBM 50 106.28 15.18 + CAT 150 35.46 -47.98 + MSFT 200 20.89 -30.34 + GE 95 13.48 -26.89 + MSFT 50 20.89 -44.21 + IBM 100 106.28 35.84 +``` + +### (e) Formatting Challenge + +How would you modify your code so that the price includes the currency symbol ($) and the output looks like this: + +```code + Name Shares Price Change + ---------- ---------- ---------- ---------- + AA 100 $9.22 -22.98 + IBM 50 $106.28 15.18 + CAT 150 $35.46 -47.98 + MSFT 200 $20.89 -30.34 + GE 95 $13.48 -26.89 + MSFT 50 $20.89 -44.21 + IBM 100 $106.28 35.84 +``` + +[Next](04_Sequences) \ No newline at end of file diff --git a/Notes/02_Working_with_data/04_Sequences.md b/Notes/02_Working_with_data/04_Sequences.md new file mode 100644 index 0000000..e92ae0d --- /dev/null +++ b/Notes/02_Working_with_data/04_Sequences.md @@ -0,0 +1,538 @@ +# 2.4 Sequences + +In this part, we look at some common idioms for working with sequence data. + +### Introduction + +Python has three *sequences* datatypes. + +* String: `'Hello'`. A string is considered a sequence of characters. +* List: `[1, 4, 5]`. +* Tuple: `('GOOG', 100, 490.1)`. + +All sequences are ordered and have length. + +```python +a = 'Hello' # String +b = [1, 4, 5] # List +c = ('GOOG', 100, 490.1) # Tuple + +# Indexed order +a[0] # 'H' +b[-1] # 5 +c[1] # 100 + +# Length of sequence +len(a) # 5 +len(b) # 3 +len(c) # 3 +``` + +Sequences can be replicated: `s * n`. + +```pycon +>>> a = 'Hello' +>>> a * 3 +'HelloHelloHello' +>>> b = [1, 2, 3] +>>> b * 2 +[1, 2, 3, 1, 2, 3] +>>> +``` + +Sequences of the same type can be concatenated: `s + t`. + +```pycon +>>> a = (1, 2, 3) +>>> b = (4, 5) +>>> a + b +(1, 2, 3, 4, 5) +>>> +>>> c = [1, 5] +>>> a + c +Traceback (most recent call last): + File "", line 1, in +TypeError: can only concatenate tuple (not "list") to tuple +``` + +### Slicing + +Slicing means to take a subsequence from a sequence. +The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want. + +```python +a = [0,1,2,3,4,5,6,7,8] + +a[2:5] # [2,3,4] +a[-5:] # [4,5,6,7,8] +a[:3] # [0,1,2] +``` + +* Indices `start` and `end` must be integers. +* Slices do *not* include the end value. +* If indices are omitted, they default to the beginning or end of the list. + +### Slice re-assignment + +Slices can also be reassigned and deleted. + +```python +# Reassignment +a = [0,1,2,3,4,5,6,7,8] +a[2:4] = [10,11,12] # [0,1,10,11,12,4,5,6,7,8] +``` + +*Note: The reassigned slice doesn't need to have the same length.* + +```python +# Deletion +a = [0,1,2,3,4,5,6,7,8] +del a[2:4] # [0,1,4,5,6,7,8] +``` + +### Sequence Reductions + +There are some functions to reduce a sequence to a single value. + +```pycon +>>> s = [1, 2, 3, 4] +>>> sum(s) +10 +>>> min(s) 1 +>>> max(s) 4 +>>> t = ['Hello', 'World'] +>>> max(t) +'World' +>>> +``` + +### Iteration over a sequence + +The for-loop iterates over the elements in the sequence. + +```pycon +>>> s = [1, 4, 9, 16] +>>> for i in s: +... print(i) +... +1 +4 +9 +16 +>>> +``` + +On each iteration of the loop, you get a new item to work with. +This new value is placed into an iteration variable. In this example, the +iteration variable is `x`: + +```python +for x in s: # `x` is an iteration variable + ...statements +``` + +In each iteration, it overwrites the previous value (if any). +After the loop finishes, the variable retains the last value. + +### `break` statement + +You can use the `break` statement to break out of a loop before it finishes iterating all of the elements. + +```python +for name in namelist: + if name == 'Jake': + break + ... + ... +statements +``` + +When the `break` statement is executed, it will exit the loop and move +on the next `statements`. The `break` statement only applies to the +inner-most loop. If this loop is within another loop, it will not +break the outer loop. + +### `continue` statement + +To skip one element and move to the next one you use the `continue` statement. + +```python +for line in lines: + if line == '\n': # Skip blank lines + continue + # More statements + ... +``` + +This is useful when the current item is not of interest or needs to be ignored in the processing. + +### Looping over integers + +If you need to count, use `range()`. + +```python +for i in range(100): + # i = 0,1,...,99 +``` + +The syntax is `range([start,] end [,step])` + +```python +for i in range(100): + # i = 0,1,...,99 +for j in range(10,20): + # j = 10,11,..., 19 +for k in range(10,50,2): + # k = 10,12,...,48 + # Notice how it counts in steps of 2, not 1. +``` + +* The ending value is never included. It mirrors the behavior of slices. +* `start` is optional. Default `0`. +* `step` is optional. Default `1`. + +### `enumerate()` function + +The `enumerate` function provides a loop with an extra counter value. + +```python +names = ['Elwood', 'Jake', 'Curtis'] +for i, name in enumerate(names): + # Loops with i = 0, name = 'Elwood' + # i = 1, name = 'Jake' + # i = 2, name = 'Curtis' +``` + +How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional. +A good example of using `enumerate()` is tracking line numbers while reading a file: + +```python +with open(filename) as f: + for lineno, line in enumerate(f, start=1): + ... +``` + +In the end, `enumerate` is just a nice shortcut for: + +```python +i = 0 +for x in s: + statements + i += 1 +``` + +Using `enumerate` is less typing and runs slightly faster. + +### For and tuples + +You can loop with multiple iteration variables. + +```python +points = [ + (1, 4),(10, 40),(23, 14),(5, 6),(7, 8) +] +for x, y in points: + # Loops with x = 1, y = 4 + # x = 10, y = 40 + # x = 23, y = 14 + # ... +``` + +When using multiple variables, each tuple will be *unpacked* into a set of iteration variables. + +### `zip()` function + +The `zip` function takes sequences and makes an iterator that combines them. + +```python +columns = ['name', 'shares', 'price'] +values = ['GOOG', 100, 490.1 ] +pairs = zip(a, b) +# ('name','GOOG'), ('shares',100), ('price',490.1) +``` + +To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier. + +```python +for column, value in pairs: + ... +``` + +A common use of `zip` is to create key/value pairs for constructing dictionaries. + +```python +d = dict(zip(columns, values)) +``` + +## Exercises + +### (a) Counting + +Try some basic counting examples: + +```pycon +>>> for n in range(10): # Count 0 ... 9 + print(n, end=' ') + +0 1 2 3 4 5 6 7 8 9 +>>> for n in range(10,0,-1): # Count 10 ... 1 + print(n, end=' ') + +10 9 8 7 6 5 4 3 2 1 +>>> for n in range(0,10,2): # Count 0, 2, ... 8 + print(n, end=' ') + +0 2 4 6 8 +>>> +``` + +### (b) More sequence operations + +Interactively experiment with some of the sequence reduction operations. + +```pycon +>>> data = [4, 9, 1, 25, 16, 100, 49] +>>> min(data) +1 +>>> max(data) +100 +>>> sum(data) +204 +>>> +``` + +Try looping over the data. + +```pycon +>>> for x in data: + print(x) + +4 +9 +... +>>> for n, x in enumerate(data): + print(n, x) + +0 4 +1 9 +2 1 +... +>>> +``` + +Sometimes the `for` statement, `len()`, and `range()` get used by +novices in some kind of horrible code fragment that looks like it +emerged from the depths of a rusty C program. + +```pycon +>>> for n in range(len(data)): + print(data[n]) + +4 +9 +1 +... +>>> +``` + +Don’t do that! Not only does reading it make everyone’s eyes bleed, it’s inefficient with memory and it runs a lot slower. +Just use a normal `for` loop if you want to iterate over data. Use `enumerate()` if you happen to need the index for some reason. + +### (c) A practical `enumerate()` example + +Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data. +Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input. + +```python +>>> cost = portfolio_cost('Data/missing.csv') +Row 4: Couldn't convert: ['MSFT', '', '51.23'] +Row 7: Couldn't convert: ['IBM', '', '70.44'] +>>> +``` + +To do this, you’ll need to change just a few parts of your code. + +```python +... +for rowno, row in enumerate(rows, start=1): + try: + ... + except ValueError: + print(f'Row {rowno}: Bad row: {row}') +``` + +### (d) Using the `zip()` function + +In the file `portfolio.csv`, the first line contains column headers. In all previous code, we’ve been discarding them. + +```pycon +>>> f = open('Data/portfolio.csv') +>>> rows = csv.reader(f) +>>> headers = next(rows) +>>> headers +['name', 'shares', 'price'] +>>> +``` + +However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture. +First try this to pair the file headers with a row of data: + +```pycon +>>> row = next(rows) +>>> row +['AA', '100', '32.20'] +>>> list(zip(headers, row)) +[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ] +>>> +``` + +Notice how `zip()` paired the column headers with the column values. +We’ve used `list()` here to turn the result into a list so that you +can see it. Normally, `zip()` creates an iterator that must be +consumed by a for-loop. + +This pairing is just an intermediate step to building a dictionary. Now try this: + +```pycon +>>> record = dict(zip(headers, row)) +>>> record +{'price': '32.20', 'name': 'AA', 'shares': '100'} +>>> +``` + +This transformation is one of the most useful tricks to know about +when processing a lot of data files. For example, suppose you wanted +to make the `pcost.py` program work with various input files, but +without regard for the actual column number where the name, shares, +and price appear. + +Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this: + +```python +# pcost.py + +def portfolio_cost(filename): + ... + for rowno, row in enumerate(rows, start=1): + record = dict(zip(headers, row)) + try: + nshares = int(record['shares']) + price = float(record['price']) + total_cost += nshares * price + # This catches errors in int() and float() conversions above + except ValueError: + print(f'Row {rowno}: Bad row: {row}') + ... +``` + +Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this: + +```csv +name,date,time,shares,price +"AA","6/11/2007","9:50am",100,32.20 +"IBM","5/13/2007","4:20pm",50,91.10 +"CAT","9/23/2006","1:30pm",150,83.44 +"MSFT","5/17/2007","10:30am",200,51.23 +"GE","2/1/2006","10:45am",95,40.37 +"MSFT","10/31/2006","12:05pm",50,65.10 +"IBM","7/9/2006","3:15pm",100,70.44 +``` + +```python +>>> portfolio_cost('Data/portfoliodate.csv') +44671.15 +>>> +``` + +If you did it right, you’ll find that your program still works even +though the data file has a completely different column format than +before. That’s cool! + +The change made here is subtle, but significant. Instead of +`portfolio_cost()` being hardcoded to read a single fixed file format, +the new version reads any CSV file and picks the values of interest +out of it. As long as the file has the required columns, the code will work. + +Modify the `report.py` program you wrote in Section 2.3 that it uses +the same technique to pick out column headers. + +Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it +produces the same answer as before. + +### (e) Inverting a dictionary + +A dictionary maps keys to values. For example, a dictionary of stock prices. + +```pycon +>>> prices = { + 'GOOG' : 490.1, + 'AA' : 23.45, + 'IBM' : 91.1, + 'MSFT' : 34.23 + } +>>> +``` + +If you use the `items()` method, you can get `(key,value)` pairs: + +```pycon +>>> prices.items() +dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)]) +>>> +``` + +However, what if you wanted to get a list of `(value, key)` pairs instead? +*Hint: use `zip()`.* + +```pycon +>>> pricelist = list(zip(prices.values(),prices.keys())) +>>> pricelist +[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')] +>>> +``` + +Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data. + +```pycon +>>> min(pricelist) +(23.45, 'AA') +>>> max(pricelist) +(490.1, 'GOOG') +>>> sorted(pricelist) +[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')] +>>> +``` + +This also illustrates an important feature of tuples. When used in +comparisons, tuples are compared element-by-element starting with the +first item. Similar to how strings are compared +character-by-character. + +`zip()` is often used in situations like this where you need to pair +up data from different places. For example, pairing up the column +names with column values in order to make a dictionary of named +values. + +Note that `zip()` is not limited to pairs. For example, you can use it +with any number of input lists: + +```pycon +>>> a = [1, 2, 3, 4] +>>> b = ['w', 'x', 'y', 'z'] +>>> c = [0.2, 0.4, 0.6, 0.8] +>>> list(zip(a, b, c)) +[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))] +>>> +``` + +Also, be aware that `zip()` stops once the shortest input sequence is exhausted. + +```pycon +>>> a = [1, 2, 3, 4, 5, 6] +>>> b = ['x', 'y', 'z'] +>>> list(zip(a,b)) +[(1, 'x'), (2, 'y'), (3, 'z')] +>>> +``` + +[Next](05_Collections) \ No newline at end of file diff --git a/Notes/02_Working_with_data/05_Collections.md b/Notes/02_Working_with_data/05_Collections.md new file mode 100644 index 0000000..dc01da9 --- /dev/null +++ b/Notes/02_Working_with_data/05_Collections.md @@ -0,0 +1,160 @@ +# 2.5 `collections` module + +The `collections` module provides a number of useful objects for data handling. +This part briefly introduces some of these features. + +### Example: Counting Things + +Let's say you want to tabulate the total shares of each stock. + +```python +portfolio = [ + ('GOOG', 100, 490.1), + ('IBM', 50, 91.1), + ('CAT', 150, 83.44), + ('IBM', 100, 45.23), + ('GOOG', 75, 572.45), + ('AA', 50, 23.15) +] +``` + +There are two `IBM` entries and two `GOOG` entries in this list. The shares need to be combined together somehow. + +Solution: Use a `Counter`. + +```python +from collections import Counter +total_shares = Counter() +for name, shares, price in portfolio: + total_shares[name] += shares + +total_shares['IBM'] # 150 +``` + +### Example: One-Many Mappings + +Problem: You want to map a key to multiple values. + +```python +portfolio = [ + ('GOOG', 100, 490.1), + ('IBM', 50, 91.1), + ('CAT', 150, 83.44), + ('IBM', 100, 45.23), + ('GOOG', 75, 572.45), + ('AA', 50, 23.15) +] +``` + +Like in the previous example, the key `IBM` should have two different tuples instead. + +Solution: Use a `defaultdict`. + +```python +from collections import defaultdict +holdings = defaultdict(list) +for name, shares, price in portfolio: + holdings[name].append((shares, price)) +holdings['IBM'] # [ (50, 91.1), (100, 45.23) ] +``` + +The `defaultdict` ensures that every time you access a key you get a default value. + +### Example: Keeping a History + +Problem: We want a history of the last N things. +Solution: Use a `deque`. + +```python +from collections import deque + +history = deque(maxlen=N) +with open(filename) as f: + for line in f: + history.append(line) + ... +``` + +## Exercises + +The `collections` module might be one of the most useful library +modules for dealing with special purpose kinds of data handling +problems such as tabulating and indexing. + +In this exercise, we’ll look at a few simple examples. Start by +running your `report.py` program so that you have the portfolio of +stocks loaded in the interactive mode. + +```bash +bash % python3 -i report.py +``` + +### (a) Tabulating with Counters + +Suppose you wanted to tabulate the total number of shares of each stock. +This is easy using `Counter` objects. Try it: + +```pycon +>>> portfolio = read_portfolio('Data/portfolio.csv') +>>> from collections import Counter +>>> holdings = Counter() +>>> for s in portfolio: + holdings[s['name']] += s['shares'] + +>>> holdings +Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95}) +>>> +``` + +Carefully observe how the multiple entries for `MSFT` and `IBM` in `portfolio` get combined into a single entry here. + +You can use a Counter just like a dictionary to retrieve individual values: + +```python +>>> holdings['IBM'] +150 +>>> holdings['MSFT'] +250 +>>> +``` + +If you want to rank the values, do this: + +```python +>>> # Get three most held stocks +>>> holdings.most_common(3) +[('MSFT', 250), ('IBM', 150), ('CAT', 150)] +>>> +``` + +Let’s grab another portfolio of stocks and make a new Counter: + +```pycon +>>> portfolio2 = read_portfolio('Data/portfolio2.csv') +>>> holdings2 = Counter() +>>> for s in portfolio2: + holdings2[s['name']] += s['shares'] + +>>> holdings2 +Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25}) +>>> +``` + +Finally, let’s combine all of the holdings doing one simple operation: + +```pycon +>>> holdings +Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95}) +>>> holdings2 +Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25}) +>>> combined = holdings + holdings2 +>>> combined +Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150}) +>>> +``` + +This is only a small taste of what counters provide. However, if you +ever find yourself needing to tabulate values, you should consider +using one. + +[Next](06_List_comprehension) \ No newline at end of file diff --git a/Notes/02_Working_with_data/06_List_comprehension.md b/Notes/02_Working_with_data/06_List_comprehension.md new file mode 100644 index 0000000..8143726 --- /dev/null +++ b/Notes/02_Working_with_data/06_List_comprehension.md @@ -0,0 +1,316 @@ +# 2.6 List Comprehensions + +A common task is processing items in a list. This section introduces list comprehensions, +a useful tool for doing just that. + +### Creating new lists + +A list comprehension creates a new list by applying an operation to each element of a sequence. + +```pycon +>>> a = [1, 2, 3, 4, 5] +>>> b = [2*x for x in a ] +>>> b +[2, 4, 6, 8, 10] +>>> +``` + +Another example: + +```pycon +>>> names = ['Elwood', 'Jake'] +>>> a = [name.lower() for name in names] +>>> a +['elwood', 'jake'] +>>> +``` + +The general syntax is: `[ for in ]`. + +### Filtering + +You can also filter during the list comprehension. + +```pycon +>>> a = [1, -5, 4, 2, -2, 10] +>>> b = [2*x for x in a if x > 0 ] +>>> b +[2, 8, 4, 20] +>>> +``` + +### Use cases + +List comprehensions are hugely useful. For example, you can collect values of a specific +record field: + +```python +stocknames = [s['name'] for s in stocks] +``` + +You can perform database-like queries on sequences. + +```python +a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ] +``` + +You can also combine a list comprehension with a sequence reduction: + +```python +cost = sum([s['shares']*s['price'] for s in stocks]) +``` + +### General Syntax + +```code +[ for in if ] +``` + +What it means: + +```python +result = [] +for variable_name in sequence: + if condition: + result.append(expression) +``` + +### Historical Digression + +List comprehension come from math (set-builder notation). + +```code +a = [ x * x for x in s if x > 0 ] # Python + +a = { x^2 | x ∈ s, x > 0 } # Math +``` + +It is also implemented in several other languages. Most +coders probably aren't thinking about their math class though. So, +it's fine to view it as a cool list shortcut. + +## Exercises + +Start by running your `report.py` program so that you have the portfolio of stocks loaded in the interactive mode. + +```bash +bash % python3 -i report.py +``` + +Now, at the Python interactive prompt, type statements to perform the operations described below. +These operations perform various kinds of data reductions, transforms, and queries on the portfolio data. + +### (a) List comprehensions + +Try a few simple list comprehensions just to become familiar with the syntax. + +```pycon +>>> nums = [1,2,3,4] +>>> squares = [ x * x for x in nums ] +>>> squares +[1, 4, 9, 16] +>>> twice = [ 2 * x for x in nums if x > 2 ] +>>> twice +[6, 8] +>>> +``` + +Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered. + +### (b) Sequence Reductions + +Compute the total cost of the portfolio using a single Python statement. + +```pycon +>>> cost = sum([ s['shares'] * s['price'] for s in portfolio ]) +>>> cost +44671.15 +>>> +``` + +After you have done that, show how you can compute the current value of the portfolio using a single statement. + +```pycon +>>> value = sum([ s['shares'] * prices[s['name']] for s in portfolio ]) +>>> value +28686.1 +>>> +``` + +Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list. + +```pycon +>>> [ s['shares'] * s['price'] for s in portfolio ] +[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0] +>>> +``` + +The `sum()` function is then performing a reduction across the result: + +```python +>>> sum(_) +44671.15 +>>> +``` + +With this knowledge, you are now ready to go launch a big-data startup company. + +### (c) Data Queries + +Try the following examples of various data queries. + +First, a list of all portfolio holdings with more than 100 shares. + +```pycon +>>> more100 = [ s for s in portfolio if s['shares'] > 100 ] +>>> more100 +[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}] +>>> +``` + +All portfolio holdings for MSFT and IBM stocks. + +```pycon +>>> msftibm = [ s for s in portfolio if s['name'] in {'MSFT','IBM'} ] +>>> msftibm +[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, + {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}] +>>> +``` + +A list of all portfolio holdings that cost more than $10000. + +```pycon +>>> cost10k = [ s for s in portfolio if s['shares'] * s['price'] > 10000 ] +>>> cost10k +[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}] +>>> +``` + +### (d) Data Extraction + +Show how you could build a list of tuples `(name, shares)` where `name` and `shares` are taken from `portfolio`. + +```pycon +>>> name_shares =[ (s['name'], s['shares']) for s in portfolio ] +>>> name_shares +[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)] +>>> +``` + +If you change the the square brackets (`[`,`]`) to curly braces (`{`, `}`), you get something known as a set comprehension. +This gives you unique or distinct values. + +For example, this determines the set of stock names that appear in `portfolio`: + +```pycon +>>> names = { s['name'] for s in portfolio } +>>> names +{ 'AA', 'GE', 'IBM', 'MSFT', 'CAT'] } +>>> +``` + +If you specify `key:value` pairs, you can build a dictionary. +For example, make a dictionary that maps the name of a stock to the total number of shares held. + +```pycon +>>> holdings = { name: 0 for name in names } +>>> holdings +{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0} +>>> +``` + +This latter feature is known as a **dictionary comprehension**. Let’s tabulate: + +```pycon +>>> for s in portfolio: + holdings[s['name']] += s['shares'] + +>>> holdings +{ 'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT':250, 'CAT': 150 } +>>> +``` + +Try this example that filters the `prices` dictionary down to only those names that appear in the portfolio: + +```pycon +>>> portfolio_prices = { name: prices[name] for name in names } +>>> portfolio_prices +{'AA': 9.22, 'GE': 13.48, 'IBM': 106.28, 'MSFT': 20.89, 'CAT': 35.46} +>>> +``` + +### (e) Advanced Bonus: Extracting Data From CSV Files + +Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing. +Here’s an example that shows how to extract selected columns from a CSV file. + +First, read a row of header information from a CSV file: + +```pycon +>>> import csv +>>> f = open('Data/portfoliodate.csv') +>>> rows = csv.reader(f) +>>> headers = next(rows) +>>> headers +['name', 'date', 'time', 'shares', 'price'] +>>> +``` + +Next, define a variable that lists the columns that you actually care about: + +```pycon +>>> select = ['name', 'shares', 'price'] +>>> +``` + +Now, locate the indices of the above columns in the source CSV file: + +```pycon +>>> indices = [ headers.index(colname) for colname in select ] +>>> indices +[0, 3, 4] +>>> +``` + +Finally, read a row of data and turn it into a dictionary using a dictionary comprehension: + +```pycon +>>> row = next(rows) +>>> record = { colname: row[index] for colname, index in zip(select, indices) } # dict-comprehension +>>> record +{'price': '32.20', 'name': 'AA', 'shares': '100'} +>>> +``` + +If you’re feeling comfortable with what just happened, read the rest +of the file: + +```pycon +>>> portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ] +>>> portfolio +[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, + {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, + {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] +>>> +``` + +Oh my, you just reduced much of the `read_portfolio()` function to a single statement. + +### Commentary + +List comprehensions are commonly used in Python as an efficient means +for transforming, filtering, or collecting data. Due to the syntax, +you don’t want to go overboard—try to keep each list comprehension as +simple as possible. It’s okay to break things into multiple +steps. For example, it’s not clear that you would want to spring that +last example on your unsuspecting co-workers. + +That said, knowing how to quickly manipulate data is a skill that’s +incredibly useful. There are numerous situations where you might have +to solve some kind of one-off problem involving data imports, exports, +extraction, and so forth. Becoming a guru master of list +comprehensions can substantially reduce the time spent devising a +solution. Also, don't forget about the `collections` module. + +[Next](07_Objects) diff --git a/Notes/02_Working_with_data/07_Objects.md b/Notes/02_Working_with_data/07_Objects.md new file mode 100644 index 0000000..28df397 --- /dev/null +++ b/Notes/02_Working_with_data/07_Objects.md @@ -0,0 +1,408 @@ +# 2.7 Objects + +This section introduces more details about Python's internal object model and +discusses some matters related to memory management, copying, and type checking. + +### Assignment + +Many operations in Python are related to *assigning* or *storing* values. + +```python +a = value # Assignment to a variable +s[n] = value # Assignment to an list +s.append(value) # Appending to a list +d['key'] = value # Adding to a dictionary +``` + +*A caution: assignment operations **never make a copy** of the value being assigned.* +All assignments are merely reference copies (or pointer copies if you prefer). + +### Assignment example + +Consider this code fragment. + +```python +a = [1,2,3] +b = a +c = [a,b] +``` + +A picture of the underlying memory operations. In this example, there +is only one list object `[1,2,3]`, but there are four different +references to it. + +This means that modifying a value affects *all* references. + +```pycon +>>> a.append(999) +>>> a +[1,2,3,999] +>>> b +[1,2,3,999] +>>> c +[[1,2,3,999], [1,2,3,999]] +>>> +``` + +Notice how a change in the original list shows up everywhere else (yikes!). +This is because no copies were ever made. Everything is pointing to the same thing. + +### Reassigning values + +Reassigning a value *never* overwrites the memory used by the previous value. + +```pycon +a = [1,2,3] +b = a +a = [4,5,6] + +print(a) # [4, 5, 6] +print(b) # [1, 2, 3] Holds the original value +``` + +Remember: **Variables are names, not memory locations.** + +### Some Dangers + +If you don't know about this sharing, you will shoot yourself in the +foot at some point. Typical scenario. You modify some data thinking +that it's your own private copy and it accidentally corrupts some data +in some other part of the program. + +*Comment: This is one of the reasons why the primitive datatypes (int, float, string) are immutable (read-only).* + +### Identity and References + +Use ths `is` operator to check if two values are exactly the same object. + +```pycon +>>> a = [1,2,3] +>>> b = a +>>> a is b +True +>>> +``` + +`is` compares the object identity (an integer). The identity can be +obtained using `id()`. + +```pycon +>>> id(a) +3588944 +>>> id(b) +3588944 +>>> +``` + +### Shallow copies + +Lists and dicts have methods for copying. + +```pycon +>>> a = [2,3,[100,101],4] +>>> b = list(a) # Make a copy +>>> a is b +False +``` + +It's a new list, but the list items are shared. + +```python +>>> a[2].append(102) +>>> b[2] +[100,101,102] +>>> +>>> a[2] is b[2] +True +>>> +``` + +For example, the inner list `[100, 101]` is being shared. +This is knows as a shallow copy. + +### Deep copies + +Sometimes you need to make a copy of an object and all the objects contained withn it. +You can use the `copy` module for this: + +```pycon +>>> a = [2,3,[100,101],4] +>>> import copy +>>> b = copy.deepcopy(a) +>>> a[2].append(102) +>>> b[2] +[100,101] +>>> a[2] is b[2] +False +>>> +``` + +### Names, Values, Types + +Variable names do not have a *type*. It's only a name. +However, values *do* have an underlying type. + +```pycon +>>> a = 42 +>>> b = 'Hello World' +>>> type(a) + +>>> type(b) + +``` + +`type()` will tell you what it is. The type name is usually a function +that creates or converts a value to that type. + +### Type Checking + +How to tell if an object is a specific type. + +```python +if isinstance(a,list): + print('a is a list') +``` + +Checking for one of many types. + +```python +if isinstance(a, (list,tuple)): + print('a is a list or tuple') +``` + +*Caution: Don't go overboard with type checking. It can lead to excessive complexity.* + +### Everything is an object + +Numbers, strings, lists, functions, exceptions, classes, instances, +etc. are all objects. It means that all objects that can be named can +be passed around as data, placed in containers, etc., without any +restrictions. There are no *special* kinds of objects. Sometimes it +is said that all objects are "first-class". + +A simple example: + +```pycon +>>> import math +>>> items = [abs, math, ValueError ] +>>> items +[, + , + ] +>>> items[0](-45) +45 +>>> items[1].sqrt(2) +1.4142135623730951 +>>> try: + x = int('not a number') + except items[2]: + print('Failed!') +Failed! +>>> +``` + +Here, `items` is a list containing a function, a module and an exception. +You can use the items in the list in place of the original names: + +```python +items[0](-45) # abs +items[1].sqrt(2) # math +except items[2]: # ValueError +``` + +## Exercises + +In this set of exercises, we look at some of the power that comes from first-class +objects. + +### (a) First-class Data + +In the file `Data/portfolio.csv`, we read data organized as columns that look like this: + +```csv +name,shares,price +"AA",100,32.20 +"IBM",50,91.10 +... +``` + +In previous code, we used the `csv` module to read the file, but still had to perform manual type conversions. For example: + +```python +for row in rows: + name = row[0] + shares = int(row[1]) + price = float(row[2]) +``` + +This kind of conversion can also be performed in a more clever manner using some list basic operations. + +Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type: + +```pycon +>>> types = [str, int, float] +>>> +``` + +The reason you can even create this list is that everything in Python +is *first-class*. So, if you want to have a list of functions, that’s +fine. The items in the list you created are functions for converting +a value `x` into a given type (e.g., `str(x)`, `int(x)`, `float(x)`). + +Now, read a row of data from the above file: + +```pycon +>>> import csv +>>> f = open('Data/portfolio.csv') +>>> rows = csv.reader(f) +>>> headers = next(rows) +>>> row = next(rows) +>>> row +['AA', '100', '32.20'] +>>> +``` + +As noted, this row isn’t enough to do calculations because the types are wrong. For example: + +```pycon +>>> row[1] * row[2] +Traceback (most recent call last): + File "", line 1, in +TypeError: can't multiply sequence by non-int of type 'str' +>>> +``` + +However, maybe the data can be paired up with the types you specified in `types`. For example: + +```pycon +>>> types[1] + +>>> row[1] +'100' +>>> +``` + +Try converting one of the values: + +```pycon +>>> types[1](row[1]) # Same as int(row[1]) +100 +>>> +``` + +Try converting a different value: + +```pycon +>>> types[2](row[2]) # Same as float(row[2]) +32.2 +>>> +``` + +Try the calculation with converted values: + +```pycon +>>> types[1](row[1])*types[2](row[2]) +3220.0000000000005 +>>> +``` + +Zip the column types with the fields and look at the result: + +```pycon +>>> r = list(zip(types, row)) +>>> r +[(, 'AA'), (, '100'), (,'32.20')] +>>> +``` + +You will notice that this has paired a type conversion with a +value. For example, `int` is paired with the value `'100'`. + +The zipped list is useful if you want to perform conversions on all of the values, one +after the other. Try this: + +```pycon +>>> converted = [] +>>> for func, val in zip(types, row): + converted.append(func(val)) +... +>>> converted +['AA', 100, 32.2] +>>> converted[1] * converted[2] +3220.0000000000005 +>>> +``` + +Make sure you understand what’s happening in the above code. +In the loop, the `func` variable is one of the type conversion functions (e.g., +`str`, `int`, etc.) and the `val` variable is one of the values like +`'AA'`, `'100'`. The expression `func(val)` is converting a value (kind of like a type cast). + +The above code can be compressed into a single list comprehension. + +```pycon +>>> converted = [func(val) for func, val in zip(types, row)] +>>> converted +['AA', 100, 32.2] +>>> +``` + +### (b) Making dictionaries + +Remember how the `dict()` function can easily make a dictionary if you have a sequence of key names and values? +Let’s make a dictionary from the column headers: + +```pycon +>>> headers +['name', 'shares', 'price'] +>>> converted +['AA', 100, 32.2] +>>> dict(zip(headers, converted)) +{'price': 32.2, 'name': 'AA', 'shares': 100} +>>> +``` + +Of course, if you’re up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension: + +```pycon +>>> { name: func(val) for name, func, val in zip(headers, types, row) } +{'price': 32.2, 'name': 'AA', 'shares': 100} +>>> +``` + +### (c) The Big Picture + +Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary. + +Just to illustrate, suppose you read data from a different datafile like this: + +```pycon +>>> f = open('Data/dowstocks.csv') +>>> rows = csv.reader(f) +>>> headers = next(rows) +>>> row = next(rows) +>>> headers +['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume'] +>>> row +['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800'] +>>> +``` + +Let’s convert the fields using a similar trick: + +```pycon +>>> types = [str, float, str, str, float, float, float, float, int] +>>> converted = [func(val) for func, val in zip(types, row)] +>>> record = dict(zip(headers, converted)) +>>> record +{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69, +'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67, +'change': -0.18} +>>> record['name'] +'AA' +>>> record['price'] +39.48 +>>> +``` + +Spend some time to ponder what you’ve done in this exercise. We’ll revisit these ideas a little later. diff --git a/Notes/03_Program_organization/00_Overview.md b/Notes/03_Program_organization/00_Overview.md new file mode 100644 index 0000000..446f1c9 --- /dev/null +++ b/Notes/03_Program_organization/00_Overview.md @@ -0,0 +1,11 @@ +# Overview + +In this section you will learn: + +* How to organize larger programs. +* Defining and working with functions. +* Exceptions and Error handling. +* Basic module management. +* Script writing. + +Python is great for short scripts, one-off problems, prototyping, testing, etc. diff --git a/Notes/03_Program_organization/01_Script.md b/Notes/03_Program_organization/01_Script.md new file mode 100644 index 0000000..b46c917 --- /dev/null +++ b/Notes/03_Program_organization/01_Script.md @@ -0,0 +1,275 @@ +# 3.1 Python Scripting + +In this part we look more closely at the practice of writing Python +scripts. + +### What is a Script? + +A *script* is a program that runs a series of statements and stops. + +```python +# program.py + +statement1 +statement2 +statement3 +... +``` + +We have been writing scripts to this point. + +### A Problem + +If you write a useful script, it will grow in features and +functionality. You may want to apply it to other related problems. +Over time, it might become a critical application. And if you don't +take care, it might turn into a huge tangled mess. So, let's get +organized. + +### Defining Things + +You must always define things before they get used later on in a program. + +```python +def square(x): + return x*x + +a = 42 +b = a + 2 # Requires that `a` is defined + +z = square(b) # Requires `square` and `b` to be defined +``` + +**The order is important.** +You almost always put the definitions of variables an functions near the beginning. + +### Defining Functions + +It is a good idea to put all of the code related to a single *task* all in one place. + +```python +def read_prices(filename): + prices = {} + with open(filename) as f: + f_csv = csv.reader(f) + for row in f_csv: + prices[row[0]] = float(row[1]) + return prices +``` + +A function also simplifies repeated operations. + +```python +oldprices = read_prices('oldprices.csv') +newprices = read_prices('newprices.csv') +``` + +### What is a Function? + +A function is a named sequence of statements. + +```python +def funcname(args): + statement + statement + ... + return result +``` + +*Any* Python statement can be used inside. + +```python +def foo(): + import math + print(math.sqrt(2)) + help(math) +``` + +There are no *special* statements in Python. + +### Function Definition + +Functions can be *defined* in any order. + +```python +def foo(x): + bar(x) + +def bar(x): + statements + +# OR +def bar(x) + statements + +def foo(x): + bar(x) +``` + +Functions must only be defined before they are actually *used* (or called) during program execution. + +```python +foo(3) # foo must be defined already +``` + +Stylistically, it is probably more common to see functions defined in a *bottom-up* fashion. + +### Bottom-up Style + +Functions are treated as building blocks. +The smaller/simpler blocks go first. + +```python +# myprogram.py +def foo(x): + ... + +def bar(x): + ... + foo(x) # Defined above + ... + +def spam(x): + ... + bar(x) # Defined above + ... + +spam(42) # Code that uses the functions appears at the end +``` + +Later functions build upon earlier functions. + +### Function Design + +Ideally, functions should be a *black box*. +They should only operate on passed inputs and avoid global variables +and mysterious side-effects. Main goals: *Modularity* and *Predictability*. + +### Doc Strings + +A good practice is to include documentations in the form of +doc-strings. Doc-strings are strings written immediately after the +name of the function. They feed `help()`, IDEs and other tools. + +```python +def read_prices(filename): + ''' + Read prices from a CSV file of name,price + ''' + prices = {} + with open(filename) as f: + f_csv = csv.reader(f) + for row in f_csv: + prices[row[0]] = float(row[1]) + return prices +``` + +### Type Annotations + +You can also add some optional type annotations to your function definitions. + +```python +def read_prices(filename: str) -> dict: + ''' + Read prices from a CSV file of name,price + ''' + prices = {} + with open(filename) as f: + f_csv = csv.reader(f) + for row in f_csv: + prices[row[0]] = float(row[1]) + return prices +``` + +These do nothing. It is purely informational. +They may be used by IDEs, code checkers, etc. + +## Exercises + +In section 2, you wrote a program called `report.py` that printed out a report showing the performance of a stock portfolio. +This program consisted of some functions. For example: + +```python +# report.py +import csv + +def read_portfolio(filename): + ''' + Read a stock portfolio file into a list of dictionaries with keys + name, shares, and price. + ''' + portfolio = [] + with open(filename) as f: + rows = csv.reader(f) + headers = next(rows) + + for row in rows: + record = dict(zip(headers, row)) + stock = { + 'name' : record['name'], + 'shares' : int(record['shares']), + 'price' : float(record['price']) + } + portfolio.append(stock) + return portfolio +... +``` + +However, there were also portions of the program that just performed a series of scripted calculations. +This code appeared near the end of the program. For example: + +```python +... + +# Output the report + +headers = ('Name', 'Shares', 'Price', 'Change') +print('%10s %10s %10s %10s' % headers) +print(('-' * 10 + ' ') * len(headers)) +for row in report: + print('%10s %10d %10.2f %10.2f' % row) +... +``` + +In this exercise, we’re going take this program and organize it a little more strongly around the use of functions. + +### (a) Structuring a program as a collection of functions + +Modify your `report.py` program so that all major operations, +including calculations and output, are carried out by a collection of +functions. Specifically: + +* Create a function `print_report(report)` that prints out the report. +* Change the last part of the program so that it is nothing more than a series of function calls and no other computation. + +### (b) Creating a function for program execution + +Take the last part of your program and package it into a single function `portfolio_report(portfolio_filename, prices_filename)`. +Have the function work so that the following function call creates the report as before: + +```python +portfolio_report('Data/portfolio.csv', 'Data/prices.csv') +``` + +In this final version, your program will be nothing more than a series +of function definitions followed by a single function call to +`portfolio_report()` at the very end (which executes all of the steps +involved in the program). + +By turning your program into a single function, it becomes easy to run it on different inputs. +For example, try these statements interactively after running your program: + +```python +>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv') +... look at the output ... +>>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv'] +>>> for name in files: + print(f'{name:-^43s}') + portfolio_report(name, 'prices.csv') + print() + +... look at the output ... +>>> +``` + +[Next](02_More_functions) diff --git a/Notes/03_Program_organization/02_More_functions.md b/Notes/03_Program_organization/02_More_functions.md new file mode 100644 index 0000000..7109b4e --- /dev/null +++ b/Notes/03_Program_organization/02_More_functions.md @@ -0,0 +1,491 @@ +# 3.2 More on Functions + +This section fills in a few more details about how functions work and are defined. + +### Calling a Function + +Consider this function: + +```python +def read_prices(filename, debug): + ... +``` + +You can call the function with positional arguments: + +``` +prices = read_prices('prices.csv', True) +``` + +Or you can call the function with keyword arguments: + +```python +prices = read_prices(filename='prices.csv', debug=True) +``` + +### Default Arguments + +Sometimes you want an optional argument. + +```python +def read_prices(filename, debug=False): + ... +``` + +If a default value is assigned, the argument is optional in function calls. + +```python +d = read_prices('prices.csv') +e = read_prices('prices.dat', True) +``` + +*Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).* + +### Prefer keyword arguments for optional arguments + +Compare and contrast these two different calling styles: + +```python +parse_data(data, False, True) # ????? + +parse_data(data, ignore_errors=True) +parse_data(data, debug=True) +parse_data(data, debug=True, ignore_errors=True) +``` + +Keyword arguments improve code clarity. + +### Design Best Practices + +Always give short, but meaningful names to functions arguments. + +Someone using a function may want to use the keyword calling style. + +```python +d = read_prices('prices.csv', debug=True) +``` + +Python development tools will show the names in help features and documentation. + +### Return Values + +The `return` statement returns a value + +```python +def square(x): + return x * x +``` + +If no return value or `return` not specified, `None` is returned. + +```python +def bar(x): + statements + return + +a = bar(4) # a = None + +# OR +def foo(x): + statements # No `return` + +b = foo(4) # b = None +``` + +### Multiple Return Values + +Functions can only return one value. +However, a function may return multiple values by returning a tuple. + +```python +def divide(a,b): + q = a // b # Quotient + r = a % b # Remainder + return q, r # Return a tuple +``` + +Usage example: + +```python +x, y = divide(37,5) # x = 7, y = 2 + +x = divide(37, 5) # x = (7, 2) +``` + +### Variable Scope + +Programs assign values to variables. + +```python +x = value # Global variable + +def foo(): + y = value # Local variable +``` + +Variables assignments occur outside and inside function definitions. +Variables defined outside are "global". Variables inside a function are "local". + +### Local Variables + +Variables inside functions are private. + +```python +def read_portfolio(filename): + portfolio = [] + for line in open(filename): + fields = line.split() + s = (fields[0],int(fields[1]),float(fields[2])) + portfolio.append(s) + return portfolio +``` + +In this example, `filename`, `portfolio`, `line`, `fields` and `s` are local variables. +Those variables are not retained or accessible after the function call. + +```pycon +>>> stocks = read_portfolio('stocks.dat') +>>> fields +Traceback (most recent call last): +File "", line 1, in ? +NameError: name 'fields' is not defined +>>> +``` + +They also can't conflict with variables found elsewhere. + +### Global Variables + +Functions can freely access the values of globals. + +```python +name = 'Dave' + +def greeting(): + print('Hello', name) # Using `name` global variable +``` + +However, functions can't modify globals: + +```python +name = 'Dave' + +def spam(): + name = 'Guido' + +spam() +print(name) # prints 'Dave' +``` + +**Remember: All assignments in functions are local.** + +### Modifying Globals + +If you must modify a global variable you must declare it as such. + +```python +name = 'Dave' +def spam(): + global name + name = 'Guido' # Changes the global name above +``` + +The global declaration must appear before its use. Having seen this, +know that it is considered poor form. In fact, try to avoid entirely +if you can. If you need a function to modify some kind of state outside +of the function, it's better to use a class instead (more on this later). + +### Argument Passing + +When you call a function, the argument variables are names for passed values. +If mutable data types are passed (e.g. lists, dicts), they can be modified *in-place*. + +```python +def foo(items): + items.append(42) # Modifies the input object + +a = [1, 2, 3] +foo(a) +print(a) # [1, 2, 3, 42] +``` + +**Key point: Functions don't receive a copy of the input arguments.** + +### Reassignment vs Modifying + +Make sure you understand the subtle difference between modifying a value and reassigning a variable name. + +```python +def foo(items): + items.append(42) # Modifies the input object + +a = [1, 2, 3] +foo(a) +print(a) # [1, 2, 3, 42] + +# VS +def bar(items): + items = [4,5,6] # Reassigns `items` variable + +b = [1, 2, 3] +bar(b) +print(b) # [1, 2, 3] +``` + +*Reminder: Variable assignment never overwrites memory. The name is simply bound to a new value.* + +## Exercises + +This exercise involves a lot of steps and putting concepts together from past exercises. +The final solution is only about 25 lines of code, but take your time and make sure you understand each part. + +A central part of your `report.py` program focuses on the reading of +CSV files. For example, the function `read_portfolio()` reads a file +containing rows of portfolio data and the function `read_prices()` +reads a file containing rows of price data. In both of those +functions, there are a lot of low-level "fiddly" bits and similar +features. For example, they both open a file and wrap it with the +`csv` module and they both convert various fields into new types. + +If you were doing a lot of file parsing for real, you’d probably want +to clean some of this up and make it more general purpose. That's +our goal. + +Start this exercise by creating a new file called `fileparse.py`. This is where we will be doing our work. + +### (a) Reading CSV Files + +To start, let’s just focus on the problem of reading a CSV file into a +list of dictionaries. In the file `fileparse.py`, define a simple +function that looks like this: + +```python +# fileparse.py +import csv + +def parse_csv(filename): + ''' + Parse a CSV file into a list of records + ''' + with open(filename) as f: + rows = csv.reader(f) + + # Read the file headers + headers = next(rows) + records = [] + for row in rows: + if not row: # Skip rows with no data + continue + record = dict(zip(headers, row)) + records.append(record) + + return records +``` + +This function reads a CSV file into a list of dictionaries while +hiding the details of opening the file, wrapping it with the `csv` +module, ignoring blank lines, and so forth. + +Try it out: + +Hint: `python3 -i fileparse.py`. + +```pycon +>>> portfolio = parse_csv('Data/portfolio.csv') +>>> portfolio +[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] +>>> +``` + +This is great except that you can’t do any kind of useful calculation with the data because everything is represented as a string. +We’ll fix this shortly, but let’s keep building on it. + +### (b) Building a Column Selector + +In many cases, you’re only interested in selected columns from a CSV file, not all of the data. +Modify the `parse_csv()` function so that it optionally allows user-specified columns to be picked out as follows: + +```python +>>> # Read all of the data +>>> portfolio = parse_csv('Data/portfolio.csv') +>>> portfolio +[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] + +>>> # Read some of the data +>>> shares_held = parse_csv('portfolio.csv', select=['name','shares']) +>>> shares_held +[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}] +>>> +``` + +An example of a column selector was given in Section 2.5. +However, here’s one way to do it: + +```python +# fileparse.py +import csv + +def parse_csv(filename, select=None): + ''' + Parse a CSV file into a list of records + ''' + with open(filename) as f: + rows = csv.reader(f) + + # Read the file headers + headers = next(rows) + + # If a column selector was given, find indices of the specified columns. + # Also narrow the set of headers used for resulting dictionaries + if select: + indices = [headers.index(colname) for colname in select] + headers = select + else: + indices = [] + + records = [] + for row in rows: + if not row: # Skip rows with no data + continue + # Filter the row if specific columns were selected + if indices: + row = [ row[index] for index in indices ] + + # Make a dictionary + record = dict(zip(headers, row)) + records.append(record) + + return records +``` + +There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices. +For example, suppose the input file had the following headers: + +```pycon +>>> headers = ['name', 'date', 'time', 'shares', 'price'] +>>> +``` + +Now, suppose the selected columns were as follows: + +```pycon +>>> select = ['name', 'shares'] +>>> +``` + +To perform the proper selection, you have to map the selected column names to column indices in the file. +That’s what this step is doing: + +```pycon +>>> indices = [headers.index(colname) for colname in select ] +>>> indices +[0, 3] +>>> +``` + +In other words, "name" is column 0 and "shares" is column 3. +When you read a row of data from the file, the indices are used to filter it: + +```pycon +>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ] +>>> row = [ row[index] for index in indices ] +>>> row +['AA', '100'] +>>> +``` + +### (c) Performing Type Conversion + +Modify the `parse_csv()` function so that it optionally allows type-conversions to be applied to the returned data. +For example: + +```pycon +>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float]) +>>> portfolio +[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}] + +>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int]) +>>> shares_held +[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}] +>>> +``` + +You already explored this in Exercise 2.7. You'll need to insert the +following fragment of code into your solution: + +```python +... +if types: + row = [func(val) for func, val in zip(types, row) ] +... +``` + +### (d) Working with Headers + +Some CSV files don’t include any header information. +For example, the file `prices.csv` looks like this: + +```csv +"AA",9.22 +"AXP",24.85 +"BA",44.85 +"BAC",11.27 +... +``` + +Modify the `parse_csv()` function so that it can work with such files by creating a list of tuples instead. +For example: + +```python +>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False) +>>> prices +[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)] +>>> +``` + +To make this change, you’ll need to modify the code so that the first +line of data isn’t interpreted as a header line. Also, you’ll need to +make sure you don’t create dictionaries as there are no longer any +column names to use for keys. + +### (e) Picking a different column delimitier + +Although CSV files are pretty common, it’s also possible that you could encounter a file that uses a different column separator such as a tab or space. +For example, the file `Data/portfolio.dat` looks like this: + +```csv +name shares price +"AA" 100 32.20 +"IBM" 50 91.10 +"CAT" 150 83.44 +"MSFT" 200 51.23 +"GE" 95 40.37 +"MSFT" 50 65.10 +"IBM" 100 70.44 +``` + +The `csv.reader()` function allows a different delimiter to be given as follows: + +```python +rows = csv.reader(f, delimiter=' ') +``` + +Modify your `parse_csv()` function so that it also allows the delimiter to be changed. + +For example: + +```pycon +>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ') +>>> portfolio +[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] +>>> +``` + +If you’ve made it this far, you’ve created a nice library function that’s genuinely useful. +You can use it to parse arbitrary CSV files, select out columns of +interest, perform type conversions, without having to worry too much +about the inner workings of files or the `csv` module. + +Nice! + +[Next](03_Error_checking) \ No newline at end of file diff --git a/Notes/03_Program_organization/03_Error_checking.md b/Notes/03_Program_organization/03_Error_checking.md new file mode 100644 index 0000000..4972c30 --- /dev/null +++ b/Notes/03_Program_organization/03_Error_checking.md @@ -0,0 +1,393 @@ +# 3.3 Error Checking + +This section discusses some aspects of error checking and exception handling. + +### How programs fail + +Python performs no checking or validation of function argument types or values. +A function will work on any data that is compatible with the statements in the function. + +```python +def add(x, y): + return x + y + +add(3, 4) # 7 +add('Hello', 'World') # 'HelloWorld' +add('3', '4') # '34' +``` + +If there are errors in a function, they will show up at run time (as an exception). + +```python +def add(x, y): + return x + y + +>>> add(3, '4') +Traceback (most recent call last): +... +TypeError: unsupported operand type(s) for +: +'int' and 'str' +>>> +``` + +To verify code, there is a strong emphasis on testing (covered later). + +### Exceptions + +Exceptions are used to signal errors. +To raise an exception yourself, use `raise` statement. + +```python +if name not in names: + raise RuntimeError('Name not found') +``` + +To catch an exception use `try-except`. + +```python +try: + authenticate(username) +except RuntimeError as e: + print(e) +``` + +### Exception Handling + +Exceptions propagate to the first matching `except`. + +```python +def grok(): + ... + raise RuntimeError('Whoa!') # Exception raised here + +def spam(): + grok() # Call that will raise exception + +def bar(): + try: + spam() + except RuntimeError as e: # Exception caught here + ... + +def foo(): + try: + bar() + except RuntimeError as e: # Exception does NOT arrive here + ... + +foo() +``` + +To handle the exception, use the `except` block. You can add any statements you want to handle the error. + +```python +def grok(): ... + raise RuntimeError('Whoa!') + +def bar(): + try: + grok() + except RuntimeError as e: # Exception caught here + statements # Use this statements + statements + ... + +bar() +``` + +After handling, execution resumes with the first statement after the `try-except`. + +```python +def grok(): ... + raise RuntimeError('Whoa!') + +def bar(): + try: + grok() + except RuntimeError as e: # Exception caught here + statements + statements + ... + statements # Resumes execution here + statements # And continues here + ... + +bar() +``` + +### Built-in Exceptions + +There are about two-dozen built-in exceptions. +This is not an exhaustive list. Check the documentation for more. + +```python +ArithmeticError +AssertionError +EnvironmentError +EOFError +ImportError +IndexError +KeyboardInterrupt +KeyError +MemoryError +NameError +ReferenceError +RuntimeError +SyntaxError +SystemError +TypeError +ValueError +``` + +### Exception Values + +Most exceptions have an associated value. It contains more information about what's wrong. + +```python +raise RuntimeError('Invalid user name') +``` + +This value is passed to the variable supplied in `except`. + +```python +try: + ... +except RuntimeError as e: # `e` holds the value raised + ... +``` + +The value is an instance of the exception type. However, it often looks like a string when +printed. + +```python +except RuntimeError as e: + print('Failed : Reason', e) +``` + +### Catching Multiple Errors + +You can catch different kinds of exceptions with multiple `except` blocks. + +```python +try: + ... +except LookupError as e: + ... +except RuntimeError as e: + ... +except IOError as e: + ... +except KeyboardInterrupt as e: + ... +``` + +Alternatively, if the block to handle them is the same, you can group them: + +```python +try: + ... +except (IOError,LookupError,RuntimeError) as e: + ... +``` + +### Catching All Errors + +To catch any exception, use `Exception` like this: + +```python +try: + ... +except Exception: + print('An error occurred') +``` + +In general, writing code like that is a bad idea because you'll have no idea +why it failed. + +### Wrong Way to Catch Errors + +Here is the wrong way to use exceptions. + +```python +try: + go_do_something() +except Exception: + print('Computer says no') +``` + +This swallows all possible errors. It may make it impossible to debug +when the code is failing for some reason you didn't expect at all +(e.g. uninstalled Python module, etc.). + +### Somewhat Better Approach + +This is a more sane approach. + +```python +try: + go_do_something() +except Exception as e: + print('Computer says no. Reason :', e) +``` + +It reports a specific reason for failure. It is almost always a good +idea to have some mechanism for viewing/reporting errors when you +write code that catches all possible exceptions. + +In general though, it's better to catch the error more narrowly. Only +catch the errors you can actually deal with. Let other errors pass to +other code. + +### Reraising an Exception + +Use `raise` to propagate a caught error. + +```python +try: + go_do_something() +except Exception as e: + print('Computer says no. Reason :', e) + raise +``` + +It allows you to take action (e.g. logging) and pass the error on to the caller. + +### Exception Best Practices + +Don't catch exceptions. Fail fast and loud. If it's important, someone +else will take care of the problem. Only catch an exception if you +are *that* someone. That is, only catch errors where you can recover +and sanely keep going. + +### `finally` statement + +It specifies code that must fun regardless of whether or not an exception occurs. + +```python +lock = Lock() +... +lock.acquire() +try: + ... +finally: + lock.release() # this will ALWAYS be executed. With and without exception. +``` + +Comonly used to properly manage resources (especially locks, files, etc.). + +### `with` statement + +In modern code, `try-finally` often replaced with the `with` statement. + +```python +lock = Lock() +with lock: + # lock acquired + ... +# lock released +``` + +A more familiar example: + +```python +with open(filename) as f: + # Use the file + ... +# File closed +``` + +It defines a usage *context* for a resource. When execution leaves that context, +resources are released. `with` only works with certain objects. + +## Exercises + +### (a) Raising exceptions + +The `parse_csv()` function you wrote in the last section allows +user-specified columns to be selected, but that only works if the +input data file has column headers. + +Modify the code so that an exception gets raised if both the `select` +and `has_headers=False` arguments are passed. +For example: + +```python +>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False) +Traceback (most recent call last): + File "", line 1, in + File "fileparse.py", line 9, in parse_csv + raise RuntimeError("select argument requires column headers") +RuntimeError: select argument requires column headers +>>> +``` + +Having added this one check, you might ask if you should be performing +other kinds of sanity checks in the function. For example, should you +check that the filename is a string, that types is a list, or anything +of that nature? + +As a general rule, it’s usually best to skip such tests and to just +let the program fail on bad inputs. The traceback message will point +at the source of the problem and can assist in debugging. + +The main reason for adding the above check to avoid running the code +in a non-sensical mode (e.g., using a feature that requires column +headers, but simultaneously specifying that there are no headers). + +This indicates a programming error on the part of the calling code. + +### (b) Catching exceptions + +The `parse_csv()` function you wrote is used to process the entire +contents of a file. However, in the real-world, it’s possible that +input files might have corrupted, missing, or dirty data. Try this +experiment: + +```python +>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float]) +Traceback (most recent call last): + File "", line 1, in + File "fileparse.py", line 36, in parse_csv + row = [func(val) for func, val in zip(types, row)] +ValueError: invalid literal for int() with base 10: '' +>>> +``` + +Modify the `parse_csv()` function to catch all `ValueError` exceptions +generated during record creation and print a warning message for rows +that can’t be converted. + +The message should include the row number and information about the reason why it failed. +To test your function, try reading the file `Data/missing.csv` above. +For example: + +```python +>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float]) +Row 4: Couldn't convert ['MSFT', '', '51.23'] +Row 4: Reason invalid literal for int() with base 10: '' +Row 7: Couldn't convert ['IBM', '', '70.44'] +Row 7: Reason invalid literal for int() with base 10: '' +>>> +>>> portfolio +[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}] +>>> +``` + +### (c) Silencing Errors + +Modify the `parse_csv()` function so that parsing error messages can be silenced if explicitly desired by the user. +For example: + +```python +>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True) +>>> portfolio +[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}] +>>> +``` + +Error handling is one of the most difficult things to get right in +most programs. As a general rule, you shouldn’t silently ignore +errors. Instead, it’s better to report problems and to give the user +an option to the silence the error message if they choose to do so. + +[Next](04_Modules) diff --git a/Notes/03_Program_organization/04_Modules.md b/Notes/03_Program_organization/04_Modules.md new file mode 100644 index 0000000..7545c7c --- /dev/null +++ b/Notes/03_Program_organization/04_Modules.md @@ -0,0 +1,317 @@ +# 3.4 Modules + +This section introduces the concept of modules. + +### Modules and import + +Any Python source file is a module. + +```python +# foo.py +def grok(a): + ... +def spam(b): + ... +``` + +The `import` statement loads and *executes* a module. + +```python +# program.py +import foo + +a = foo.grok(2) +b = foo.spam('Hello') +... +``` + +### Namespaces + +A module is a collection of named values and is sometimes said to be a *namespace*. +The names are all of the global variables and functions defined in the source file. +After importing, the module name is used as a prefix. Hence the *namespace*. + +```python +import foo + +a = foo.grok(2) +b = foo.spam('Hello') +... +``` + +The module name is tied to the file name (foo -> foo.py). + +### Global Definitions + +Everything defined in the *global* scope is what populates the module +namespace. `foo` in our previous example. Consider two modules +that define the same variable `x`. + +```python +# foo.py +x = 42 +def grok(a): + ... +``` + +```python +# bar.py +x = 37 +def spam(a): + ... +``` + +In this case, the `x` definitions refer to different variables. One +is `foo.x` and the other is `bar.x`. Different modules can use the +same names and those names won't conflict with each other. + +**Modules are isolated.** + +### Modules as Environments + +Modules form an enclosing environment for all of the code defined inside. + +```python +# foo.py +x = 42 + +def grok(a): + print(x) +``` + +*Global* variables are always bound to the enclosing module (same file). +Each source file is its own little universe. + +### Module Execution + +When a module is imported, *all of the statements in the module +execute* one after another until the end of the file is reached. The +contents of the module namespace are all of the *global* names that +are still defined at the end of the execution process. If there are +scripting statements that carry out tasks in the global scope +(printing, creating files, etc.) you will see them run on import. + +### `import as` statement + +You can change the name of a module as you import it: + +```python +import math as m +def rectangular(r, theta): + x = r * m.cos(theta) + y = r * m.sin(theta) + return x, y +``` + +It works the same as a normal import. It just renames the module in that one file. + +### `from` module import + +This picks selected symbols out of a module and makes them available locally. + +```python +from math import sin, cos + +def rectangular(r, theta): + x = r * cos(theta) + y = r * sin(theta) + return x, y +``` + +It allows parts of a module to be used without having to type the module prefix. +Useful for frequently used names. + +### Comments on importing + +Variations on import do *not* change the way that modules work. + +```python +import math as m +# vs +from math import cos, sin +... +``` + +Specifically, `import` always executes the *entire* file and modules +are still isolated environments. + +The `import module as` statement is only manipulating the names. + +### Module Loading + +Each module loads and executes only *once*. +*Note: Repeated imports just return a reference to the previously loaded module.* + +`sys.modules` is a dict of all loaded modules. + +```python +>>> import sys +>>> sys.modules.keys() +['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...] +>>> +``` + +### Locating Modules + +Python consults a path list (sys.path) when looking for modules. + +```python +>>> import sys +>>> sys.path +[ + '', + '/usr/local/lib/python36/python36.zip', + '/usr/local/lib/python36', + ... +] +``` + +Current working directory is usually first. + +### Module Search Path + +`sys.path` contains the search paths. + +You can manually adjust if you need to. + +```python +import sys +sys.path.append('/project/foo/pyfiles') +``` + +Paths are also added via environment variables. + +```python +% env PYTHONPATH=/project/foo/pyfiles python3 +Python 3.6.0 (default, Feb 3 2017, 05:53:21) +[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] +>>> import sys +>>> sys.path +['','/project/foo/pyfiles', ...] +``` + +## Exercises + +For this exercise involving modules, it is critically important to +make sure you are running Python in a proper environment. Modules +are usually when programmers encounter problems with the current working +directory or with Python's path settings. + +### (a) Module imports + +In section 3, we created a general purpose function `parse_csv()` for parsing the contents of CSV datafiles. + +Now, we’re going to see how to use that function in other programs. +First, start in a new shell window. Navigate to the folder where you +have all your files. We are going to import them. + +Start Python interactive mode. + +```shell +bash % python3 +Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) +[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin +Type "help", "copyright", "credits" or "license" for more information. +>>> +``` + +Once you’ve done that, try importing some of the programs you +previously wrote. You should see their output exactly as before. +Just emphasize, importing a module runs its code. + +```python +>>> import bounce +... watch output ... +>>> import mortgage +... watch output ... +>>> import report +... watch output ... +>>> +``` + +If none of this works, you’re probably running Python in the wrong directory. +Now, try importing your `fileparse` module and getting some help on it. + +```python +>>> import fileparse +>>> help(fileparse) +... look at the output ... +>>> dir(fileparse) +... look at the output ... +>>> +``` + +Try using the module to read some data: + +```python +>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float]) +>>> portfolio +... look at the output ... +>>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False) +>>> pricelist +... look at the output ... +>>> prices = dict(pricelist) +>>> prices +... look at the output ... +>>> prices['IBM'] +106.11 +>>> +``` + +Try importing a function so that you don’t need to include the module name: + +```python +>>> from fileparse import parse_csv +>>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float]) +>>> portfolio +... look at the output ... +>>> +``` + +### (b) Using your library module + +In section 2, you wrote a program `report.py` that produced a stock report like this: + +```shell + Name Shares Price Change + ---------- ---------- ---------- ---------- + AA 100 39.91 7.71 + IBM 50 106.11 15.01 + CAT 150 78.58 -4.86 + MSFT 200 30.47 -20.76 + GE 95 37.38 -2.99 + MSFT 50 30.47 -34.63 + IBM 100 106.11 35.67 +``` + +Take that program and modify it so that all of the input file +processing is done using functions in your `fileparse` module. To do +that, import `fileparse` as a module and change the `read_portfolio()` +and `read_prices()` functions to use the `parse_csv()` function. + +Use the interactive example at the start of this exercise as a guide. +Afterwards, you should get exactly the same output as before. + +### (c) Using more library imports + +In section 1, you wrote a program `pcost.py` that read a portfolio and computed its cost. + +```python +>>> import pcost +>>> pcost.portfolio_cost('Data/portfolio.csv') +44671.15 +>>> +``` + +Modify the `pcost.py` file so that it uses the `report.read_portfolio()` function. + +### Commentary + +When you are done with this exercise, you should have three +programs. `fileparse.py` which contains a general purpose +`parse_csv()` function. `report.py` which produces a nice report, but +also contains `read_portfolio()` and `read_prices()` functions. And +finally, `pcost.py` which computes the portfolio cost, but makes use +of the code written for the `report.py` program. + +[Next](05_Main_module) \ No newline at end of file diff --git a/Notes/03_Program_organization/05_Main_module.md b/Notes/03_Program_organization/05_Main_module.md new file mode 100644 index 0000000..ecfbe4a --- /dev/null +++ b/Notes/03_Program_organization/05_Main_module.md @@ -0,0 +1,299 @@ +# 3.5 Main Module + +This section introduces the concept of a main program or main module. + +### Main Functions + +In many programming languages, there is a concept of a *main* function or method. + +```c +// c / c++ +int main(int argc, char *argv[]) { + ... +} +``` + +```java +// java +class myprog { + public static void main(String args[]) { + ... + } +} +``` + +This is the first function that is being executing when an application is launched. + +### Python Main Module + +Python has no *main* function or method. Instead, there is a *main* +module. The *main module* is the source file that runs first. + +```bash +bash % python3 prog.py +... +``` + +Whatever module you give to the interpreter at startup becomes *main*. It doesn't matter the name. + +### `__main__` check + +It is standard practice for modules that can run as a main script to use this convention: + +```python +# prog.py +... +if __name__ == '__main__': + # Running as the main program ... + statements + ... +``` + +Statements inclosed inside the `if` statement become the *main* program. + +### Main programs vs. library imports + +Any file can either run as main or as a library import: + +```bash +bash % python3 prog.py # Running as main +``` + +```python +import prog +``` + +In both cases, `__name__` is the name of the module. However, it will only be set to `__main__` if +running as main. + +As a general rule, you don't want statements that are part of the main +program to execute on a library import. So, it's common to have an `if-`check in code +that might be used either way. + +```python +if __name__ == '__main__': + # Does not execute if loaded with import ... +``` + +### Program Template + +Here is a common program template for writing a Python program: + +```python +# prog.py +# Import statements (libraries) +import modules + +# Functions +def spam(): + ... + +def blah(): + ... + +# Main function +def main(): + ... + +if __name__ == '__main__': + main() +``` + +### Command Line Tools + +Python is often used for command-line tools + +```bash +bash % python3 report.py portfolio.csv prices.csv +``` + +It means that the scripts are executed from the shell / +terminal. Common use cases are for automation, background tasks, etc. + +### Command Line Args + +The command line is a list of text strings. + +```bash +bash % python3 report.py portfolio.csv prices.csv +``` + +This list of text strings is found in `sys.argv`. + +```python +# In the previous bash command +sys.argv # ['report.py, 'portfolio.csv', 'prices.csv'] +``` + +Here is a simple example of processing the arguments: + +```python +import sys + +if len(sys.argv) != 3: + raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile') +portfile = sys.argv[1] +pricefile = sys.argv[2] +... +``` + +### Standard I/O + +Standard Input / Output (or stdio) are files that work the same as normal files. + +```python +sys.stdout +sys.stderr +sys.stdin +``` + +By default, print is directed to `sys.stdout`. Input is read from +`sys.stdin`. Tracebacks and errors are directed to `sys.stderr`. + +Be aware that *stdio* could be connected to terminals, files, pipes, etc. + +```bash +bash % python3 prog.py > results.txt +# or +bash % cmd1 | python3 prog.py | cmd2 +``` + +### Environment Variables + +Environment variables are set in the shell. + +```bash +bash % setenv NAME dave +bash % setenv RSH ssh +bash % python3 prog.py +``` + +`os.environ` is a dictionary that contains these values. + +```python +import os + +name = os.environ['NAME'] # 'dave' +``` + +Changes are reflected in any subprocesses later launched by the program. + +### Program Exit + +Program exit is handled through exceptions. + +```python +raise SystemExit +raise SystemExit(exitcode) +raise SystemExit('Informative message') +``` + +An alternative. + +```python +import sys +sys.exit(exitcode) +``` + +A non-zero exit code indicates an error. + +### The `#!` line + +On Unix, the `#!` line can launch a script as Python. +Add the following to the first line of your script file. + +```python +#!/usr/bin/env python3 +# prog.py +... +``` + +It requires the executable permission. + +```bash +bash % chmod +x prog.py +# Then you can execute +bash % prog.py +... output ... +``` + +*Note: The Python Launcher on Windows also looks for the `#!` line to indicate language version.* + +### Script Template + +Here is a common code template for Python programs that run as command-line scripts: + +```python +#!/usr/bin/env python3 +# prog.py + +# Import statements (libraries) +import modules + +# Functions +def spam(): + ... + +def blah(): + ... + +# Main function +def main(argv): + # Parse command line args, environment, etc. + ... + +if __name__ == '__main__': + import sys + main(sys.argv) +``` + +## Exercises + +### (a) `main()` functions + +In the file `report.py` add a `main()` function that accepts a list of command line options and produces the same output as before. +You should be able to run it interatively like this: + +```python +>>> import report +>>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv']) + Name Shares Price Change +---------- ---------- ---------- ---------- + AA 100 39.91 7.71 + IBM 50 106.11 15.01 + CAT 150 78.58 -4.86 + MSFT 200 30.47 -20.76 + GE 95 37.38 -2.99 + MSFT 50 30.47 -34.63 + IBM 100 106.11 35.67 +>>> +``` + +Modify the `pcost.py` file so that it has a similar `main()` function: + +```python +>>> import pcost +>>> pcost.main(['pcost.py', 'Data/portfolio.csv']) +Total cost: 44671.15 +>>> +``` + +### (b) Making Scripts + +Modify the `report.py` and `pcost.py` programs so that they can execute as a script on the command line: + +```bash +bash $ python3 report.py Data/portfolio.csv Data/prices.csv + Name Shares Price Change +---------- ---------- ---------- ---------- + AA 100 39.91 7.71 + IBM 50 106.11 15.01 + CAT 150 78.58 -4.86 + MSFT 200 30.47 -20.76 + GE 95 37.38 -2.99 + MSFT 50 30.47 -34.63 + IBM 100 106.11 35.67 + +bash $ python3 pcost.py Data/portfolio.csv +Total cost: 44671.15 +``` diff --git a/Notes/03_Program_organization/06_Design_discussion.md b/Notes/03_Program_organization/06_Design_discussion.md new file mode 100644 index 0000000..edbefef --- /dev/null +++ b/Notes/03_Program_organization/06_Design_discussion.md @@ -0,0 +1,132 @@ +# 3.6 Design Discussion + +In this section we consider some design decisions made in code so far. + +### Filenames versus Iterables + +Compare these two programs that return the same output. + +```python +# Provide a filename +def read_data(filename): + records = [] + with open(filename) as f: + for line in f: + ... + records.append(r) + return records + +d = read_data('file.csv') +``` + +```python +# Provide lines +def read_data(lines): + records = [] + for line in lines: + ... + records.append(r) + return records + +with open('file.csv') as f: + d = read_data(f) +``` + +* Which of these functions do you prefer? Why? +* Which of these functions is more flexible? + + +### Deep Idea: "Duck Typing" + +[Duck Typing](https://en.wikipedia.org/wiki/Duck_typing) is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the [duck test](https://en.wikipedia.org/wiki/Duck_test). + +> If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. + +In our previous example that reads the lines, our `read_data` expects +any iterable object. Not just the lines of a file. + +```python +def read_data(lines): + records = [] + for line in lines: + ... + records.append(r) + return records +``` + +This means that we can use it with other *lines*. + +```python +# A CSV file +lines = open('data.csv') +data = read_data(lines) + +# A zipped file +lines = gzip.open('data.csv.gz','rt') +data = read_data(lines) + +# The Standard Input +lines = sys.stdin +data = read_data(lines) + +# A list of strings +lines = ['ACME,50,91.1','IBM,75,123.45', ... ] +data = read_data(lines) +``` + +There is considerable flexibility with this design. + +*Question: Shall we embrace or fight this flexibility?* + +### Library Design Best Practices + +Code libraries are often better served by embracing flexibility. +Don't restrict your options. With great flexibility comes great power. + +## Exercise + +### (a)From filenames to file-like objects + +In this section, you worked on a file `fileparse.py` that contained a +function `parse_csv()`. The function worked like this: + +```pycon +>>> import fileparse +>>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float]) +>>> +``` + +Right now, the function expects to be passed a filename. However, you +can make the code more flexible. Modify the function so that it works +with any file-like/iterable object. For example: + +``` +>>> import fileparse +>>> import gzip +>>> with gzip.open('Data/portfolio.csv.gz', 'rt') as f: +... port = fileparse.parse_csv(f, types=[str,int,float]) +... +>>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1'] +>>> port = fileparse.parse_csv(lines, types=[str,int,float]) +>>> +``` + +In this new code, what happens if you pass a filename as before? + +``` +>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float]) +>>> port +... look at output (it should be crazy) ... +>>> +``` + +With flexibility comes power and with power comes responsibility. Sometimes you'll +need to be careful. + +### (b) Fixing existing functions + +Fix the `read_portfolio()` and `read_prices()` functions in the +`report.py` file so that they work with the modified version of +`parse_csv()`. This should only involve a minor modification. +Afterwards, your `report.py` and `pcost.py` programs should work +the same way they always did. diff --git a/Notes/04_Classes_objects/00_Overview.md b/Notes/04_Classes_objects/00_Overview.md new file mode 100644 index 0000000..5485eee --- /dev/null +++ b/Notes/04_Classes_objects/00_Overview.md @@ -0,0 +1,35 @@ +# Overview + +## Object Oriented (OO) programming + +A Programming technique where code is organized as a collection of *objects*. + +An *object* consists of: + +* Data. Attributes +* Behavior. Methods, functions applied to the object. + +You have already been using some OO during this course. + +For example with Lists. + +```python +>>> nums = [1, 2, 3] +>>> nums.append(4) # Method +>>> nums.insert(1,10) # Method +>>> nums +[1, 10, 2, 3, 4] # Data +>>> +``` + +`nums` is an *instance* of a list. + +Methods (`append` and `insert`) are attached to the instance (`nums`). + +## Summary + +This will be a high-level overview of classes. + +Most code involving classes will involve the topics covered in this section. + +If you're merely using existing libraries, the code is typically fairly simple. diff --git a/Notes/04_Classes_objects/01_Class.md b/Notes/04_Classes_objects/01_Class.md new file mode 100644 index 0000000..97e7222 --- /dev/null +++ b/Notes/04_Classes_objects/01_Class.md @@ -0,0 +1,253 @@ +# 4.1 Classes + +### The `class` statement + +Use the `class` statement to define a new object. + +```python +class Player(object): + def __init__(self, x, y): + self.x = x + self.y = y + self.health = 100 + + def move(self, dx, dy): + self.dx += dx + self.dy += dy + + def damage(self, pts): + self.health -= pts +``` + +In a nutshell, a class is a set of functions that carry out various operations on so-called *instances*. + +### Instances + +Instances are the actual *objects* that you manipulate in your program. + +They are created by calling the class as a function. + +```python +>>> a = Player(2, 3) +>>> b = Player(10, 20) +>>> +``` + +`a` anb `b` are instances of `Player`. + +*Emphasize: The class statement is just the definition (it does nothing by itself). Similar to a function definition.* + +### Instance Data + +Each instance has its own local data. + +```python +>>> a.x +2 +>>> b.x +10 +``` + +This data is initialized by the `__init__()`. + +```python +class Player(object): + def __init__(self, x, y): + # Any value stored on `self` is instance data + self.x = x + self.y = y + self.health = 100 +``` + +There are no restrictions on the total number or type of attributes stored. + +### Instance Methods + +Instance methods are functions applied to instances of an object. + +```python +class Player(object): + ... + # `move` is a method + def move(self, dx, dy): + self.x += dx + self.y += dy +``` + +The object itself is always passed as first argument. + +```python +>>> a.move(1, 2) + +# matches `a` to `self` +# matches `1` to `dx` +# matches `2` to `dy` +def move(self, dx, dy): +``` + +By convention, the instance is called `self`. However, the actual name +used is unimportant. The object is always passed as the first +argument. It is simply Python programming style to call this argument +`self`. + +### Class Scoping + +Classes do not define a scope. + +```python +class Player(object): + ... + def move(self, dx, dy): + self.x += dx + self.y += dy + + def left(self, amt): + move(-amt, 0) # NO. Calls a global `move` function + self.move(-amt, 0) # YES. Calls method `move` from above. +``` + +If you want to operate on an instance, you always have to refer too it explicitly (e.g., `self`). + +## Exercises + +### (a) Objects as Data Structures + +In section 2 and 3, we worked with data represented as tuples and dictionaries. +For example, a holding of stock could be represented as a tuple like this: + +```python +s = ('GOOG',100,490.10) +``` + +or as a dictionary like this: + +```python +s = { 'name' : 'GOOG', + 'shares' : 100, + 'price' : 490.10 +} +``` + +You can even write functions for manipulating such data. For example: + +```python +def cost(s): + return s['shares'] * s['price'] +``` + +However, as your program gets large, you might want to create a better sense of organization. +Thus, another approach for representing data would be to define a class. + +Create a file called `stock.py` and define a class `Stock` that represents a single holding of stock. +Have the instances of `Stock` have `name`, `shares`, and `price` attributes. + +```python +>>> import stock +>>> s = stock.Stock('GOOG',100,490.10) +>>> s.name +'GOOG' +>>> s.shares +100 +>>> s.price +490.1 +>>> +``` + +Create a few more `Stock` objects and manipulate them. For example: + +```python +>>> a = stock.Stock('AAPL',50,122.34) +>>> b = stock.Stock('IBM',75,91.75) +>>> a.shares * a.price +6117.0 +>>> b.shares * b.price +6881.25 +>>> stocks = [a,b,s] +>>> stocks +[, , ] +>>> for t in stocks: + print(f'{t.name:>10s} {t.shares:>10d} {t.price:>10.2f}') + +... look at the output ... +>>> +``` + +One thing to emphasize here is that the class `Stock` acts like a factory for creating instances of objects. +Basically, you just call it as a function and it creates a new object for you. + +Also, it needs to be emphasized that each object is distinct---they +each have their own data that is separate from other objects that have +been created. An object defined by a class is somewhat similar to a +dictionary, just with somewhat different syntax. +For example, instead of writing `s['name']` or `s['price']`, you now +write `s.name` and `s.price`. + +### (b) Reading Data into a List of Objects + +In your `stock.py` program, write a function +`read_portfolio(filename)` that reads portfolio data from a file into +a list of `Stock` objects. This function is going to mimic the +behavior of earlier code you have written. Here’s how your function +will behave: + +```python +>>> import stock +>>> portfolio = stock.read_portfolio('Data/portfolio.csv') +>>> portfolio +[, , , + , , , + ] +>>> +``` + +It is important to emphasize that `read_portfolio()` is a top-level function, not a method of the `Stock` class. +This function is merely creating a list of `Stock` objects; it’s not an operation on an individual `Stock` instance. + +Try performing some calculations with the above data. First, try printing a formatted table: + +```python +>>> for s in portfolio: + print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}') + +... look at the output ... +>>> +``` + +Try a list comprehension: + +```python +>>> more100 = [s for s in portfolio if s.shares > 100] +>>> for s in more100: + print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}') + +... look at the output ... +>>> +``` + +Again, notice the similarity between `Stock` objects and dictionaries. They’re basically the same idea, but the syntax for accessing values differs. + +### (c) Adding some Methods + +With classes, you can attach functions to your objects. These are +known as methods and are functions that operate on the data stored +inside an object. + +Add a `cost()` and `sell()` method to your `Stock` object. They should +work like this: + +```python +>>> import stock +>>> s = stock.Stock('GOOG',100,490.10) +>>> s.cost() +49010.0 +>>> s.shares +100 +>>> s.sell(25) +>>> s.shares +75 +>>> s.cost() +36757.5 +>>> +``` + +[Next](02_Inheritance) \ No newline at end of file diff --git a/Notes/04_Classes_objects/02_Inheritance.md b/Notes/04_Classes_objects/02_Inheritance.md new file mode 100644 index 0000000..911e5e3 --- /dev/null +++ b/Notes/04_Classes_objects/02_Inheritance.md @@ -0,0 +1,502 @@ +# 4.2 Inheritance + +Inheritance is a commonly used tool for writing extensible programs. This section explores that idea. + +### Introduction + +Inheritance is used to specialize existing objects: + +```python +class Parent: + ... + +class Child(Parent): # Check how `Parent` is between the parenthesis + ... +``` + +The new class `Child` is called a derived class or subclass. +The `Parent` class is known as base class or superclass. +`Parent` is specified in `()` after the class name, `class Child(Parent):`. + +### Extending + +With inheritance, you are taking an existing class and: + +* Adding new methods +* Redefining some of the existing methods +* Adding new attributes to instances + +In the end you are **extending existing code**. + +### Example + +Suppose that this is your starting class: + +```python +class Stock(object): + def __init__(self, name, shares, price): + self.name = name + self.shares = shares + self.price = price + + def cost(self): + return self.shares * self.price + + def sell(self, nshares): + self.shares -= nshares +``` + +You can change any part of this via inheritance. + +### Add a new method + +```python +class MyStock(Stock): + def panic(self): + self.sell(self.shares) +``` + +Usage example. + +```python +>>> s = MyStock('GOOG', 100, 490.1) +>>> s.sell(25) +>>> s.shares 75 +>>> s.panic() +>>> s.shares 0 +>>> +``` + +### Redefining an existing method + +```python +class MyStock(Stock): + def cost(self): + return 1.25 * self.shares * self.price +``` + +Usage example. + +```python +>>> s = MyStock('GOOG', 100, 490.1) +>>> s.cost() +61262.5 +>>> +``` + +The new method takes the place of the old one. The other methods are unaffected. + +## Overriding + +Sometimes a class extends an existing method, but it wants to use the original implementation. +For this, use `super()`: + +```python +class Stock(object): + ... + def cost(self): + return self.shares * self.price + ... + +class MyStock(Stock): + def cost(self): + # Check the call to `super` + actual_cost = super().cost() + return 1.25 * actual_cost +``` + +Use `super()` to call the previous version. + +*Caution: Python 2 is different.* + +```python +actual_cost = super(MyStock, self).cost() +``` + +### `__init__` and inheritance + +If `__init__` is redefined, it is mandatory to initialize the parent. + +```python +class Stock(object): + def __init__(self, name, shares, price): + self.name = name + self.shares = shares + self.price = price + +class MyStock(Stock): + def __init__(self, name, shares, price, factor): + # Check the call to `super` and `__init__` + super().__init__(name, shares, price) + self.factor = factor + + def cost(self): + return self.factor * super().cost() +``` + +You should call the `init` on the `super` which is the way to call the previous version as shown previously. + +### Using Inheritance + +Inheritance is sometimes used to organize related objects. + +```python +class Shape(object): + ... + +class Circle(Shape): + ... + +class Rectangle(Shape): + ... +``` + +Think of a logical hierarchy or taxonomy. However, a more common usage is +related to making reusable or extensible code: + +```python +class CustomHandler(TCPHandler): + def handle_request(self): + ... + # Custom processing +``` + +The base class contains some general purpose code. +Your class inherits and customized specific parts. Maybe it plugs into a framework. + +### "is a" relationship + +Inheritance establishes a type relationship. + +```python +class Shape(object): + ... + +class Circle(Shape): + ... +``` + +Check for object instance. + +```python +>>> c = Circle(4.0) +>>> isinstance(c, Shape) +True +>>> +``` + +*Important: Code that works with the parent is also supposed to work with the child.* + +### `object` base class + +If a class has no parent, you sometimes see `object` used as the base. + +```python +class Shape(object): + ... +``` + +`object` is the parent of all objects in Python. + +*Note: it's not technically required in Python 3. If omitted in Python 2, it results in an "old style class" which should be avoided.* + +### Multiple Inheritance + +You can inherit from multiple classes by specifying them in the definition of the class. + +```python +class Mother(object): + ... + +class Father(object): + ... + +class Child(Mother, Father): + ... +``` + +The class `Child` inherits features from both parents. There are some rather tricky details. Don't do it unless you know what you are doing. +We're not going to explore multiple inheritance further in this course. + +## Exercises + +### (a) Print Portfolio + +A major use of inheritance is in writing code that’s meant to be extended or customized in various ways—especially in libraries or frameworks. +To illustrate, start by adding the following function to your `stock.py` program: + +```python +# stock.py +... +def print_portfolio(portfolio): + ''' + Make a nicely formatted table showing portfolio contents. + ''' + headers = ('Name','Shares','Price') + for h in headers: + print(f'{h:>10s}',end=' ') + print() + print(('-'*10 + ' ')*len(headers)) + for s in portfolio: + print(f'{s.name:>10s} {s.shares:>10d} {s.price:>10.2f}') +``` + +Add a little testing section to the bottom of your `stock.py` file that runs the above function: + +```python +if __name__ == '__main__': + portfolio = read_portfolio('Data/portfolio.csv') + print_portfolio(portfolio) +``` + +When you run your `stock.py`, you should get this output: + +```bash + Name Shares Price + ---------- ---------- ---------- + AA 100 32.20 + IBM 50 91.10 + CAT 150 83.44 + MSFT 200 51.23 + GE 95 40.37 + MSFT 50 65.10 + IBM 100 70.44 +``` + +### (b) An Extensibility Problem + +Suppose that you wanted to modify the `print_portfolio()` function to +support a variety of different output formats such as plain-text, +HTML, CSV, or XML. To do this, you could try to write one gigantic +function that did everything. However, doing so would likely lead to +an unmaintainable mess. Instead, this is a perfect opportunity to use +inheritance instead. + +To start, focus on the steps that are involved in a creating a +table. At the top of the table is a set of table headers. After that, +rows of table data appear. Let’s take those steps and and put them into their own class. + +Create a file called `tableformat.py` and define the following class: + +```python +# tableformat.py + +class TableFormatter(object): + def headings(self, headers): + ''' + Emit the table headings. + ''' + raise NotImplementedError() + + def row(self, rowdata): + ''' + Emit a single row of table data. + ''' + raise NotImplementedError() +``` + +This class does nothing, but it serves as a kind of design specification for additional classes that will be defined shortly. + +Modify the `print_portfolio()` function so that it accepts a `TableFormatter` object as input and invokes methods on it to produce the output. +For example, like this: + +```python +# stock.py +... +def print_portfolio(portfolio, formatter): + ''' + Make a nicely formatted table showing portfolio contents. + ''' + formatter.headings(['Name', 'Shares', 'Price']) + for s in portfolio: + # Form a row of output data (as strings) + rowdata = [s.name, str(s.shares), f'{s.price:0.2f}' ] + formatter.row(rowdata) +``` + +Finally, try your new class by modifying the main program like this: + +```python +# stock.py +... +if __name__ == '__main__': + from tableformat import TableFormatter + portfolio = read_portfolio('Data/portfolio.csv') + formatter = TableFormatter() + print_portfolio(portfolio, formatter) +``` + +When you run this new code, your program will immediately crash with a `NotImplementedError` exception. +That’s not too exciting, but continue to the next part. + +### (c) Using Inheritance to Produce Different Output + +The `TableFormatter` class you defined in part (a) is meant to be extended via inheritance. +In fact, that’s the whole idea. To illustrate, define a class `TextTableFormatter` like this: + +```python +# tableformat.py +... +class TextTableFormatter(TableFormatter): + ''' + Emit a table in plain-text format + ''' + def headings(self, headers): + for h in headers: + print(f'{h:>10s}', end=' ') + print() + print(('-'*10 + ' ')*len(headers)) + + def row(self, rowdata): + for d in rowdata: + print(f'{d:>10s}', end=' ') + print() +``` + +Modify your main program in `stock.py` like this and try it: + +```python +# stock.py +... +if __name__ == '__main__': + from tableformat import TextTableFormatter + portfolio = read_portfolio('Data/portfolio.csv') + formatter = TextTableFormatter() + print_portfolio(portfolio, formatter) +``` + +This should produce the same output as before: + +```bash + Name Shares Price + ---------- ---------- ---------- + AA 100 32.20 + IBM 50 91.10 + CAT 150 83.44 + MSFT 200 51.23 + GE 95 40.37 + MSFT 50 65.10 + IBM 100 70.44 +``` + +However, let’s change the output to something else. Define a new class `CSVTableFormatter` that produces output in CSV format: + +```python +# tableformat.py +... +class CSVTableFormatter(TableFormatter): + ''' + Output portfolio data in CSV format. + ''' + def headings(self, headers): + print(','.join(headers)) + + def row(self, rowdata): + print(','.join(rowdata)) +``` + +Modify your main program as follows: + +```python +# stock.py +... +if __name__ == '__main__': + from tableformat import CSVTableFormatter + portfolio = read_portfolio('Data/portfolio.csv') + formatter = CSVTableFormatter() + print_portfolio(portfolio, formatter) +``` + +You should now see CSV output like this: + +```csv +Name,Shares,Price +AA,100,32.20 +IBM,50,91.10 +CAT,150,83.44 +MSFT,200,51.23 +GE,95,40.37 +MSFT,50,65.10 +IBM,100,70.44 +``` + +Using a similar idea, define a class `HTMLTableFormatter` that produces a table with the following output: + +```html + Name Shares Price + AA 100 32.20 + IBM 50 91.10 +``` + +Test your code by modifying the main program to create a `HTMLTableFormatter` object instead of a `CSVTableFormatter` object. + +### (d) Polymorphism in Action + +A major feature of object-oriented programming is that you can plug an +object into a program and it will work without having to change any of +the existing code. For example, if you wrote a program that expected +to use a `TableFormatter` object, it would work no matter what kind of +`TableFormatter` you actually gave it. + +This behavior is sometimes referred to as *polymorphism*. + +One potential problem is making it easier for the user to pick the formatter that they want. +This can sometimes be fixed by defining a helper function. + +In the `tableformat.py` file, add a function `create_formatter(name)` +that allows a user to create a formatter given an output name such as +`'txt'`, `'csv'`, or `'html'`. + +For example: + +```python +# stock.py +... +if __name__ == '__main__': + from tableformat import create_formatter + portfolio = read_portfolio('Data/portfolio.csv') + formatter = create_formatter('csv') + print_portfolio(portfolio, formatter) +``` + +When you run this program, you’ll see output such as this: + +```csv +Name,Shares,Price +AA,100,32.20 +IBM,50,91.10 +CAT,150,83.44 +MSFT,200,51.23 +GE,95,40.37 +MSFT,50,65.10 +IBM,100,70.44 +``` + +Try changing the format to `'txt'` and `'html'` just to make sure your +code is working correctly. If the user provides a bad output format +to the `create_formatter()` function, have it raise a `RuntimeError` +exception. For example: + +```python +>>> from tableformat import create_formatter +>>> formatter = create_formatter('xls') +Traceback (most recent call last): + File "", line 1, in + File "tableformat.py", line 68, in create_formatter + raise RuntimeError('Unknown table format %s' % name) +RuntimeError: Unknown table format xls +>>> +``` + +Writing extensible code is one of the most common uses of inheritance in libraries and frameworks. +For example, a framework might instruct you to define your own object that inherits from a provided base class. +You’re then told to fill in various methods that implement various bits of functionality. +That said, designing object oriented programs can be extremely +difficult. For more information, you should probably look for books on +the topic of design patterns. + +That said, understanding what happened in this exercise will take you +pretty far in terms of using most library modules and knowing +what inheritance is good for (extensibility). + +[Next](03_Special_methods) diff --git a/Notes/04_Classes_objects/03_Special_methods.md b/Notes/04_Classes_objects/03_Special_methods.md new file mode 100644 index 0000000..cdbc7fe --- /dev/null +++ b/Notes/04_Classes_objects/03_Special_methods.md @@ -0,0 +1,332 @@ +# 4.3 Special Methods + +Various parts of Python's behavior can be customized via special or magic methods. +This section introduces that idea. + +### Introduction + +Classes may define special methods. These have special meaning to the Python interpreter. +They are always preceded and followed by `__`. For example `__init__`. + +```python +class Stock(object): + def __init__(self): + ... + def __repr__(self): + ... +``` + +There are dozens of special methods, but we will only look at a few specific examples. + +### Special methods for String Conversions + +Objects have two string representations. + +```python +>>> from datetime import date +>>> d = date(2012, 12, 21) +>>> print(d) +2012-12-21 +>>> d +datetime.date(2012, 12, 21) +>>> +``` + +The `str()` function is used to create a nice printable output: + +```python +>>> str(d) +'2012-12-21' +>>> +``` + +The `repr()` function is used to create a more detailed representation +for programmers. + +```python +>>> repr(d) +'datetime.date(2012, 12, 21)' +>>> +``` + +Those functions, `str()` and `repr()`, use a pair of special methods in the class to get the string to be printed. + +```python +class Date(object): + def __init__(self, year, month, day): + self.year = year + self.month = month + self.day = day + + # Used with `str()` + def __str__(self): + return f'{self.year}-{self.month}-{self.day}' + + # Used with `repr()` + def __repr__(self): + return f'Date({self.year},{self.month},{self.day})' +``` + +*Note: The convention for `__repr__()` is to return a string that, + when fed to `eval()`., will recreate the underlying object. If this + is not possible, some kind of easily readable representation is used + instead.* + +### Special Methods for Mathematics + +Mathematical operators are just calls to special methods. + +```python +a + b a.__add__(b) +a - b a.__sub__(b) +a * b a.__mul__(b) +a / b a.__div__(b) +a // b a.__floordiv__(b) +a % b a.__mod__(b) +a << b a.__lshift__(b) +a >> b a.__rshift__(b) +a & b a.__and__(b) +a | b a.__or__(b) +a ^ b a.__xor__(b) +a ** b a.__pow__(b) +-a a.__neg__() +~a a.__invert__() +abs(a) a.__abs__() +``` + +### Special Methods for Item Access + +These are the methods to implement containers. + +```python +len(x) x.__len__() +x[a] x.__getitem__(a) +x[a] = v x.__setitem__(a,v) +del x[a] x.__delitem__(a) +``` + +You can use them in your classes. + +```python +class Sequence(object): + def __len__(self): + ... + def __getitem__(self,a): + ... + def __setitem__(self,a,v): + ... + def __delitem__(self,a): + ... +``` + +### Method Invocation + +Invoking a method is a two-step process. + +1. Lookup: The `.` operator +2. Method call: The `()` operator + +```python +>>> s = Stock('GOOG',100,490.10) +>>> c = s.cost # Lookup +>>> c +> +>>> c() # Method call +49010.0 +>>> +``` + +### Bound Methods + +A method that has not yet been invoked by the function call operator `()` is known as a *bound method*. +It operates on the instance where it originated. + +```python +>>> s = Stock('GOOG', 100, 490.10) >>> s + +>>> c = s.cost +>>> c +> +>>> c() +49010.0 +>>> +``` + +Bound methods are often a source of careless non-obvious errors. For example: + +```python +>>> s = Stock('GOOG', 100, 490.10) +>>> print('Cost : %0.2f' % s.cost) +Traceback (most recent call last): + File "", line 1, in +TypeError: float argument required +>>> +``` + +Or devious behavior that's hard to debug. + +```python +f = open(filename, 'w') +... +f.close # Oops, Didn't do anything at all. `f` still open. +``` + +In both of these cases, the error is cause by forgetting to include the +trailing parentheses. For example, `s.cost()` or `f.close()`. + +### Attribute Access + +There is an alternative way to access, manipulate and manage attributes. + +```python +getattr(obj, 'name') # Same as obj.name +setattr(obj, 'name', value) # Same as obj.name = value +delattr(obj, 'name') # Same as del obj.name +hasattr(obj, 'name') # Tests if attribute exists +``` + +Example: + +```python +if hasattr(obj, 'x'): + x = getattr(obj, 'x'): +else: + x = None +``` + +*Note: `getattr()` also has a useful default value *arg*. + +```python +x = getattr(obj, 'x', None) +``` + +## Exercises + +### (a) Better output for printing objects + +All Python objects have two string representations. The first +representation is created by string conversion via `str()` (which is +called by `print`). The string representation is usually a nicely +formatted version of the object meant for humans. The second +representation is a code representation of the object created by +`repr()` (or by viewing a value in the interactive shell). The code +representation typically shows you the code that you have to type to +get the object. + +The two representations of an object are often different. For example, you can see the difference by trying the following: + +```python +>>> s = 'Hello\nWorld' +>>> print(str(s)) # Notice nice output (no quotes) +Hello +World +>>> print(repr(s)) # Notice the added quotes and escape codes +'Hello\nWorld' +>>> print(f'{s!r}') # Alternate way to get repr() string +'Hello\nWorld' +>>> +``` + +Both kinds of string conversions can be redefined in a class if it defines the `__str__()` and `__repr__()` methods. + +Modify the `Stock` object that you defined in Exercise 4.1 so that the `__repr__()` method produces more useful output. + +```python +>>> goog = Stock('GOOG', 100, 490.1) +>>> goog +Stock('GOOG', 100, 490.1) +>>> +``` + +See what happens when you read a portfolio of stocks and view the resulting list after you have made these changes. + +```python +>>> import stock +>>> portfolio = stock.read_portfolio('Data/portfolio.csv') +>>> portfolio +... see what the output is ... +>>> +``` + +### (b) An example of using `getattr()` + +In Exercise 4.2 you worked with a function `print_portfolio()` that made a table for a stock portfolio. +That function was hard-coded to only work with stock data—-how limiting! You can do so much more if you use functions such as `getattr()`. + +To begin, try this little example: + +```python +>>> import stock +>>> s = stock.Stock('GOOG', 100, 490.1) +>>> columns = ['name', 'shares'] +>>> for colname in columns: + print(colname, '=', getattr(s, colname)) + +name = GOOG +shares = 100 +>>> +``` + +Carefully observe that the output data is determined entirely by the attribute names listed in the `columns` variable. + +In the file `tableformat.py`, take this idea and expand it into a +generalized function `print_table()` that prints a table showing +user-specified attributes of a list of arbitrary objects. + +As with the earlier `print_portfolio()` function, `print_table()` +should also accept a `TableFormatter` instance to control the output +format. Here’s how it should work: + +```python +>>> import stock +>>> portfolio = stock.read_portfolio('Data/portfolio.csv') +>>> from tableformat import create_formatter, print_table +>>> formatter = create_formatter('txt') +>>> print_table(portfolio, ['name','shares'], formatter) + name shares +---------- ---------- + AA 100 + IBM 50 + CAT 150 + MSFT 200 + GE 95 + MSFT 50 + IBM 100 + +>>> print_table(portfolio, ['name','shares','price'], formatter) + name shares price +---------- ---------- ---------- + AA 100 32.2 + IBM 50 91.1 + CAT 150 83.44 + MSFT 200 51.23 + GE 95 40.37 + MSFT 50 65.1 + IBM 100 70.44 +>>> +``` + +### (c) Exercise Bonus: Column Formatting + +Modify the `print_table()` function in part (B) so that it also +accepts a list of format specifiers for formatting the contents of +each column. + +```python +>>> print_table(portfolio, + ['name','shares','price'], + ['s','d','0.2f'], + formatter) + name shares price +---------- ---------- ---------- + AA 100 32.20 + IBM 50 91.10 + CAT 150 83.44 + MSFT 200 51.23 + GE 95 40.37 + MSFT 50 65.10 + IBM 100 70.44 +>>> +``` + +[Next](04_Defining_exceptions) diff --git a/Notes/04_Classes_objects/04_Defining_exceptions.md b/Notes/04_Classes_objects/04_Defining_exceptions.md new file mode 100644 index 0000000..0c31365 --- /dev/null +++ b/Notes/04_Classes_objects/04_Defining_exceptions.md @@ -0,0 +1,49 @@ +# 4.4 Defining Exceptions + +User defined exceptions are defined by classes. + +```python +class NetworkError(Exception): + pass +``` + +**Exceptions always inherit from `Exception`.** +Usually they are empty classes. Use `pass` for the body. + +You can also make a hierarchy of your exceptions. + +```python +class AuthenticationError(NetworkError): + pass + +class ProtocolError(NetworkError): + pass +``` + +## Exercises + +### (a) Defining a custom exception + +It is often good practice for libraries to define their own exceptions. + +This makes it easier to distinguish between Python exceptions raised +in response to common programming errors versus exceptions +intentionally raised by a library to a signal a specific usage +problem. + +Modify the `create_formatter()` function from the last exercise so +that it raises a custom `FormatError` exception when the user provides +a bad format name. + +For example: + +```python +>>> from tableformat import create_formatter +>>> formatter = create_formatter('xls') +Traceback (most recent call last): + File "", line 1, in + File "tableformat.py", line 71, in create_formatter + raise FormatError('Unknown table format %s' % name) +FormatError: Unknown table format xls +>>> +``` diff --git a/_layouts/default.html b/_layouts/default.html index e773aad..0cac6f5 100644 --- a/_layouts/default.html +++ b/_layouts/default.html @@ -39,6 +39,7 @@ {{ content }}