Added sections 2-4

This commit is contained in:
David Beazley
2020-05-25 12:40:11 -05:00
parent 7a4423dee4
commit c06face3a5
21 changed files with 5639 additions and 0 deletions

View File

@@ -0,0 +1,11 @@
# Overview
In this section you will learn:
* How to organize larger programs.
* Defining and working with functions.
* Exceptions and Error handling.
* Basic module management.
* Script writing.
Python is great for short scripts, one-off problems, prototyping, testing, etc.

View File

@@ -0,0 +1,275 @@
# 3.1 Python Scripting
In this part we look more closely at the practice of writing Python
scripts.
### What is a Script?
A *script* is a program that runs a series of statements and stops.
```python
# program.py
statement1
statement2
statement3
...
```
We have been writing scripts to this point.
### A Problem
If you write a useful script, it will grow in features and
functionality. You may want to apply it to other related problems.
Over time, it might become a critical application. And if you don't
take care, it might turn into a huge tangled mess. So, let's get
organized.
### Defining Things
You must always define things before they get used later on in a program.
```python
def square(x):
return x*x
a = 42
b = a + 2 # Requires that `a` is defined
z = square(b) # Requires `square` and `b` to be defined
```
**The order is important.**
You almost always put the definitions of variables an functions near the beginning.
### Defining Functions
It is a good idea to put all of the code related to a single *task* all in one place.
```python
def read_prices(filename):
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
A function also simplifies repeated operations.
```python
oldprices = read_prices('oldprices.csv')
newprices = read_prices('newprices.csv')
```
### What is a Function?
A function is a named sequence of statements.
```python
def funcname(args):
statement
statement
...
return result
```
*Any* Python statement can be used inside.
```python
def foo():
import math
print(math.sqrt(2))
help(math)
```
There are no *special* statements in Python.
### Function Definition
Functions can be *defined* in any order.
```python
def foo(x):
bar(x)
def bar(x):
statements
# OR
def bar(x)
statements
def foo(x):
bar(x)
```
Functions must only be defined before they are actually *used* (or called) during program execution.
```python
foo(3) # foo must be defined already
```
Stylistically, it is probably more common to see functions defined in a *bottom-up* fashion.
### Bottom-up Style
Functions are treated as building blocks.
The smaller/simpler blocks go first.
```python
# myprogram.py
def foo(x):
...
def bar(x):
...
foo(x) # Defined above
...
def spam(x):
...
bar(x) # Defined above
...
spam(42) # Code that uses the functions appears at the end
```
Later functions build upon earlier functions.
### Function Design
Ideally, functions should be a *black box*.
They should only operate on passed inputs and avoid global variables
and mysterious side-effects. Main goals: *Modularity* and *Predictability*.
### Doc Strings
A good practice is to include documentations in the form of
doc-strings. Doc-strings are strings written immediately after the
name of the function. They feed `help()`, IDEs and other tools.
```python
def read_prices(filename):
'''
Read prices from a CSV file of name,price
'''
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
### Type Annotations
You can also add some optional type annotations to your function definitions.
```python
def read_prices(filename: str) -> dict:
'''
Read prices from a CSV file of name,price
'''
prices = {}
with open(filename) as f:
f_csv = csv.reader(f)
for row in f_csv:
prices[row[0]] = float(row[1])
return prices
```
These do nothing. It is purely informational.
They may be used by IDEs, code checkers, etc.
## Exercises
In section 2, you wrote a program called `report.py` that printed out a report showing the performance of a stock portfolio.
This program consisted of some functions. For example:
```python
# report.py
import csv
def read_portfolio(filename):
'''
Read a stock portfolio file into a list of dictionaries with keys
name, shares, and price.
'''
portfolio = []
with open(filename) as f:
rows = csv.reader(f)
headers = next(rows)
for row in rows:
record = dict(zip(headers, row))
stock = {
'name' : record['name'],
'shares' : int(record['shares']),
'price' : float(record['price'])
}
portfolio.append(stock)
return portfolio
...
```
However, there were also portions of the program that just performed a series of scripted calculations.
This code appeared near the end of the program. For example:
```python
...
# Output the report
headers = ('Name', 'Shares', 'Price', 'Change')
print('%10s %10s %10s %10s' % headers)
print(('-' * 10 + ' ') * len(headers))
for row in report:
print('%10s %10d %10.2f %10.2f' % row)
...
```
In this exercise, were going take this program and organize it a little more strongly around the use of functions.
### (a) Structuring a program as a collection of functions
Modify your `report.py` program so that all major operations,
including calculations and output, are carried out by a collection of
functions. Specifically:
* Create a function `print_report(report)` that prints out the report.
* Change the last part of the program so that it is nothing more than a series of function calls and no other computation.
### (b) Creating a function for program execution
Take the last part of your program and package it into a single function `portfolio_report(portfolio_filename, prices_filename)`.
Have the function work so that the following function call creates the report as before:
```python
portfolio_report('Data/portfolio.csv', 'Data/prices.csv')
```
In this final version, your program will be nothing more than a series
of function definitions followed by a single function call to
`portfolio_report()` at the very end (which executes all of the steps
involved in the program).
By turning your program into a single function, it becomes easy to run it on different inputs.
For example, try these statements interactively after running your program:
```python
>>> portfolio_report('Data/portfolio2.csv', 'Data/prices.csv')
... look at the output ...
>>> files = ['Data/portfolio.csv', 'Data/portfolio2.csv']
>>> for name in files:
print(f'{name:-^43s}')
portfolio_report(name, 'prices.csv')
print()
... look at the output ...
>>>
```
[Next](02_More_functions)

View File

@@ -0,0 +1,491 @@
# 3.2 More on Functions
This section fills in a few more details about how functions work and are defined.
### Calling a Function
Consider this function:
```python
def read_prices(filename, debug):
...
```
You can call the function with positional arguments:
```
prices = read_prices('prices.csv', True)
```
Or you can call the function with keyword arguments:
```python
prices = read_prices(filename='prices.csv', debug=True)
```
### Default Arguments
Sometimes you want an optional argument.
```python
def read_prices(filename, debug=False):
...
```
If a default value is assigned, the argument is optional in function calls.
```python
d = read_prices('prices.csv')
e = read_prices('prices.dat', True)
```
*Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).*
### Prefer keyword arguments for optional arguments
Compare and contrast these two different calling styles:
```python
parse_data(data, False, True) # ?????
parse_data(data, ignore_errors=True)
parse_data(data, debug=True)
parse_data(data, debug=True, ignore_errors=True)
```
Keyword arguments improve code clarity.
### Design Best Practices
Always give short, but meaningful names to functions arguments.
Someone using a function may want to use the keyword calling style.
```python
d = read_prices('prices.csv', debug=True)
```
Python development tools will show the names in help features and documentation.
### Return Values
The `return` statement returns a value
```python
def square(x):
return x * x
```
If no return value or `return` not specified, `None` is returned.
```python
def bar(x):
statements
return
a = bar(4) # a = None
# OR
def foo(x):
statements # No `return`
b = foo(4) # b = None
```
### Multiple Return Values
Functions can only return one value.
However, a function may return multiple values by returning a tuple.
```python
def divide(a,b):
q = a // b # Quotient
r = a % b # Remainder
return q, r # Return a tuple
```
Usage example:
```python
x, y = divide(37,5) # x = 7, y = 2
x = divide(37, 5) # x = (7, 2)
```
### Variable Scope
Programs assign values to variables.
```python
x = value # Global variable
def foo():
y = value # Local variable
```
Variables assignments occur outside and inside function definitions.
Variables defined outside are "global". Variables inside a function are "local".
### Local Variables
Variables inside functions are private.
```python
def read_portfolio(filename):
portfolio = []
for line in open(filename):
fields = line.split()
s = (fields[0],int(fields[1]),float(fields[2]))
portfolio.append(s)
return portfolio
```
In this example, `filename`, `portfolio`, `line`, `fields` and `s` are local variables.
Those variables are not retained or accessible after the function call.
```pycon
>>> stocks = read_portfolio('stocks.dat')
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>
```
They also can't conflict with variables found elsewhere.
### Global Variables
Functions can freely access the values of globals.
```python
name = 'Dave'
def greeting():
print('Hello', name) # Using `name` global variable
```
However, functions can't modify globals:
```python
name = 'Dave'
def spam():
name = 'Guido'
spam()
print(name) # prints 'Dave'
```
**Remember: All assignments in functions are local.**
### Modifying Globals
If you must modify a global variable you must declare it as such.
```python
name = 'Dave'
def spam():
global name
name = 'Guido' # Changes the global name above
```
The global declaration must appear before its use. Having seen this,
know that it is considered poor form. In fact, try to avoid entirely
if you can. If you need a function to modify some kind of state outside
of the function, it's better to use a class instead (more on this later).
### Argument Passing
When you call a function, the argument variables are names for passed values.
If mutable data types are passed (e.g. lists, dicts), they can be modified *in-place*.
```python
def foo(items):
items.append(42) # Modifies the input object
a = [1, 2, 3]
foo(a)
print(a) # [1, 2, 3, 42]
```
**Key point: Functions don't receive a copy of the input arguments.**
### Reassignment vs Modifying
Make sure you understand the subtle difference between modifying a value and reassigning a variable name.
```python
def foo(items):
items.append(42) # Modifies the input object
a = [1, 2, 3]
foo(a)
print(a) # [1, 2, 3, 42]
# VS
def bar(items):
items = [4,5,6] # Reassigns `items` variable
b = [1, 2, 3]
bar(b)
print(b) # [1, 2, 3]
```
*Reminder: Variable assignment never overwrites memory. The name is simply bound to a new value.*
## Exercises
This exercise involves a lot of steps and putting concepts together from past exercises.
The final solution is only about 25 lines of code, but take your time and make sure you understand each part.
A central part of your `report.py` program focuses on the reading of
CSV files. For example, the function `read_portfolio()` reads a file
containing rows of portfolio data and the function `read_prices()`
reads a file containing rows of price data. In both of those
functions, there are a lot of low-level "fiddly" bits and similar
features. For example, they both open a file and wrap it with the
`csv` module and they both convert various fields into new types.
If you were doing a lot of file parsing for real, youd probably want
to clean some of this up and make it more general purpose. That's
our goal.
Start this exercise by creating a new file called `fileparse.py`. This is where we will be doing our work.
### (a) Reading CSV Files
To start, lets just focus on the problem of reading a CSV file into a
list of dictionaries. In the file `fileparse.py`, define a simple
function that looks like this:
```python
# fileparse.py
import csv
def parse_csv(filename):
'''
Parse a CSV file into a list of records
'''
with open(filename) as f:
rows = csv.reader(f)
# Read the file headers
headers = next(rows)
records = []
for row in rows:
if not row: # Skip rows with no data
continue
record = dict(zip(headers, row))
records.append(record)
return records
```
This function reads a CSV file into a list of dictionaries while
hiding the details of opening the file, wrapping it with the `csv`
module, ignoring blank lines, and so forth.
Try it out:
Hint: `python3 -i fileparse.py`.
```pycon
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
This is great except that you cant do any kind of useful calculation with the data because everything is represented as a string.
Well fix this shortly, but lets keep building on it.
### (b) Building a Column Selector
In many cases, youre only interested in selected columns from a CSV file, not all of the data.
Modify the `parse_csv()` function so that it optionally allows user-specified columns to be picked out as follows:
```python
>>> # Read all of the data
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>> # Read some of the data
>>> shares_held = parse_csv('portfolio.csv', select=['name','shares'])
>>> shares_held
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
>>>
```
An example of a column selector was given in Section 2.5.
However, heres one way to do it:
```python
# fileparse.py
import csv
def parse_csv(filename, select=None):
'''
Parse a CSV file into a list of records
'''
with open(filename) as f:
rows = csv.reader(f)
# Read the file headers
headers = next(rows)
# If a column selector was given, find indices of the specified columns.
# Also narrow the set of headers used for resulting dictionaries
if select:
indices = [headers.index(colname) for colname in select]
headers = select
else:
indices = []
records = []
for row in rows:
if not row: # Skip rows with no data
continue
# Filter the row if specific columns were selected
if indices:
row = [ row[index] for index in indices ]
# Make a dictionary
record = dict(zip(headers, row))
records.append(record)
return records
```
There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices.
For example, suppose the input file had the following headers:
```pycon
>>> headers = ['name', 'date', 'time', 'shares', 'price']
>>>
```
Now, suppose the selected columns were as follows:
```pycon
>>> select = ['name', 'shares']
>>>
```
To perform the proper selection, you have to map the selected column names to column indices in the file.
Thats what this step is doing:
```pycon
>>> indices = [headers.index(colname) for colname in select ]
>>> indices
[0, 3]
>>>
```
In other words, "name" is column 0 and "shares" is column 3.
When you read a row of data from the file, the indices are used to filter it:
```pycon
>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
>>> row = [ row[index] for index in indices ]
>>> row
['AA', '100']
>>>
```
### (c) Performing Type Conversion
Modify the `parse_csv()` function so that it optionally allows type-conversions to be applied to the returned data.
For example:
```pycon
>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
>>> shares_held
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
>>>
```
You already explored this in Exercise 2.7. You'll need to insert the
following fragment of code into your solution:
```python
...
if types:
row = [func(val) for func, val in zip(types, row) ]
...
```
### (d) Working with Headers
Some CSV files dont include any header information.
For example, the file `prices.csv` looks like this:
```csv
"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
...
```
Modify the `parse_csv()` function so that it can work with such files by creating a list of tuples instead.
For example:
```python
>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
>>> prices
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
>>>
```
To make this change, youll need to modify the code so that the first
line of data isnt interpreted as a header line. Also, youll need to
make sure you dont create dictionaries as there are no longer any
column names to use for keys.
### (e) Picking a different column delimitier
Although CSV files are pretty common, its also possible that you could encounter a file that uses a different column separator such as a tab or space.
For example, the file `Data/portfolio.dat` looks like this:
```csv
name shares price
"AA" 100 32.20
"IBM" 50 91.10
"CAT" 150 83.44
"MSFT" 200 51.23
"GE" 95 40.37
"MSFT" 50 65.10
"IBM" 100 70.44
```
The `csv.reader()` function allows a different delimiter to be given as follows:
```python
rows = csv.reader(f, delimiter=' ')
```
Modify your `parse_csv()` function so that it also allows the delimiter to be changed.
For example:
```pycon
>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>
```
If youve made it this far, youve created a nice library function thats genuinely useful.
You can use it to parse arbitrary CSV files, select out columns of
interest, perform type conversions, without having to worry too much
about the inner workings of files or the `csv` module.
Nice!
[Next](03_Error_checking)

View File

@@ -0,0 +1,393 @@
# 3.3 Error Checking
This section discusses some aspects of error checking and exception handling.
### How programs fail
Python performs no checking or validation of function argument types or values.
A function will work on any data that is compatible with the statements in the function.
```python
def add(x, y):
return x + y
add(3, 4) # 7
add('Hello', 'World') # 'HelloWorld'
add('3', '4') # '34'
```
If there are errors in a function, they will show up at run time (as an exception).
```python
def add(x, y):
return x + y
>>> add(3, '4')
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +:
'int' and 'str'
>>>
```
To verify code, there is a strong emphasis on testing (covered later).
### Exceptions
Exceptions are used to signal errors.
To raise an exception yourself, use `raise` statement.
```python
if name not in names:
raise RuntimeError('Name not found')
```
To catch an exception use `try-except`.
```python
try:
authenticate(username)
except RuntimeError as e:
print(e)
```
### Exception Handling
Exceptions propagate to the first matching `except`.
```python
def grok():
...
raise RuntimeError('Whoa!') # Exception raised here
def spam():
grok() # Call that will raise exception
def bar():
try:
spam()
except RuntimeError as e: # Exception caught here
...
def foo():
try:
bar()
except RuntimeError as e: # Exception does NOT arrive here
...
foo()
```
To handle the exception, use the `except` block. You can add any statements you want to handle the error.
```python
def grok(): ...
raise RuntimeError('Whoa!')
def bar():
try:
grok()
except RuntimeError as e: # Exception caught here
statements # Use this statements
statements
...
bar()
```
After handling, execution resumes with the first statement after the `try-except`.
```python
def grok(): ...
raise RuntimeError('Whoa!')
def bar():
try:
grok()
except RuntimeError as e: # Exception caught here
statements
statements
...
statements # Resumes execution here
statements # And continues here
...
bar()
```
### Built-in Exceptions
There are about two-dozen built-in exceptions.
This is not an exhaustive list. Check the documentation for more.
```python
ArithmeticError
AssertionError
EnvironmentError
EOFError
ImportError
IndexError
KeyboardInterrupt
KeyError
MemoryError
NameError
ReferenceError
RuntimeError
SyntaxError
SystemError
TypeError
ValueError
```
### Exception Values
Most exceptions have an associated value. It contains more information about what's wrong.
```python
raise RuntimeError('Invalid user name')
```
This value is passed to the variable supplied in `except`.
```python
try:
...
except RuntimeError as e: # `e` holds the value raised
...
```
The value is an instance of the exception type. However, it often looks like a string when
printed.
```python
except RuntimeError as e:
print('Failed : Reason', e)
```
### Catching Multiple Errors
You can catch different kinds of exceptions with multiple `except` blocks.
```python
try:
...
except LookupError as e:
...
except RuntimeError as e:
...
except IOError as e:
...
except KeyboardInterrupt as e:
...
```
Alternatively, if the block to handle them is the same, you can group them:
```python
try:
...
except (IOError,LookupError,RuntimeError) as e:
...
```
### Catching All Errors
To catch any exception, use `Exception` like this:
```python
try:
...
except Exception:
print('An error occurred')
```
In general, writing code like that is a bad idea because you'll have no idea
why it failed.
### Wrong Way to Catch Errors
Here is the wrong way to use exceptions.
```python
try:
go_do_something()
except Exception:
print('Computer says no')
```
This swallows all possible errors. It may make it impossible to debug
when the code is failing for some reason you didn't expect at all
(e.g. uninstalled Python module, etc.).
### Somewhat Better Approach
This is a more sane approach.
```python
try:
go_do_something()
except Exception as e:
print('Computer says no. Reason :', e)
```
It reports a specific reason for failure. It is almost always a good
idea to have some mechanism for viewing/reporting errors when you
write code that catches all possible exceptions.
In general though, it's better to catch the error more narrowly. Only
catch the errors you can actually deal with. Let other errors pass to
other code.
### Reraising an Exception
Use `raise` to propagate a caught error.
```python
try:
go_do_something()
except Exception as e:
print('Computer says no. Reason :', e)
raise
```
It allows you to take action (e.g. logging) and pass the error on to the caller.
### Exception Best Practices
Don't catch exceptions. Fail fast and loud. If it's important, someone
else will take care of the problem. Only catch an exception if you
are *that* someone. That is, only catch errors where you can recover
and sanely keep going.
### `finally` statement
It specifies code that must fun regardless of whether or not an exception occurs.
```python
lock = Lock()
...
lock.acquire()
try:
...
finally:
lock.release() # this will ALWAYS be executed. With and without exception.
```
Comonly used to properly manage resources (especially locks, files, etc.).
### `with` statement
In modern code, `try-finally` often replaced with the `with` statement.
```python
lock = Lock()
with lock:
# lock acquired
...
# lock released
```
A more familiar example:
```python
with open(filename) as f:
# Use the file
...
# File closed
```
It defines a usage *context* for a resource. When execution leaves that context,
resources are released. `with` only works with certain objects.
## Exercises
### (a) Raising exceptions
The `parse_csv()` function you wrote in the last section allows
user-specified columns to be selected, but that only works if the
input data file has column headers.
Modify the code so that an exception gets raised if both the `select`
and `has_headers=False` arguments are passed.
For example:
```python
>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fileparse.py", line 9, in parse_csv
raise RuntimeError("select argument requires column headers")
RuntimeError: select argument requires column headers
>>>
```
Having added this one check, you might ask if you should be performing
other kinds of sanity checks in the function. For example, should you
check that the filename is a string, that types is a list, or anything
of that nature?
As a general rule, its usually best to skip such tests and to just
let the program fail on bad inputs. The traceback message will point
at the source of the problem and can assist in debugging.
The main reason for adding the above check to avoid running the code
in a non-sensical mode (e.g., using a feature that requires column
headers, but simultaneously specifying that there are no headers).
This indicates a programming error on the part of the calling code.
### (b) Catching exceptions
The `parse_csv()` function you wrote is used to process the entire
contents of a file. However, in the real-world, its possible that
input files might have corrupted, missing, or dirty data. Try this
experiment:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fileparse.py", line 36, in parse_csv
row = [func(val) for func, val in zip(types, row)]
ValueError: invalid literal for int() with base 10: ''
>>>
```
Modify the `parse_csv()` function to catch all `ValueError` exceptions
generated during record creation and print a warning message for rows
that cant be converted.
The message should include the row number and information about the reason why it failed.
To test your function, try reading the file `Data/missing.csv` above.
For example:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Row 4: Couldn't convert ['MSFT', '', '51.23']
Row 4: Reason invalid literal for int() with base 10: ''
Row 7: Couldn't convert ['IBM', '', '70.44']
Row 7: Reason invalid literal for int() with base 10: ''
>>>
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>
```
### (c) Silencing Errors
Modify the `parse_csv()` function so that parsing error messages can be silenced if explicitly desired by the user.
For example:
```python
>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True)
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>
```
Error handling is one of the most difficult things to get right in
most programs. As a general rule, you shouldnt silently ignore
errors. Instead, its better to report problems and to give the user
an option to the silence the error message if they choose to do so.
[Next](04_Modules)

View File

@@ -0,0 +1,317 @@
# 3.4 Modules
This section introduces the concept of modules.
### Modules and import
Any Python source file is a module.
```python
# foo.py
def grok(a):
...
def spam(b):
...
```
The `import` statement loads and *executes* a module.
```python
# program.py
import foo
a = foo.grok(2)
b = foo.spam('Hello')
...
```
### Namespaces
A module is a collection of named values and is sometimes said to be a *namespace*.
The names are all of the global variables and functions defined in the source file.
After importing, the module name is used as a prefix. Hence the *namespace*.
```python
import foo
a = foo.grok(2)
b = foo.spam('Hello')
...
```
The module name is tied to the file name (foo -> foo.py).
### Global Definitions
Everything defined in the *global* scope is what populates the module
namespace. `foo` in our previous example. Consider two modules
that define the same variable `x`.
```python
# foo.py
x = 42
def grok(a):
...
```
```python
# bar.py
x = 37
def spam(a):
...
```
In this case, the `x` definitions refer to different variables. One
is `foo.x` and the other is `bar.x`. Different modules can use the
same names and those names won't conflict with each other.
**Modules are isolated.**
### Modules as Environments
Modules form an enclosing environment for all of the code defined inside.
```python
# foo.py
x = 42
def grok(a):
print(x)
```
*Global* variables are always bound to the enclosing module (same file).
Each source file is its own little universe.
### Module Execution
When a module is imported, *all of the statements in the module
execute* one after another until the end of the file is reached. The
contents of the module namespace are all of the *global* names that
are still defined at the end of the execution process. If there are
scripting statements that carry out tasks in the global scope
(printing, creating files, etc.) you will see them run on import.
### `import as` statement
You can change the name of a module as you import it:
```python
import math as m
def rectangular(r, theta):
x = r * m.cos(theta)
y = r * m.sin(theta)
return x, y
```
It works the same as a normal import. It just renames the module in that one file.
### `from` module import
This picks selected symbols out of a module and makes them available locally.
```python
from math import sin, cos
def rectangular(r, theta):
x = r * cos(theta)
y = r * sin(theta)
return x, y
```
It allows parts of a module to be used without having to type the module prefix.
Useful for frequently used names.
### Comments on importing
Variations on import do *not* change the way that modules work.
```python
import math as m
# vs
from math import cos, sin
...
```
Specifically, `import` always executes the *entire* file and modules
are still isolated environments.
The `import module as` statement is only manipulating the names.
### Module Loading
Each module loads and executes only *once*.
*Note: Repeated imports just return a reference to the previously loaded module.*
`sys.modules` is a dict of all loaded modules.
```python
>>> import sys
>>> sys.modules.keys()
['copy_reg', '__main__', 'site', '__builtin__', 'encodings', 'encodings.encodings', 'posixpath', ...]
>>>
```
### Locating Modules
Python consults a path list (sys.path) when looking for modules.
```python
>>> import sys
>>> sys.path
[
'',
'/usr/local/lib/python36/python36.zip',
'/usr/local/lib/python36',
...
]
```
Current working directory is usually first.
### Module Search Path
`sys.path` contains the search paths.
You can manually adjust if you need to.
```python
import sys
sys.path.append('/project/foo/pyfiles')
```
Paths are also added via environment variables.
```python
% env PYTHONPATH=/project/foo/pyfiles python3
Python 3.6.0 (default, Feb 3 2017, 05:53:21)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)]
>>> import sys
>>> sys.path
['','/project/foo/pyfiles', ...]
```
## Exercises
For this exercise involving modules, it is critically important to
make sure you are running Python in a proper environment. Modules
are usually when programmers encounter problems with the current working
directory or with Python's path settings.
### (a) Module imports
In section 3, we created a general purpose function `parse_csv()` for parsing the contents of CSV datafiles.
Now, were going to see how to use that function in other programs.
First, start in a new shell window. Navigate to the folder where you
have all your files. We are going to import them.
Start Python interactive mode.
```shell
bash % python3
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
```
Once youve done that, try importing some of the programs you
previously wrote. You should see their output exactly as before.
Just emphasize, importing a module runs its code.
```python
>>> import bounce
... watch output ...
>>> import mortgage
... watch output ...
>>> import report
... watch output ...
>>>
```
If none of this works, youre probably running Python in the wrong directory.
Now, try importing your `fileparse` module and getting some help on it.
```python
>>> import fileparse
>>> help(fileparse)
... look at the output ...
>>> dir(fileparse)
... look at the output ...
>>>
```
Try using the module to read some data:
```python
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv',select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>> pricelist = fileparse.parse_csv('Data/prices.csv',types=[str,float], has_headers=False)
>>> pricelist
... look at the output ...
>>> prices = dict(pricelist)
>>> prices
... look at the output ...
>>> prices['IBM']
106.11
>>>
```
Try importing a function so that you dont need to include the module name:
```python
>>> from fileparse import parse_csv
>>> portfolio = parse_csv('Data/portfolio.csv', select=['name','shares','price'], types=[str,int,float])
>>> portfolio
... look at the output ...
>>>
```
### (b) Using your library module
In section 2, you wrote a program `report.py` that produced a stock report like this:
```shell
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
```
Take that program and modify it so that all of the input file
processing is done using functions in your `fileparse` module. To do
that, import `fileparse` as a module and change the `read_portfolio()`
and `read_prices()` functions to use the `parse_csv()` function.
Use the interactive example at the start of this exercise as a guide.
Afterwards, you should get exactly the same output as before.
### (c) Using more library imports
In section 1, you wrote a program `pcost.py` that read a portfolio and computed its cost.
```python
>>> import pcost
>>> pcost.portfolio_cost('Data/portfolio.csv')
44671.15
>>>
```
Modify the `pcost.py` file so that it uses the `report.read_portfolio()` function.
### Commentary
When you are done with this exercise, you should have three
programs. `fileparse.py` which contains a general purpose
`parse_csv()` function. `report.py` which produces a nice report, but
also contains `read_portfolio()` and `read_prices()` functions. And
finally, `pcost.py` which computes the portfolio cost, but makes use
of the code written for the `report.py` program.
[Next](05_Main_module)

View File

@@ -0,0 +1,299 @@
# 3.5 Main Module
This section introduces the concept of a main program or main module.
### Main Functions
In many programming languages, there is a concept of a *main* function or method.
```c
// c / c++
int main(int argc, char *argv[]) {
...
}
```
```java
// java
class myprog {
public static void main(String args[]) {
...
}
}
```
This is the first function that is being executing when an application is launched.
### Python Main Module
Python has no *main* function or method. Instead, there is a *main*
module. The *main module* is the source file that runs first.
```bash
bash % python3 prog.py
...
```
Whatever module you give to the interpreter at startup becomes *main*. It doesn't matter the name.
### `__main__` check
It is standard practice for modules that can run as a main script to use this convention:
```python
# prog.py
...
if __name__ == '__main__':
# Running as the main program ...
statements
...
```
Statements inclosed inside the `if` statement become the *main* program.
### Main programs vs. library imports
Any file can either run as main or as a library import:
```bash
bash % python3 prog.py # Running as main
```
```python
import prog
```
In both cases, `__name__` is the name of the module. However, it will only be set to `__main__` if
running as main.
As a general rule, you don't want statements that are part of the main
program to execute on a library import. So, it's common to have an `if-`check in code
that might be used either way.
```python
if __name__ == '__main__':
# Does not execute if loaded with import ...
```
### Program Template
Here is a common program template for writing a Python program:
```python
# prog.py
# Import statements (libraries)
import modules
# Functions
def spam():
...
def blah():
...
# Main function
def main():
...
if __name__ == '__main__':
main()
```
### Command Line Tools
Python is often used for command-line tools
```bash
bash % python3 report.py portfolio.csv prices.csv
```
It means that the scripts are executed from the shell /
terminal. Common use cases are for automation, background tasks, etc.
### Command Line Args
The command line is a list of text strings.
```bash
bash % python3 report.py portfolio.csv prices.csv
```
This list of text strings is found in `sys.argv`.
```python
# In the previous bash command
sys.argv # ['report.py, 'portfolio.csv', 'prices.csv']
```
Here is a simple example of processing the arguments:
```python
import sys
if len(sys.argv) != 3:
raise SystemExit(f'Usage: {sys.argv[0]} ' 'portfile pricefile')
portfile = sys.argv[1]
pricefile = sys.argv[2]
...
```
### Standard I/O
Standard Input / Output (or stdio) are files that work the same as normal files.
```python
sys.stdout
sys.stderr
sys.stdin
```
By default, print is directed to `sys.stdout`. Input is read from
`sys.stdin`. Tracebacks and errors are directed to `sys.stderr`.
Be aware that *stdio* could be connected to terminals, files, pipes, etc.
```bash
bash % python3 prog.py > results.txt
# or
bash % cmd1 | python3 prog.py | cmd2
```
### Environment Variables
Environment variables are set in the shell.
```bash
bash % setenv NAME dave
bash % setenv RSH ssh
bash % python3 prog.py
```
`os.environ` is a dictionary that contains these values.
```python
import os
name = os.environ['NAME'] # 'dave'
```
Changes are reflected in any subprocesses later launched by the program.
### Program Exit
Program exit is handled through exceptions.
```python
raise SystemExit
raise SystemExit(exitcode)
raise SystemExit('Informative message')
```
An alternative.
```python
import sys
sys.exit(exitcode)
```
A non-zero exit code indicates an error.
### The `#!` line
On Unix, the `#!` line can launch a script as Python.
Add the following to the first line of your script file.
```python
#!/usr/bin/env python3
# prog.py
...
```
It requires the executable permission.
```bash
bash % chmod +x prog.py
# Then you can execute
bash % prog.py
... output ...
```
*Note: The Python Launcher on Windows also looks for the `#!` line to indicate language version.*
### Script Template
Here is a common code template for Python programs that run as command-line scripts:
```python
#!/usr/bin/env python3
# prog.py
# Import statements (libraries)
import modules
# Functions
def spam():
...
def blah():
...
# Main function
def main(argv):
# Parse command line args, environment, etc.
...
if __name__ == '__main__':
import sys
main(sys.argv)
```
## Exercises
### (a) `main()` functions
In the file `report.py` add a `main()` function that accepts a list of command line options and produces the same output as before.
You should be able to run it interatively like this:
```python
>>> import report
>>> report.main(['report.py', 'Data/portfolio.csv', 'Data/prices.csv'])
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
>>>
```
Modify the `pcost.py` file so that it has a similar `main()` function:
```python
>>> import pcost
>>> pcost.main(['pcost.py', 'Data/portfolio.csv'])
Total cost: 44671.15
>>>
```
### (b) Making Scripts
Modify the `report.py` and `pcost.py` programs so that they can execute as a script on the command line:
```bash
bash $ python3 report.py Data/portfolio.csv Data/prices.csv
Name Shares Price Change
---------- ---------- ---------- ----------
AA 100 39.91 7.71
IBM 50 106.11 15.01
CAT 150 78.58 -4.86
MSFT 200 30.47 -20.76
GE 95 37.38 -2.99
MSFT 50 30.47 -34.63
IBM 100 106.11 35.67
bash $ python3 pcost.py Data/portfolio.csv
Total cost: 44671.15
```

View File

@@ -0,0 +1,132 @@
# 3.6 Design Discussion
In this section we consider some design decisions made in code so far.
### Filenames versus Iterables
Compare these two programs that return the same output.
```python
# Provide a filename
def read_data(filename):
records = []
with open(filename) as f:
for line in f:
...
records.append(r)
return records
d = read_data('file.csv')
```
```python
# Provide lines
def read_data(lines):
records = []
for line in lines:
...
records.append(r)
return records
with open('file.csv') as f:
d = read_data(f)
```
* Which of these functions do you prefer? Why?
* Which of these functions is more flexible?
### Deep Idea: "Duck Typing"
[Duck Typing](https://en.wikipedia.org/wiki/Duck_typing) is a computer programming concept to determine whether an object can be used for a particular purpose. It is an application of the [duck test](https://en.wikipedia.org/wiki/Duck_test).
> If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In our previous example that reads the lines, our `read_data` expects
any iterable object. Not just the lines of a file.
```python
def read_data(lines):
records = []
for line in lines:
...
records.append(r)
return records
```
This means that we can use it with other *lines*.
```python
# A CSV file
lines = open('data.csv')
data = read_data(lines)
# A zipped file
lines = gzip.open('data.csv.gz','rt')
data = read_data(lines)
# The Standard Input
lines = sys.stdin
data = read_data(lines)
# A list of strings
lines = ['ACME,50,91.1','IBM,75,123.45', ... ]
data = read_data(lines)
```
There is considerable flexibility with this design.
*Question: Shall we embrace or fight this flexibility?*
### Library Design Best Practices
Code libraries are often better served by embracing flexibility.
Don't restrict your options. With great flexibility comes great power.
## Exercise
### (a)From filenames to file-like objects
In this section, you worked on a file `fileparse.py` that contained a
function `parse_csv()`. The function worked like this:
```pycon
>>> import fileparse
>>> portfolio = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>>
```
Right now, the function expects to be passed a filename. However, you
can make the code more flexible. Modify the function so that it works
with any file-like/iterable object. For example:
```
>>> import fileparse
>>> import gzip
>>> with gzip.open('Data/portfolio.csv.gz', 'rt') as f:
... port = fileparse.parse_csv(f, types=[str,int,float])
...
>>> lines = ['name,shares,price', 'AA,34.23,100', 'IBM,50,91.1', 'HPE,75,45.1']
>>> port = fileparse.parse_csv(lines, types=[str,int,float])
>>>
```
In this new code, what happens if you pass a filename as before?
```
>>> port = fileparse.parse_csv('Data/portfolio.csv', types=[str,int,float])
>>> port
... look at output (it should be crazy) ...
>>>
```
With flexibility comes power and with power comes responsibility. Sometimes you'll
need to be careful.
### (b) Fixing existing functions
Fix the `read_portfolio()` and `read_prices()` functions in the
`report.py` file so that they work with the modified version of
`parse_csv()`. This should only involve a minor modification.
Afterwards, your `report.py` and `pcost.py` programs should work
the same way they always did.