Added sections 2-4

2020-05-25 12:40:11 -05:00
parent 7a4423dee4
commit c06face3a5
21 changed files with 5639 additions and 0 deletions
--- a/Notes/02_Working_with_data/04_Sequences.md
+++ b/Notes/02_Working_with_data/04_Sequences.md
@@ -0,0 +1,538 @@
+# 2.4 Sequences
+
+In this part, we look at some common idioms for working with sequence data.
+
+### Introduction
+
+Python has three *sequences* datatypes.
+
+* String: `'Hello'`. A string is considered a sequence of characters.
+* List: `[1, 4, 5]`.
+* Tuple: `('GOOG', 100, 490.1)`.
+
+All sequences are ordered and have length.
+
+```python
+a = 'Hello'               # String
+b = [1, 4, 5]             # List
+c = ('GOOG', 100, 490.1)  # Tuple
+
+# Indexed order
+a[0]                      # 'H'
+b[-1]                     # 5
+c[1]                      # 100
+
+# Length of sequence
+len(a)                    # 5
+len(b)                    # 3
+len(c)                    # 3
+```
+
+Sequences can be replicated: `s * n`.
+
+```pycon
+>>> a = 'Hello'
+>>> a * 3
+'HelloHelloHello'
+>>> b = [1, 2, 3]
+>>> b * 2
+[1, 2, 3, 1, 2, 3]
+>>>
+```
+
+Sequences of the same type can be concatenated: `s + t`.
+
+```pycon
+>>> a = (1, 2, 3)
+>>> b = (4, 5)
+>>> a + b
+(1, 2, 3, 4, 5)
+>>>
+>>> c = [1, 5]
+>>> a + c
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: can only concatenate tuple (not "list") to tuple
+```
+
+### Slicing
+
+Slicing means to take a subsequence from a sequence.
+The syntax used is `s[start:end]`. Where `start` and `end` are the indexes of the subsequence you want.
+
+```python
+a = [0,1,2,3,4,5,6,7,8]
+
+a[2:5]    # [2,3,4]
+a[-5:]    # [4,5,6,7,8]
+a[:3]     # [0,1,2]
+```
+
+* Indices `start` and `end` must be integers.
+* Slices do *not* include the end value.
+* If indices are omitted, they default to the beginning or end of the list.
+
+### Slice re-assignment
+
+Slices can also be reassigned and deleted.
+
+```python
+# Reassignment
+a = [0,1,2,3,4,5,6,7,8]
+a[2:4] = [10,11,12]       # [0,1,10,11,12,4,5,6,7,8]
+```
+
+*Note: The reassigned slice doesn't need to have the same length.*
+
+```python
+# Deletion
+a = [0,1,2,3,4,5,6,7,8]
+del a[2:4]                # [0,1,4,5,6,7,8]
+```
+
+### Sequence Reductions
+
+There are some functions to reduce a sequence to a single value.
+
+```pycon
+>>> s = [1, 2, 3, 4]
+>>> sum(s)
+10
+>>> min(s) 1
+>>> max(s) 4
+>>> t = ['Hello', 'World']
+>>> max(t)
+'World'
+>>>
+```
+
+### Iteration over a sequence
+
+The for-loop iterates over the elements in the sequence.
+
+```pycon
+>>> s = [1, 4, 9, 16]
+>>> for i in s:
+...     print(i)
+...
+1
+4
+9
+16
+>>>
+```
+
+On each iteration of the loop, you get a new item to work with.
+This new value is placed into an iteration variable. In this example, the
+iteration variable is `x`:
+
+```python
+for x in s:         # `x` is an iteration variable
+    ...statements
+```
+
+In each iteration, it overwrites the previous value (if any).
+After the loop finishes, the variable retains the last value.
+
+### `break` statement
+
+You can use the `break` statement to break out of a loop before it finishes iterating all of the elements.
+
+```python
+for name in namelist:
+    if name == 'Jake':
+        break
+    ...
+    ...
+statements
+```
+
+When the `break` statement is executed, it will exit the loop and move
+on the next `statements`.  The `break` statement only applies to the
+inner-most loop. If this loop is within another loop, it will not
+break the outer loop.
+
+### `continue` statement
+
+To skip one element and move to the next one you use the `continue` statement.
+
+```python
+for line in lines:
+    if line == '\n':    # Skip blank lines
+        continue
+    # More statements
+    ...
+```
+
+This is useful when the current item is not of interest or needs to be ignored in the processing.
+
+### Looping over integers
+
+If you need to count, use `range()`.
+
+```python
+for i in range(100):
+    # i = 0,1,...,99
+```
+
+The syntax is `range([start,] end [,step])`
+
+```python
+for i in range(100):
+    # i = 0,1,...,99
+for j in range(10,20):
+    # j = 10,11,..., 19
+for k in range(10,50,2):
+    # k = 10,12,...,48
+    # Notice how it counts in steps of 2, not 1.
+```
+
+* The ending value is never included. It mirrors the behavior of slices.
+* `start` is optional. Default `0`.
+* `step` is optional. Default `1`.
+
+### `enumerate()` function
+
+The `enumerate` function provides a loop with an extra counter value.
+
+```python
+names = ['Elwood', 'Jake', 'Curtis']
+for i, name in enumerate(names):
+    # Loops with i = 0, name = 'Elwood'
+    # i = 1, name = 'Jake'
+    # i = 2, name = 'Curtis'
+```
+
+How to use enumerate: `enumerate(sequence [, start = 0])`. `start` is optional.
+A good example of using `enumerate()` is tracking line numbers while reading a file:
+
+```python
+with open(filename) as f:
+    for lineno, line in enumerate(f, start=1):
+        ...
+```
+
+In the end, `enumerate` is just a nice shortcut for:
+
+```python
+i = 0
+for x in s:
+    statements
+    i += 1
+```
+
+Using `enumerate` is less typing and runs slightly faster.
+
+### For and tuples
+
+You can loop with multiple iteration variables.
+
+```python
+points = [
+  (1, 4),(10, 40),(23, 14),(5, 6),(7, 8)
+]
+for x, y in points:
+    # Loops with x = 1, y = 4
+    #            x = 10, y = 40
+    #            x = 23, y = 14
+    #            ...
+```
+
+When using multiple variables, each tuple will be *unpacked* into a set of iteration variables.
+
+### `zip()` function
+
+The `zip` function takes sequences and makes an iterator that combines them.
+
+```python
+columns = ['name', 'shares', 'price']
+values = ['GOOG', 100, 490.1 ]
+pairs = zip(a, b)
+# ('name','GOOG'), ('shares',100), ('price',490.1)
+```
+
+To get the result you must iterate. You can use multiple variables to unpack the tuples as shown earlier.
+
+```python
+for column, value in pairs:
+    ...
+```
+
+A common use of `zip` is to create key/value pairs for constructing dictionaries.
+
+```python
+d = dict(zip(columns, values))
+```
+
+## Exercises
+
+### (a) Counting
+
+Try some basic counting examples:
+
+```pycon
+>>> for n in range(10):            # Count 0 ... 9
+        print(n, end=' ')
+
+0 1 2 3 4 5 6 7 8 9
+>>> for n in range(10,0,-1):       # Count 10 ... 1
+        print(n, end=' ')
+
+10 9 8 7 6 5 4 3 2 1
+>>> for n in range(0,10,2):        # Count 0, 2, ... 8
+        print(n, end=' ')
+
+0 2 4 6 8
+>>>
+```
+
+### (b) More sequence operations
+
+Interactively experiment with some of the sequence reduction operations.
+
+```pycon
+>>> data = [4, 9, 1, 25, 16, 100, 49]
+>>> min(data)
+1
+>>> max(data)
+100
+>>> sum(data)
+204
+>>>
+```
+
+Try looping over the data.
+
+```pycon
+>>> for x in data:
+        print(x)
+
+4
+9
+...
+>>> for n, x in enumerate(data):
+        print(n, x)
+
+0 4
+1 9
+2 1
+...
+>>>
+```
+
+Sometimes the `for` statement, `len()`, and `range()` get used by
+novices in some kind of horrible code fragment that looks like it
+emerged from the depths of a rusty C program.
+
+```pycon
+>>> for n in range(len(data)):
+        print(data[n])
+
+4
+9
+1
+...
+>>>
+```
+
+Don’t do that! Not only does reading it make everyone’s eyes bleed, it’s inefficient with memory and it runs a lot slower.
+Just use a normal `for` loop if you want to iterate over data.  Use `enumerate()` if you happen to need the index for some reason.
+
+### (c) A practical `enumerate()` example
+
+Recall that the file `Data/missing.csv` contains data for a stock portfolio, but has some rows with missing data.
+Using `enumerate()` modify your `pcost.py` program so that it prints a line number with the warning message when it encounters bad input.
+
+```python
+>>> cost = portfolio_cost('Data/missing.csv')
+Row 4: Couldn't convert: ['MSFT', '', '51.23']
+Row 7: Couldn't convert: ['IBM', '', '70.44']
+>>>
+```
+
+To do this, you’ll need to change just a few parts of your code.
+
+```python
+...
+for rowno, row in enumerate(rows, start=1):
+    try:
+        ...
+    except ValueError:
+        print(f'Row {rowno}: Bad row: {row}')
+```
+
+### (d) Using the `zip()` function
+
+In the file `portfolio.csv`, the first line contains column headers. In all previous code, we’ve been discarding them.
+
+```pycon
+>>> f = open('Data/portfolio.csv')
+>>> rows = csv.reader(f)
+>>> headers = next(rows)
+>>> headers
+['name', 'shares', 'price']
+>>>
+```
+
+However, what if you could use the headers for something useful? This is where the `zip()` function enters the picture.
+First try this to pair the file headers with a row of data:
+
+```pycon
+>>> row = next(rows)
+>>> row
+['AA', '100', '32.20']
+>>> list(zip(headers, row))
+[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
+>>>
+```
+
+Notice how `zip()` paired the column headers with the column values.
+We’ve used `list()` here to turn the result into a list so that you
+can see it. Normally, `zip()` creates an iterator that must be
+consumed by a for-loop.
+
+This pairing is just an intermediate step to building a dictionary. Now try this:
+
+```pycon
+>>> record = dict(zip(headers, row))
+>>> record
+{'price': '32.20', 'name': 'AA', 'shares': '100'}
+>>>
+```
+
+This transformation is one of the most useful tricks to know about
+when processing a lot of data files.  For example, suppose you wanted
+to make the `pcost.py` program work with various input files, but
+without regard for the actual column number where the name, shares,
+and price appear.
+
+Modify the `portfolio_cost()` function in `pcost.py` so that it looks like this:
+
+```python
+# pcost.py
+
+def portfolio_cost(filename):
+    ...
+        for rowno, row in enumerate(rows, start=1):
+            record = dict(zip(headers, row))
+            try:
+                nshares = int(record['shares'])
+                price = float(record['price'])
+                total_cost += nshares * price
+            # This catches errors in int() and float() conversions above
+            except ValueError:
+                print(f'Row {rowno}: Bad row: {row}')
+        ...
+```
+
+Now, try your function on a completely different data file `Data/portfoliodate.csv` which looks like this:
+
+```csv
+name,date,time,shares,price
+"AA","6/11/2007","9:50am",100,32.20
+"IBM","5/13/2007","4:20pm",50,91.10
+"CAT","9/23/2006","1:30pm",150,83.44
+"MSFT","5/17/2007","10:30am",200,51.23
+"GE","2/1/2006","10:45am",95,40.37
+"MSFT","10/31/2006","12:05pm",50,65.10
+"IBM","7/9/2006","3:15pm",100,70.44
+```
+
+```python
+>>> portfolio_cost('Data/portfoliodate.csv')
+44671.15
+>>>
+```
+
+If you did it right, you’ll find that your program still works even
+though the data file has a completely different column format than
+before. That’s cool!
+
+The change made here is subtle, but significant.  Instead of
+`portfolio_cost()` being hardcoded to read a single fixed file format,
+the new version reads any CSV file and picks the values of interest
+out of it.  As long as the file has the required columns, the code will work.
+
+Modify the `report.py` program you wrote in Section 2.3 that it uses
+the same technique to pick out column headers.
+
+Try running the `report.py` program on the `Data/portfoliodate.csv` file and see that it
+produces the same answer as before.
+
+### (e) Inverting a dictionary
+
+A dictionary maps keys to values. For example, a dictionary of stock prices.
+
+```pycon
+>>> prices = {
+        'GOOG' : 490.1,
+        'AA' : 23.45,
+        'IBM' : 91.1,
+        'MSFT' : 34.23
+    }
+>>>
+```
+
+If you use the `items()` method, you can get `(key,value)` pairs:
+
+```pycon
+>>> prices.items()
+dict_items([('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)])
+>>>
+```
+
+However, what if you wanted to get a list of `(value, key)` pairs instead?
+*Hint: use `zip()`.*
+
+```pycon
+>>> pricelist = list(zip(prices.values(),prices.keys()))
+>>> pricelist
+[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
+>>>
+```
+
+Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data.
+
+```pycon
+>>> min(pricelist)
+(23.45, 'AA')
+>>> max(pricelist)
+(490.1, 'GOOG')
+>>> sorted(pricelist)
+[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
+>>>
+```
+
+This also illustrates an important feature of tuples. When used in
+comparisons, tuples are compared element-by-element starting with the
+first item. Similar to how strings are compared
+character-by-character.
+
+`zip()` is often used in situations like this where you need to pair
+up data from different places.  For example, pairing up the column
+names with column values in order to make a dictionary of named
+values.
+
+Note that `zip()` is not limited to pairs. For example, you can use it
+with any number of input lists:
+
+```pycon
+>>> a = [1, 2, 3, 4]
+>>> b = ['w', 'x', 'y', 'z']
+>>> c = [0.2, 0.4, 0.6, 0.8]
+>>> list(zip(a, b, c))
+[(1, 'w', 0.2), (2, 'x', 0.4), (3, 'y', 0.6), (4, 'z', 0.8))]
+>>>
+```
+
+Also, be aware that `zip()` stops once the shortest input sequence is exhausted.
+
+```pycon
+>>> a = [1, 2, 3, 4, 5, 6]
+>>> b = ['x', 'y', 'z']
+>>> list(zip(a,b))
+[(1, 'x'), (2, 'y'), (3, 'z')]
+>>>
+```
+
+[Next](05_Collections)