Programming with Python
Data types
Learning Objectives
- Identify built-in data types in Python
- Differentiate between scalar and structured objects
- Recognize mutable and immutable objects
- Convert between data types in Python
Before we go much further into numerical modeling, we should stop and discuss some of the inner workings of Python. Recognizing the way values can be handled by Python will give you flexibility in programming and help you avoid common errors.
Early in the previous lesson, we saw that we could assign a value to a variable using the symbol =
:
elevation_ft = 5430 # elevation of Boulder, CO in feet
The variable name elevation_ft
is not itself the value 5430. It is simply a label that points to a place in the memory where the object with the value 5430
is stored.
This is different from the way the symbol = is used in algebra. An equation like this one represents different things in Python and in algebra:
x = 4 + 1
In both cases, the letter ‘x’ corresponds to the value 5. In algebra, ‘x’ is equivalent to 5; the symbol is simply taking the place of the number. In Python, ‘x’ is not itself 5; it is a name that points to an object with a value of 5. The variable name ‘x’ is short-hand for the address where the object is stored in the memory.
Objects are classified into different classes or data types that define the kinds of things that a program can do with those objects. An integer (like 5430
above) is one type of object, the string “Hello, World!” is also an object, and the numpy array of elevation values in the previous lesson was another type of object.
Scalar objects
Objects are either scalar or non-scalar. Scalar objects are the building blocks of data. They hold a single value and cannot be divided. Non-scalar objects hold sets of elements within some internal structure. Computers operate directly on scalar objects but have to iterate through the elements of a non-scalar object in order to process it.
The term scalar comes from linear algebra, where it is used to differentiate a single number from a vector or matrix.
- Integers
We can use the built-in function type
to see what type a particular object is:
type(5430)
int
The number 5430
is an object of type int, or integer. We can also use type
see the type of object that the variable is assigned to:
type(elevation_ft)
int
The variable name elevation_ft
is assigned to an object with the value 5430
, which is of type int. Integer is one of several built-in data types in Python. Because they are built in, we don’t need to load a library to use them.
- Floats
Real numbers (potentially with decimals) are floating point numbers or floats:
elevation_m = 1655.064 # elevation of Boulder, CO in meters
type(elevation_m)
float
A number doesn’t need to have meaningful fractional part to be a float. Just adding a decimal point to a whole number makes it a float:
print '7 is', type(7)
print '-' * 20
print '7. is', type(7.)
print '7.0 is', type(7.0)
7 is <type 'int'>
--------------------
7. is <type 'float'>
7.0 is <type 'float'>
- Booleans
Other types of objects in Python are a bit more unusual. Boolean objects can take one of two values: True or False. We will see in a later lesson that boolean objects are produced by operations that compare values against one another and by conditional statements.
You’ll notice that the words True and False change color when you type them into a Jupyter Notebook. They look different because they are recognized as special keywords. This only works when True and False are capitalized, though! Python does not treat lower case true and false as boolean objects.
i_like_chocolate = True
type(i_like_chocolate)
bool
When used in an arithmetic operation, a boolean object acts like an integer. True takes a value of 1 and False a value of 0:
print '3 * True:', 3 * True
print '3.0 * True:', 3.0 * True
print '3.0 * False:', 3.0 * False
3 * True: 3
3.0 * True: 3.0
3.0 * False: 0.0
We can cast objects of any type into a boolean using the function bool():
print bool(127.3)
True
- NoneType
The most abstract of data types in Python is the NoneType. NoneType objects can only contain the special constant None. None
is the value that an object takes when no value is set or in the absence of a value. None
is a null or NoData value. It is not the same as False, it is not 0 and it is not an empty string. None
is nothing.
If you compare None
to anything other than None
, None
will always be less than the other value (In Python 3, comparing None
to another object will instead produce an error):
nothing = None
print type(nothing)
print nothing > -4
print nothing == nothing # double == compares for equivalency
<type 'NoneType'>
False
True
Why would you ever want to create an object that contains nothing at all? As you build more complex programs, you’ll find many situations where you might want to set a variable but don’t want to assign a value to it quite yet. For example, you might want your code to perform one action if the user sets a certain variable but perform a different action if the user does nothing:
input_from_user = None
# The user might or might not provide input here.
# If the user provides input, the value would be
# assigned to the variable input_from_user
if input_from_user is None:
print "The user hasn't said anything!"
if input_from_user is not None:
print "The user said:", input_from_user
The user hasn't said anything!
Try assigning an object of a different type to input_from_user
to see how the script behaves.
Numeric data types
What type of object are these values?
- 5.6
- 1932
- 7.0000
Solution
- float
- int
- float
- float
Casting and integer division
Think about the operations that occur when running the following statements. Why are their outputs different?
print 'a:', 100/3
print 'b:', float(100)/3
print 'c:', 100/float(3)
print 'd:', float(100/3)
Solution
a: 33
b: 33.3333333333
c: 33.3333333333
d: 33.0
- Dividing two integers results in an integer (b),(c) Casting either the dividend or divisor as a float will mean that it is no longer integer division
- The function
float()
is acting on the output of integer division. The remainder has already been discarded.
Lemonade sales
You get hired to work for a highly successful lemonade stand. Their database is managed by a 7-year-old, though, so their data is a mess. These are their sales reports for FY2017:
sales_1q = ["50.3"] # thousand dollars
sales_2q = 108.52
sales_3q = 79
sales_4q = "82"
- Calculate the total sales for FY2017
Solution
total_sales = float(sales_1q[0]) + sales_2q + sales_3q + float(sales_4q)
print 'Total lemonade sales:', total_sales, 'thousand dollars'
Total lemonade sales: 319.82 thousand dollars
Casting bool
Any type of object can be cast to a boolean with the function bool(). Which of these objects converts to True and which to False?
- a negative float
- None
- the boolean object True
- the integer 0
- the float 0
- the string ‘string’
- an empty string
- a string that contains only a space
- 3e-324
- 2e-324
- a list with one item
- an empty list
Solution
- a negative float: True
- None: False
- the boolean object True: True
- the integer 0: False
- the float 0.0: False
- the string ‘string’: True
- an empty string: False
- a string that contains only a space: True
- 3e-324: True
- 2e-324: False
- a list with one item: True
- an empty list: False
Non-scalar (or structured) objects
Non-scalar objects contain multiple elements that can be separated into parts. Since they have an internal structure, we can use indexing to access the individual parts of a non-scalar object.
There are several built-in types of non-scalar objects in Python. They can be grouped according to their internal structure:
Sequences are structured objects where elements are kept in a known order. We use integer indexing and slicing to access elements based on their position.
Mapping objects map keys to values. Because the elements of a mapping object are not stored in order, we cannot select them based on their position. Instead, the keys serve as indices.
The differences between the two groups will make more sense after looking at some examples.
Sequences
- Strings
Objects of type string are simply sequences of characters with a defined order. Strings have to be enclosed in sigle quotes (‘’), double quotes (" “), triple single or double quotes (‘’’ ‘’’,”“” “”“), or single quotes within double quotes (”‘’“):
print type("The judge said 'Nobody expects the Spanish Inquisition!'")
<type 'str'>
We can cast objects of any type into strings with the function str():
str(bool(6 < 2))
'False'
We can test if a sub string exists within a string or not using the keyword in:
print 'a' in 'program'
print 'at' not in 'battle'
True
False
There are many methods available for objects of type string:
string = "if it's in caps i'm trying to YELL!"
print string.lower()
print string.upper()
print string.capitalize()
print string.split()
print string.replace('YELL', 'fix my keyboard')
if it's in caps i'm trying to yell!
IF IT'S IN CAPS I'M TRYING TO YELL!
If it's in caps i'm trying to yell!
['if', "it's", 'in', 'caps', "i'm", 'trying', 'to', 'YELL!']
if it's in caps i'm trying to fix my keyboard!
- Lists
A list is exactly what it sounds like – a sequence of things. The objects contained in a list don’t have to be of the same type: one list can simultaneously contain numbers, strings, other lists, numpy arrays, and even commands to run. Like other sequences, lists are ordered. We can access the individual items in a list through an integer index.
Lists are created by putting values, separated by commas, inside square brackets:
shopping_list = ['funions', 'ice cream', 'guacamole']
We can change the individual values in a list using indexing:
shopping_list[0] = 'funyuns' # oops
print shopping_list
['funyuns', 'ice cream', 'guacamole']
There are many ways to change the contents of lists besides assigning new values to individual elements:
shopping_list.append('tortilla chips') # add one item
print shopping_list
['funyuns', 'ice cream', 'guacamole', 'tortilla chips']
del shopping_list[0] # delete the first item
print shopping_list
['ice cream', 'guacamole', 'tortilla chips']
shopping_list.reverse() # reverse the order of the list (in place)
print shopping_list
['tortilla chips', 'guacamole', 'ice cream']
We can use operators to concatenate lists or build lists with repeated elements:
shopping_list = shopping_list + ['coffee', 'cheese']
print shopping_list
['tortilla chips', 'guacamole', 'ice cream', 'coffee', 'cheese']
print 3 * shopping_list[-1:]
['cheese', 'cheese', 'cheese']
There is one very important difference between lists and strings: lists can be modified in place while strings cannot.
- Tuples
Like lists, tuples are simply sequences of objects. Tuples, however, are immutable objects. We can only change the values in a tuple by assigning the variable name to a new object.
Tuples are created by putting values in a sequence, separated by commas. For easier reading, they are usually inside parentheses:
things = ('toy cars', 42, 'dinosaur')
print type(things)
<type 'tuple'>
Because they are sequences, we can use indexing to access individual values in a tuple:
print things[0]
toy cars
However, because they are immutable objects, we cannot use indexing to change the values of a tuple:
things[0] = 'toy airplanes'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-151-546e5c83c872> in <module>()
----> 1 things[0] = 'toy airplanes'
TypeError: 'tuple' object does not support item assignment
Mapping types
- Dictionaries
Because values in sequences are stored a known order, individual values in sequence-type objects can be accessed by their position through integer indices. Dictionaries are a type of object where values are not stored in any particular order. Dictionaries are unordered collections of key:value pairs. They map (or match) keys, which can be any immutable type (strings, numbers, tuples), to values, which can be of any type (heterogeneous). Individual values in a dictionary are accessed by their keys.
We create dictionaries with curly brackets and pairs of keys and values. An empty dictionary would simply have no key:value pairs inside the curly brackets:
person = {'name':'Jack', 'age': 32}
print person
{'age': 32, 'name': 'Jack'}
Notice that the order of the key:value pairs is different in the dictionary definition than in the output! Because values in a dictionary are not stored in a particular order, they take an arbitrary order when the dictionary is displayed.
We can access and modify individual values in a dictionary with their keys:
person['age'] = 33
print person
{'age': 33, 'name': 'Jack'}
We can also use keys to add values to a previously defined dictionary:
person['address'] = 'Downtown Boulder'
print person
{'age': 33, 'name': 'Jack', 'address': 'Downtown Boulder'}
String methods
Can you explain what this script does?
string = "if it's in caps i'm trying to YELL!"
print string.find('caps')
Modify the command so that it finds the substring (‘caps’) even if capitalization is different in the string (ex. ‘CAPS’).
What is the output of
find
if the substring is not in the string?What happens if the substring appears more than once in the string? (ex. ‘in’)
Solution
loc = string.find('caps')
print string[loc:]
caps i'm trying to YELL!
The method find
returns the start index of the substring.
# change capitalization for testing
string = "if it's in cAPs i'm trying to YELL!"
print 'If substring not in string:', string.find('caps')
# force string to lowercase
print 'If find substring:', string.lower().find('caps')
print "Returns only first occurrence of substring 'in':", string.lower().find('in')
If substring not in string: -1
If find substring: 11
Returns only first occurrence of substring 'in': 8
Numbers in strings
You decided to quit research and open a bar. You are using Python to create a sign.
age = 21 # <--- don't change this line!
sign = 'You must be ' + age + '-years-old to enter this bar'
print sign
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-253b33d26e7e> in <module>()
1 age = 21 # <--- don't change this line!
----> 2 sign = 'You must be ' + age + '-years-old to enter this bar'
3 print sign
TypeError: cannot concatenate 'str' and 'int' objects
Fix your code so it prints the text in sign
correctly. Don’t change the first line!
Solution
age = 21
sign = 'You must be ' + str(age) + '-years-old to enter this bar'
print sign
You must be 21-years-old to enter this bar
Cheeeeeeeeese
What is the difference between these two statements? Why are their outputs different?
s1 = 3 * shopping_list[-1:]
s2 = 3 * shopping_list[-1]
Solution
print s1, type(s1)
print s2, type(s2)
['cheese', 'cheese', 'cheese'] <type 'list'>
cheesecheesecheese <type 'str'>
shopping_list[-1]
is the last value in the list, which is a string. The second statement is therefore repeating a string three times.
shopping_list[-1:]
is a slice of a list, so it is also a list (even if it only has one value). The first statement is therefore repeating a list three times.
Human readable numbers
When we write down a large integer, it’s customary to use commas (or periods, depending on the country) to separate the number into groups of three digits. It’s easier for humans to read a large number with separators but Python sees them as something else. What type of object is this? Why does Python read it as this object type?
my_account_balance = 15,752,000,000
Solution
my_account_balance = 15,752,000,000
type(my_account_balance)
tuple
You don’t actually need the parentheses to create a tuple. Python reads any sequence of objects separated by commas as a tuple.
Tiny tuples
Create a tuple that contains only one value. Confirm that it’s really a tuple. You might have to experiment!
Hint: Start with a tuple with two values and simplify it.
Solution
lil_tuple = 1,
type(lil_tuple)
tuple
Travel guide
- Create an empty dictionary called “states”
- Add 3 items to the dictionary. Map state names (the keys) to their abbreviations (the values) (ex. ‘Wyoming’:‘WY’). Pick easy ones! You can also use states from another country or look here for help.
Solution
states = {}
states['Colorado'] = 'CO'
states['California'] = 'CA'
states['Florida'] = 'FL'
- Use a variable in place of a key to access values in your
states
dictionary. For example, if I set the variable to “Wyoming”, the value should be “WY”.
Solution
selected_state = 'California'
print states[selected_state]
CA
- Create a dictionary called “cities” that contains 3 key:value pairs. The keys should be the state abbreviation in your
states
dictionary and the values should be the names of one city in each of those states state (ex. ‘WY’:‘Laramie’). Don’t start with an empty dictionary and add values to it – initialize the dictionary with the all of the key:value pairs already in it.
Solution
cities = {'CO':'Denver', 'FL':'Miami', 'CA':'San Francisco'}
Travel guide, part II (Advanced)
Write a short script to fill in the blanks in this string for any state in your
states
dictionary.__________ is abbreviated ____ and has cities like ________
Refactor (rewrite, improve) your code so you only have to change one word in your script to change states.
Hint: The values in one of your dictionaries are the keys for the other dictionary
Solution
selected_state = 'Colorado'
print selected_state + ' is abbreviated ' + states[selected_state] + ' and has cities like ' + cities[states[selected_state]]
Colorado is abbreviated CO and has cities like Denver