forked from JamesByers/GA-SEA-DAT2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
05_api.py
123 lines (93 loc) · 3.32 KB
/
05_api.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
'''
CLASS: Getting Data from APIs
What is an API?
- Application Programming Interface
- Structured way to expose specific functionality and data access to users
- Web APIs usually follow the "REST" standard
How to interact with a REST API:
- Make a "request" to a specific URL (an "endpoint"), and get the data back in a "response"
- Most relevant request method for us is GET (other methods: POST, PUT, DELETE)
- Response is often JSON format
- Web console is sometimes available (allows you to explore an API)
'''
# read IMDb data into a DataFrame: we want a year column!
import pandas as pd
movies = pd.read_csv('imdb_1000.csv')
movies.head(10)
# use requests library to interact with a URL
import requests
r = requests.get('http://www.omdbapi.com/?t=the shawshank redemption&r=json&type=movie')
# check the status: 200 means success, 4xx means error
r.status_code
# view the raw response text
r.text
# decode the JSON response body into a dictionary
r.json()
# extracting the year from the dictionary
r.json()['Year']
r.json()['Plot']
# what happens if the movie name is not recognized?
r = requests.get('http://www.omdbapi.com/?t=blahblahblah&r=json&type=movie')
r.status_code
r.json()
# define a function to return the year
def get_movie_year(title):
r = requests.get('http://www.omdbapi.com/?t=' + title + '&r=json&type=movie')
info = r.json()
if info['Response'] == 'True':
return int(info['Year'])
else:
print "Movie not found"
# test the function
get_movie_year('The Shawshank Redemption')
get_movie_year('blahblahblah')
# create a smaller DataFrame for testing
top_movies = movies.head().copy()
# write a for loop to build a list of years
from time import sleep
years = []
for title in top_movies.title:
years.append(get_movie_year(title))
sleep(1)
years
top_movies
# check that the DataFrame and the list of years are the same length
assert(len(top_movies) == len(years))
# if you put a false satement in the above it would come back with an error
#assert(len(top_movies) == 2)
# save that list as a new column
top_movies['year'] = years
top_movies
'''
Bonus content: Updating the DataFrame as part of a loop
'''
# enumerate allows you to access the item location while iterating
letters = ['a', 'b', 'c']
for index, letter in enumerate(letters):
print index, letter
# iterrows method for DataFrames is similar
for index, row in top_movies.iterrows():
print index, row.title
# create a new column and set a default value
movies['year'] = -1
# loc method allows you to access a DataFrame element by 'label'
movies.loc[0, 'year'] = 1994
# write a for loop to update the year for the first three movies
for index, row in movies.iterrows():
if index < 3:
movies.loc[index, 'year'] = get_movie_year(row.title)
sleep(1)
else:
break
'''
Other considerations when accessing APIs:
- Most APIs require you to have an access key (which you should store outside your code)
- Most APIs limit the number of API calls you can make (per day, hour, minute, etc.)
- Not all APIs are free
- Not all APIs are well-documented
- Pay attention to the API version
Python wrapper is another option for accessing an API:
- Set of functions that "wrap" the API code for ease of use
- Potentially simplifies your code
- But, wrapper could have bugs or be out-of-date or poorly documented
'''