Scrapy spider to scrape recipe and nutritional data from
Data was used to provide insight of the nutritional value of various recipes.
www.allrecipes.com
. 35,516 recipes scraped, found in /export
.
Data was used to provide insight of the nutritional value of various recipes.
Data scraped for each recipe includes:
# Recipe and related information
'name', 'url', 'category', 'author', 'summary', 'rating', 'rating_count', 'review_count', 'ingredients', 'directions', 'prep', 'cook', 'total', 'Servings', 'Yield',
# Nutritional information
'calories', 'carbohydrates_g', 'sugars_g', 'fat_g', 'saturated_fat_g', 'cholesterol_mg', 'protein_g', 'dietary_fiber_g', 'sodium_mg', 'calories_from_fat',
# Additional nutritional info - Metals
'calcium_mg', 'iron_mg', 'magnesium_mg', 'potassium_mg', 'zinc_mg', 'phosphorus_mg',
# Additional nutritional info - Vitamins
'vitamin_a_iu_IU', 'niacin_equivalents_mg', 'vitamin_b6_mg', 'vitamin_c_mg', 'folate_mcg', 'thiamin_mg', 'riboflavin_mg', 'vitamin_e_iu_IU', 'vitamin_k_mcg', 'biotin_mcg', 'vitamin_b12_mcg',
# Additional nutritional info - Fats
'mono_fat_g', 'poly_fat_g', 'trans_fatty_acid_g', 'omega_3_fatty_acid_g', 'omega_6_fatty_acid_g'
Note: Many recipes do not list most of the nutrients from the 'Additional nutritional info'. A few nutrients aren't scraped and have been omitted from 'Additional nutritional info' due to its listing being extremely scarce.
{
"name":"Simple Macaroni and Cheese",
"url":"https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/",
"category":"main-dish",
"author":"g0dluvsugly",
"summary":"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",
"rating":4.42,
"rating_count":834,
"review_count":575,
"ingredients":"1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese",
"directions":"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",
"prep":"10 mins",
"cook":"20 mins",
"total":"30 mins",
"servings":"4",
"yield":"4 servings",
"calories":630.2,
"carbohydrates_g":55.0,
"sugars_g":7.6,
"fat_g":33.6,
"saturated_fat_g":20.9,
"cholesterol_mg":99.6,
"protein_g":26.5,
"dietary_fiber_g":2.1,
"sodium_mg":777.0,
"calories_from_fat":302.2,
"calcium_mg":567.9,
"iron_mg":2.7,
"magnesium_mg":61.8,
"potassium_mg":380.0,
"vitamin_a_iu_IU":1152.0,
"niacin_equivalents_mg":10.1,
"vitamin_c_mg":0.3,
"folate_mcg":165.6,
"thiamin_mg":0.7
}
First run pip install scrapy
- Set
DEBUG = False
inrecipescrape/spiders/recipes.py
- Run
scrapy crawl recipes
- Set
DEBUG = True
inrecipescrape/spiders/recipes.py
- Run
scrapy crawl recipes -L WARN --logfile=scrapelog.txt -s JOBDIR=recipes/spider-1
- Run
pip install pandas
- Make sure
DEBUG = True
innutrient_list.py
- Run
scrapy crawl nutrients -o nutrients.csv -L WARN -s JOBDIR=nutrients/spider-1
- Outputs 'items_count', 'nutri_count', 'nutri_list', 'url' in
nutrients.csv
in which the last row contains the unified list of nutritients found so far.
- There are many ways to combine CSV files, a sample python file
/extras/combine_csv.py
is included for quick reference