For the list capture assignment I’ve chosen the Experiences section of the AirBnb website:
This section utilizes an infinite scrolling feature, fetching 40 items each time the bottom of the page is hit. In order to programmatically capture all elements I found the specific fetch request in the inspector panel:
which looks something like this:
https://www.airbnb.com/api/v2/explore_tabs?version=1.3.4&_format=for_explore_search_web&experiences_per_grid=20&items_per_grid=18&guidebooks_per_grid=20&auto_ib=true&fetch_filters=true&has_zero_guest_treatment=false&is_guided_search=true&is_new_cards_experiment=true&luxury_pre_launch=false&query_understanding_enabled=true&show_groupings=true&supports_for_you_v3=true&timezone_offset=-300&metadata_only=false&is_standard_search=true&tab_id=experience_tab§ion_offset=7&items_offset=120&recommendation_item_cursor=&refinement_paths[]=/experiences&last_search_session_id=&federated_search_session_id=a86f985a-7bac-46e4-8cfe-0e1988808c5a&screen_size=large&_intents=p1&key=d306zoyjsyarp7ifhu67rjxn52tv0t20¤cy=USD&locale=en
In this url the `items_offset` var is iterated by 40 each time the bottom of the page is hit, so iterating this in python would allow me to fetch all items:
import requests
def get_page(_offset):
url = "https://www.airbnb.com/api/v2/explore_tabs?version=1.3.3&_format=for_explore_search_web&experiences_per_grid=20&items_per_grid=18&guidebooks_per_grid=20&auto_ib=true&fetch_filters=true&is_guided_search=true&is_new_cards_experiment=true&luxury_pre_launch=false&query_understanding_enabled=false&show_groupings=true&supports_for_you_v3=true&timezone_offset=-300&metadata_only=false&is_standard_search=true&tab_id=experience_tab§ion_offset=3&items_offset=" + str(_offset) + "&recommendation_item_cursor=&refinement_paths[]=/experiences&query=&last_search_session_id=&federated_search_session_id=320016fd-d09c-48c0-b7ed-2786432d35fb&screen_size=large&_intents=p1&key=d306zoyjsyarp7ifhu67rjxn52tv0t20¤cy=USD&locale=en"
responses = requests.get(url).json()
return responses
offset = 0
while offset <= 280: #280
#add all results from this json pull to results var
results = get_page(offset)
items = results['explore_tabs'][0]['sections'][0]['trip_templates']
for item in items:
print item['title'].encode('utf-8')
print item['kicker_text'].encode('utf-8')
print item['country'].encode('utf-8')
print item['picture']['large_ro']
print item['star_rating']
print item['lat']
print item['lng']
print 'n'
#update offset
offset = offset + 40
In the above script I printed the following items from each Experience post:
- title: “Paris’ Best Kept Secrets Tour”
- kicker_text: “history walk ยท Paris”
- country: “France”
- picture: “https://a0.muscache.com/im/pictures/571e0e7b-6867-4c44-bd91-776a5d698fae.jpg”
- star_rating: 5.0
- lat: 48.8674018702
- lng: 2.32934203089
I piped these results to a .txt file via the command line:
python get_airbnb.py > output.txt