Posted on and Updated on

Detourning the Web | Assignment 3

https://youtu.be/tdn8ubicWsY

youtube-dl commands look like this (but the name of the video is an incremented integer— 1.mp4, 2.mp4, etc):

youtube-dl -f worstvideo --max-downloads 5 -o /Users/a/Documents/of_v0.9.8_osx_release/apps/myApps/20180318_detourningTheWeb_assignment3/bin/data/vids_sky/1.mp4 "https://www.youtube.com/results?sp=CAISAhgB&search_query=sky+timelapse"

youtube-dl -f worstvideo --max-downloads 5 -o /Users/a/Documents/of_v0.9.8_osx_release/apps/myApps/20180318_detourningTheWeb_assignment3/bin/data/vids_grass/1.mp4 "https://www.youtube.com/results?search_query=grass+timelapse&sp=CAISAhgB"

 

I called this shell command from within my oF app with a `system()` call from a separate thread (created with the `fork()` function):

void ofApp::downloadVid(int vidType, bool _isVidDownloaded, string dir, string vidUrl, string vidId){
    if(!_isVidDownloaded){
        if(!forkOnce_downloadVid){
            string vidName = "/Users/a/Documents/of_v0.9.8_osx_release/apps/myApps/20180318_detourningTheWeb_assignment3/bin/data/" + dir + vidId + ".mp4";
            
            cout << "vidName = " << vidName << endl;
            
            //youtube-dl -f worstvideo --max-downloads 5 -o /Users/a/Documents/of_v0.9.8_osx_release/apps/myApps/20180318_detourningTheWeb_assignment3/bin/data/vids/sky.mp4 "https://www.youtube.com/results?sp=CAISAhgB&search_query=sky+timelapse"
            
            string cmd = "/usr/local/bin/youtube-dl -f worstvideo --max-downloads 1 --playlist-start " + vidId + " --recode-video mp4 -o " + vidName + " "" + vidUrl.c_str() + """;
            
            cout << "cmd = " << cmd << endl;
            
            int pid = fork();
            switch(pid){
                case -1:{
                    perror("fork");
                    _exit(EXIT_FAILURE);
                    break;
                }
                case 0:{ // child process
                    system(cmd.c_str());
                    _exit(0);
                    break;
                }
                default:{ // parent process
                    // wait for child process
                    int status = 0;
                    waitpid(-1, &status, WNOHANG);
                    printf("child status:%dn",status);
                    break;
                }
            }
            forkOnce_downloadVid = true;
        } else {
            if(vidType == 0) {
                signal(SIGCHLD,signalHandler_sky);
            } else {
                signal(SIGCHLD,signalHandler_grass);
            }
        }
        
    }
}

Posted on and Updated on

Detourning the Web | Assignment 2: AirBnb Experiences, pt 1

For the list capture assignment I’ve chosen the Experiences section of the AirBnb website:

This section utilizes an infinite scrolling feature, fetching 40 items each time the bottom of the page is hit. In order to programmatically capture all elements I found the specific fetch request in the inspector panel:

which looks something like this:

https://www.airbnb.com/api/v2/explore_tabs?version=1.3.4&_format=for_explore_search_web&experiences_per_grid=20&items_per_grid=18&guidebooks_per_grid=20&auto_ib=true&fetch_filters=true&has_zero_guest_treatment=false&is_guided_search=true&is_new_cards_experiment=true&luxury_pre_launch=false&query_understanding_enabled=true&show_groupings=true&supports_for_you_v3=true&timezone_offset=-300&metadata_only=false&is_standard_search=true&tab_id=experience_tab§ion_offset=7&items_offset=120&recommendation_item_cursor=&refinement_paths[]=/experiences&last_search_session_id=&federated_search_session_id=a86f985a-7bac-46e4-8cfe-0e1988808c5a&screen_size=large&_intents=p1&key=d306zoyjsyarp7ifhu67rjxn52tv0t20¤cy=USD&locale=en

In this url the `items_offset` var is iterated by 40 each time the bottom of the page is hit, so iterating this in python would allow me to fetch all items:

import requests

def get_page(_offset):
	url = "https://www.airbnb.com/api/v2/explore_tabs?version=1.3.3&_format=for_explore_search_web&experiences_per_grid=20&items_per_grid=18&guidebooks_per_grid=20&auto_ib=true&fetch_filters=true&is_guided_search=true&is_new_cards_experiment=true&luxury_pre_launch=false&query_understanding_enabled=false&show_groupings=true&supports_for_you_v3=true&timezone_offset=-300&metadata_only=false&is_standard_search=true&tab_id=experience_tab§ion_offset=3&items_offset=" + str(_offset) + "&recommendation_item_cursor=&refinement_paths[]=/experiences&query=&last_search_session_id=&federated_search_session_id=320016fd-d09c-48c0-b7ed-2786432d35fb&screen_size=large&_intents=p1&key=d306zoyjsyarp7ifhu67rjxn52tv0t20¤cy=USD&locale=en"
	responses = requests.get(url).json()
	return responses

offset = 0
while offset <= 280: #280

	#add all results from this json pull to results var
	results = get_page(offset)
	items = results['explore_tabs'][0]['sections'][0]['trip_templates']
	for item in items:
		print item['title'].encode('utf-8')
		print item['kicker_text'].encode('utf-8')
		print item['country'].encode('utf-8')
		print item['picture']['large_ro']
		print item['star_rating']
		print item['lat']
		print item['lng']
		print 'n'

	#update offset
	offset = offset + 40

In the above script I printed the following items from each Experience post:

  • title: “Paris’ Best Kept Secrets Tour”
  • kicker_text: “history walk · Paris”
  • country: “France”
  • picture: “https://a0.muscache.com/im/pictures/571e0e7b-6867-4c44-bd91-776a5d698fae.jpg”
  • star_rating: 5.0
  • lat: 48.8674018702
  • lng: 2.32934203089

I piped these results to a .txt file via the command line:

python get_airbnb.py > output.txt

The full output.txt can be seen here.