Skip to content

sungchun12/schedule-python-script-using-Google-Cloud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

schedule-python-script-using-Google-Cloud

Use Case: Automates live Chicago traffic data and flows it into BigQuery for interactive real-time analysis

Technical Concept: Schedules a simple Python script to append data into BigQuery using Google Cloud's App Engine with a cron job.

Source Data: https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Congestion-Estimates-by-Se/n4j6-wkkf

Architecture Reference: http://zablo.net/blog/post/python-apache-beam-google-dataflow-cron

Shout out to Mylin Ackerman for all his help. Saved me weeks of research with his personal touch. https://www.linkedin.com/in/mylin-ackermann-25a00445/

Check me out on LinkedIn: https://www.linkedin.com/in/sungwonchung1/

Setup Prerequisites:

  1. Signup for Google Cloud account and enable billing
  2. Enable BigQuery API, Stackdriver API, Google Cloud Deployment Manager V2 API, Google Compute Engine API

Order of Operations:

  1. Develop scripts with Google cloud shell or SDK
  2. Deploy on appengine
  3. Deploy cron job
  4. Check BigQuery
  5. Connect with dataviz tool such as Tableau

Development Instructions:

  1. Copy github repository into SDK or Google cloud shell(thankfully it has persistent storage, so you don't have to recopy the folder structure): git clone https://github.com/sungchun12/schedule-python-script-using-Google-Cloud.git
  2. Create BigQuery dataset: "chicago_traffic"

Deploy Instructions:

  1. Remember to put init.py files into all local packages
  2. Change directory: cd ~/chicago-traffic
  3. Install all required packages into local lib folder: pip install -r requirements.txt -t lib
  4. To deploy App Engine app, run: gcloud app deploy app.yaml
  5. To deploy App Engine CRON, run: gcloud app deploy cron.yaml

Folder Structure:

alt text

init.py needed to properly deploy within App Engine

append_data.py - call the Chicago live traffic API and appends it into BigQuery

app.yaml - definition of Google App Engine application

appengine_config.py adds dependencies to locally installed packages (from lib folder)

cron.yaml - definition of Google App Engine CRON job

main.py - entry point for the web application and calls the function contained within "append_data.py"

requirements.txt - file for pip package manager, which contains list of all required packages to run the application and the pipeline

lib - local folder with all pip-installed packages from requirements.txt file

About

🕓 Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy