0% found this document useful (0 votes)

98 views

Pyspark Learning Hub

Uploaded by

Sozha Vendhan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Pyspark Learning Hub

Uploaded by

Sozha Vendhan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

PYSPARK LEARNING HUB : DAY - 9

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

Step - 1 : Problem Statement

09_Game Play Analysis II

Write a pyspark code that reports the device that is first
logged in for each player.
Return the result table in any order.
Difficult Level : EASY
DataFrame:
# Define the schema for the "Activity"
activity_schema = StructType([
StructField("player_id", IntegerType(), True),
StructField("device_id", IntegerType(), True),
StructField("event_date", StringType(), True),
StructField("games_played", IntegerType(), True)
])

# Define data for the "Activity"

activity_data = [
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(2, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5)
]

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

Step - 2 : Identifying The Input Data And Expected

Output
INPUT

INPUT
PLAYER_ID DEVICE_ID EVENT_DATE GAMES_PLAYED
1 2 2016-03-01 5
1 2 2016-05-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5

OUTPUT

OUTPUT
PLAYER_ID DEVICE_ID
1 2
2 3
3 1

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

Step - 3 : Writing the pyspark code to solve

the problem
# Creating Spark Session
from pyspark.sql import SparkSession
from pyspark.sql.types import
StructType,StructField,IntegerType,StringType

#creating spark session

spark = SparkSession. \
builder. \
config('spark.shuffle.useOldFetchProtocol', 'true'). \
config('spark.ui.port','0'). \
config("spark.sql.warehouse.dir", "/user/itv008042/warehouse"). \
enableHiveSupport(). \
master('yarn'). \
getOrCreate()

# Define the schema for the "Activity"

activity_schema = StructType([
StructField("player_id", IntegerType(), True),
StructField("device_id", IntegerType(), True),
StructField("event_date", StringType(), True),
StructField("games_played", IntegerType(), True)
])

# Define data for the "Activity"

activity_data = [
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(2, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5)
]

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

# Create a PySpark DataFrame

df=spark.createDataFrame(activity_data,activity_schema)
df.show()

rank_df=df.withColumn("rk",rank().over(Window.partitionBy(df["pla
yer_id"]).orderBy(df["event_date"])))
rank_df.show()

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

rank_df.filter(rank_df["rk"] ==
1).select("player_id","device_id").show()

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9

WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
No ratings yet
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
27 pages
Fospha Big Data Visualisations React
No ratings yet
Fospha Big Data Visualisations React
15 pages
Pyspark Hands on
No ratings yet
Pyspark Hands on
189 pages
Ravi Pyspark RDD Tutorial 1665758938
No ratings yet
Ravi Pyspark RDD Tutorial 1665758938
20 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Pyspark IQ FREE Guide
No ratings yet
Pyspark IQ FREE Guide
57 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
azure DE interview que
100% (1)
azure DE interview que
25 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Top Pyspark InterviewQuestions
No ratings yet
Top Pyspark InterviewQuestions
21 pages
Pyspark RDD Cheat Sheet Python For Data Science
No ratings yet
Pyspark RDD Cheat Sheet Python For Data Science
1 page
50 PySpark Interview Questions.pdf
No ratings yet
50 PySpark Interview Questions.pdf
7 pages
Pyspark Code
No ratings yet
Pyspark Code
3 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
How To Create Secrets in Databricks? - by Ashish Garg - Medium
No ratings yet
How To Create Secrets in Databricks? - by Ashish Garg - Medium
13 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
python interview question
No ratings yet
python interview question
39 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
EDA with Pandas
No ratings yet
EDA with Pandas
8 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Spark Vs Hadoop Features Spark
No ratings yet
Spark Vs Hadoop Features Spark
9 pages
Mongodb Cheat Sheet
No ratings yet
Mongodb Cheat Sheet
10 pages
Pyspark Study Material
No ratings yet
Pyspark Study Material
5 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
Download ebooks file Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh all chapters
100% (3)
Download ebooks file Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh all chapters
55 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
PySpark VS SQL Interview Questions
No ratings yet
PySpark VS SQL Interview Questions
16 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Pyspark
No ratings yet
Pyspark
31 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Spark RDD Dataframes SQL
No ratings yet
Spark RDD Dataframes SQL
3 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Download Full Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh PDF All Chapters
100% (4)
Download Full Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh PDF All Chapters
55 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
_ Databricks & PySpark learning day-10
No ratings yet
_ Databricks & PySpark learning day-10
4 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
Interview
No ratings yet
Interview
86 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
De Mod 0 Get Started With Pyspark Programming
No ratings yet
De Mod 0 Get Started With Pyspark Programming
7 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Lecture # 12 - Introduction To React JS
No ratings yet
Lecture # 12 - Introduction To React JS
76 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Learning SQL Zero To Hero
100% (1)
Learning SQL Zero To Hero
110 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
61 pages
Disk Management From The Command-Line, Part 1 - The Instructional
No ratings yet
Disk Management From The Command-Line, Part 1 - The Instructional
10 pages
Sharada Pati: Consultant - Saksoft Limited
No ratings yet
Sharada Pati: Consultant - Saksoft Limited
4 pages
Class-Xii-It-Database Concepts
No ratings yet
Class-Xii-It-Database Concepts
240 pages
Expert Veri Ed, Online, Free.: Microsoft AZ-104 Exam Actual Questions
No ratings yet
Expert Veri Ed, Online, Free.: Microsoft AZ-104 Exam Actual Questions
1 page
TSM Sizing Concepts
No ratings yet
TSM Sizing Concepts
76 pages
Common Field Types: Customer Id Lastname Firstname Address City State Zipcode
No ratings yet
Common Field Types: Customer Id Lastname Firstname Address City State Zipcode
20 pages
PO, CMR and XLA
No ratings yet
PO, CMR and XLA
3 pages
Video Rental ERD
0% (1)
Video Rental ERD
7 pages
Basic Tutorial To Use Pivot Table in Excel
No ratings yet
Basic Tutorial To Use Pivot Table in Excel
9 pages
RahulPathrabe_v0.1_18112024-1
No ratings yet
RahulPathrabe_v0.1_18112024-1
3 pages
SQL Queries Pitanja
No ratings yet
SQL Queries Pitanja
45 pages
Assignment1 - MapReduce
No ratings yet
Assignment1 - MapReduce
2 pages
Schema Report
No ratings yet
Schema Report
16 pages
Optim Installation &amp Configuration Guide
No ratings yet
Optim Installation &amp Configuration Guide
548 pages
Computer Drives
No ratings yet
Computer Drives
4 pages
log_6508738694_20241226001106
No ratings yet
log_6508738694_20241226001106
15 pages
Data Warehousing Concept Using ETL Process For SCD Type-2
No ratings yet
Data Warehousing Concept Using ETL Process For SCD Type-2
6 pages
SWOT Analysis of The Oracle Standard and MS-SQL
No ratings yet
SWOT Analysis of The Oracle Standard and MS-SQL
6 pages
Excel Tutorial On Pivot Tables PDF
No ratings yet
Excel Tutorial On Pivot Tables PDF
2 pages
Resume
No ratings yet
Resume
2 pages
Primavera P6 Version 22
No ratings yet
Primavera P6 Version 22
4 pages
Guardium Architecture and Deployment - Master Skills
No ratings yet
Guardium Architecture and Deployment - Master Skills
97 pages
Command List For Data ONTAP Upgrades
No ratings yet
Command List For Data ONTAP Upgrades
3 pages
Benchmarking ETL Workflows: Presented by Kevin Wilkinson
No ratings yet
Benchmarking ETL Workflows: Presented by Kevin Wilkinson
48 pages
J1.L.P0022-Library Management 500LOC
No ratings yet
J1.L.P0022-Library Management 500LOC
4 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
DBMS QB
No ratings yet
DBMS QB
3 pages
Unit-4 Relational Database and Big Data
No ratings yet
Unit-4 Relational Database and Big Data
22 pages
Bundle Patch
No ratings yet
Bundle Patch
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Pyspark Learning Hub

Uploaded by

Pyspark Learning Hub

Uploaded by

PYSPARK LEARNING HUB : DAY - 9

Step - 1 : Problem Statement

09_Game Play Analysis II

# Define data for the "Activity"

Step - 2 : Identifying The Input Data And Expected

Step - 3 : Writing the pyspark code to solve

#creating spark session

# Define the schema for the "Activity"

# Define data for the "Activity"

# Create a PySpark DataFrame

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.