0% found this document useful (0 votes)

110 views

Air BNB Data Analysis

The document discusses importing necessary libraries and installing packages for data analysis in Python. It then downloads airbnb open data from Kaggle and loads it into a Pandas dataframe. Some initial cleaning steps are performed, including removing duplicate rows and checking for null values.

Uploaded by

farrukhbhatti78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views

Air BNB Data Analysis

Uploaded by

farrukhbhatti78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

First of all I imported the libraries that are essential for our data analysis process.

We will do make our data

clean in order to perform analysis.

In [3]: import pandas as pd

import numpy as nm
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.stats.multicomp as mc

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

I installed opendatastes to import data from the website of Kaggle.

In [4]: !pip install opendatasets

Requirement already satisfied: opendatasets in c:\users\47455\anaconda3\lib\site-package
s (0.1.22)
Requirement already satisfied: kaggle in c:\users\47455\anaconda3\lib\site-packages (fro
m opendatasets) (1.5.13)
Requirement already satisfied: tqdm in c:\users\47455\anaconda3\lib\site-packages (from
opendatasets) (4.64.0)
Requirement already satisfied: click in c:\users\47455\anaconda3\lib\site-packages (from
opendatasets) (8.0.4)
Requirement already satisfied: colorama in c:\users\47455\anaconda3\lib\site-packages (f
rom click->opendatasets) (0.4.4)
Requirement already satisfied: python-slugify in c:\users\47455\anaconda3\lib\site-packa
ges (from kaggle->opendatasets) (5.0.2)
Requirement already satisfied: urllib3 in c:\users\47455\anaconda3\lib\site-packages (fr
om kaggle->opendatasets) (1.26.9)
Requirement already satisfied: requests in c:\users\47455\anaconda3\lib\site-packages (f
rom kaggle->opendatasets) (2.27.1)
Requirement already satisfied: certifi in c:\users\47455\anaconda3\lib\site-packages (fr
om kaggle->opendatasets) (2021.10.8)
Requirement already satisfied: six>=1.10 in c:\users\47455\anaconda3\lib\site-packages
(from kaggle->opendatasets) (1.16.0)
Requirement already satisfied: python-dateutil in c:\users\47455\anaconda3\lib\site-pack
ages (from kaggle->opendatasets) (2.8.2)
Requirement already satisfied: text-unidecode>=1.3 in c:\users\47455\anaconda3\lib\site-
packages (from python-slugify->kaggle->opendatasets) (1.3)
Requirement already satisfied: idna<4,>=2.5 in c:\users\47455\anaconda3\lib\site-package
s (from requests->kaggle->opendatasets) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\47455\anaconda3\lib
\site-packages (from requests->kaggle->opendatasets) (2.0.4)

In [5]: import opendatasets as od

In [6]: dataset = 'https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata'

In [7]: od.download(dataset)
Skipping, found downloaded files in ".\airbnbopendata" (use force=True to force downloa
d)

In [8]: import os

In [9]: data_dir = './airbnbopendata'

In [10]: os.listdir(data_dir)
['Airbnb_Open_Data.csv']
Out[10]:

In [11]: frame = pd.read_csv('Airbnb_Open_Data.csv',low_memory=False)

frame

Out[11]:
neighbourhood
id NAME host id host_identity_verified host name neighbourhood
group

Clean & quiet

0 1001254 apt home by 80014485718 unconfirmed Madaline Brooklyn Kensington
the park

Skylit Midtown
1 1002102 52335172823 verified Jenna Manhattan Midtown
Castle

THE VILLAGE
OF
2 1002403 78829239556 NaN Elise Manhattan Harlem
HARLEM....NEW
YORK !

3 1002755 NaN 85098326012 unconfirmed Garry Brooklyn Clinton Hill

Entire Apt:
Spacious
4 1003689 92037596077 verified Lyndon Manhattan East Harlem
Studio/Loft by
central park

... ... ... ... ... ... ... ...

Spare room in
102594 6092437 12312296767 verified Krik Brooklyn Williamsburg
Williamsburg

Best Location
Morningside
102595 6092990 near Columbia 77864383453 unconfirmed Mifan Manhattan
Heights
U

Comfy, bright
102596 6093542 room in 69050334417 unconfirmed Megan Brooklyn Park Slope
Brooklyn

Big Studio-One
102597 6094094 Stop from 11160591270 unconfirmed Christopher Queens Long Island City
Midtown

585 sf Luxury Upper West

102598 6094647 68170633372 unconfirmed Rebecca Manhattan
Studio Side

102599 rows × 26 columns

Removing the duplicates

In [12]: # Count the number of rows in the original DataFrame
print('Number of rows before removing duplicates:', len(frame))

# Drop duplicate rows based on all columns

frame = frame.drop_duplicates()

# Count the number of rows in the updated DataFrame

print('Number of rows after removing duplicates:', len(frame))

Number of rows before removing duplicates: 102599

Number of rows after removing duplicates: 102058

checking that do we have null values in our data.

In [13]: print(frame.isnull().sum())

id 0
NAME 250
host id 0
host_identity_verified 289
host name 404
neighbourhood group 29
neighbourhood 16
lat 8
long 8
country 532
country code 131
instant_bookable 105
cancellation_policy 76
room type 0
Construction year 214
price 247
service fee 273
minimum nights 400
number of reviews 183
last review 15832
reviews per month 15818
review rate number 319
calculated host listings count 319
availability 365 448
house_rules 51842
license 102056
dtype: int64

I wanted to see the column names so, I performed the below code

In [14]: print(frame.columns)
Index(['id', 'NAME', 'host id', 'host_identity_verified', 'host name',
'neighbourhood group', 'neighbourhood', 'lat', 'long', 'country',
'country code', 'instant_bookable', 'cancellation_policy', 'room type',
'Construction year', 'price', 'service fee', 'minimum nights',
'number of reviews', 'last review', 'reviews per month',
'review rate number', 'calculated host listings count',
'availability 365', 'house_rules', 'license'],
dtype='object')

In [15]: print(frame.head())
id NAME host id \
0 1001254 Clean & quiet apt home by the park 80014485718
1 1002102 Skylit Midtown Castle 52335172823
2 1002403 THE VILLAGE OF HARLEM....NEW YORK ! 78829239556
3 1002755 NaN 85098326012
4 1003689 Entire Apt: Spacious Studio/Loft by central park 92037596077
host_identity_verified host name neighbourhood group neighbourhood \
0 unconfirmed Madaline Brooklyn Kensington
1 verified Jenna Manhattan Midtown
2 NaN Elise Manhattan Harlem
3 unconfirmed Garry Brooklyn Clinton Hill
4 verified Lyndon Manhattan East Harlem

lat long country ... service fee minimum nights \

0 40.64749 -73.97237 United States ... $193 10.0
1 40.75362 -73.98377 United States ... $28 30.0
2 40.80902 -73.94190 United States ... $124 3.0
3 40.68514 -73.95976 United States ... $74 30.0
4 40.79851 -73.94399 United States ... $41 10.0

number of reviews last review reviews per month review rate number \
0 9.0 10/19/2021 0.21 4.0
1 45.0 5/21/2022 0.38 4.0
2 0.0 NaN NaN 5.0
3 270.0 7/5/2019 4.64 4.0
4 9.0 11/19/2018 0.10 3.0

calculated host listings count availability 365 \

0 6.0 286.0
1 2.0 228.0
2 1.0 352.0
3 1.0 322.0
4 1.0 289.0

house_rules license
0 Clean up and treat the home the way you'd like... NaN
1 Pet friendly but please confirm with me if the... NaN
2 I encourage you to use my kitchen, cooking and... NaN
3 NaN NaN
4 Please no smoking in the house, porch or on th... NaN

[5 rows x 26 columns]

Checking the Shape of the data.

In [16]: print(frame.shape)
(102058, 26)

Used the mode function to fill the null values of each data.

In [17]: frame_copy=frame.copy()
frame_copy['long'].fillna(frame_copy['long'].mode()[0], inplace=True)
frame_copy['NAME'].fillna(frame_copy['NAME'].mode()[0], inplace=True)
frame_copy['neighbourhood group'].fillna(frame_copy['neighbourhood group'].mode()[0],inp
frame_copy['neighbourhood'].fillna(frame_copy['neighbourhood'].mode()[0],inplace = True)
frame_copy['host name'].fillna(frame_copy['host name'].mode()[0],inplace=True)
#replacing null values using mode in name column
frame_copy['lat'].fillna(frame_copy['lat'].mode()[0],inplace=True)
#replacing null values using mode in name column
frame_copy['long'].fillna(frame_copy['long'].mode()[0], inplace=True)
#replacing null values using mode in name column
frame_copy['country'].fillna(frame_copy['country'].mode()[0], inplace=True)
#replacing null values using mode in name column
frame_copy['country code'].fillna(frame_copy['country code'].mode()[0], inplace=True)
#replacing null values using mode in name column
frame_copy['instant_bookable'].fillna(frame_copy['instant_bookable'].mode()[0], inplace=
#replacing null values using mode in name column
frame_copy['cancellation_policy'].fillna(frame_copy['cancellation_policy'].mode()[0], inp
frame_copy['host_identity_verified'].fillna(frame_copy['host_identity_verified'].mode()[
frame_copy['Construction year'].fillna(frame_copy['Construction year'].mode()[0], inplac
frame_copy['service fee'].fillna(frame_copy['service fee'].mode()[0], inplace=True)
frame_copy['minimum nights'].fillna(frame_copy['minimum nights'].mode()[0], inplace=True
frame_copy['number of reviews'].fillna(frame_copy['number of reviews'].mode()[0], inplac
frame_copy['last review'].fillna(frame_copy['last review'].mode()[0], inplace=True)
frame_copy['reviews per month'].fillna(frame_copy['reviews per month'].mode()[0], inplac
frame_copy['review rate number'].fillna(frame_copy['review rate number'].mode()[0], inpl
frame_copy['calculated host listings count'].fillna(frame_copy['calculated host listings
frame_copy['availability 365'].fillna(frame_copy['availability 365'].mode()[0], inplace=
frame_copy['house_rules'].fillna(frame_copy['house_rules'].mode()[0], inplace=True)
frame_copy['price'].fillna(frame_copy['price'].mode()[0], inplace=True)

In [18]: frame = frame_copy.copy()

In [19]: print(frame.isnull().sum())

id 0
NAME 0
host id 0
host_identity_verified 0
host name 0
neighbourhood group 0
neighbourhood 0
lat 0
long 0
country 0
country code 0
instant_bookable 0
cancellation_policy 0
room type 0
Construction year 0
price 0
service fee 0
minimum nights 0
number of reviews 0
last review 0
reviews per month 0
review rate number 0
calculated host listings count 0
availability 365 0
house_rules 0
license 102056
dtype: int64

to remove the license column I used the drop.

In [20]: frame = frame.drop("license", axis=1)

In [21]: print(frame.isnull().sum())

The 'types_of_rooms' variable contains the percentage of each type of room in the dataset, rounded to one
decimal point.

In [22]: types_of_rooms = frame['room type'].value_counts(normalize=True).mul(100).round(1)

types_of_rooms

Entire home/apt 52.4

Out[22]:
Private room 45.4
Shared room 2.2
Hotel room 0.1
Name: room type, dtype: float64

The explanation of the code below,

The first four lines of code convert the price column from object to float data type, by removing commas
and dollar signs, and then parsing it as numeric data using pandas. The next two lines of code create two
groups based on the room type (Entire home/apt and Private room) and extract the price values for each
group. The stats.ttest_ind() function from the scipy library is then used to perform an independent two-
sample t-test between the two groups, assuming unequal variances.

Finally, the t-statistic and p-value for the t-test are printed using f-strings.

The purpose of this code is to perform a statistical analysis to test whether there is a significant difference in
mean price between the two groups of listings (Entire home/apt and Private room). The t-test is a common
method for comparing means of two groups and determining whether the difference between them is
statistically significant or not.

In [23]: from scipy import stats

frame['price'] = frame['price'].astype(str)
frame['price'] = frame['price'].str.replace(',', '').str.replace('$', '', regex=False)
frame['price'] = frame['price'].astype(float)
frame['price'] = pd.to_numeric(frame['price'])
# Split data into two groups based on room type
group1 = frame[frame['room type'] == 'Entire home/apt']['price']
group2 = frame[frame['room type'] == 'Private room']['price']

# Perform t-test
t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)

# Print results
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

T-statistic: 0.10435722164673732
P-value: 0.9168860835285177

In the code below, I performed an ANOVA (Analysis of Variance) test to determine if there are significant
differences in mean price between three groups of listings, based on their neighbourhood_group feature.
In [24]: # create three groups based on neighbourhood_group
group1 = frame[frame['neighbourhood group'] == 'Brooklyn']['price']
group2 = frame[frame['neighbourhood group'] == 'Manhattan']['price']
group3 = frame[frame['neighbourhood group'] == 'Queens']['price']

# perform ANOVA test

f_statistic, p_value = stats.f_oneway(group1, group2, group3)

# print results
print("F-statistic:", f_statistic)
print("P-value:", p_value)

F-statistic: 3.1372044124800285
P-value: 0.043408308935618714

In [25]: # perform one-way ANOVA

f_stat, p_val = stats.f_oneway(frame[frame['neighbourhood group'] == 'Brooklyn']['price'
frame[frame['neighbourhood group'] == 'Manhattan']['price
frame[frame['neighbourhood group'] == 'Queens']['price'],
frame[frame['neighbourhood group'] == 'Staten Island']['p
frame[frame['neighbourhood group'] == 'Bronx']['price'])

# perform Tukey's HSD post-hoc test

m_comp = mc.MultiComparison(frame['price'], frame['neighbourhood group'])
tukey_res = m_comp.tukeyhsd()

print(tukey_res)

Multiple Comparison of Means - Tukey HSD, FWER=0.05

========================================================================
group1 group2 meandiff p-adj lower upper reject
------------------------------------------------------------------------
Bronx Brooklyn -1.1063 1.0 -20.5605 18.3478 False
Bronx Manhattan -5.0561 0.988 -24.4837 14.3715 False
Bronx Queens 2.4874 0.9998 -18.2014 23.1761 False
Bronx Staten Island -3.9998 0.9999 -40.9394 32.9399 False
Bronx brookln -46.6689 1.0 -1025.4278 932.09 False
Bronx manhatan -166.6689 0.9988 -1145.4278 812.09 False
Brooklyn Manhattan -3.9498 0.5913 -10.656 2.7565 False
Brooklyn Queens 3.5937 0.933 -6.1821 13.3695 False
Brooklyn Staten Island -2.8934 1.0 -35.0194 29.2325 False
Brooklyn brookln -45.5626 1.0 -1024.1516 933.0264 False
Brooklyn manhatan -165.5626 0.9989 -1144.1516 813.0264 False
Manhattan Queens 7.5435 0.2499 -2.1794 17.2663 False
Manhattan Staten Island 1.0563 1.0 -31.0536 33.1663 False
Manhattan brookln -41.6128 1.0 -1020.2013 936.9757 False
Manhattan manhatan -161.6128 0.999 -1140.2013 816.9757 False
Queens Staten Island -6.4871 0.9973 -39.3754 26.4012 False
Queens brookln -49.1562 1.0 -1027.7706 929.4581 False
Queens manhatan -169.1562 0.9987 -1147.7706 809.4581 False
Staten Island brookln -42.6691 1.0 -1021.7618 936.4236 False
Staten Island manhatan -162.6691 0.999 -1141.7618 816.4236 False
brookln manhatan -120.0 1.0 -1503.9172 1263.9172 False
------------------------------------------------------------------------

In [26]: print(frame['neighbourhood group'].unique())

['Brooklyn' 'Manhattan' 'brookln' 'manhatan' 'Queens' 'Staten Island'
'Bronx']

In [27]: frame['neighbourhood group'] = frame['neighbourhood group'].replace({'brookln': 'Brookly

In [28]: print(frame['neighbourhood group'].unique())

['Brooklyn' 'Manhattan' 'Queens' 'Staten Island' 'Bronx']

In [29]: print(frame['room type'].unique())

['Private room' 'Entire home/apt' 'Shared room' 'Hotel room']

Now, In the below code data analysis have been performed by using different types of graphs like barpchart,
piechart, boxplot, histogram, scatterplot and violinplot.

In [30]: # group data by room type and calculate average price

avg_price = frame.groupby('room type')['price'].mean()

# create barplot
fig, ax = plt.subplots()
ax.bar(avg_price.index, avg_price.values, color=['blue', 'green', 'red', 'orange'], widt

# add labels and title

ax.set_xlabel('Room Type')
ax.set_ylabel('Average Price')
ax.set_title('Barplot of Average Price by Room Type')

# create legend
colors = {'Private room': 'blue', 'Entire home/apt': 'green', 'Shared room': 'red', 'Hot
labels = list(colors.keys())
handles = [plt.Rectangle((0,0),1,1, color=colors[label]) for label in labels]
ax.legend(handles, labels, bbox_to_anchor=(1.05, 1), loc='upper left')

# display plot
plt.show()

In [31]: import matplotlib.pyplot as plt

# create a list of labels for each neighbourhood group

labels = ['Brooklyn', 'Manhattan', 'Queens', 'Staten Island', 'Bronx']

# create a list of colors for each neighbourhood group

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

# filter the DataFrame to include only the neighbourhood group column

ng_counts = frame['neighbourhood group'].value_counts()

# create a pie chart of neighbourhood group distribution

plt.figure(figsize=(8,6))
plt.pie(ng_counts, labels=ng_counts.index, colors=colors, autopct='%1.1f%%')

# add legend outside the chart

plt.legend(labels=labels, bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# set title
plt.title('Pie Chart of Neighbourhood Group Distribution', fontsize=14)

# show the plot

plt.show()

In [32]: # create a list of labels for each neighbourhood group

labels = ['Brooklyn', 'Manhattan', 'Queens', 'Staten Island', 'Bronx']

# define colors for each box

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

# define box properties

boxprops = dict(linestyle='-', linewidth=2, color='black')

# create a boxplot of price by neighbourhood group

plt.figure(figsize=(8,6))
boxplots = plt.boxplot([frame[frame['neighbourhood group']=='Brooklyn']['price'],
frame[frame['neighbourhood group']=='Manhattan']['price'],
frame[frame['neighbourhood group']=='Queens']['price'],
frame[frame['neighbourhood group']=='Staten Island']['price'],
frame[frame['neighbourhood group']=='Bronx']['price']],
labels=labels, boxprops=boxprops, patch_artist=True)

# set colors for each box

for patch, color in zip(boxplots['boxes'], colors):
patch.set_facecolor(color)
patch.set_label('')

# set axis labels and title

plt.xlabel('Neighbourhood Group', fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.title('Boxplot of Price by Neighbourhood Group', fontsize=14)

# add legend
plt.legend(handles=boxplots['boxes'], labels=labels, bbox_to_anchor=(1.05, 1), loc='uppe

# show the plot

plt.show()
In [33]: # Create a horizontal bar chart
colors = ['#5DA5DB', '#FAA43C', '#60BD69', '#F17CB1']
plt.figure(figsize=(8, 6))
ax = plt.gca()
ax.barh(types_of_rooms.index, types_of_rooms.values, color=colors)
plt.title('Percentage of Types of Rooms')
plt.xlabel('Percentage')
plt.show()

In [34]: plt.figure(figsize=(8, 6))

sns.histplot(data=frame, x='price', kde=True,color='orange')
plt.title('Histogram of Price')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
In [35]: import matplotlib.pyplot as plt
color = ['purple']
plt.figure(figsize=(8, 6))
plt.scatter(frame['price'], frame['number of reviews'],color=color)
plt.title('Scatter Plot of Price vs. Number of Reviews')
plt.xlabel('price')
plt.ylabel('number of reviews')
plt.show()

In [37]: sns.violinplot(x='room type', y='reviews per month', data=frame, palette='Blues')

plt.title('Distribution of Reviews per Month by Room Type')
plt.show()

Make Your CPN Instrctions
96% (203)
Make Your CPN Instrctions
21 pages
PDF Fraud Bible 1 Compress
91% (56)
PDF Fraud Bible 1 Compress
123 pages
BOA Check Template
100% (11)
BOA Check Template
2 pages
Non VBV Bins
65% (17)
Non VBV Bins
2 pages
Carding Tutorial For Beginners 2024
92% (24)
Carding Tutorial For Beginners 2024
10 pages
Card in 1 Lesson 2024
89% (19)
Card in 1 Lesson 2024
27 pages
Corce
70% (47)
Corce
206 pages
Spamming Checks Sauce
89% (27)
Spamming Checks Sauce
14 pages
2024 Latest Cashapp Method 1-2
86% (22)
2024 Latest Cashapp Method 1-2
7 pages
Data
71% (17)
Data
2 pages
Mobile Check Fraud A-Z Easy $500day-1
97% (36)
Mobile Check Fraud A-Z Easy $500day-1
18 pages
Hotel Corporate Codes
84% (25)
Hotel Corporate Codes
2 pages
Cpncreation
98% (84)
Cpncreation
313 pages
How To Get Cash With Just A Credit Card Number (Works 100%)
82% (34)
How To Get Cash With Just A Credit Card Number (Works 100%)
6 pages
How To Get Credit Cards With Funds
85% (52)
How To Get Credit Cards With Funds
2 pages
All Guides
72% (25)
All Guides
28 pages
Cardable Sites
88% (16)
Cardable Sites
110 pages
Amazon Carding 2024
74% (19)
Amazon Carding 2024
7 pages
752193859-1710962824997-New-CC-usa (Sent F)
100% (4)
752193859-1710962824997-New-CC-usa (Sent F)
12 pages
Fraud Resources by Nomtim (2021) FREE
93% (55)
Fraud Resources by Nomtim (2021) FREE
5 pages
Dump Trackssjs
39% (28)
Dump Trackssjs
88 pages
BINS That Bypass VBV, 3D Security.txt
100% (1)
BINS That Bypass VBV, 3D Security.txt
3 pages
2023 Credit Card Loading
91% (35)
2023 Credit Card Loading
16 pages
Fullz Loan Cashout Guide: Brought To You by
86% (51)
Fullz Loan Cashout Guide: Brought To You by
5 pages
Fake Loan 100% Approval
81% (36)
Fake Loan 100% Approval
6 pages
fullz ready
70% (10)
fullz ready
2 pages
Bank Account Creation
95% (20)
Bank Account Creation
10 pages
300
73% (26)
300
18 pages
Cashappmetho
88% (24)
Cashappmetho
2 pages
How To Check Client or Fullz Credit Score@Baddestupdate
88% (17)
How To Check Client or Fullz Credit Score@Baddestupdate
7 pages
How To Get US Loans
100% (12)
How To Get US Loans
13 pages
24 1k cc dump by satyam
No ratings yet
24 1k cc dump by satyam
27 pages
Random Credit Card Number Generator for Testing and Development - Namsogen 2
No ratings yet
Random Credit Card Number Generator for Testing and Development - Namsogen 2
1 page
Bin Paypal
No ratings yet
Bin Paypal
1 page
Message
No ratings yet
Message
2 pages
Irc
100% (1)
Irc
1 page
Book 2
No ratings yet
Book 2
4 pages
Message
No ratings yet
Message
4 pages
Credit Card Generator CC Generator - Akto Free Growth Tools
No ratings yet
Credit Card Generator CC Generator - Akto Free Growth Tools
1 page
CC Merged
No ratings yet
CC Merged
51 pages
Darkweb Links
No ratings yet
Darkweb Links
3 pages
!cashapp sauce
No ratings yet
!cashapp sauce
1 page
200x_CC_Generated_By_6008343239 (1)
No ratings yet
200x_CC_Generated_By_6008343239 (1)
4 pages
usa dumps by selfish
No ratings yet
usa dumps by selfish
611 pages
Extra Polar
No ratings yet
Extra Polar
4 pages
paypal
No ratings yet
paypal
56 pages
Forfeiture
No ratings yet
Forfeiture
34 pages
Nti Isaac Kofi Atm
100% (1)
Nti Isaac Kofi Atm
126 pages
CA Method Code
No ratings yet
CA Method Code
1 page
?priv
No ratings yet
?priv
2 pages
Ravenet Bot Manual
No ratings yet
Ravenet Bot Manual
3 pages
US 000 20 Orders 2 CC BR US 0 Gifts 609232
No ratings yet
US 000 20 Orders 2 CC BR US 0 Gifts 609232
1 page
Hits
No ratings yet
Hits
6 pages
Messages
No ratings yet
Messages
32 pages
Bin Amex
100% (1)
Bin Amex
1 page
A Guide To Everything Non-AVS
No ratings yet
A Guide To Everything Non-AVS
2 pages
Hotstar Text
No ratings yet
Hotstar Text
2 pages
2
No ratings yet
2
109 pages
View Notes On Shelf 2
No ratings yet
View Notes On Shelf 2
6 pages
2018 LV VPS Booking Method
No ratings yet
2018 LV VPS Booking Method
76 pages
Free Vbuck Codes - Google Pretraživanje
No ratings yet
Free Vbuck Codes - Google Pretraživanje
1 page
Cyclic Dispersion - Some Quantitative Cause-and-Effect Relationships
No ratings yet
Cyclic Dispersion - Some Quantitative Cause-and-Effect Relationships
26 pages
129241credit Card Debit Authorization Form
No ratings yet
129241credit Card Debit Authorization Form
2 pages
Ing Guide
No ratings yet
Ing Guide
7 pages
Credit Card Details (VCCGenerator.org)
No ratings yet
Credit Card Details (VCCGenerator.org)
2 pages
Master Inventory List Updated 9811
No ratings yet
Master Inventory List Updated 9811
33 pages
3) Freeloading Guide December 2024 (v1)
No ratings yet
3) Freeloading Guide December 2024 (v1)
79 pages
6529XXXXXXXXXX39_01-04-2025
No ratings yet
6529XXXXXXXXXX39_01-04-2025
2 pages
Mixed Keywords HQ by Flatearthalien
No ratings yet
Mixed Keywords HQ by Flatearthalien
5 pages
x75 NordVPN Premium by Viralsummer
No ratings yet
x75 NordVPN Premium by Viralsummer
13 pages
Active BIN Range On Issuer
100% (1)
Active BIN Range On Issuer
1 page
CC_2025-04-27.compcon
No ratings yet
CC_2025-04-27.compcon
244 pages
credit card 2
No ratings yet
credit card 2
19 pages
Fortnite Hacks Cheats 2021
No ratings yet
Fortnite Hacks Cheats 2021
4 pages
Getting started Class 2025
50% (2)
Getting started Class 2025
2 pages
New Text Document
No ratings yet
New Text Document
19 pages
CC Scrapping Method
No ratings yet
CC Scrapping Method
1 page
784652-Merchant Payout Report RN1438 STID 16895576 21-Mar-2023 08.27
No ratings yet
784652-Merchant Payout Report RN1438 STID 16895576 21-Mar-2023 08.27
5 pages
non_vbv_bins (1)
No ratings yet
non_vbv_bins (1)
5 pages
All Passwords
No ratings yet
All Passwords
1 page
Leaked Accounts (Full Lines)
No ratings yet
Leaked Accounts (Full Lines)
1 page
City of San Antonio's Vacant Structure Inventory
No ratings yet
City of San Antonio's Vacant Structure Inventory
14 pages
Check if your CC is NON-VBV
No ratings yet
Check if your CC is NON-VBV
4 pages
order_1730353404-2
No ratings yet
order_1730353404-2
4 pages
Visa Pre
No ratings yet
Visa Pre
3 pages
Free Wireless Triple Play Business Resource Center For The USA
100% (8)
Free Wireless Triple Play Business Resource Center For The USA
140 pages
Carding Info
No ratings yet
Carding Info
12 pages
Lets LOGIN
No ratings yet
Lets LOGIN
3 pages
SF Dump
No ratings yet
SF Dump
14 pages
Roswell Alien Visitors Guide
From Everand
Roswell Alien Visitors Guide
Randy Luethye
No ratings yet
Global Vaccine Tracker
No ratings yet
Global Vaccine Tracker
1 page
Dataset - Set de Datos
No ratings yet
Dataset - Set de Datos
56 pages
CH 1 Introduction
No ratings yet
CH 1 Introduction
19 pages
Solution 02
No ratings yet
Solution 02
6 pages
Youwin Internal Control
No ratings yet
Youwin Internal Control
4 pages
Practice Set
No ratings yet
Practice Set
46 pages
Valuation-of-goodwill-May-2024
No ratings yet
Valuation-of-goodwill-May-2024
5 pages
Ba7207 Business Research Methods Question Bank Edited
No ratings yet
Ba7207 Business Research Methods Question Bank Edited
9 pages
Reinforced Earth: Case Studies
No ratings yet
Reinforced Earth: Case Studies
7 pages
4 Ms
No ratings yet
4 Ms
4 pages
What Factors Influence Drug Pricing in Public Hospitals
No ratings yet
What Factors Influence Drug Pricing in Public Hospitals
16 pages
Table 6 - Mechanical Products PDF
No ratings yet
Table 6 - Mechanical Products PDF
18 pages
Rif300 Vortex Flowmeter
No ratings yet
Rif300 Vortex Flowmeter
28 pages
Diwali
No ratings yet
Diwali
8 pages
KRA TENDER DOCUMENT FOR HOUSEKEEEPING ITEMS Reflector
No ratings yet
KRA TENDER DOCUMENT FOR HOUSEKEEEPING ITEMS Reflector
21 pages
Africa Meal Project - Final
No ratings yet
Africa Meal Project - Final
32 pages
Part 2 Prelim Ha Lec Transes
No ratings yet
Part 2 Prelim Ha Lec Transes
2 pages
Rankers Learning
No ratings yet
Rankers Learning
5 pages
SLL International Cables Specialist and Sonny L
No ratings yet
SLL International Cables Specialist and Sonny L
1 page
Filtered Marketing Data - Maharastra
No ratings yet
Filtered Marketing Data - Maharastra
63 pages
Lecture 1 NSTP For Student
No ratings yet
Lecture 1 NSTP For Student
4 pages
Reasearch Paper '22 (Group 1 - Brotherhood)
No ratings yet
Reasearch Paper '22 (Group 1 - Brotherhood)
27 pages
Ethylene Glycol: Hazard Summary
No ratings yet
Ethylene Glycol: Hazard Summary
4 pages
Micro Organism
No ratings yet
Micro Organism
23 pages
Plaintiff,: The United States District Court Southern District of Florida
No ratings yet
Plaintiff,: The United States District Court Southern District of Florida
11 pages
WT11i A
No ratings yet
WT11i A
46 pages
LCA21
No ratings yet
LCA21
4 pages
AutoLumo-A1860-Operating-Manual
No ratings yet
AutoLumo-A1860-Operating-Manual
197 pages
Imso Guide
No ratings yet
Imso Guide
116 pages
Yoyo Op Manual-2010
No ratings yet
Yoyo Op Manual-2010
17 pages
Thermodynamics An Engineering Approach 6th Edition Cengel Solutions Manual download
100% (2)
Thermodynamics An Engineering Approach 6th Edition Cengel Solutions Manual download
44 pages
Algebra Sheet - 1 - Crwill
No ratings yet
Algebra Sheet - 1 - Crwill
24 pages
FAR 2MC The Accounting Process
No ratings yet
FAR 2MC The Accounting Process
5 pages
10-The-role-of-youth-in-community-action
No ratings yet
10-The-role-of-youth-in-community-action
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Air BNB Data Analysis

Uploaded by

Air BNB Data Analysis

Uploaded by

First of all I imported the libraries that are essential for our data analysis process.

We will do make our data

In [3]: import pandas as pd

I installed opendatastes to import data from the website of Kaggle.

In [4]: !pip install opendatasets

In [5]: import opendatasets as od

In [6]: dataset = 'https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata'

In [9]: data_dir = './airbnbopendata'

In [11]: frame = pd.read_csv('Airbnb_Open_Data.csv',low_memory=False)

Clean & quiet

3 1002755 NaN 85098326012 unconfirmed Garry Brooklyn Clinton Hill

... ... ... ... ... ... ... ...

585 sf Luxury Upper West

102599 rows × 26 columns

Removing the duplicates

# Drop duplicate rows based on all columns

# Count the number of rows in the updated DataFrame

Number of rows before removing duplicates: 102599

checking that do we have null values in our data.

lat long country ... service fee minimum nights \

calculated host listings count availability 365 \

Checking the Shape of the data.

In [18]: frame = frame_copy.copy()

to remove the license column I used the drop.

In [20]: frame = frame.drop("license", axis=1)

In [22]: types_of_rooms = frame['room type'].value_counts(normalize=True).mul(100).round(1)

Entire home/apt 52.4

The explanation of the code below,

In [23]: from scipy import stats

# perform ANOVA test

In [25]: # perform one-way ANOVA

# perform Tukey's HSD post-hoc test

Multiple Comparison of Means - Tukey HSD, FWER=0.05

In [26]: print(frame['neighbourhood group'].unique())

In [27]: frame['neighbourhood group'] = frame['neighbourhood group'].replace({'brookln': 'Brookly

In [28]: print(frame['neighbourhood group'].unique())

['Brooklyn' 'Manhattan' 'Queens' 'Staten Island' 'Bronx']

['Private room' 'Entire home/apt' 'Shared room' 'Hotel room']

In [30]: # group data by room type and calculate average price

# add labels and title

In [31]: import matplotlib.pyplot as plt

# create a list of labels for each neighbourhood group

# create a list of colors for each neighbourhood group

# filter the DataFrame to include only the neighbourhood group column

# create a pie chart of neighbourhood group distribution

# add legend outside the chart

# show the plot

In [32]: # create a list of labels for each neighbourhood group

# define colors for each box

# define box properties

# create a boxplot of price by neighbourhood group

# set colors for each box

# set axis labels and title

# show the plot

In [34]: plt.figure(figsize=(8, 6))

In [37]: sns.violinplot(x='room type', y='reviews per month', data=frame, palette='Blues')

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.