Starbucks Sentiment Analysis Using VADER
Starbucks Sentiment Analysis Using VADER
Customer Reviews
Using VADER Analysis
The dataset is secondary to Kaggle with a. Name: The reviewer's name (simulation)
850 observations. The data was collected b. Location: The location or city associated with
by web scraping customer reviews and the reviewer, if provided.
ratings from the ConsumerAffairs website, c. Date: The date when the review was posted.
whose location is the USA. Note that this d. Rating: The star rating given by the reviewer,
dataset is for research and analysis ranges from 1 to 5.
purposes and may be subject to the e. Review: The textual content of the review,
terms and conditions specified by captures the reviewer's experience and
ConsumerAffairs. opinions
https://www.kaggle.com/datasets/ha
rshalhonde/starbucks-reviews-dataset
Flow Analysis
02 04 06
01 03 07
05
Start EDA Wordcloud End
Data Pre-Processing
A. Split the Location’s Variable and Drop The Unnecessary Variable
The location variable contains two pieces of information : state and city. It will benefit us to split the location
into two variable because we can explore the data through both city and state.
Then, we also have to drop the unnecessary variable Image_Links. We will not analyse it through sentiment
analysis.
Data Pre-Processing
B. Detecting and Handling Missing Values
The next step is to detect and drop the missing values. Dropping missing values is often done to maintain data integrity and
ensure compatibility with certain analyses or models. This approach simplifies the data by removing instances where
information is incomplete, which can be crucial for maintaining the accuracy of statistical analyses and preventing biases.
From the output above, we know that there are 145 missing values in Rating’s variable. We will remove the missing values and
check again. After removing the missing value from the output above, we know that no missing values exist, and we can
continue the analysis.
Data Pre-Processing
C. Text Data Pre-Processing
In this step, we will prepare the text data. The pre-processing of this data
contains several steps :
a. Convert Text to Lowercase
It ensures uniformity in the dataset, as the same word in different
cases is treated as identical.
b. Remove the unnecessary contents :
in this case, we’ll remove the links, @, hashtag, contraction,and
punctuation inside the text data that are not important in this analysis.
c. Tokenize The Text
Tokenization allows for a word-level analysis of the text. By breaking
down sentences into individual words, it becomes easier to
understand the semantic meaning and relationships between words
d. Remove the Stopwords
Stopwords are common words (e.g., "and," "the," "is") that often occur
frequently in a language but contribute little to the overall meaning of
a text. By removing stopwords, we reduce noise in the data, allowing
the analysis to focus on more meaningful words
e. Join The Words Back into a Cleaned Sentence
During preprocessing, the original text is often transformed into a list
of individual words or tokens. Joining these words back together
reconstructs the cleaned text, making it readable and usable for
downstream tasks
Data Pre-Processing
C. Text Data Pre-Processing
After doing the text preprocessing, the data is ready to used for analysis. We can check the data sample below, and it looks clean
enough to be an analysis object.
Exploratory Data Analysis (EDA)
A. Reviews by Rating
The bar chart shows that many customers are giving low ratings, especially rating 1, indicating possible dissatisfaction. Rating 2 in
the second place is also common, signalling moderate dissatisfaction among customers.
Exploratory Data Analysis (EDA)
B. Ratings by State
If we look at the ratings by state, we know that visually, of the 10 states with the most ratings, the majority gave a rating of 1 with the
state CA (California) as the biggest contributor.
Exploratory Data Analysis (EDA)
B. Ratings by City
But If we look at the ratings by cities, we know that visually, of the 10 states with the most ratings, the majority gave a rating of 1,
with the city New York as the biggest contributor.
The discrepancy between the highest state-level ratings and city-level contributions might be attributed to the way the data is
aggregated. While California (CA) could have the highest overall ratings when considering the entire state, at the city level, New
York might be the primary contributor to rating 1. This can create a situation where, visually, California appears to have the highest
ratings, but when broken down by city, New York stands out.
Exploratory Data Analysis (EDA)
B. Ratings by Year
From the graph below, we can see that 2017 had the most ratings collected. The public was very enthusiastic about Starbucks
through reviews and ratings.
If we look back to 2017, from the Starbucks press release annual meeting report, Starbucks boosted its digital offerings with
innovations like the expanded Mobile Order & Pay platform, which allows customers to order via Amazon Alexa and Ford
vehicles. This supports data regarding the number of people who were enthusiastic about providing ratings and reviews of
Starbucks in 2017. But unfortunately, after 2017, public interest decreased.
https://stories.starbucks.com/press/2017/press-release-starbucks-annual-meeting-2017/
VADER Sentiment Scoring
A. Employ NLTK’s SentimentIntensityAnalyzer
We will employ NLTK's SentimentIntensityAnalyzer to obtain the negativity, neutrality, and positivity scores of the text. This method
utilises a "bag of words" strategy:
a. Stop words are excluded
b. Each word is assigned a score, which is then aggregated to derive a total score.
Once the sia object is created, we can use its methods, such as
polarity_scores, to get sentiment scores, including negativity,
neutrality, and positivity, for a given text.
For example:
“The taste is so Bad!”
'neg': 0.545 - This indicates that approximately 54.5% of the text is
classified as negative.
'neu': 0.455 - This indicates that approximately 45.5% of the text is
classified as neutral.
'pos': 0.0 - This indicates that 0% of the text is classified as positive.
'compound': -0.6988 - This is an overall compound sentiment score
that combines the scores of neg, neu, and pos. In this case, the
compound score is negative, suggesting an overall negative
sentiment.
VADER Sentiment Scoring
B. Employ Sentiment Analysis on dataset
In the next step, we will employ sentiment analysis using the SentimentIntensityAnalyzer (sia) on the entire dataset (df). It iterates
through each row in the dataset, extracts the cleaned text from the 'clean' column, and calculates the polarity scores using the
polarity_scores method from the SentimentIntensityAnalyzer. The results, which include the compound sentiment score, are stored
in a dictionary (res) with the ‘name' of each row as the key.
VADER Sentiment Scoring
B. Employ Sentiment Analysis on dataset
After that, we will create a new DataFrame (vaders) that combines the sentiment analysis results with the original dataset based on
the ‘name' column. From the graph below, we know that the scores of neg, neu, pos, and compound already exist on the dataset.
VADER Sentiment Scoring
C. Sentiment Scoring vs Rating
To know that sentiment scoring is in line with rating, we will visualise the difference between sentiment scoring and rating.
From the boxplot above, when the rating is higher, the compound score will be higher too. When the rating is neutral, the
compound score will be flat, and when the rating is negative, the compound score will be lower.
Visualise The Classification using Wordcloud
A. Positive Sentiment
From the wordcloud below, we know that “Service”,”Time”,”Order”,”Employee” are those that appear more frequently in positive
reviews.
In the context of neutral reviews, the frequent appearance of the words suggests that customers are likely discussing their
orders, the store location, and specific drink sizes. This information can be valuable for understanding common topics or
aspects that customers mention without a strong positive or negative sentiment.
Conclusion and Suggestions
A. Conclusion
From the analysis that we have carried out, the following conclusions are obtained.
1. Customer Dissatisfaction Hotspots:
- The bar chart reveals a significant number of low ratings, particularly rating 1, signalling dissatisfaction.
- California states and New York City stand out as areas with the highest dissatisfaction levels, necessitating targeted
improvement efforts.
2. Digital Innovations Impact:
- The boost in digital offerings in 2017, such as the Mobile Order & Pay platform, contributed to heightened enthusiasm and
increased customer engagement.
- Post-2017, there is a noticeable decline in public enthusiasm, highlighting the need for continued innovation and adaptation to
evolving consumer expectations.
3. Service Excellence and Operational Efficiency:
- Customer reviews consistently emphasise the importance of "Service," showcasing a positive sentiment towards the quality of
service provided.
- Positive mentions of "Time" and "Order" indicate satisfaction with efficient order processing and operational excellence.
4. Challenges Leading to Negative Experiences:
- Negative reviews featuring "Time" suggest challenges related to service speed and timely order fulfilment.
- Negative experiences related to "Order" point to potential issues with accuracy, fulfilment processes, or general processing.
- The appearance of "Manager" in negative reviews implies situations requiring managerial intervention.
5. Insights from Neutral Reviews:
- Neutral reviews highlight discussions around routine topics like orders, store locations, and specific drink sizes.
- These insights can guide Starbucks in understanding common customer concerns that may not strongly lean towards either
positive or negative sentiment.
Conclusion and Suggestions
B. Suggestions
Based on the conclusions that have been presented. Some of the recommendations given are as follows.
1. Addressing Dissatisfaction Hotspots:
- Implement targeted improvement initiatives in areas with the highest dissatisfaction, particularly in California states
and New York City.
- Conduct thorough investigations into low-rated experiences to identify specific pain points and tailor solutions.
2. Revitalising Digital Strategies:
- Enhance public enthusiasm by introducing fresh digital innovations and enhancements.
- Leverage customer feedback to adapt digital offerings, ensuring they align with evolving expectations and
preferences.
3. Sustaining Service Excellence:
- Strengthen the focus on service quality, acknowledging its crucial role in overall customer satisfaction.
- Emphasise efficient order processing, operational excellence, and positive interactions with staff members.
4. Addressing Challenges for Negative Experiences:
- Tackle challenges related to service speed and order fulfilment highlighted in negative reviews.
- Implement corrective measures to address issues with order accuracy, fulfilment processes, and general processing.
5. Managerial Intervention:
- Address situations requiring managerial intervention, as indicated by the appearance of "Manager" in negative reviews.
- Provide additional training and support for staff and managers to handle challenging scenarios effectively.
6. Leveraging Insights from Neutral Reviews:
- Use insights from routine discussions in neutral reviews to enhance customer experiences.
- Consider incorporating feedback on orders, store locations, and specific drink sizes into operational improvements.
Thank You
Yulian Farid Wahyudi
yulianfarid4@gmail.com
https://www.linkedin.com/in/yulianfarid-wahyudi/