Competitor analysis using Latent Dirichlet Allocation
Topic modeling using Latent Dirichlet Allocation (LDA) on reviews of Baldur’s Gate 3 from Steam
Using a custom built web scraper script, I scraped a sample of 500 reviews of the game Baldur’s Gate 3 for an NLP analysis using LDA.
Analysis Summary:
- Exploratory Data Analysis (EDA): The notebook began with an exploratory analysis of the text data to understand the structure and main themes of the reviews.
- Text Preprocessing: Text data was preprocessed to prepare it for modeling. This included steps like tokenization, removal of stop words, stemming or lemmatization, and the creation of a document-term matrix.
- LDA Modeling: The LDA model was tuned with different hyperparameters and topic numbers. The aim was to identify a set of topics that are both interpretable and representative of the key themes in the reviews.
- Manual Inspection and Cleaning: Some topics initially identified contained strange terms. Performed manual inspection of the reviews associated with these terms and removed them as outliers. A new LDA model was then created using the cleaned dataset.
- Topic Naming: The final step involved naming each topic based on the collection of terms it included. The identified topics were:
- Topic 1: “Game mechanics and storytelling”
- Topic 2: “Player engagement and time investment”
- Topic 3: “RPG elements and world-building”
- Topic 4: “Game development and player expectations”
- Topic 5: “Emotional impact and experience”
- Topic 6: “Developer recognition and genre impact”
- Topic 7: “CRPG Mechanics”
- Topic 8: “Game impact on players”
- Topic 9: “General game quality assessment”
- Topic 10: “Controversial content”
- Topic 11: “Character appreciation”
Conclusions:
The analysis revealed several key themes in the reviews of Baldur’s Gate 3. These include strong player engagement, high praise for the game’s storytelling and mechanics, recognition of the developer, and discussions around the impact of the game on the RPG genre. Additionally, there was some discussion around controversial content and specific character appreciation, highlighting a broad range of player experiences and sentiments. The topics identified provide a comprehensive view of what players value and discuss in relation to the game, which can be useful for further marketing, game development, or community engagement strategies.
Click here for Github repo