← Back to Projects
ShopSense: Market Intelligence Analysis

ShopSense: Market Intelligence Analysis

Python MySQL Pandas Plotly Streamlit Scikit-learn MLxtend Random Forest Apriori Gemini 2.0 Flash OpenRouter API
📊 View Live Dashboard

Built an end-to-end retail analytics platform processing over 1 million raw transactional records into a seven-tab interactive Streamlit dashboard. The pipeline cleans and stores 805,549 transactions across 5,878 customers and 36,969 orders into a MySQL database, then surfaces executive KPIs, customer segmentation, market basket intelligence, and ML-powered sales forecasting - all wrapped with a conversational AI Business Analyst powered by Google Gemini 2.0 Flash.

ETL & Data Pipeline
Ingested and cleaned 1,067,371 raw rows from a UK-based online retailer dataset, removing cancellations, invalid stock codes, and null customers to produce a clean 805,549-row dataset. Derived revenue, quantity, and order metrics and loaded everything into a normalized MySQL schema for downstream querying.
Executive KPI Dashboard
Surfaces key business metrics across the full dataset: total revenue of £17.7M, 5,878 unique customers, and 36,969 orders. Built interactive Plotly visualizations for monthly revenue trends, top products by revenue and volume, and country-level performance - driven directly from MySQL queries at render time.
RFM Customer Segmentation
Scored all customers on Recency, Frequency, and Monetary dimensions to produce 8 behavioral segments. Identified that Champions (22% of customers) drive 68% of total revenue (£12.1M), while At Risk customers represent over £1M in revenue at risk of churn - enabling targeted retention campaigns for each segment.
Market Basket Analysis
Applied the Apriori algorithm via MLxtend to mine 60 association rules from transaction data. Surfaced cross-sell opportunities with lift scores up to 29.9 - the strongest rule linking Blue and Pink Spotty Party Candles. Rules are filterable by support, confidence, and lift for product bundling and placement decisions.
ML Sales Forecasting
Trained a Random Forest Regressor on time-enriched features (lag values, rolling means, day-of-week, month) to predict weekly revenue. Achieved an R² of 0.707 on the holdout set, with a forecast for H1 2012 projecting £3.6M in revenue. Feature importance analysis confirmed lag-1 revenue as the dominant predictor.
AI Business Analyst
Integrated Google Gemini 2.0 Flash via OpenRouter API to create a conversational analyst layer. The AI is preloaded with full dataset context - RFM segments, basket rules, forecast figures, and behavioral patterns - and responds to natural language queries with structured, data-grounded business recommendations.
  • Champions segment (22% of customers) generate 68% of total revenue (£12.1M out of £17.7M) - confirming extreme revenue concentration and making retention of this segment the single highest-leverage business priority.
  • At Risk customers represent over £1M in revenue at risk of churn, with clear signals (declining recency and frequency) that allow targeted win-back campaigns before revenue is lost.
  • Saturday orders drop to near-zero (~30 orders vs ~7,773 on Thursdays), confirming the customer base is predominantly B2B - with ordering patterns tightly coupled to business working hours.
  • Netherlands customers average £2,430 per order compared to £439 for UK customers - identifying high-value international segments worth prioritizing for volume-based growth.
  • Market basket lift of 29.9 between Blue and Pink Spotty Party Candles represents the strongest cross-sell opportunity identified, with 60 total actionable association rules available for product bundling and placement strategy.
  • Random Forest R² = 0.707 on the holdout test set, with lag-1 weekly revenue as the most predictive feature - enabling reliable short-horizon revenue forecasting for inventory and procurement planning.