IIT Madras · Business Data Management

Analyzing Ride Patterns & Pricing Strategies in Uber

A comprehensive academic capstone project exploring demand fluctuations, peak-hour congestion, cancellations, and driver availability using public Uber datasets. Built with a rigorous data pipeline, exploratory data analysis (EDA), temporal & spatial analysis, and clear operational recommendations.

Project Author

Achal Deep

Roll No: 22F3000694

IIT Madras BS Degree Program

Business Data Management - Capstone Project

Academic Project · May 2025 Term
Created as part of IIT Madras BS in Data Science & Applications

Organization Background

Uber Rides

Global Ride-hailing Platform (B2C Model)

Founded in 2009 by Garrett Camp and Travis Kalanick, Uber started as a premium black car service in San Francisco, California. It has since expanded to become a leading mobility service provider worldwide, operating under a B2C model in the online transportation sector.

Service Portfolio

  • UberX: Affordable everyday rides
  • Uber Pool: Shared rides for cost savings
  • Uber Green: Electric vehicle rides
  • Uber Comfort: Premium ride experience

Global Presence

70+
Countries
10K+
Cities
2009
Founded
B2C
Business Model

Expanded Services

Uber Eats - Food delivery platform
Uber Freight - Logistics solutions
Micro-mobility - Bikes and scooters

Problem Statement & Background

Background

The efficiency of ride-hailing services is influenced by internal factors (driver allocation inefficiencies, suboptimal ride distribution) and external factors (traffic congestion, unpredictable demand fluctuations, geographic disparities). These inefficiencies result in longer trip durations, frequent ride cancellations, and supply-demand imbalances, impacting customer experience and profitability.

Problem Statement

1 Investigating Ride Demand Fluctuations and Peak-Hour Congestion

Analyzing ride demand variations across different times and regions to enhance operational efficiency, reduce wait times, and optimize driver allocation.

2 Assessing Ride Cancellations and Driver Availability Gaps

Identifying key factors influencing ride cancellations and driver shortages that impact customer satisfaction and service reliability.

3 Optimizing Route Planning for Efficiency

Evaluating route optimization strategies by analyzing trip durations, congestion patterns, and fuel efficiency to enhance cost-effective operations.

Analytical Approach & Methodology

1
Data Collection & Overview

Data Collection

Secondary data sourced from Kaggle, verified for academic use and compliance with open-source licensing.

  • Data Type: Secondary (publicly available)
  • Source: Kaggle verified repositories
  • Licensing: Academic use approved

Dataset Summaries

Uber NYC Enriched Pickups
Dataset Link

29,101 NYC pickup records enriched with time, weather, and holiday details

City–Airport Request Data
Dataset Link

6,745 ride requests between city and airport tracking outcomes and gaps

2
Data Cleaning & Feature Engineering

Data Cleaning

Comprehensive data quality improvement including missing value treatment, datatype standardization, and duplicate removal.

  • • Missing value imputation
  • • Datetime parsing and timezone handling
  • • Duplicate record identification and removal

Feature Engineering

Enhanced dataset with derived temporal and contextual features for comprehensive analysis capabilities.

  • • Temporal features (hour, day, month, day-of-week)
  • • Categorical variable standardization

3
Analysis Methods & Key Findings

1. Descriptive Statistics

Computed comprehensive summary statistics (mean, median, standard deviation, quartiles) for ride demand and weather variables to understand data distributions and identify patterns in the Uber NYC dataset.

Key Findings:
  • • Average hourly pickups ≈ 490, but distribution is highly skewed (median only 54, maximum up to 7,883)
  • • Ride demand shows substantial variability, with peaks during busy borough hours (e.g., Manhattan)
  • • Weather variables show limited impact on ride demand patterns

Charts: Summary statistics table

Summary Table
Variable Mean Std Dev Min 25% Median 75% Max
Number of Pickups 490.22 995.65 0.00 1.00 54.00 449.0 7883.0
Temperature (°F) 47.67 19.81 2.00 32.00 46.00 64.50 89.00
Wind Speed (mph) 5.98 3.70 0.00 3.00 6.00 8.00 21.00
Precipitation 1hr (in) 0.0038 0.019 0.00 0.00 0.00 0.00 0.28

2. Correlation Analysis

Analyzed relationships between ride demand, temporal patterns, and weather factors using Pearson correlation coefficients to identify significant predictive relationships in the NYC Uber dataset.

Key Findings:
  • • Weather variables show very weak or near-zero correlation with ride demand
  • • Hour of Day has a modest positive correlation with ride volumes, reflecting daily cycles in demand
  • • Strong correlations exist among weather variables (temperature, dew point, month), consistent with seasonal climate patterns in NYC

Charts: Correlation matrix heatmap

Correlation Matrix
Variable Pickups Wind Visibility Temp DewPoint Pressure Precip1h Precip6h Precip24h Snow Hour Month
NumberOfPickups 1.00 0.01 -0.01 0.05 0.03 -0.02 0.00 -0.00 -0.02 -0.01 0.17 0.05
WindSpeed 0.01 1.00 0.09 -0.29 -0.32 -0.09 -0.00 0.02 -0.01 0.10 0.09 -0.27
Visibility -0.01 0.09 1.00 0.02 -0.23 0.17 -0.49 -0.12 0.00 -0.05 0.04 0.02
Temp 0.05 -0.29 0.02 1.00 0.90 -0.22 -0.01 -0.04 -0.01 -0.55 0.09 0.85
DewPoint 0.03 -0.32 -0.23 0.90 1.00 -0.31 0.12 0.01 0.00 -0.49 0.01 0.81
Pressure -0.02 -0.09 0.17 -0.22 -0.31 1.00 -0.12 -0.14 -0.14 0.14 0.08 -0.55
Precip1h 0.00 -0.00 -0.49 -0.01 0.12 -0.12 1.00 0.87 0.78 0.23 -0.01 0.02
Precip6h -0.00 0.02 -0.12 -0.04 0.01 -0.14 0.87 1.00 0.95 0.26 -0.01 0.02
Precip24h -0.02 -0.01 0.00 -0.01 0.00 -0.14 0.78 0.95 1.00 0.29 -0.02 0.01
Snow -0.01 0.10 -0.05 -0.55 -0.49 0.14 0.23 0.26 0.29 1.00 -0.01 -0.04
Hour 0.17 0.09 0.04 0.09 0.01 0.08 -0.01 -0.01 -0.02 -0.01 1.00 -0.01
Month 0.05 -0.27 0.02 0.85 0.81 -0.55 0.02 0.02 0.01 -0.04 -0.01 1.00
Correlation Scale:
-1 0 +1

3. Temporal Analysis

Examined ride demand patterns across different hours and days using time-series analysis to identify peak periods, trends, and temporal variations in NYC Uber pickups.

Key Findings:
  • • Ride demand shows a clear upward growth trend over months, with occasional dips due to low-demand days or disruptions
  • • Demand is highly volatile, punctuated by spikes driven by events, holidays, and weather
  • • Daily cycle patterns are evident: lowest demand between midnight–5 AM, rising after 6 AM, and peaking between 6 PM–midnight
  • • Evening hours consistently record the highest ride activity with periodic surges

Charts: Daily trend line chart, hourly boxplot

Temporal Analysis Charts
Daily Uber Pickups in NYC (Jan-Jun 2015)
Uber Pickups by Hour of Day

4. Spatial-Temporal Patterns

Mapped demand variations across NYC boroughs and time periods to identify congestion hotspots using hour × borough heatmap analysis of average pickup patterns.

Key Findings:
  • • Manhattan dominates ride demand, showing consistently high pickup volumes throughout the day
  • • Demand in Manhattan peaks during late afternoon and evening hours, driven by business, commuters, and nightlife
  • • Other boroughs (Brooklyn, Bronx, Queens, Staten Island) show much lower and steadier demand, with only slight increases during peak commuting hours

Charts: Hour × Borough heatmap

Average Pickups by Hour and Borough
NYC Uber Pickups: Hourly Patterns by Borough

5. Operational Gap Analysis

Assessed trip completion rates and supply-demand mismatches between city and airport locations to identify operational bottlenecks and service gaps.

Key Findings:
  • • Airport's main issue is a severe supply shortage, with "no cars available" being its biggest problem
  • • The "no cars available" rate at the airport is consistently high all day, indicating a persistent gap
  • • City's main issue is user behavior, with cancellations being the primary reason for unfulfilled rides
  • • Cancellation rates in the city fluctuate sharply, peaking during morning and evening rush hours

Charts: Status distribution & hourly supply-demand gap analysis

Operational Gap Analysis Charts
Distribution of Ride Request Statuses by Pickup Point
Supply-Demand Gap by Hour and Pickup Point

Interpreting Results & Recommendations

Peak-Hour Analysis

Demand peaks during 6 PM-midnight with Manhattan showing highest concentration. Predictable temporal patterns identified.

Key Recommendations:
  • • Implement dynamic pricing during evening rush hours
  • • Launch promotional campaigns for low-demand time periods
  • • Partner with traffic authorities for congestion management

Operational Gaps

Airport shows 50%+ unfulfilled requests due to driver shortage. City has higher user cancellation rates.

Key Recommendations:
  • • Create guaranteed earnings program for airport pickups
  • • Deploy real-time tracking with accurate arrival estimates
  • • Integrate backup transportation services during peak demand

Route Planning

Analysis limited due to lack of route-specific data. Future scope identified for optimization.

Future Recommendations:
  • • Build machine learning models for intelligent route planning
  • • Test new routing algorithms through controlled pilot studies
  • • Establish driver reporting system for navigation improvement

Strategic Implementation Summary

Immediate Actions:
  • • Deploy surge pricing during peak hours (6-12 PM)
  • • Launch airport driver incentive program
  • • Implement real-time wait time updates
Long-term Strategy:
  • • Develop predictive demand forecasting
  • • Integrate traffic data for route optimization
  • • Establish performance monitoring framework

© 2025 Achal Deep · IITM BS Degree Program · For academic use only

License: MIT - see LICENSE in repository · Project Repository: bdm-project