IIT Madras · Business Data Management
A comprehensive academic capstone project exploring demand fluctuations, peak-hour congestion, cancellations, and driver availability using public Uber datasets. Built with a rigorous data pipeline, exploratory data analysis (EDA), temporal & spatial analysis, and clear operational recommendations.
Roll No: 22F3000694
IIT Madras BS Degree Program
Business Data Management - Capstone Project
Academic Project · May 2025 Term
Created as part of IIT Madras BS in Data Science & Applications
Global Ride-hailing Platform (B2C Model)
Founded in 2009 by Garrett Camp and Travis Kalanick, Uber started as a premium black car service in San Francisco, California. It has since expanded to become a leading mobility service provider worldwide, operating under a B2C model in the online transportation sector.
The efficiency of ride-hailing services is influenced by internal factors (driver allocation inefficiencies, suboptimal ride distribution) and external factors (traffic congestion, unpredictable demand fluctuations, geographic disparities). These inefficiencies result in longer trip durations, frequent ride cancellations, and supply-demand imbalances, impacting customer experience and profitability.
Analyzing ride demand variations across different times and regions to enhance operational efficiency, reduce wait times, and optimize driver allocation.
Identifying key factors influencing ride cancellations and driver shortages that impact customer satisfaction and service reliability.
Evaluating route optimization strategies by analyzing trip durations, congestion patterns, and fuel efficiency to enhance cost-effective operations.
Secondary data sourced from Kaggle, verified for academic use and compliance with open-source licensing.
29,101 NYC pickup records enriched with time, weather, and holiday details
6,745 ride requests between city and airport tracking outcomes and gaps
Comprehensive data quality improvement including missing value treatment, datatype standardization, and duplicate removal.
Enhanced dataset with derived temporal and contextual features for comprehensive analysis capabilities.
Computed comprehensive summary statistics (mean, median, standard deviation, quartiles) for ride demand and weather variables to understand data distributions and identify patterns in the Uber NYC dataset.
Charts: Summary statistics table
Variable | Mean | Std Dev | Min | 25% | Median | 75% | Max |
---|---|---|---|---|---|---|---|
Number of Pickups | 490.22 | 995.65 | 0.00 | 1.00 | 54.00 | 449.0 | 7883.0 |
Temperature (°F) | 47.67 | 19.81 | 2.00 | 32.00 | 46.00 | 64.50 | 89.00 |
Wind Speed (mph) | 5.98 | 3.70 | 0.00 | 3.00 | 6.00 | 8.00 | 21.00 |
Precipitation 1hr (in) | 0.0038 | 0.019 | 0.00 | 0.00 | 0.00 | 0.00 | 0.28 |
Analyzed relationships between ride demand, temporal patterns, and weather factors using Pearson correlation coefficients to identify significant predictive relationships in the NYC Uber dataset.
Charts: Correlation matrix heatmap
Variable | Pickups | Wind | Visibility | Temp | DewPoint | Pressure | Precip1h | Precip6h | Precip24h | Snow | Hour | Month |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NumberOfPickups | 1.00 | 0.01 | -0.01 | 0.05 | 0.03 | -0.02 | 0.00 | -0.00 | -0.02 | -0.01 | 0.17 | 0.05 |
WindSpeed | 0.01 | 1.00 | 0.09 | -0.29 | -0.32 | -0.09 | -0.00 | 0.02 | -0.01 | 0.10 | 0.09 | -0.27 |
Visibility | -0.01 | 0.09 | 1.00 | 0.02 | -0.23 | 0.17 | -0.49 | -0.12 | 0.00 | -0.05 | 0.04 | 0.02 |
Temp | 0.05 | -0.29 | 0.02 | 1.00 | 0.90 | -0.22 | -0.01 | -0.04 | -0.01 | -0.55 | 0.09 | 0.85 |
DewPoint | 0.03 | -0.32 | -0.23 | 0.90 | 1.00 | -0.31 | 0.12 | 0.01 | 0.00 | -0.49 | 0.01 | 0.81 |
Pressure | -0.02 | -0.09 | 0.17 | -0.22 | -0.31 | 1.00 | -0.12 | -0.14 | -0.14 | 0.14 | 0.08 | -0.55 |
Precip1h | 0.00 | -0.00 | -0.49 | -0.01 | 0.12 | -0.12 | 1.00 | 0.87 | 0.78 | 0.23 | -0.01 | 0.02 |
Precip6h | -0.00 | 0.02 | -0.12 | -0.04 | 0.01 | -0.14 | 0.87 | 1.00 | 0.95 | 0.26 | -0.01 | 0.02 |
Precip24h | -0.02 | -0.01 | 0.00 | -0.01 | 0.00 | -0.14 | 0.78 | 0.95 | 1.00 | 0.29 | -0.02 | 0.01 |
Snow | -0.01 | 0.10 | -0.05 | -0.55 | -0.49 | 0.14 | 0.23 | 0.26 | 0.29 | 1.00 | -0.01 | -0.04 |
Hour | 0.17 | 0.09 | 0.04 | 0.09 | 0.01 | 0.08 | -0.01 | -0.01 | -0.02 | -0.01 | 1.00 | -0.01 |
Month | 0.05 | -0.27 | 0.02 | 0.85 | 0.81 | -0.55 | 0.02 | 0.02 | 0.01 | -0.04 | -0.01 | 1.00 |
Examined ride demand patterns across different hours and days using time-series analysis to identify peak periods, trends, and temporal variations in NYC Uber pickups.
Charts: Daily trend line chart, hourly boxplot
Mapped demand variations across NYC boroughs and time periods to identify congestion hotspots using hour × borough heatmap analysis of average pickup patterns.
Charts: Hour × Borough heatmap
Assessed trip completion rates and supply-demand mismatches between city and airport locations to identify operational bottlenecks and service gaps.
Charts: Status distribution & hourly supply-demand gap analysis
Demand peaks during 6 PM-midnight with Manhattan showing highest concentration. Predictable temporal patterns identified.
Airport shows 50%+ unfulfilled requests due to driver shortage. City has higher user cancellation rates.
Analysis limited due to lack of route-specific data. Future scope identified for optimization.