BDM Capstone Project

Analytical Approach & Methodology

1
Data Collection & Overview

Data Collection

Secondary data sourced from Kaggle, verified for academic use and compliance with open-source licensing.

• Data Type: Secondary (publicly available)
• Source: Kaggle verified repositories
• Licensing: Academic use approved

Dataset Summaries

Uber NYC Enriched Pickups

Dataset Link

29,101 NYC pickup records enriched with time, weather, and holiday details

City–Airport Request Data

Dataset Link

6,745 ride requests between city and airport tracking outcomes and gaps

2
Data Cleaning & Feature Engineering

Data Cleaning

Comprehensive data quality improvement including missing value treatment, datatype standardization, and duplicate removal.

• Missing value imputation
• Datetime parsing and timezone handling
• Duplicate record identification and removal

Feature Engineering

Enhanced dataset with derived temporal and contextual features for comprehensive analysis capabilities.

• Temporal features (hour, day, month, day-of-week)
• Categorical variable standardization

3
Analysis Methods & Key Findings

1. Descriptive Statistics

Computed comprehensive summary statistics (mean, median, standard deviation, quartiles) for ride demand and weather variables to understand data distributions and identify patterns in the Uber NYC dataset.

Key Findings:

• Average hourly pickups ≈ 490, but distribution is highly skewed (median only 54, maximum up to 7,883)
• Ride demand shows substantial variability, with peaks during busy borough hours (e.g., Manhattan)
• Weather variables show limited impact on ride demand patterns

Charts: Summary statistics table

Summary Table

Variable	Mean	Std Dev	Min	25%	Median	75%	Max
Number of Pickups	490.22	995.65	0.00	1.00	54.00	449.0	7883.0
Temperature (°F)	47.67	19.81	2.00	32.00	46.00	64.50	89.00
Wind Speed (mph)	5.98	3.70	0.00	3.00	6.00	8.00	21.00
Precipitation 1hr (in)	0.0038	0.019	0.00	0.00	0.00	0.00	0.28

2. Correlation Analysis

Analyzed relationships between ride demand, temporal patterns, and weather factors using Pearson correlation coefficients to identify significant predictive relationships in the NYC Uber dataset.

Key Findings:

• Weather variables show very weak or near-zero correlation with ride demand
• Hour of Day has a modest positive correlation with ride volumes, reflecting daily cycles in demand
• Strong correlations exist among weather variables (temperature, dew point, month), consistent with seasonal climate patterns in NYC

Charts: Correlation matrix heatmap

Correlation Matrix

Variable	Pickups	Wind	Visibility	Temp	DewPoint	Pressure	Precip1h	Precip6h	Precip24h	Snow	Hour	Month
NumberOfPickups	1.00	0.01	-0.01	0.05	0.03	-0.02	0.00	-0.00	-0.02	-0.01	0.17	0.05
WindSpeed	0.01	1.00	0.09	-0.29	-0.32	-0.09	-0.00	0.02	-0.01	0.10	0.09	-0.27
Visibility	-0.01	0.09	1.00	0.02	-0.23	0.17	-0.49	-0.12	0.00	-0.05	0.04	0.02
Temp	0.05	-0.29	0.02	1.00	0.90	-0.22	-0.01	-0.04	-0.01	-0.55	0.09	0.85
DewPoint	0.03	-0.32	-0.23	0.90	1.00	-0.31	0.12	0.01	0.00	-0.49	0.01	0.81
Pressure	-0.02	-0.09	0.17	-0.22	-0.31	1.00	-0.12	-0.14	-0.14	0.14	0.08	-0.55
Precip1h	0.00	-0.00	-0.49	-0.01	0.12	-0.12	1.00	0.87	0.78	0.23	-0.01	0.02
Precip6h	-0.00	0.02	-0.12	-0.04	0.01	-0.14	0.87	1.00	0.95	0.26	-0.01	0.02
Precip24h	-0.02	-0.01	0.00	-0.01	0.00	-0.14	0.78	0.95	1.00	0.29	-0.02	0.01
Snow	-0.01	0.10	-0.05	-0.55	-0.49	0.14	0.23	0.26	0.29	1.00	-0.01	-0.04
Hour	0.17	0.09	0.04	0.09	0.01	0.08	-0.01	-0.01	-0.02	-0.01	1.00	-0.01
Month	0.05	-0.27	0.02	0.85	0.81	-0.55	0.02	0.02	0.01	-0.04	-0.01	1.00

Correlation Scale:

-1 0 +1

3. Temporal Analysis

Examined ride demand patterns across different hours and days using time-series analysis to identify peak periods, trends, and temporal variations in NYC Uber pickups.

Key Findings:

• Ride demand shows a clear upward growth trend over months, with occasional dips due to low-demand days or disruptions
• Demand is highly volatile, punctuated by spikes driven by events, holidays, and weather
• Daily cycle patterns are evident: lowest demand between midnight–5 AM, rising after 6 AM, and peaking between 6 PM–midnight
• Evening hours consistently record the highest ride activity with periodic surges

Charts: Daily trend line chart, hourly boxplot

Temporal Analysis Charts

Daily Uber Pickups in NYC (Jan-Jun 2015)

Uber Pickups by Hour of Day

4. Spatial-Temporal Patterns

Mapped demand variations across NYC boroughs and time periods to identify congestion hotspots using hour × borough heatmap analysis of average pickup patterns.

Key Findings:

• Manhattan dominates ride demand, showing consistently high pickup volumes throughout the day
• Demand in Manhattan peaks during late afternoon and evening hours, driven by business, commuters, and nightlife
• Other boroughs (Brooklyn, Bronx, Queens, Staten Island) show much lower and steadier demand, with only slight increases during peak commuting hours

Charts: Hour × Borough heatmap

Average Pickups by Hour and Borough

NYC Uber Pickups: Hourly Patterns by Borough

5. Operational Gap Analysis

Assessed trip completion rates and supply-demand mismatches between city and airport locations to identify operational bottlenecks and service gaps.

Key Findings:

• Airport's main issue is a severe supply shortage, with "no cars available" being its biggest problem
• The "no cars available" rate at the airport is consistently high all day, indicating a persistent gap
• City's main issue is user behavior, with cancellations being the primary reason for unfulfilled rides
• Cancellation rates in the city fluctuate sharply, peaking during morning and evening rush hours

Charts: Status distribution & hourly supply-demand gap analysis

Operational Gap Analysis Charts

Distribution of Ride Request Statuses by Pickup Point

Supply-Demand Gap by Hour and Pickup Point

Interpreting Results & Recommendations

Peak-Hour Analysis

Demand peaks during 6 PM-midnight with Manhattan showing highest concentration. Predictable temporal patterns identified.

Key Recommendations:

• Implement dynamic pricing during evening rush hours
• Launch promotional campaigns for low-demand time periods
• Partner with traffic authorities for congestion management

Operational Gaps

Airport shows 50%+ unfulfilled requests due to driver shortage. City has higher user cancellation rates.

Key Recommendations:

• Create guaranteed earnings program for airport pickups
• Deploy real-time tracking with accurate arrival estimates
• Integrate backup transportation services during peak demand

Route Planning

Analysis limited due to lack of route-specific data. Future scope identified for optimization.

Future Recommendations:

• Build machine learning models for intelligent route planning
• Test new routing algorithms through controlled pilot studies
• Establish driver reporting system for navigation improvement

Strategic Implementation Summary

Immediate Actions:

• Deploy surge pricing during peak hours (6-12 PM)
• Launch airport driver incentive program
• Implement real-time wait time updates

Long-term Strategy:

• Develop predictive demand forecasting
• Integrate traffic data for route optimization
• Establish performance monitoring framework

Analyzing Ride Patterns & Pricing Strategies in Uber

Project Author

Achal Deep

Organization Background

Uber Rides

Service Portfolio

Global Presence

Expanded Services

Problem Statement & Background

Background

Problem Statement

1 Investigating Ride Demand Fluctuations and Peak-Hour Congestion

2 Assessing Ride Cancellations and Driver Availability Gaps

3 Optimizing Route Planning for Efficiency

Analytical Approach & Methodology

1 Data Collection & Overview

Data Collection

Dataset Summaries

Uber NYC Enriched Pickups

City–Airport Request Data

2 Data Cleaning & Feature Engineering

Data Cleaning

Feature Engineering

3 Analysis Methods & Key Findings

1. Descriptive Statistics

Key Findings:

Summary Table

2. Correlation Analysis

Key Findings:

Correlation Matrix

3. Temporal Analysis

Key Findings:

Temporal Analysis Charts

Daily Uber Pickups in NYC (Jan-Jun 2015)

Uber Pickups by Hour of Day

4. Spatial-Temporal Patterns

Key Findings:

Average Pickups by Hour and Borough

NYC Uber Pickups: Hourly Patterns by Borough

5. Operational Gap Analysis

Key Findings:

Operational Gap Analysis Charts

Distribution of Ride Request Statuses by Pickup Point

Supply-Demand Gap by Hour and Pickup Point

Interpreting Results & Recommendations

Peak-Hour Analysis

Key Recommendations:

Operational Gaps

Key Recommendations:

Route Planning

Future Recommendations:

Strategic Implementation Summary

Immediate Actions:

Long-term Strategy:

1
Data Collection & Overview

2
Data Cleaning & Feature Engineering

3
Analysis Methods & Key Findings