Bike-Sharing Service Demand Analysis

A Business Analysis utilizing Probability Theory

Bagus Guntur Farisa
6 min readApr 8, 2023

Introduction

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position.

Currently, there are about over 500 bike-sharing programs around the world which is composed of

over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

To provide a good service to its users, especially in peak hours, the company wants to understand how the number of orders are distributed and how various factors affect the distribution.

Dataset Information

Bike-sharing rental process is highly correlated to the environmental and seasonal settings. For instance, weather conditions, precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors.

The core data set is related to the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is publicly available in http://capitalbikeshare.com/system-data. The data is aggregated on two hourly basis and then

extracted and added the corresponding weather and seasonal information. Weather information are extracted from http://www.freemeteo.com.

The final dataset can be accessed here.

Question 1: What is the expected value of the total order of the service?

In these analysis, the total number of order(s) are broken down into 2 frequencies:

  1. Hourly
  2. Daily

Hourly number of order

The hourly number of order(s) has an exponential distribution with a mean (expected value) of 189.46. Based on this article, the standard deviation of exponential distribution is always equal to its mean .

So it could be said that the hourly number of order will likely to range between 0–379 orders per hour.

Daily number of order

On the other hand, the daily number of order(s) has a normal distribution with a mean (expected value) of 4504.35 order per day with a standard deviation of 1935.89.

The hourly number of order will likely to range between 2568–6440 orders per day.

Question 2: How given information about the weather and temperature will impact the number of order?

Weather

Based on the boxplot of the total order by the weather situation, number of order has the highest median and varied the most in a good weather (weathersit =1)

Weathe code description

1: Clear, Few clouds, Partly cloudy, Partly cloudyf

2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

Hourly number of order given the weather situation

The probability of number of order to exceed twice its expected value (order > 378.92) is 0.135.

But, given the information of good weather (weathersit = 1), the probability of the order exceeding twice its expected value increases to 0.17, indicating the connection of good weather to the increase of the service demand.

Temperature

The hourly temperature data has a multimodal distribution with 15 and 25 degree celsius as the tops.

The probability of number of order to exceed twice its expected value (order > 378.92) is 0.135.

Given the information of temperature is higher than 15 celcius, the probability of the number of orders exceeding its expected value increases to 0.19.

And given the temperature is higher than 25 celcius, the probability increases to 0.26.

Thus, the warmer the temperature, the higher the demand of the bike-sharing service.

Question 3: How varied is the number of orders throughout the year?

By month

The number of orders (hourly) has the highest median between mid summer and fall (May-September). The random variables also vary the most during those months.

By weekday

During weekday, the median of number of orders are relatively higher than its weekend counterpart. Not only that, the number of orders tend to consist a lot of outlier random variables.

It’s indicating that “peak hours” tend to happen in weekdays.

Question 4: How each user segment (registered/casual) contribute to the total order during weekdays and weekend?

Registered users

Number of order(s) from registered users has a higher correlation with the number of total order(s) during the weekdays than the weekend.

This can be said that registered users tend to use the service in the weekdays.

Casual users

On the other hand. number of order(s) from casual users has a higher correlation with the number of total order(s) during the weekend than the weekdays.

Thus, casual users tend to use the service in the weekends.

It’s good to note that both insights couldn’t be inferred unless we group the data into two categories (weekday and weekend) as showcased in diagrams above.

Question 5: Hypothesis Testing on the mean of daily total orders.

Problem statement

Based on the sample collected (x̄=4190.40 , n=73), the company wants to check whether the mean of daily number of order is significantly lower than 4504.35 (std=1935.89).

Null Hypothesis (H₀): 𝐻𝑜 = The mean of daily number of order is equal or more than 4504.35 orders/day.
µ >= 4504.35

Alternate Hypothesis (H₁): 𝐻𝑎 = The mean of daily number of order is less than 4504.35 orders/day.
µ < 4504.35

Level of significance: 5% (0.05)

Calculation

The z-test on one proportion is carried out using the formula outlined below.

Z-score = (4190.40–4504.35)/(1935.89/(73**1/2))

Z-score = -1.38

Conclusion

Using the Z score table from this Z-table, we can determine the probability of getting a Z score <= -1.38 is 0.083. 𝑃(𝑧 ≤ -1.38) = 0.083.

The p-value (0.083) is greater than the level of significance (0.05),

Thus, we can’t reject the Null hypothesis. We do not have sufficient evidence to say that the mean of daily total order is less than 4504.35 orders/day.

Further Research

Due to lack of time and expertise, I’m yet to do any hypothesis testing on the number of hour total orders since it has an exponential distribution.

Further hypothesis testing on the hourly number of orders could give more granular insights and new understanding of the data.

Reference:

  1. ScienceDirect: Exponential Distribution
  2. Statology: PMF
  3. Statology: Z-test Using Python
  4. Towards Data Science: Z-test
  5. Towards Data Science: Conditional Probability

Working Document

--

--