The fight against coronavirus has data science at its very core. Only by analyzing the data, can we understand how the virus spreads and predict potential outcomes of various actions (or inactions).
Data science is one of the most important tools available to governments and healthcare bodies around the world right now, but it’s also of vital importance to mass transit operators that want to manage profitability, support key workers and plan for recovery.
Based on evidence from countries inflicted by coronavirus early in the pandemic, it was clear that ridership would drop quickly and significantly. The scale would be on a different level to anything mass transit networks had previously experienced.
Data science challenges
This introduced new challenges for CitySwift's data scientists. There was no historical data that could accurately inform our algorithms, and many of the metrics usually relied on by transit operators were no longer relevant.
To help our clients through the pandemic, we had to be able to provide accurate, relevant insights that would aid their decision-making in a constantly-changing environment. This would require new techniques and daily re-training of our machine learning models as additional data became available.
Our forecasts accurately predicted that the decline would bottom out at the end of March, with ridership averaging at less than 15% of the seasonal norm. Based on this ‘new normal’, we updated our deep learning architecture and set about qualifying the correlation between demand, run time, and dwell time, along with external big data sources such as confirmed Covid-19 cases, news feeds, and transit data from cities across Europe and Asia.
Journey times had the potential to become much shorter, due to the huge drops in traffic congestion and the lower number of passengers on/offboarding. However, without proper timetable adjustments, all that would happen would be a wasteful increase in dwell time as drivers waited at stops to avoid getting ahead of schedule.
Journey times had the potential to become much shorter, due to the huge drops in traffic congestion and the lower number of passengers on/offboarding.
An important consideration was the ability to support healthcare staff, so our analysts examined ridership and ticketing data for all bus routes servicing hospitals. This provided insight into workers’ shift patterns. A simple switch to Sunday timetables would not necessarily ensure buses were available where and when they were most needed, so it was important for emergency timetables to take this information into account.
To support key workers, allow for physical distancing on board (via lower target load factors) and save valuable operating hours, we encouraged clients to optimize their timetables using SwiftSchedule, which we updated to predict run times based on our new algorithms.
As lockdown restrictions begin to lift, it will be important for clients to continue using SwiftSchedule on a regular basis, so that they can make incremental changes to their timetables based on the gradual re-growth in ridership and a corresponding drop in vehicle speeds as traffic levels start to return to pre-coronavirus levels.
From a data science perspective, the process of tagging dates with key events has been vital to our understanding of their impact, particularly given the short timeframes involved and the magnitude of their effect. Key stimuli such as schools restarting; stores, bars, and restaurants reopening; and staged worker reinstatement will all affect predicted ridership and runtime.
It’s crucial that transit operators are ready to act on these changes. To ensure the right levels of passenger capacity are reintroduced at the right times, we will be incorporating recovery data sources from around the world into our predictive modeling.
To ensure the right levels of passenger capacity are reintroduced at the right times, we will be incorporating recovery data sources from around the world into our predictive modeling.
Looking forward, the work conducted (from home) during the coronavirus pandemic has enriched our machine learning capabilities, making our predictions more robust to extreme events and their impact better understood. Data modeled during this highly unusual period will act as baselines for fastest runtimes and lowest-demand, with the insights gained held in the memorandum as code, sweat, and predictions.
Matthew Doodes is a Senior Data Scientist at CitySwift. He has previously helped Barclays predict fraud using big data and worked on a project to optimize the logistics of the UK’s fastest-growing port and deep-sea container terminal.