CitySwift

The bus data revolution: Becoming data-ready

BlogBlog
Tips
The bus data revolution: Becoming data-ready

Many transit providers are already data-driven to a large extent, but they’ll soon need to become even more data-focused thanks to the pandemic and the wider changes in society we are starting to see, which are likely to rapidly expand over the next few years. 

In the second part of a series on the bus data revolution (read part 1), Sarah McCartan, Head of Insights at CitySwift, explains what bus operators should do to ensure they are ‘data-ready’.

As green initiatives come to the fore, data will play a role in optimizing fleets. Meanwhile, there are many initiatives around the world that promise to place revitalized public transit at the heart of modal shift strategies that aim to help counter the threat of climate change. 

The upshot of this is that it will become ever more important for bus operators to focus on their data in order to make informed decisions about their networks and implement them quickly. That means the right data will need to be easily accessible to them.

It will become ever more important for bus operators to focus on their data in order to make informed decisions about their networks and implement them quickly.

Data must be clean and consistent if you’re going to be able to extract usable insights; after all, you must be able to trust your outcomes. Typically, operational data comes in its rawest form – for example, GPS pings from AVL equipment. Each and every day, systems such as this generate thousands and thousands of rows of complex information. These are real-world datasets that contain huge amounts of information. And, while this information can be incredibly useful, the sheer amount of it can potentially lead to problems and pitfalls – the more data there is, the more potential there is for problems, for example, due to inconsistent, missing or even corrupted data.

Here’s a simple example of how things can go wrong. How many different ways can you think of writing today’s date? You could go year/month/day, or how about day/month/year, or even month/day/year? What if you replace the slashes with dashes? Or use words instead of numbers? If different ways of writing today’s date enter your dataset, you’ve got a potential problem – a quick search for data relating to July 2nd could suddenly start throwing up information for February 7th! Your dataset can suddenly stop making sense and it will quickly become clear that you can no longer trust the outputs. That’s why it’s really important to ensure data consistency.

You may have heard the expression, ‘garbage in, garbage out’? If you put bad data into a model, no amount of interrogation or impressive machine learning techniques will deliver a wholly accurate and dependable outcome. 

So how can you ensure you have the best possible dataset? The key is ensuring you have good data in the first place - and that it remains that way - to create outcomes that you can wholly trust. 

The key is ensuring you have good data in the first place - and that it remains that way - to create outcomes that you can wholly trust.

How CitySwift can help

CitySwift’s specialist bus data engine has the processes in place to ensure your data is clean and consistent.

It takes in all sorts of data that, on the face of it, should follow the same format, but in reality can vary between systems, between operators... even between different company depots. After all, people make data (or create the systems that generate it), so it’s human nature that there may be quirks and differences within datasets. It’s not that any particular format is wrong, far from it, it’s just about picking one format and running with it throughout all of your data, in order to generate outcomes you can trust.

Our data experts have carefully scrutinized all of the different data formats and standards and developed techniques to automatically transform everything into a single and consistent format that can be combined with other strands of information. We even have anomaly detection processes that can flag potential data issues that may require further investigation.

The data engine also sense-checks data from a business and commonsense perspective. A dataset containing average bus speeds may look correct from a technical standpoint - after all, it’s just a number - but if it suggests an average speed of 90kph was achieved in a congested city center location, then it doesn’t make any sense at all from a business perspective!

Of course, bus operators’ datasets are continually growing. So as the data pipeline refreshes the datasets every few minutes with new information, every single piece of new data goes through those quality checks to ensure it is as clean and as accurate as possible.

The result of all this scrutiny is your ability to make important decisions about your network with confidence – and to speed up your decision-making processes. Things can and do change quickly - just look at the impact of Covid - and operators need to make changes (and implement them) fast. If datasets aren’t clean and accurate, somebody will have to spend a lot of time manually extracting the relevant data to get it into a usable format – and perform a lot of checks to ensure that what you are left with is an accurate representation. And that’s before you can even start doing your analysis! 

The result of all this scrutiny is your ability to make important decisions about your network with confidence – and to speed up your decision-making processes.

We work hard to ensure those dates end up in the right format, those average speeds look like they reflect conditions on the ground and that we’re not duplicating or missing anything. In short, we do a lot of work backstage to ensure you have the best possible dataset, the power of which can be unleashed at your fingertips. 

That means you can trust any analysis that is performed on it, so any subsequent insights or recommendations, such as how you are going to improve your network or service reliability, will be wholly accurate and trustworthy. 

Sarah McCartan is Head of Insights at CitySwift. Read part 1 of our Bus Data Revolution series here.

Learn more about CitySwift’s specialist bus data engine, request a demo or contact us to discover how CitySwift can help you get data-ready.

Data-Led Optimization of the Bus Sector