Revolutionizing Traffic Planning with Deep Neural Networks: A Guide to Accurate Real-time Forecasting

7 min readNov 30, 2022

Traffic forecasting is a crucial component of any intelligent traffic system, providing valuable insights for urban planners, traffic managers, and control systems. But despite its importance, traffic forecasting remains a challenging and elusive task. The complex topology of urban road networks and the ever-changing nature of traffic patterns make it difficult to predict traffic with any degree of accuracy. But what if there were a way to unlock the secrets of traffic forecasting and gain a real-time understanding of traffic patterns?

In this post, we’ll specifically look at a problem of bicycle stations occupancy forecasting using a deep neural network-based traffic forecasting method, the temporal graph convolutional network (T-GCN) model, which is a combination of graph convolutional network (GCN) and the gated recurrent unit (GRU) capturing both spatial and temporal dependences simultaneously. The GCN is used to learn complex topological structures for capturing spatial dependence, and the gated recurrent unit is used to learn dynamic changes in traffic data for capturing temporal dependence — stick with me, we will demystify all the terms one by one.

💡If you drive 10 miles to work every day in a car that gets 22 miles per gallon, hop on your bike instead and you’ll save 1,863 pounds of CO2 emissions (according to the EPA), an amount equivalent to about 5% of the average family of two’s yearly CO2 emissions.

Bicikelj initiative offers environmentally friendly public transportation and over the years they managed to expand and control over 80 stations with more than 800 active bikes. This comes with its issues and while the number of active users rapidly grew over the first 5 years, Bicikelj lost about half of its clients in the last 3 years. A partial cause of this is due to the trends and Coronavirus since many people started to work from home, but another reason is that many clients are unhappy since its not rare that they either encounter empty stations with no bikes or stations full of bikes and they are unable to return and dock their bikes — basically bad user experience.

Proposed solution

In the following sections, we address this particular issue and propose a deep neural network model that will help manage the network more efficiently and not only improve the distribution of the bikes but eventually also automatize the management of such a system.

As already mentioned before, the Bicikelj network imposes a challenging task due to its complex spatial and temporal dependencies.

Spatial Dependences

The change in station occupancy is influenced by the topological structure of the urban road network. The station occupancy on the edge of the town impacts stations in the town through the transfer effect, and the station occupancy in the town impacts station occupancy on the edge of the town through the feedback effect. The following network architecture is used to capture and learn these dependencies.

Temporal Dependences

The station occupancy changes dynamically over time and is mainly reflected in periodicity and trends. Affected by weather, working days, time of the day, etc. The following network architecture is used to capture and learn these dependencies.

💡The world population is expected to grow from 8 billion people to 10 billion by 2050 and consequently the number of vehicles on the road is not going to reduce anytime soon. Hence there’s a high demand for developing smart and sustainable transportation systems which often come hand in hand with AI/ML.

If you’re a newbie in AI/ML stuff I wouldn’t worry too much about the architecture itself, since it can take you quite a bit of time and knowledge to understand why things are the way they are. If you really want to understand it, I highly suggest starting with Recurrent Neural Networks (RNN) networks and graph convolutional networks (GCN) in general. Still, I’m more than happy to answer any questions you might have. In subsections, we briefly look at each component in a bit more detail.

Input parameters

In the left-most part of our model above, there are two input parameters (matrices) the model accepts during training. Namely, the Adjacency matrix, which is a constant matrix capturing temporal dependence between Bicikelj stations predetermined using statistical methods. And the Feature matrix is a T x P matrix, where T is the number of Bicikelj stations and P is the number of sampled states of each station in fixed intervals.

Modeling Dependences

Given an adjacency matrix A and the feature matrix X, the GCN model constructs a filter in the Fourier domain. The filter, acting on the nodes of a graph, captures spatial features between the nodes by its first-order neighborhood.

On the other hand, GRU obtains the stations' occupancy information at time t by taking the hidden state at time t-1 and the current occupancy information as inputs. While capturing the traffic information at the current moment, the model still retains the changing trend of historical traffic information and can capture temporal dependence.

I decided to spare you all the equations, but if someone is interested in all the dirty details, I can share the full paper.

Results and Evaluation

Here are some preliminary results of the proof-of-concept design:

the X-axis is time in minutes
the Y-axis is the number of bikes at a station

Evaluation results for 4 bicycle stations

The blue line is the network prediction result, while the orange line is the ground truth. I intentionally left the prediction continuous, such that no information is suppressed, and of course, there can be no negative number of bikes on the station.

The graphs give an indication that the proposed network is able to extract knowledge from the training data, but at the same time, I was only able to test on my NVIDIA GeForce GTX 1650 Ti which is relatively low-power compared to the amount of data the network would need to learn the complex nature of Bicikelj station occupancy.

Additionally, I operated on very little data so I had to utilize Regularization techniques such as Dropout and Weight decay to countermeasure that. In general, the model learns relatively fast e.g. for 200 epochs it took about ~15 minutes, also highly depending upon the depth and width of the network. Since the activation function on the last layer is linear, the loss is evaluated using Mean Squared Error (MSE) with optional regularization.

It was a bit expected that the GCN network will under-perform since the Adjacency matrix is representing geographical dependency between stations rather than a statistically evaluated matrix, showing the probability of a bike going from station A to station B. In contrast, GRU performs fairly well in all tests, successfully capturing the temporal dependence. Since the GCN network under-performed, that also highly affected the T-GCN network, hiding its true potential in such an application. As stated before, the greatest performance boost was achieved by increasing the complexity of the model ie. the number of hidden layers, and fine-tuning the learning rate, while preserving the number of epochs below 200 to prevent over-fitting.

From the graphs, we can also determine several challenges that need to be addressed while constructing the network. For example in the bottom-left corner, we can assume that the ground truth number of bikes jumped for a large number of bikes since the Bicikelj management team relocated bikes to it and the network can learn such occasions only on a large number of data. I want to emphasize also that as difficult as it may sound, it’s also not easy to debug such large matrices of 7000 and more elements — so the size of the problem itself is quite a challenge.

The complete source code is available on my GitHub repository:

GitHub - dorkamotorka/DeepCikelj: GCN for Bicikelj system optimization

The goal of this project was to predict the occupancy of stations namely the number of parked bicycles of the Bicikelj network…

github.com

Conclusion

This network is only the base model and many additional features can and should be included. Inspired by the AST-GCN network, an A-cell should be added to enable the network to learn not only from spatial and temporal information but also from data like weather forecasts, holidays, work, and weekdays. This information highly impacts the model performance and should be further investigated.

Thanks for reading! 😎 If you enjoyed this article, hit that clap button below 👏
Would mean a lot to me and it helps other people see the story. Say Hello on Linkedin | Twitter
Do you want to start reading exclusive stories on Medium? Use this referral link 🔗
If you liked my post you can buy me a Hot dog 🌭
Are you an enthusiastic Engineer, who lacks the ability to compile compelling and inspiring technical content about it? Hire me on Upwork 🛠️
Checkout the rest of my content on Teodor J. Podobnik, @dorkamotorka and follow me for more, cheers!