Remote data abstraction layer for the new VDL Mobile App
In 2020, Apart was tasked with the complete overhaul of the Mobile Application for the City of Luxembourg (VDL).
Alongside the modernisation of the graphical interface, we had the chance to develop a state of the art backend infrastructure that allowed us to provide the end users with an increase in information.
Challenge
One of the challenges we faced with the backend development of the VDL Mobile application was the requirement to access a complex ecosystem of remote services while still maintaining a high level of performance for our users.
Not only did the services have a diverse set of output formats and structures but we also had passive services that only served a static file and others that provided a more dynamic API. Some services allowed us to query single elements, others only returned a list of elements without an option to filter and sort them. A couple of services had information updates almost in realtime and others had the same data for multiple days.
As those services were managed by multiple entities, there was a high possibility that some services could eventually become unsuitable over time, have some kind of downtime or would simply not be available in the future.
Analysis
Based on those challenges, we started our analysis about the software features that we would need to develop for this project and we found 3 main points that would require special attention.
1. Abstraction
Having multiple output formats and data structures required some kind of abstraction layer between the remote data and the application data. This abstraction should automatically convert the needed information into the best possible data type and sanitise the content by performing complex checks.
2. Caching
From the start of the project, we expected that we would require some kind of cache to smooth out the performance impact of having multiple remote services running on different infrastructures.
But because some services had real time or close to real time information, a more advanced system would be required.
3. Simplicity
Dealing with abstractions and caching while cross processing information from multiple services can be overwhelming. We would need a way to simplify the interactions to both reduce code size and speed up development.
Solution
To solve the main issues pointed out by our analysis, we developed the idea of an abstraction layer between the database storage, the remote services and our developers. We call it a Datasource.
The Datasource controls the flow of information between the remote service and the application stack and automatically stores a cached version on a local database when needed. It ensures that the application developers always get the same format of data, whatever the source of the information is.
Example of a Datasource call:
$data = Datasource::request("serviceA", ["query"=>"a"])
In the scenario where we are still in our cache window, the Datasource returns the pre-formatted data from our local database. This skips the round trip to the remote service and increases performance dramatically.
When the TTL (time to live) of our Datasource is reached and our cache expires, we use connectors to access the remote service. Each service has its own connector that handles the authentication, conversion and data sanitation before forwarding the result to the Datasource. The Datasource stores the information in the database and refreshes the cache expiration timer.
See example of a snippet connector below.
[
'updated' => [
'path' => 'lastUpdate',
'type' => self::PARAMETER_STRING
],
'isLive' => [
'path' => 'connected',
'type' => self::PARAMETER_BOOL
],
'availableBikes' => [
'path' => 'totalStands.availabilities.bikes',
'type' => self::PARAMETER_INT
],
'availableStands' => [
'path' => 'totalStands.availabilities.stands',
'type' => self::PARAMETER_INT
],
'remote_reference' => [
'path' => 'number',
'type' => BikeStationsConverter::PARAMETER_INT
]
]
To minimize the processing time required to use a Datasource, we have the possibility to choose between 3 different cache expiration strategies. The first strategy is a simple cron that is executed every N minutes and refreshes the information once the TTL is reached. This allows us to process costly requests overnight and serve the cached version through the day. For information that requires more updates per day but where there’s a possibility that the information does not change every time, we can avoid unnecessary requests by using our second or third strategy.
In both variants we compare the last Datasource call and only trigger a cache refresh when the time between the calls is greater than the TTL. The main difference is that the second strategy forces a cache refresh and returns the data, where the third strategy returns the old data first and then asynchronously refreshes the data.
Based on how often a Datasource is used and how the cache behaves, we have the possibility to fine tune the TTL and cache strategy to maximize performance while minimizing resources.
Real world usage
In the VDL Mobile Application, we have over 20 distinct Datasources that support a multitude of different query parameters each.
Some Datasources are long lived and only update every couple of hours or days, like for example the Bus/Tram Lines or the Trash collection information. Others are short lived and update every 5 to 10 minutes, like Parking capacity, Bike charging places or Traffic information.
As for real number, here are some example of how Datasources improved performance:
- Tram Lines List : (1450ms -> 360ms) 4.1 x
- Bus Lines List : (2100ms -> 370ms) 5.7 x
- Soft Mobility List: (3170ms -> 430ms) 7.3x
- Stop Details: (970ms -> 570ms) 1.7x
- Parking Details: (1080ms -> 380ms) 2.8x
- Construction Details: (1490ms -> 400ms) 3.7x
Note: Testing done in development environment
Final thoughts
As the key technological feature for VDL Mobile Application backend, Datasources turned out to be even better than expected. We were able to improve the performance drastically while simultaneously reducing the amount of coding needed. The control over information flow and the analysis of it, allowed us to fine-tune the Datasources by increasing/reducing the TTLs or by changing the caching strategy.