Today, there is a greater demand to produce more timely official statistics at a more granular level. The CSO is looking to new and novel data sources to meet this demand. For the CSO, ‘Big Data’, such as the Automatic Identification System (AIS) and Transport Infrastructure Ireland data, represents an innovative opportunity to generate experimental maritime and traffic volume statistics.
Transport Infrastructure Ireland (TII) have over 300 active Traffic Monitoring Units (TMU’s) around the country that record the volume of traffic by hour of day and vehicle class. Vehicles are counted when they pass over loops embedded in the road surface.
The hourly aggregated counts are provided daily to the CSO by TII via an API where they are uploaded and stored in the CSO’s data hub. The daily data files consisted of seven variables. In addition to the hour, day, month and year, the data includes the hourly vehicle count, the vehicle classification, and a unique identifier information for each TMU. The unique identifier can be used to identify the TMU location and description.
Given the velocity and volume of the TII data, it falls under the classification of Big Data.
The TMU’s recording data for the TII unfortunately lack a storage capacity which is typically found in Motorway Incident Detection and Automatic Signalling (MIDAS) sensors. Given a lack of storage capacity, when a TMU is temporarily out of operation it results in a loss of information or traffic count. The leading causes of temporary inactivity are typically software updates, roadworks or where a sensor has failed validation checks. In the case of TMU’s when any of the following faults or outages are present there will be gaps present in the data until the issue is resolved.
A major aspect when dealing with this TII traffic count data is identifying if any rises or falls in recorded traffic is valid and not because of a TMU failure. The importance of this aspect strongly correlates with spatial resolution. Inversely, as the spatial resolution decreases due to an increase in the geographical area investigated, aggregated values over several counters allows for an increased margin of error.
Although the TII TMU’s are quite comprehensive in their coverage it should be noted that there may be location bias in the areas in which they are operational. TMU’s tend to be clustered in more densely populated areas where greater traffic volumes are expected.
While the TII traffic count data can help to improve the periodicity and timeliness of traffic count statistics it is not a simple matter of aggregating weekly or daily aggregates. Among the challenges faced are the quality and completeness of the data sets. For example, gaps can occur during the recording of traffic counts due to inactive TMU’s as a result of road works or technical difficulties. A major component of this work is the preparation of the data and the generation of methodologies to deal with missing data.
Missing data for each TMU and each day are imputed based on the last available value for that day. For example, if Monday 08/08/22 was missing then the traffic count based on the previous Monday (01/08/22) was imputed.
Automatic Identification System (AIS) data are supplied by the Task Team on AIS Data of the UN Committee of Experts on Big data and Data Science for Official Statistics (Task Team on AIS Data — UN-CEBD), and accessed through the UN Global Platform — UN-CEBD (UNGP) which holds a global repository of live and archived AIS data. This data is provided by ExactEarth who combine their own satellite data with terrestrial data from FleetMon. In addition to location, bearing and navigation status, the AIS data includes unique identifier information on International Maritime Organisation (IMO) number and Maritime Mobile Service Identity (MMSI) number.
The UNGP not only holds AIS data but also the IHS Shipping Registry. Incorporating SeaWeb and Lloyd’s Register of Ships (published since 1764), the IHS Shipping Registry provides detailed information on all self-propelled and seagoing merchant ships. Among the information included in the registry are IMO number, MMSI number, ship name, ship type, cargo type, ownership, registration, tonnage, dimensions, and propulsion.
AIS data on the UN Global Platform (UNGP) is improved by adding a spatial index to each AIS message record (see Figure 1). This spatial index system, known as H3 and initially developed by Uber, includes sixteen resolution levels, comprising cells that cover the Earth's surface with corresponding H3 Index values. H3 cells are geometric or geographic unit polygons, typically hexagons or pentagons, and an H3 Index represents an H3 object. This hierarchical geospatial index system organizes cells by spatial hierarchy, with each hexagonal cell having seven child cells beneath it, up to the maximum supported resolution. The main benefit of the H3 indexing systems is any position on the earth can be assigned to any AIS message.
Port polygons define the space for port visits and link visits to specific ports. These polygons also serve as a spatial filter to limit the amount of AIS data processing. Port polygons do not overlap and are created using GIS software. It is important to note the port polygons do not represent an area corresponding to port activities but rather an area that contains the port. Data reduction is crucial when dealing with Big Data, so Bounding Box Buffers (BBB) were established around the port polygons to restrict data analysis to this area. The BBB geometry reduced the amount of data to be processed by about 80%.
Arrivals of ships are calculated using the Stationary Marine Broadcast Method (SMBM).
The Stationary Marine Broadcast Method, developed by the CSO, relies on the concept that ships are stationary for a time during loading and unloading. When a ship remains stationary within a port polygon for an extended period, it is likely a ship visit. This method uses the H3 Indexes discussed above. Once a ship is identified as stationary (a triggering event), an escape condition is needed to detect movement. H3 Indexes make this escape condition an attribute comparison instead of a geographic distance calculation. The escape condition checks if the test record's H3 Index is outside a defined neighbourhood of the triggering event, determined by the H3 k-ring (griddisk) function. "k-depth" refers to the number of rings used to define this neighbourhood.
Table 1: K-depth and number of cells for k-ring | |
K depth | Number cells in k-ring |
0 | 1 |
1 | 7 |
2 | 19 |
3 | 37 |
4 | 61 |
5 | 91 |
6 | 127 |
This function involves two hyper-parameters: the k-depth for the k-ring and the H3 resolution. A level 10 H3 Index resolution is used, with cells approximately 150m wide, matching the ship sizes under study. A k-depth of three is chosen to consider position reporting errors, minor shifts due to tide and current, and small operational movements of the ship. In Ireland, a k-depth of three with a resolution of 10 corresponds to an area of nearly one square kilometre.
The data processing for the SMBM consists of the following steps:
This allows us to make an upper and lower time estimate of the time the ship was stopped:
Upper Time Estimate = Uescape - Uprior
Lower Time Estimate = Uprevious - Ureference
Create stopped ship record: The above steps add additional information to each “stopped ship” record:
record count value, this is the number of iterations in step 6 above and is the number of records checked in meeting the escape condition.
the upper and lower time estimates of stopped duration.Coverage: The Dashboard summarises the results of seven main Irish ports areas:
1. Bantry Bay Maritime Area
2. Drogheda Maritime Area
3. Dublin Maritime Area
4. Shannon Foynes Maritime Area
5. Cork Maritime Area
6. Rosslare Maritime Area
7. Waterford Maritime Area
The AIS based data source is relatively new. A significant benefit of this data source is that there is one global standard, providing for the opportunity of easily compiled coherent and standardised maritime statistics. The AIS based source, like many new and novel data sources, enables greater temporal granularity. International standards are currently being developed based on experiences using this data in the UNGP and other projects. Current CSO analysis does show up differences between the AIS based estimates and official statistics – the AIS based estimates tend to be little higher. Official statistics are compiled from summary data provided by each port authority on a quarterly basis and are published in the CSO Statistics of Port Traffic release. There are two possible reasons (or a combination there of) for the higher estimate in AIS based statistics. The first reason is that the difference could be explained by a higher coverage in the AIS based estimates. The second reason is that there may be an element of double counting included in the AIS based estimation procedure indicating over-coverage errors with the AIS based estimates. In all likelihood the differences are probably explained by some combination of these error types.
The CSO continues to develop and produce experimental statistics for port traffic in parallel with the official statistics. This will allow monitoring of how both sets of statistics behave over a greater time scale. The experimental statistics are compiled at a greater granularity (monthly
Learn about our data and confidentiality safeguards, and the steps we take to produce statistics that can be trusted by all.