September 14, 2017

# TL;DR

In this article, we use Python and QGIS to find an interpolation method for missing coordinates of vessels. It’s an excuse to talk about interesting things, such as:

- Automatic Identification Systems for boats
- Bearing and Haversine formula, two spherical trigonometry concepts very useful in navigation
- Geo-analysis tools: heatmaps, hotspots analysis and polygonization with the open-source software QGIS

# Interpolate missing coordinates for vessels tracking

To avoid collisions on the sea, vessels are equipped with automatic identification system (AIS): they broadcast their position, course, speed and unique identifier. This information can be received by other ships equipped with a receiver, as well as base stations on the coast line and satellites. But this data can also be of interest for many other people. Bloomberg Terminal for example, a system used in the financial industry to access real time stock data, makes it accessible to commodities traders who wish to follow a cargo.

Let’s have a look at a ship in details. Like a car, they all have their own unique identifier, provided by the International Maritime Organisation (IMO). The crude oil tanker LEONIDAS for instance has the following IMO number: 9410234.

The website Marine Traffic gives us more information, such as its deadweight of 318,325t or its year of construction, 2009.

We can map the data provided for the tanker LEONIDAS between December 16, 2016 and March 3, 2017. We try to have one point a day, but some days are missing in the South China sea.

We want to estimate these positions.

# Different missing points scenarios

We want to create a script that interpolates the missing location of a vessel, based on the other points we have. Constraints are:

- the script needs at least two points to have an interpolation
- those two points should be in a (parametrable) range of days from the missing one: we don’t want to use two points 15 days away from the real one.

Thus our script will go through the other days and find the best points before or after the missing one, so we can be in one of those three cases:

- Case 1: we only have data before the missing point
- Case 2: we have data before and after the missing point
- Case 3: we only have data after the missing point

Our best guess is to first think that the vessel is travelling on a straight line and a constant speed. We will have to define what a straight line means on the earth but we can imagine what it means on a piece of paper. Thus for the case 1, we will take the bearing to go from A to B, the speed to go from A to B, and then keep the same bearing on B to go to C.

For case 2, we take the bearing to go from A to C and find the missing point at the date we want if the vessel goes with constant speed.

For case 3, we will take the bearing from C to B, it’s like case 1 but reverse.

But what is a bearing and how to compute it ?

## Bearing

In navigation, the bearing refers to the direction of motion. It’s the angle between the trajectory and the North. Thus, a boat heading east has a bearing of 90°. AIS data also broadcasts bearing, but we will re-calculate it here. The formula:

This formula and many other related to navigation can be found in this great page.

## Arctan2

The function used here is the arctangente function with two arguments. Arctangente is the inverse function of tangent, in the sense that from the tangent, it gives the angle in radians between $-\frac{\pi}{2}$ and $\frac{\pi}{2}$.

For a point $P(x, y)$, we define $\tan(x,y) = \frac{y}{x}$. The $\arctan$ function gives the angle between the positive $x$ axis and the point P. To compute it properly, it's easier to know the sign of $x$ and $y$, information lost when using $\frac{y}{x}$. Then in many softwares, we define the $\arctan$ with two arguments as being

$$\arctan_2(x, y) = \arctan_1(\frac{y}{x})$$.In Python, we can define a function bearing, that will return the bearing between two points.

def calculate_bearing(self, point1, point2): lat1 = point1[0] lon1 = point1[1] lat2 = point2[0] lon2 = point2[1] lat1, lon1, lat2, lon2 = map(radians, (lat1, lon1, lat2, lon2)) y = sin(lon2-lon1)*cos(lat2); x = cos(lat1)*sin(lat2) - sin(lat1)*cos(lat2)*cos(lon2-lon1); bearing = degrees(atan2(y, x)) return bearing

## Interpolate point

We have the bearing and the initial point, chosen depending on the situation we are among the three discussed previously. The speed is supposed to be constant, so we don’t need to know it, we just need to know the travel time to the interpolate point rapported to the travel time between the two known points.

Boats are travelling on **great circles**. On a plane (flat), the shortest path between two points is a straight line. On a sphere like earth however, shortest surface paths are called great circles. They are diameters of the sphere, or more precisely, they are the intersection of the sphere and a plane that passes through its center point.

For our purpose, we take an itial bearing and then travel for some time on the corresponding great circle. Here is the formula to do that:

def calculate_coordinates(self, departure_point, bearing, distance): R = self.earth_radius d = distance lat1 = radians(departure_point[0]) #Current lat point converted to radians lon1 = radians(departure_point[1]) #Current long point converted to radians bearing = radians(bearing) lat2 = asin( sin(lat1)*cos(d/R) + cos(lat1)*sin(d/R)*cos(bearing)) lon2 = lon1 + atan2(sin(bearing)*sin(d/R)*cos(lat1), cos(d/R)-sin(lat1)*sin(lat2)) lat2 = degrees(lat2) lon2 = degrees(lon2) return (round(lat2, 4), round(lon2, 4))

In this formula, $R$ is the earth radius, 6371km, and $d$ is the distance of travel. We dont know the distance of travel yet, but we know that the speed of the vessel is constant. The distance travelled will be a multiple of the distance between the two known points, proportionnaly of the travel time. Let’s call $t_1$ and $t_2$ the dates of the two known points, and t the date of the missing point. Then for our three cases:

- Case 1: $\frac{t - t_1}{t_2-t_1}$
- Case 2: $\frac{t - t_1}{t_2-t_1}$
- Case 3: $\frac{t_2 - t}{t_2-t_1}$

To simplify, the ratio is just $$ r = \frac{travel \ time \ from \ initial \ point}{travel \ time \ between \ known \ points}$$

We then have our distance $d$:

$$ d = r.d_{12}$$

where $d_{12}$ is the distance between the two known points. We now have to calculate this distance. This is done with Haversine formula.

## Haversine

The haversine formula gives the great circle distance between two points, and our hypothesis is that vessels are moving on great circles.

$$ d = 2 r \arcsin\left(\sqrt{\sin^2\left(\frac{\varphi_2 - \varphi_1}{2}\right) + \cos(\varphi_1) \cos(\varphi_2)\sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right) $$

$\varphi_1$ and $\varphi_2$ are latitudes in radians. $\lambda_1$ and $\lambda_2$ are longitudes in radians.

Longitude and Latitude are measure in degrees of the position of a point on the earth.

Degrees are subdivision of one full turn in 360 parts.

Radians however, considers that a full turn is $2\pi$. An angle in radians is numerically equal to the length of the arc of the circle of radius 1, thus the full circle is $2\pi R = 2\pi$.

From that, it's easy to convert degrees to radians:

$$ radians = \frac{d°.\pi}{180}$$def haversine(self, point1, point2): lat1 = point1[0] lon1 = point1[1] lat2 = point2[0] lon2 = point2[1] lat1, lon1, lat2, lon2 = map(radians, (lat1, lon1, lat2, lon2)) lat = lat2 - lat1 lon = lon2 - lon1 h = sin(lat * 0.5) ** 2 + cos(lat1) * cos(lat2) * sin(lon * 0.5) ** 2 d = 2 * self.earth_radius * asin(sqrt(d)) return d

## Interpolation

We now have everything to calculate the position of our interpolated point. Let’s have a look at some results with QGIS, a great open-source that can display geo-data, create map and many other functions.

The script produces a CSV file with the positions and flag the one that have been interpolated. We can load the CSV, put another layer of land and color in yellow the interpolated points.

Results from the pictures seems plausible. However sometimes, vessels don’t cruise on great circle, as they have to bypass land. Here in south-india, our interpolation puts us right outside of the sea, which is not acceptable.

We will have to add another step to our interpolation, a correction step for this kind of behaviour.

# Find usual trajectories

Vessels don’t navigate randomly. They tend to follow the corridors, and it would be a good correction step to impose our interpolated points to be in these corridors.

Now to find out their shape, let’s dump 2,725,669 historic positions in QGIS and see if a pattern emerges.

It seems clear that some corridors can be identified. Let’s do it automatically by creating a heatmap.

## Heatmap

A **heatmap** is an image where every pixel’s color is related to the concentration of points around it. For our usecase, we reduce the number of points in our map and chose a radius of 30km. We chose a scale of colors from blue to red.

- blue: the number of points in a radius of 30km around is low
- red: the number of points in a radius of 30km around is high

## Polygon from heatmap

The heatmap is visual and allows us to identify corridors. We will use QGIS raster calculator to filter pixels above a specific threshold. The color scale gives us an idea of how low we should filter our map to find corridors of interest.

This filter identifies hotspots, it gives only two possible colors to a pixel, above the thresold and below the thresold. This technique is called **Hotspot Analysis**.

To be used in our script, we need to have a vector polygon, we can use the geoprocessing tools of QGIS to polygonize the hotpspot image we have and create a polygon of points above the treshold.

Then we need to remove rings in the polygon, manually modify and simplify the geometries and be sure that the corridors do not overlapse lands by using the “difference” tool.

We now have our final polygon:

## Correction step in the script

the previous polygon is saved in a geographic file called shapefile. We want our interpolated points to be in those corridors. First we load the shapefile in the script by using Fiona. It’s a powerful library to read and write geographic data.

import fiona with fiona.open("corridors.shp") as corridors_shapefile: corridor_polyg = corridors_shapefile.next() # We only have one polygone corridors = shapely.geometry.asShape( corridor_polyg['geometry'] ) # Used to check if point inside polygon corridors_enveloppe = LinearRing(corridors.exterior.coords) # Used to project the point

Then we check if our interpolated point is inside the corridors, by using the **contains** function of the library shapely. It’s easy, this functions returns true is a point is inside a given polygon. We then check if the known positions were inside the corridors as well. If not, we are facing an outlier and do not correct. But if the known positions were in the corridors, then we project the point on the enveloppe of the corridors with the project function of shapely, giving the closer point on the corridor’s enveloppe to our interpolated point.

import shapely if not corridors.contains(point): initial_point = Point(initial_point[1], initial_point[0]) if corridors.contains(initial_point): d = corridors_enveloppe.project(point) p = corridors_enveloppe.interpolate(d) lon = round(list(p.coords)[0][0], 4) lat = round(list(p.coords)[0][1], 4)

We can now visualize the behaviour of some corrected points.

- blue point: interpolation before correction
- brown point: correction
- black point: known positions

# Test

When running this script on 50 values that we know and can compare, we find an average error of 112km.

# Conclusion

This script is a fun way to do some geo-analysis. Python is very powerful for that, with libraries like Shapely or Fiona a lot of data can be processed.

Also QGIS can go pretty far with a nice interface.