How does Cerulean Work?
Potential oil slick identification
Currently, potential oil slicks are detected using a U-Net model trained on a dataset of expert-labeled oil slicks and applied to a single data layer:
The VV polarization of Sentinel-1 GRD data from the European Space Agency’s Copernicus satellite program, accessed via Amazon Web Service’s (AWS) Open Data Registry. Only Sentinel-1 scenes that intersect with the ocean are processed by our model.
Prior to inference, the Sentinel-1 VV polarization imagery is scaled to 80-meter resolution and split into overlapping 512×512 tiles, which are then processed by a ResNet34-based U-Net model. Predicted inference results from overlapping image tiles are composed using basic merging strategies that average the model’s confidence scores.
These merged oil slick rasters are then processed to create instances of vectorized polygons, which are inserted into a cloud-based Postgres database. Additional calculations, such as polygon area, length, perimeter, and other geometric properties, are performed within the database. All oil slick detection results are accessible via our online map interface or our OGC Compliant REST API. A graphical overview of our inference architecture can be seen in Figure 1.
Disclaimer: It is not possible to definitively identify oil slicks using synthetic aperture radar (SAR) satellite data alone. All Cerulean-detected oil slicks should be considered potential oil slicks, not definitive oil slicks.
Vessel source association
Vessels near possible oil slicks are identified using vessel position histories from Automatic Identification System (AIS) broadcast systems. For each possible oil slick with a machine confidence > 0.5, we compute a proximity score to identify vessels whose path and timing best match the path and timing of identified oil slicks.
To calculate proximity scores, we download all available AIS data that intersects each Sentinel-1 scene for a time window from 12 hours preceding Sentinel-1 image collection time to 1 hour after Sentinel-1 image collection time. Then we calculate three metrics to get a combined proximity score (P), as follows. For complete information on methods, please refer directly to the code in the GitHub repository.
Frechet distance (F). The Frechet distance is calculated between the path of a vessel and the path of a slick polygon. The path of the vessel is defined by the linear trajectory of its AIS points. The path of the slick polygon is defined by approximating its central trajectory vector. To approximate this vector, we use Voronoi polygonization to split the polygon, compute the centroids of each resulting shape, then fit a spline curve through those centroids to generate a linestring that can be compared to the vessel path using the Frechet distance.
Spatial overlap (S). We define spatial overlap as the intersecting area between the oil slick and the buffered vessel path. We use conical buffers to account for the fact that slicks drift over time, meaning that older portions of slicks tend to move a greater distance from their point of origin than newer portions of slicks. To do this, we create a cone around each track, resulting in a cone-shaped buffer like that shown below, with a narrower, more highly-weighted buffer at AIS positions closer to the time of Sentinel-1 image capture, and a wider, less-weighted buffer at older AIS positions.
Infrastructure source association
We compute the proximity of oil slicks to nearby pieces of offshore oil and gas infrastructure, such as oil platforms. Infrastructure locations are identified using SAR Fixed Infrastructure locations which are made available through the Global Fishing Watch API.
Cerulean detections are automatically attributed to potential infrastructure sources using an algorithm which calculates the probability that a nearby point is the terminus or origin of the slick. The algorithm finds points along the perimeter of the slick which are far enough from the centerpoint to be considered a potential terminus. It then applies a distance decay to assign higher probabilities to points closer to the potential terminus, and lower probabilities to points which are further away. This algorithm is run on points obtained from the SAR Fixed Infrastructure Dataset to identify and assign probabilities to the most likely culprits.
Essentially the algorithm is designed to reveal known infrastructure located near the terminus of an oil slick, indicating that oil could be flowing from it. For more detailed information on this algorithm, please see the corresponding code on our GitHub page.
Sources of data used by Cerulean
as of 2025-01-01
Data | Purpose | Source |
---|---|---|
Sentinel-1 GRD | Oil slick detection training and inference | European Space Agency, via Registry of Open Data on AWS |
Offshore infrastructure locations | Assessing proximity of potential sources and slicks | Global Fishing Watch |
AIS vessel location data | Assessing proximity of potential sources and slicks | Spire Maritime, via Global Fishing Watch |
Marine Protected Areas | Map display and slick intersection | Protected Planet, World Database of Protected Areas |
Exclusive Economic Zone boundaries | Map display and slick intersection | Marineregions.org |
Ocean current and wind | Map display | NOAA GFS 0.25 degree data, via WeatherLayers |
Bathymetry | Map display | Mapbox |
Global political boundaries | Map display | Mapbox |