Leveling up Cerulean’s Ability To Reveal Stationary Polluters
Recent advancements in Cerulean’s infrastructure association algorithms are bringing us closer than ever to accurately identifying stationary polluters at sea.
A primary goal of SkyTruth’s global oil pollution detection tool, Cerulean, is to automatically attribute oil slicks to potential polluters. Cerulean is designed to enhance not only our ability to find oil slicks from satellite imagery, but also to uncover who — and what — appears to be the source of that oil. Thanks to our partnership with Global Fishing Watch, we’ve historically been able to incorporate vessel positions from Automatic Identification System (AIS) broadcasts to tie Cerulean slick detections to nearby vessel tracks. This AIS data has enabled us to design a robust algorithm to automate the process of revealing potential polluting vessels.
Lately, however, there has been growing interest in monitoring chronic pollution events originating from stationary oil infrastructure, along with the potential impacts of the expansion of oil and gas operations in the ocean. New infrastructure locations are coming online as deepwater oil and gas exploration activity increases farther out at sea, in part driven by the depletion of wells closer to shore.
Since many of these stationary structures do not broadcast or produce AIS tracks, we’ve developed a separate algorithm to automatically identify offshore infrastructure potential pollution sources.
Until recently, our vessel association algorithm was outperforming its infrastructure counterpart to automatically attribute possible oil slick culpability to a source. Our recent redesign has significantly improved the accuracy of infrastructure attributions, paving the way for their use in future SkyTruth analyses and our shared understanding of the environmental threat of offshore oil facilities.
Enhancing Cerulean’s Infrastructure Association

Figure 1: Sentinel-1 radar oil slick detection with an automatic source association, captured from the Cerulean user interface.
How we improved our infrastructure association algorithm:
Fresh Data Sources
As the offshore oil industry grows, we first upgraded how we source our infrastructure data. Previously, we used a circa-2023 global map of offshore infrastructure created in collaboration with our partners at Global Fishing Watch. This dataset required manual maintenance to keep up-to-date in Cerulean. We now use the Global Fishing Watch API to access SAR Fixed Infrastructure locations, ensuring our data remains current each month, as detections of new infrastructure emerge. This also means our source attribution will improve as Global Fishing Watch continues to advance the quality and breadth of their dataset.
Smarter Processing of Oil Slicks
Most significantly, we innovated a new method to process slick geometries and assign proximity scores to nearby infrastructure points. Previously, scores were assigned more simply — by how close they were to the outer perimeter of the slick geometry. Points closer to the boundary of the slick geometry were given higher scores than those further away. The result was a relatively even distribution of scores along the edges of the slick (Figure 2a). This approach often resulted in slicks being falsely attributed to infrastructure near the middle of the slick, where the oil had drifted, but clearly not where it had originated. These misattributions were at greater risk in regions with large clusters of infrastructure in close proximity to each other.
The redesigned algorithm takes advantage of certain observations about the shape of infrastructure oil slicks to apply a more conservative distribution of scores; attributable infrastructure typically appears at the tail ends of the slick, from where the oil appears to flow out. The algorithm finds points along the perimeter of the slick which are far enough from the centerpoint to be considered a potential point of origin. It then distributes scores to nearby infrastructure based on their proximity to these points.

Figures 2a and 2b: Visual comparison of the old and new methods for assigning proximity scores to infrastructure near oil slicks. Figure 2a (left) represents the previous method, where scores were distributed evenly along the perimeter of the slick based on proximity, potentially leading to false attributions in areas with dense infrastructure clusters. Figure 2b (right) shows the new approach, where scores are more concentrated higher at the tail ends of the polygon, producing a focus on infrastructure near the origin of the slick, which in turn results in more accurate pollution source attributions.
Data-Driven Refinement
Another key improvement in the new algorithm’s design lies in our data-driven approach to selecting and optimizing parameters. The algorithm includes parameters that can be adjusted to influence how scores are distributed. One important parameter, decay, determines the spread of score distribution across potential sources. It controls how quickly probabilities decrease as the distance from a point of interest increases. If the decay is too weak, it could result in distant infrastructure being unnecessarily included in the score distribution. Finding the right decay parameter would lead to slight optimizations in the accuracy of source attributions and help us to choose the most effective version of our algorithm.
In the absence of any immediate intuitions, we needed to bring in actual data to make a decision. We collected a set of 61 slick geometries along with nearby infrastructure points from SAR Fixed Infrastructure dataset. For each of the slicks, we manually recorded which of the nearby points was the verified source. This ground truth data drove our evaluation of the algorithm’s performance at different rates of decay.
The performance of the algorithm is assessed through metrics like the top-1-source rate and top-3-source rate. The top-1-source rate measures how often the true source is the highest-scoring nearby infrastructure, while the top-3-source rate reflects the frequency with which the true source appears among the top three highest-scoring infrastructure locations. We also monitored the difference in scores between source and non-source infrastructure. A larger difference indicates greater separability, making it easier to distinguish true sources from non-sources.

Figure 3: Charts showing how the decay rate has impacted the algorithm’s performance in source attribution. The left plot shows the difference in average scores between true sources and non-sources, with the highest separability achieved at a decay rate of 4.0. The middle plot tracks the top-source rate, indicating that the algorithm is most accurate in identifying the true source as the highest-scoring infrastructure at this same decay rate. Similarly, the right plot shows that the top-3 source rate also peaks at the same decay.
Using this analysis, we are more confident in selecting a decay rate of 4.0 as the optimal choice for the algorithm.
Bringing in ground truth data also allowed us to benchmark the best versions of our algorithm against the original or baseline algorithm. This gave us insights into the expected performance improvements. In the cases where an expert human was able to identify an associated piece of infrastructure, the fine-tuned algorithm assigned the top score to the correct source nearly 90% of the time, compared to around 60% for the previous algorithm, demonstrating a significant performance increase.
Vessel and Infrastructure Score Collation
With the new algorithm, we now have greater confidence that infrastructure association performs at the same level as vessel association. This enables us to normalize both types of scores into a single metric, called a collation score. As a result, instead of having separate infrastructure or vessel association scores, all nearby sources are assigned a common collation score. This allows for apples-to-apples comparison to determine if a vessel or infrastructure is the most likely source of an oil slick.
Impact on future work
We are confident that the newly redesigned infrastructure association algorithm will bring Cerulean one step closer to enabling effective monitoring of infrastructure-related oil in the ocean. Better automatic source associations empowers slick investigators to more efficiently verify data needed for future analyses. Cerulean users can more confidently leverage our data for advocacy initiatives to drive more informed action towards addressing infrastructure-related environmental impacts. In 2025, SkyTruth will apply these foundational tools to inform the public of the true cost of offshore oil and gas extraction and hold marine polluters accountable.