Right-sizing Our Data Pipeline to Detect Polluters

How does SkyTruth’s new project Cerulean reduce the time and cost of processing enormous volumes of satellite information?

Project Cerulean is SkyTruth’s systematic endeavor to curb the widespread practice of bilge dumping, in which moving vessels empty their oily wastewater directly into the ocean. (We recently highlighted the scope and scale of the problem in this blog series) Our goal is to stop oil pollution at sea, and this particular project aims to do that by automating what has historically been a laborious manual process for our team. SkyTruthers have spent days scrolling through satellite radar imagery looking to identify the telltale black streaks of an oil slick stretching for dozens of kilometers on the sea’s surface. We are finally able to do this automatically thanks to recent developments in the field of machine learning (ML). Machine learning “teaches” computers to identify certain traits, such as bilge slicks. To do so efficiently, however, requires a lot of computation that costs both time and money. In this article, we explore a few tricks that allow us to reduce that load without sacrificing accuracy.

The sheer volume of data being collected by the thousands of satellites currently in orbit can be overwhelming, and over time, those numbers will only continue to increase. However, SkyTruth’s bilge dump detection relies primarily on one particular pair of satellites called Sentinel-1, whose data are made available to the public by the European Space Agency’s Copernicus program. Sentinel-1 satellites beam radar energy down at the surface and gather the signal that bounces back up, compiling that information into birds-eye images of Earth. To get a sense of how much data is being collected, Figure 1 shows a composite of all the footprints of these images from a single day of collecting. If you spent 60 seconds looking at each image, it would take you 21 hours to comb through them all. Fortunately, the repetitive nature of the task makes it ripe for automation via machine learning.

Figure 1. One day’s worth of radar imagery collected by Sentinel-1 satellites. Each blue quadrilateral represents the location of a single image. You can see the diagonal swath that the satellites cut across the equator. (Note the scenes near the poles are distorted on this map because it uses the Mercator projection.) Image compiled by Jona Raphael, SkyTruth.

Figure 2 illustrates a typical satellite radar image: just one of those blue polygons above. These images are so big — this one captures 50,000 square kilometers (more than 19,000 square miles) — that it can be tough to spot the thin oil slicks that appear as a slightly blacker patch on the dark sea surface. In this image, if you look closely, you can see a slick just south of the bright white landmass. (It’s a bit clearer in the zoomed in detail that follows.)

Figure 2. Satellite imagery, from January 1, 2019, capturing a difficult to see bilge dump just south of the tip of Papua New Guinea (Copernicus Sentinel data 2019).

Contrary to intuition, you are not seeing any clouds in this picture. Radar consists of electromagnetic radiation with very long wavelengths, so it travels from satellites through the atmosphere largely undisturbed until it hits a surface on the Earth. Once it hits, it is either absorbed by the surface or reflected back into space (much like bats using echolocation). The lightest section of the image in the top right corner is part of a mountainous island. This jagged terrain scatters the radar diffusely in all directions and reflects much of it back to the satellite. The rest of the image is much darker, and shows us the ocean where less of the radar energy bounces back to the satellite receiver. The muddled, medium-gray area along the bottom of the image shows us where strong gusty winds blowing across the ocean surface have made the water choppy and less mirror-like. Figure 3 shows us more clearly the oil slick just offshore. 

 

Figure 3: Detail from Sentinel-1 radar image shown above. The narrow oil slick identified by a dark gray streak along the bottom of this image is roughly 60 kilometers (40 miles) long, and only 15 kilometers (roughly nine miles) offshore. (Contains modified Copernicus Sentinel data 2019).

Although each image covers a large area, we need to process many images each day to monitor the entire Earth. How many? Here’s an approximation:

  • 510,000,000 square kilometers  = The total surface of the earth
  • 90,000,000 square kilometers  = Total area of images captured in one day (1,300 scenes)

This means that we expect the whole Earth to be imaged somewhere between every six to 12 days (because many of the images overlap each other), or roughly 10,000 images. 

Every year more satellite constellations are being launched, so if a new constellation were to capture the whole earth’s surface in a single day, then we would need to spend an order of magnitude more processing time to ingest it. We care about this because each image scanned will cost time, money, and computational power. To enable appropriate allocation of resources for automation, it’s critical to understand the scale of the data. For now, we can size our processing pipeline to the current number, but we must take measures to ensure the system is scalable for the increasing numbers of satellite images we anticipate in coming years.

So does that mean we need to look at 1,300 images every day? Thankfully not. We have a few techniques that we’ll use to make the computations manageable:

  • First off, we’ve found that radar satellite images near the poles are typically captured using a particular configuration called HH polarization — great for mapping sea ice, but not ideal for detecting oil slicks. And the presence of the sea ice itself makes oil slick detection difficult. If we remove those images from the 1,300, we have about 880 that suit our needs (using VV polarization).
  • Next, we won’t be looking for oil slicks on land, so we can further filter out any images that don’t have at least some ocean in them. That reduces our set of images to about 500. (Note: Sentinel-1 coverage of the open ocean is generally poor, as discussed in our previous blog post, but we anticipate future radar constellations will fill that gap.)
  • Those 500 images represent roughly 25,000,000 square kilometers of area, but we can reduce that even further by finding images on the shoreline, and eliminating any pixels in those images we know that belong to land. That drops the total area by another quarter to 15,000,000 square kilometers.

So at this point, we’ve reduced our load to about 17% of the data that our satellite source is actively sending each day. Figure 4 illustrates the effect that these filters have on our data load:

Figure 4. [Gallery] Filtering images. a) All Sentinel-1 images in one day. b) Filtered by polarization. c) Filtered by intersection with the ocean. Images compiled by Jona Raphael, SkyTruth.

Can we do better? Let’s hope so — remember that each of those 500 images is almost a gigabyte of data. That’s like filling up the memory of a brand new Apple laptop every day: an expensive proposition. Here are some ways that we make it easier to process that much information:

  • First, we don’t need to store all that data at SkyTruth. Just like you don’t need to store all of Wikipedia just to read one article, it is more efficient to load an article one at a time, read it, and then close the window to release it from memory before opening another one. That means if we’re clever, we’ll never have to load all 500 images at once.
  • Each image is originally created as 32-bit, but we can easily convert it to 16-bit, thereby halving the data size. You can think of the bit-depth as the number of digits after the decimal place: Instead of storing the value ‘1.2345678’, we would store ‘1.235’, which is almost the same value, but takes a lot less effort to store in active memory.
  • We can further reduce the amount of required memory by reducing the resolution of the image. We find it works just fine to reduce each dimension (height and width) by a factor of eight each, which means the number of pixels actually decreases by a factor of 64. This has the effect of averaging 64 small pixels into one large one.
  • Finally, we don’t need to process the whole image at once. Just like it was easier for you to spot the oil slick when you were looking at a smaller zoomed in portion of the satellite image above, we can divide up each picture into 32 smaller square ‘chips’.

Taken together, we can now work on single chips that are only 250 kilobytes  — instead of 1 gigabyte images — effectively a 99.975% reduction in memory load. That translates directly to speeding up the two core parts of machine learning: training the model and making predictions. Because training an ML model requires performing complex mathematics on thousands of examples, this speed up represents the difference between training for 10 minutes versus 20 hours.

But, wait! What about the step in which we threw away data by averaging 64 pixels into one large pixel? Is that the same as losing 63 pieces of information and saving only one? It is important to address this idea in the context of machine learning and what a machine is capable of learning. It hinges on the difference between information and data. Generally speaking, we want resolution to be as low as possible, while retaining the critical information in the image. That’s because the lower the resolution, the faster our model can be trained and make predictions. To figure out how much resolution is necessary, a convenient rule of thumb is to ask whether a human expert could still accurately make a proper assessment at the lower resolution. Let’s take a quick look at an example. Each of the following images in Figure 5 has one-fourth as many pixels as the previous:

Figure 5: [Gallery]  Reducing resolution by factor of 4: 512×512, 256×256, 128×128, 64×64, 32×32. (Contains modified Copernicus Sentinel data 2019).

If your objective is to find all the pixels that represent oil, you could easily do so in the first two images (zooming in is allowed). The next two are a bit harder, and the final one is impossible. Note that the fourth image is a factor of 64, or 98.5%, smaller than the first, but the oil is still visible. If we think critically about it, we realize that the number of pixels in a picture is irrelevant — what actually matters is that the pixels must be much smaller than the object you are identifying. In the first image, the oil slick is many pixels wide and so it has sufficient information for an ML model to learn. However, in the last image the pixels are so large that the tiny slick all but disappears in the averaging. If, instead, your objective is to find all the pixels that represent land, then the final image (32×32) is more than sufficient and could probably be reduced even further.

However, reducing resolution also comes with downsides. For instance, the reduction in visual information means there are fewer context clues, and therefore less certainty, about which pixels are oil and which are not. This results in more pixels being labeled with the wrong prediction, and a corresponding reduction in accuracy. Furthermore, even if it were possible to perfectly label all of the pixels that are oil, the pixels are so big that it’s difficult to get a good sense for the exact shape, outline, or path of the oil itself, which results in lower quality estimates derived from this work (for instance, estimating the volume of oil spilled).

If only we could use the resolution of the first image somehow… 

Good news — we can! It turns out that the first image above, the 512×512, already had its resolution reduced from the original satellite image by a factor of 64. To give you a sense, Figure 6 shows the original resolution satellite image side by side with the version we use in training for our ML model. The second image has lower resolution, but because the pixels are still much smaller than the features we care about, we are able to avoid most of the downsides described above. The oil spill is still unambiguously identifiable, so we are willing to trade the resolution for gains in training and prediction. This is a subjective tradeoff that each machine learning project must make based on the resources available

Figure 6:  Original satellite image at full resolution (left), compared to reduced resolution used for ML prediction (right). (Contains modified Copernicus Sentinel data 2019).

So what are the takeaways from this exploration? We first learned that it takes a lot of imagery to monitor the surface of the Earth. However, by thoughtfully excluding imagery that doesn’t capture the information we care about, it’s possible to reduce the volume of data we need to process significantly. Then, we discussed a few different tricks that make it much faster to train and predict on that pared down dataset. In particular, the most important technique is to reduce the resolution as much as possible, while maintaining a pixel size that is substantially smaller than any features we are attempting to identify. All told, there is a lot of data that still needs to be processed, but if we can achieve a total process-time for each satellite image that is under three minutes, then the whole pipeline can be run on a single computer. Right now, we are down to about 10 minutes per image, and still have a few tricks up our sleeve. So we’re hopeful that in the coming months we’ll hit our target and have a robust and scalable mechanism for identifying oil slicks automatically. 

Until then, we continue to tip our hats to the SkyTruthers that regularly identify and expose these egregious acts, and we look forward to the day that the illegal practice of bilge dumping comes to an end.