On Considering The Larger World Around Us: The SkyTruth Intern Experience

Bilge dumping and more allowed Tatianna Evanisko to think big at SkyTruth.

SkyTruth seemed like a great fit. I had always been interested in data, the inductive route, experiencing things firsthand and then exploring my assumptions. I was compelled by computation as well as the natural world. Being active in environmental protection was important to me and I had always been drawn to vocations with a larger purpose; that allowed me to be visionary and have big dreams. Growing up on the tail end of the Millennial generation, I had experienced an explosion of technology, becoming what some have coined a “screenager.” Not only that, but I had grown up in the climate change generation. In recent decades we’d seen an increase in extreme weather events, environmental atrocities, and lost species. But notably, my generation also has been active in movements that strive to address these problems such as eating less meat, using alternative energy, and living more sustainably. Even at a time of climate conspiracies and fake news, several million people globally participated in the largest climate protest in history in 2019. That’s not to say I believe all that I hear. However, over time and by paying attention to environmental events occurring all over the world, I find the evidence overwhelming: the Earth is changing, and more and more people are bearing witness to it.

When I started as an intern at SkyTruth I was asked what issues I cared about to help me decide what to work on. My reaction was: everything of course! How can you ask such a thing? To a certain extent, my options were already defined: most of the SkyTruth staff were using satellite imagery, and despite the other issues we were working on, at some point we were all looking at the ocean — the eerie, non-terrestrial world — often in search of pollution, such as oil.  My work quickly became focused on searching for streaks of oil in the middle of vast oceans. Oil can appear on radar satellite imagery as a uniform dark and linear formation, called a slick. Many of these slicks come from cargo vessels and tankers that dump their untreated oily waste from the bottom of their ship (the bilge) into the ocean, an act called bilge dumping. Our team has been developing a solution that expands the capacity of SkyTruth to automate the detection of these slicks by using machine learning, a type of artificial intelligence. In a matter of months, I helped turn an empty spreadsheet into a collection of over 330 images of oil slicks — training data that we could use to “teach” computers to recognize the slicks in our prototype of a monitoring platform named Cerulean. Wow — intelligent and creative minds at work, which will soon enable anyone globally to monitor the sea to detect oil slicks with SkyTruth! 

Ocean monitoring thus became a routine event for me, and naturally I started to notice some patterns. I learned the locations of energy infrastructure as well as the largest shipping routes and ports, and this meant I also realized when the environment changed. For example, we found oil appearing in regular areas at sea all over the world. On a weekly basis, we discovered  obvious oil slicks where the normally smooth grey (on a radar image) ocean was instead splattered with black streaks. I also tracked  some of the vessels I believed were responsible for the oily waste and they shared something in common: many were registered under flags different from their country of ownership. I wanted to know more. Why were ships dumping in the same places — what was it about those areas that was attracting them? Did those vessels have something in common? Who was responsible for making the choice to dump pollution — the crew, the vessel operator, or the vessel company? This was the catalyst for my largest project at SkyTruth, a multi-month pursuit to understand the scale, impact, intentions, and potential solutions of the dumping of this untreated oily waste. 

Compilation of training data showing the various ways oil slicks appear on radar satellite images.

Bilge dumping isn’t the first environmental issue that people think of when they think of protecting the Earth. In fact, when I started at SkyTruth I had only ever heard of accidental, large scale oil spills, such as the 2010 BP spill in the Gulf of Mexico, and was unaware that smaller, more frequent and intentional acts of pollution occur. Additionally, there is little information about bilge dumping  online. One of the last large-scale reports, published by the National Academies Press, was released in 2003. My quest to know more had to be thorough. I had to read prolifically and search widely in order to piece together the true scale and impact of this issue. 

What did I find? I learned how vast the world’s ocean is (encompassing various oceans, composed of bodies of water such as seas and straits) and how little the ocean is regulated (legal authority depends on nearby countries). Promising international treaties don’t necessarily lead to legislation that allows for enforcement or meaningful measures to prosecute polluters. Vessels’ operators should know that polluting the oceans is wrong, but have little incentive to protect marine waters, especially when penalties are rare. I learned that some vessel operators choose to pollute the ocean — to harm coastal birds, dolphins, and coral reefs, to adversely impact human health, to harm the livelihoods of coastal businesses, and to leave beaches stained and tarred — all just to save money.  

But my research didn’t just uncover bad news. A lot of stakeholders are interested in initiatives supporting more sustainable seas. Not only citizen activists, non-profits, and coastal communities, but investors and technology providers.  Several indexes score vessels on how well they manage waste and emissions, and some international sustainable shipping partnerships have pledged to support and invest in cleaner ships. Additionally, support systems for whistleblowers allow them to share their stories in confidence, so that authorities can punish the operators of vessels that  are polluting at sea with large fines and probations. And groups like SkyTruth are out there fighting for a cleaner world.  You can access my findings in my series of blog posts here

Likely bilge dumping events identified by SkyTruth in 2020

In general, my time at SkyTruth taught me how to use powerful technology to solve complex issues and how to use data to tell stories. I was encouraged to ask as many questions as I answered, to differentiate between what was certain or just an assumption, to be fair in my reporting (using words such as “likely” or “suspected” instead of assuming blame) and to seek evidence-based truths. I was included on esoteric programming projects that I couldn’t quite understand and was pushed to grow from those challenges; I learned faster this way. I was given the autonomy to do my own investigative research, and was provided  a platform to report on — an overwhelming transition from positions I had previously held. The SkyTruth team had confidence in me, and valued my feedback. I will take my experiences from SkyTruth out into my next venture with the same enthusiasm to do work with an important mission. 

When I started writing my  series on global bilge dumping I was inspired by a quote SkyTruth’s Writer-Editor Amy Mathews introduced to me: “Don’t just share your data, share your awe,” which she attributed to former National Public Radio correspondent Christopher Joyce. True fulfillment comes from making a difference and being motivated by what matters most to us. Nonprofit organizations like SkyTruth have the ability to engage in both local and exotic pursuits, to consider personal stories, and tackle the challenges of society for reasons beyond mere profit. They think big — really big — and look into the future; this is awe-inspiring work. They pursue concerns we may not know we have, matters that elude us in our day-to-day lives, but that have true impact. Working for a cause you care about is fulfilling. You never have to doubt the importance of your work. It was humbling to be a changemaker in the weekday hours. 

As I wrap up my ten months at SkyTruth, every day I still feel a profound sense of how small I am. SkyTruth, through the constant engagement with global imagery, made me recognize the interconnectedness of the world and amplified the numerous opportunities to advocate for change. I’ve learned that the more informed you are, the more you can make good decisions about your life and future. I’ve learned to seek a deeper understanding of issues beyond what appears on the surface. And I’ve learned to question everything, observe the environment, appreciate it, and protect it. 

 

Photo: Tatianna at work during COVID-19 quarantine. Photo credit: Tatianna Evanisko

Protecting Biodiversity and Indigenous Lands from Space

Illegal mining is devastating parts of the Amazon rainforest. SkyTruth is figuring out how to detect new mining threats and alert conservationists on the ground.

The Amazon rainforest is one of the most biologically diverse places on Earth; a breathtaking riot of life that evolved over eons, encompassing the Amazon River and its vast system of tributaries. Those rivers hold more species of fish than any other river system in the world.  The surrounding forests are home to 25% of the world’s terrestrial species. Many are found only in the Amazon region, and some are endangered, while others undoubtedly remain unknown. Besides their intrinsic value as unique species, rainforest flora and fauna represent a barely tapped reservoir of genes, chemicals, and more that could benefit humankind.  Already, more than 25% of medicines used today trace their roots back to Amazonian species, including quinine and many cancer drugs. How many more remain hidden?

And then there’s the forest’s role in regulating climate: those 1.4 billion acres of trees covering 40% of South America hold a tremendous amount of carbon. If released, that carbon will accelerate climate change and the disruptions we already are seeing on Earth, including rising temperatures, melting glaciers, stronger storms, longer droughts, and more frequent flooding.

Photo: Jaguar by Nickbar from Pixabay.

Tragically, this carbon is in fact being released. For decades, there has been widespread concern about deforestation in the Amazon as logging, mining, agriculture, and human infrastructure penetrate forest boundaries and slash holes in otherwise intact habitat. Today, ever more remote regions are affected, including lands held by indigenous people who depend on the plants and animals of the forest to survive.  As forest life disappears, so too will ancient cultures that have lived sustainably in the forest for centuries, victims of a global economy and expanding population that demands ever more resources.

Before this year, SkyTruth’s work hadn’t focused on the world’s rainforests. Yet the fact that they are remote, dense, and threatened makes them perfect targets for exploring environmental damage from space, and our new partnership with Wildlife Conservation Society (WCS) has pushed SkyTruth to expand its reach in applying its tools to new parts of the world, including the Amazon.

One growing problem in particular has caught our attention: small-scale, artisanal mining for gold in Peru and Brazil along tributaries of the Amazon such as the Inambari River. These aren’t the huge gold mines of the Northern Hemisphere, but rather individual miners or groups of miners who work along the edges of rivers, dredging their banks and beds with toxic mercury to separate out small flecks of gold. In the process, miners cut down trees and destroy riverside habitat with their dredges, pits, and sluices. Their mercury poisons the water, fish, birds, and people who rely on these rivers. Although it’s called “small-scale,” the actions of an estimated 40,000 miners add up: as of 2018 such mining had destroyed 170,000 acres of virgin forest in southeast Peru alone. It’s illegal there, and in other protected areas throughout the region, yet it often occurs unchecked. Government agencies in the region, and our partners at WCS and other NGOs, have struggled with identifying new mining activity in such remote regions; if they don’t know where mining is occurring, they can’t take action to stop it.

Radar satellite imagery from the European Space Agency’s Sentinel-1 satellite can help. This radar penetrates the rainforest’s frequent cloud cover and reveals activities on the ground underneath. Using this imagery, SkyTruth has begun developing an open mapping platform to identify areas on the ground that have been deforested because of mining, and illustrate trends over time to reveal new mining activity. While radar imagery is able to see through clouds, it lacks the spectral data provided by optical (color-infrared) satellite sensors. To compensate for this, our model includes a processing step that cleans and enhances each image. Then, the images are analyzed using a random forest classifier that we’ve trained to identify land cover types, including mining.

You can see the output of our model in Figure 1 for the Madre de Dios region in southern Peru. Areas in red are classified as likely mines, while areas in yellow correspond to cleared forest, those in green are intact forest, and those in blue are water.  

Figure 1. Recent mining in Madre de Dios, Peru.

So far, we’ve successfully detected recent mining operations in the Madre de Dios region (as well as in the lands of the Munduruku tribe in Brazil, shown in Figure 2) The Munduruku have been struggling for years to demarcate their sovereign lands to protect their indigenous culture and stop continued encroachment from mining.  

Figure 2. Mining activity in Munduruku land along the Cabruá and Das Tropas Rivers in Brazil’s Para state.

This past week, SkyTruth submitted its pitch highlighting this progress as a semi-finalist in the Artisanal Mining Challenge, a competition sponsored by Conservation X Labs to address the adverse impacts of artisanal mining around the world. We made the first cut this spring (from 90 applicants down to 26), and are hopeful that our proposed Project Inambari will be promoted by the judges through this next round of the competition, and we’ll become one of 10 finalists. That would put us in position to be chosen as one of the winners, and to receive significant funding to scale-up this vitally important initiative. We’ll keep you posted.

Alice Foster’s Internship Triggered New Excitement About Her Career Possibilities

Before her internship, Alice felt burnt out at school. After applying new skills and technologies to environmental projects at SkyTruth, she’s looking forward to her remaining classes and a fulfilling career.

As I wrap up my four-month internship at SkyTruth, I would like to share some highlights and takeaways from my experience. During my internship I explored the field of geospatial technology for the first time, which allowed me to learn new skills and gave me insight into my career goals. I learned about global environmental issues that I hadn’t known existed. And I got to work with a kind, dedicated, creative group of people. I contributed to SkyTruth’s mountaintop mining research and Project Inambari, which will create an early alert system for tropical forest mining. I also spent time identifying oil and gas well pads, collecting images of oil slicks, and creating annotated maps in QGIS, a geographic information system application that can be used to analyze and visualize geospatial data such as satellite imagery or a ship’s track across the ocean.

On just my first day of orientation at SkyTruth, the high level of support and guidance I received from the staff surprised me. My advisers Brendan Jarrell and Christian Thomas spent lots of time introducing me to concepts and technologies (like Google Earth Engine and QGIS) that I would use in my work. One of the first skills I learned was recognizing oil slicks on satellite imagery — most likely from vessels dumping oily bilge water at sea — and creating an annotated map to reveal the slicks to the public. Brendan patiently guided me through the steps to making a map twice. The team congratulated me when I found my first slick, even though I did not think it merited attention. This encouragement made me feel welcomed and excited about my work. 

The search for oil slicks allowed me to virtually explore oceans and coastlines across the globe. With time, it revealed to me more than how to use geospatial technology, but how little geography I knew. I would toggle past a country or island and wonder what it was like there, realizing I did not even know its name. And so I started exploring a geography trivia website in my free time to teach myself the countries of the world. I am now learning capital cities in Europe, which I tend to forget.

After getting practice with Google Earth Engine — a tool for analyzing and mapping satellite imagery and change around the world —  during my first couple of weeks at SkyTruth, I became involved in some mining-related projects. In one project, I adapted code from SkyTruth’s mountaintop mining research to incorporate satellite imagery from the European Space Agency’s Sentinel-2 satellite. This imagery provides us additional data, which could improve our ability to detect surface mining throughout Central Appalachia. Working with the code in Earth Engine allowed me to better understand SkyTruth’s process for identifying mines. First, we produce a greenest pixel composite image from a collection of images. Making a composite in Earth Engine means combining multiple overlapping images to create a single image. Images can be combined in different ways; in this case, the greenest pixel composite selects pixels with the highest Normalized Difference Vegetation Index (NDVI) values compared with corresponding pixels in the image collection. NDVI is an indicator of plant health in a given area. To provide a more concrete example, suppose we want to make a greenest pixel composite from three images, all showing a part of West Virginia at different times of summer. Say we look at one pixel in one of the images, which covers a small square of forest. We then compare this pixel with the pixels covering the same bit of forest in the other two images, and we choose the greenest of the three (or, the pixel with the highest NDVI value). If we repeat this process for every pixel in the image, we get one image with all the greenest pixels selected from the collection. 

A second script uses the greenest pixel composite to approximate the lowest NDVI value for each county, producing a threshold image. Again, say we have the greenest pixel composite of West Virginia that we just made. Now we look at forested areas within one county and find the pixels that are least green, or have the lowest NDVI values, and then take the average of these NDVI values. This is the threshold for that county; if a pixel is less green than the threshold, it is likely a mine. Our output image contains these values for every county. As a final step, we compare the greenest pixels with the NDVI thresholds to determine likely mine areas. 

Figure 1. Mining data overlying a Sentinel-2 greenest composite image. The image covers counties in West Virginia, Virginia, and Kentucky.

SkyTruth’s surface mining expert, Christian Thomas, also had me experiment with two different techniques for masking clouds in Sentinel-2 imagery. Clouds obstruct necessary data in images, so clearing them out improves analyses. The standard approach uses a built-in “cloud mask” band. The other approach is an adapted “FMasking” method. This takes advantage of the  arrangement of sensors on Sentinel-2 satellites, which creates a displacement effect in the imagery that is more pronounced for objects at altitude. The FMask uses this effect to distinguish low altitude clouds from human-made infrastructure on land. Though the two methods had similar results, the FMask seemed slightly more accurate.

Working on technical projects like this, I learned how much I enjoy using imagery and geospatial data. I had found analyzing data interesting in the past, but something about being able to visualize the information on a map was even more appealing. I loved how a satellite image could be reduced to numbers and assessed quantitatively, or understood visually, almost as a piece of art. 

In another project, I had the opportunity to develop my writing skills by contributing to an  application for the Artisanal Mining Grand Challenge, a global competition to provide solutions for small-scale, low-tech, and/or informal mining. Researching artisanal gold mining was illuminating, as I knew almost nothing about the subject beforehand. I learned that illegal gold mining in Venezuela and Peru has often involved brutal violence and exploitation. In recent decades, labor and sex trafficking have plagued remote mining regions like Madre de Dios. Small-scale mining practices are also particularly damaging to the biodiverse Amazon ecosystem. To extract a small amount of gold, miners must dig up massive amounts of sediment, denuding the landscape in the process. The use of mercury in artisanal gold mining is incredibly detrimental to water quality and human health.

I was also able to be involved in the technical side of this project, building a tool to detect mines in the Peruvian Amazon. I created a mask that removes water from satellite images so that water areas could not be mistaken for mine areas or vice versa. Mines are often near water or can look like water in imagery. To make the mask, I used the European Commission’s Joint Research Centre global surface water dataset. This dataset contains information about where and when surface water occurred around the world over the past thirty years. In Google Earth Engine, the data is stored in an image with bands representing different measures of surface water. I used the “occurrence,” “seasonality,” and “recurrence” bands to create the mask. “Occurrence” refers to how often water was present at a location; “seasonality” means the number of months during which water was present; and “recurrence” is the frequency with which water returned from one year to the next. I tried to find a combination of band values that would do the best job getting rid of water without masking mines or forest. For example, using an occurrence value of twenty, (that is, masking pixels where water was present twenty percent or more of the time), ended up masking mine areas as well. Christian also suggested using a buffer, which meant that pixels adjacent to a masked pixel also got masked. Since the mask often did not capture all of the pixels in a body of water, the buffer filled in the gaps. Masked pixels dotting a river became a continuous thread. The buffer also helped eliminate river banks, which look similar to mines. We applied the finished water mask to the area of interest in Madre de Dios, Peru.

Figure 2: Water mask in the Madre de Dios region of Peru. White pixels have value 1, while black pixels (water) have value 0. When the mask is applied to a satellite image, all pixels in the black areas appear transparent and are not included in analyses. When identifying potential mines in the image, the masked areas are ignored.

Researching issues related to artisanal gold mining left me unsure of how countermeasures can fully promote the welfare of mine workers and others involved in the long term. The problem of illegal gold mining seems entrenched in broader economic and social issues and therefore cannot be addressed simply by identifying and eradicating mines. Nevertheless, understanding the great damage that this type of mining can do to humans and their environment made clear to me the importance of the project. 

Not only did working at SkyTruth teach me a variety of technical and professional skills, it also helped reveal to me what I want to learn about and pursue in the future. In school last fall, I felt burnt out to the point that I just wanted to get through my remaining semesters and be done. Now I feel the excitement about academics I had as a freshman, motivated and informed by my experience at SkyTruth. With my interest in geology and climate issues renewed, I feel like there is barely enough time left to take all the classes I want to. I hope to improve on skills like writing and computer programming so that I can contribute my best work in the future. Being part of an amazing team has motivated me in that way. I also know that I would like to use the geospatial technologies and approaches I learned at SkyTruth moving forward. I feel excited about future career possibilities; before my internship, I felt confused.

I want to give a huge thank you to Bruce and Carolyn Thomas for hosting me in Shepherdstown. I want to thank Christian for introducing me to SkyTruth and for including me in his Dungeons and Dragons game! And I want to thank everyone on the SkyTruth team for their guidance and for being wonderful.

Figure 3: Team Hike, Harpers Ferry, West Virginia. Photo by Amy Emert.

Right-sizing Our Data Pipeline to Detect Polluters

How does SkyTruth’s new project Cerulean reduce the time and cost of processing enormous volumes of satellite information?

Project Cerulean is SkyTruth’s systematic endeavor to curb the widespread practice of bilge dumping, in which moving vessels empty their oily wastewater directly into the ocean. (We recently highlighted the scope and scale of the problem in this blog series) Our goal is to stop oil pollution at sea, and this particular project aims to do that by automating what has historically been a laborious manual process for our team. SkyTruthers have spent days scrolling through satellite radar imagery looking to identify the telltale black streaks of an oil slick stretching for dozens of kilometers on the sea’s surface. We are finally able to do this automatically thanks to recent developments in the field of machine learning (ML). Machine learning “teaches” computers to identify certain traits, such as bilge slicks. To do so efficiently, however, requires a lot of computation that costs both time and money. In this article, we explore a few tricks that allow us to reduce that load without sacrificing accuracy.

The sheer volume of data being collected by the thousands of satellites currently in orbit can be overwhelming, and over time, those numbers will only continue to increase. However, SkyTruth’s bilge dump detection relies primarily on one particular pair of satellites called Sentinel-1, whose data are made available to the public by the European Space Agency’s Copernicus program. Sentinel-1 satellites beam radar energy down at the surface and gather the signal that bounces back up, compiling that information into birds-eye images of Earth. To get a sense of how much data is being collected, Figure 1 shows a composite of all the footprints of these images from a single day of collecting. If you spent 60 seconds looking at each image, it would take you 21 hours to comb through them all. Fortunately, the repetitive nature of the task makes it ripe for automation via machine learning.

Figure 1. One day’s worth of radar imagery collected by Sentinel-1 satellites. Each blue quadrilateral represents the location of a single image. You can see the diagonal swath that the satellites cut across the equator. (Note the scenes near the poles are distorted on this map because it uses the Mercator projection.) Image compiled by Jona Raphael, SkyTruth.

Figure 2 illustrates a typical satellite radar image: just one of those blue polygons above. These images are so big — this one captures 50,000 square kilometers (more than 19,000 square miles) — that it can be tough to spot the thin oil slicks that appear as a slightly blacker patch on the dark sea surface. In this image, if you look closely, you can see a slick just south of the bright white landmass. (It’s a bit clearer in the zoomed in detail that follows.)

Figure 2. Satellite imagery, from January 1, 2019, capturing a difficult to see bilge dump just south of the tip of Papua New Guinea (Copernicus Sentinel data 2019).

Contrary to intuition, you are not seeing any clouds in this picture. Radar consists of electromagnetic radiation with very long wavelengths, so it travels from satellites through the atmosphere largely undisturbed until it hits a surface on the Earth. Once it hits, it is either absorbed by the surface or reflected back into space (much like bats using echolocation). The lightest section of the image in the top right corner is part of a mountainous island. This jagged terrain scatters the radar diffusely in all directions and reflects much of it back to the satellite. The rest of the image is much darker, and shows us the ocean where less of the radar energy bounces back to the satellite receiver. The muddled, medium-gray area along the bottom of the image shows us where strong gusty winds blowing across the ocean surface have made the water choppy and less mirror-like. Figure 3 shows us more clearly the oil slick just offshore. 

 

Figure 3: Detail from Sentinel-1 radar image shown above. The narrow oil slick identified by a dark gray streak along the bottom of this image is roughly 60 kilometers (40 miles) long, and only 15 kilometers (roughly nine miles) offshore. (Contains modified Copernicus Sentinel data 2019).

Although each image covers a large area, we need to process many images each day to monitor the entire Earth. How many? Here’s an approximation:

  • 510,000,000 square kilometers  = The total surface of the earth
  • 90,000,000 square kilometers  = Total area of images captured in one day (1,300 scenes)

This means that we expect the whole Earth to be imaged somewhere between every six to 12 days (because many of the images overlap each other), or roughly 10,000 images. 

Every year more satellite constellations are being launched, so if a new constellation were to capture the whole earth’s surface in a single day, then we would need to spend an order of magnitude more processing time to ingest it. We care about this because each image scanned will cost time, money, and computational power. To enable appropriate allocation of resources for automation, it’s critical to understand the scale of the data. For now, we can size our processing pipeline to the current number, but we must take measures to ensure the system is scalable for the increasing numbers of satellite images we anticipate in coming years.

So does that mean we need to look at 1,300 images every day? Thankfully not. We have a few techniques that we’ll use to make the computations manageable:

  • First off, we’ve found that radar satellite images near the poles are typically captured using a particular configuration called HH polarization — great for mapping sea ice, but not ideal for detecting oil slicks. And the presence of the sea ice itself makes oil slick detection difficult. If we remove those images from the 1,300, we have about 880 that suit our needs (using VV polarization).
  • Next, we won’t be looking for oil slicks on land, so we can further filter out any images that don’t have at least some ocean in them. That reduces our set of images to about 500. (Note: Sentinel-1 coverage of the open ocean is generally poor, as discussed in our previous blog post, but we anticipate future radar constellations will fill that gap.)
  • Those 500 images represent roughly 25,000,000 square kilometers of area, but we can reduce that even further by finding images on the shoreline, and eliminating any pixels in those images we know that belong to land. That drops the total area by another quarter to 15,000,000 square kilometers.

So at this point, we’ve reduced our load to about 17% of the data that our satellite source is actively sending each day. Figure 4 illustrates the effect that these filters have on our data load:

Figure 4. [Gallery] Filtering images. a) All Sentinel-1 images in one day. b) Filtered by polarization. c) Filtered by intersection with the ocean. Images compiled by Jona Raphael, SkyTruth.

Can we do better? Let’s hope so — remember that each of those 500 images is almost a gigabyte of data. That’s like filling up the memory of a brand new Apple laptop every day: an expensive proposition. Here are some ways that we make it easier to process that much information:

  • First, we don’t need to store all that data at SkyTruth. Just like you don’t need to store all of Wikipedia just to read one article, it is more efficient to load an article one at a time, read it, and then close the window to release it from memory before opening another one. That means if we’re clever, we’ll never have to load all 500 images at once.
  • Each image is originally created as 32-bit, but we can easily convert it to 16-bit, thereby halving the data size. You can think of the bit-depth as the number of digits after the decimal place: Instead of storing the value ‘1.2345678’, we would store ‘1.235’, which is almost the same value, but takes a lot less effort to store in active memory.
  • We can further reduce the amount of required memory by reducing the resolution of the image. We find it works just fine to reduce each dimension (height and width) by a factor of eight each, which means the number of pixels actually decreases by a factor of 64. This has the effect of averaging 64 small pixels into one large one.
  • Finally, we don’t need to process the whole image at once. Just like it was easier for you to spot the oil slick when you were looking at a smaller zoomed in portion of the satellite image above, we can divide up each picture into 32 smaller square ‘chips’.

Taken together, we can now work on single chips that are only 250 kilobytes  — instead of 1 gigabyte images — effectively a 99.975% reduction in memory load. That translates directly to speeding up the two core parts of machine learning: training the model and making predictions. Because training an ML model requires performing complex mathematics on thousands of examples, this speed up represents the difference between training for 10 minutes versus 20 hours.

But, wait! What about the step in which we threw away data by averaging 64 pixels into one large pixel? Is that the same as losing 63 pieces of information and saving only one? It is important to address this idea in the context of machine learning and what a machine is capable of learning. It hinges on the difference between information and data. Generally speaking, we want resolution to be as low as possible, while retaining the critical information in the image. That’s because the lower the resolution, the faster our model can be trained and make predictions. To figure out how much resolution is necessary, a convenient rule of thumb is to ask whether a human expert could still accurately make a proper assessment at the lower resolution. Let’s take a quick look at an example. Each of the following images in Figure 5 has one-fourth as many pixels as the previous:

Figure 5: [Gallery]  Reducing resolution by factor of 4: 512×512, 256×256, 128×128, 64×64, 32×32. (Contains modified Copernicus Sentinel data 2019).

If your objective is to find all the pixels that represent oil, you could easily do so in the first two images (zooming in is allowed). The next two are a bit harder, and the final one is impossible. Note that the fourth image is a factor of 64, or 98.5%, smaller than the first, but the oil is still visible. If we think critically about it, we realize that the number of pixels in a picture is irrelevant — what actually matters is that the pixels must be much smaller than the object you are identifying. In the first image, the oil slick is many pixels wide and so it has sufficient information for an ML model to learn. However, in the last image the pixels are so large that the tiny slick all but disappears in the averaging. If, instead, your objective is to find all the pixels that represent land, then the final image (32×32) is more than sufficient and could probably be reduced even further.

However, reducing resolution also comes with downsides. For instance, the reduction in visual information means there are fewer context clues, and therefore less certainty, about which pixels are oil and which are not. This results in more pixels being labeled with the wrong prediction, and a corresponding reduction in accuracy. Furthermore, even if it were possible to perfectly label all of the pixels that are oil, the pixels are so big that it’s difficult to get a good sense for the exact shape, outline, or path of the oil itself, which results in lower quality estimates derived from this work (for instance, estimating the volume of oil spilled).

If only we could use the resolution of the first image somehow… 

Good news — we can! It turns out that the first image above, the 512×512, already had its resolution reduced from the original satellite image by a factor of 64. To give you a sense, Figure 6 shows the original resolution satellite image side by side with the version we use in training for our ML model. The second image has lower resolution, but because the pixels are still much smaller than the features we care about, we are able to avoid most of the downsides described above. The oil spill is still unambiguously identifiable, so we are willing to trade the resolution for gains in training and prediction. This is a subjective tradeoff that each machine learning project must make based on the resources available

Figure 6:  Original satellite image at full resolution (left), compared to reduced resolution used for ML prediction (right). (Contains modified Copernicus Sentinel data 2019).

So what are the takeaways from this exploration? We first learned that it takes a lot of imagery to monitor the surface of the Earth. However, by thoughtfully excluding imagery that doesn’t capture the information we care about, it’s possible to reduce the volume of data we need to process significantly. Then, we discussed a few different tricks that make it much faster to train and predict on that pared down dataset. In particular, the most important technique is to reduce the resolution as much as possible, while maintaining a pixel size that is substantially smaller than any features we are attempting to identify. All told, there is a lot of data that still needs to be processed, but if we can achieve a total process-time for each satellite image that is under three minutes, then the whole pipeline can be run on a single computer. Right now, we are down to about 10 minutes per image, and still have a few tricks up our sleeve. So we’re hopeful that in the coming months we’ll hit our target and have a robust and scalable mechanism for identifying oil slicks automatically. 

Until then, we continue to tip our hats to the SkyTruthers that regularly identify and expose these egregious acts, and we look forward to the day that the illegal practice of bilge dumping comes to an end.

SkyTruth Board Member Mary Anne Hitt: Activist Extraordinaire

Mary Anne Hitt has led Sierra Club’s Beyond Coal Campaign to extraordinary national success. But she honed her skills in Appalachia, with a little help from SkyTruth.

You might say Mary Anne Hitt has Appalachian activism in her blood. When she was growing up in Gatlinburg, Tennessee (where she attended Dolly Parton’s former high school), her father was Chief Scientist at Great Smoky Mountains National Park. Back then, acid rain was decimating high elevation forests in the East, fueled by pollution from coal-fired power plants. Her father watched as iconic places in the park turned into forests of skeleton trees. He knew the science pointed to nearby power plants run by the Tennessee Valley Authority, and wanted to stop the pollution. But his warnings triggered some resistance from those who didn’t want to rock the boat. “So right from the start,” says Mary Anne, she was “immersed in the beauty and the threats” of protecting Appalachian forests. And she knew the costs of speaking out.

Those costs have never stopped her. Mary Anne graduated from the University of Tennessee, creating her own environmental studies major and forming a student environmental group that continues today. Later, she obtained a graduate degree in advocacy at the University of Montana. Now, she leads the Sierra Club’s Beyond Coal Campaign; a national effort to retire all coal plants in the United States, moving towards 100% renewable energy by 2030, while supporting economic opportunities in communities affected by plant closures.

And she serves on SkyTruth’s board of directors. Her entre to SkyTruth is also steeped in Appalachian advocacy. In the early aughts, Mary Anne was Executive Director of Appalachian Voices, a nonprofit conservation group dedicated to fighting mountaintop mining, fracked-gas pipelines and other harmful activities in Appalachia, while advancing energy and economic alternatives that allow Appalachian communities to thrive. Appalachian Voices is one of SkyTruth’s conservation partners; a relationship that began under Mary Anne’s leadership.

As Mary Anne tells it, Appalachian Voices was fighting mountaintop mining and construction of a new coal plant in southwest Virginia. While fighting the plant, they discovered that 200 new power plants were planned across the country. In other words, a whole new generation of power plants was on the books to replace aging plants. A coalition of grassroots groups and local citizens, organized with help from the Sierra Club, worked to stop them, fighting permits at every stage, slowing the process down and making financial backers nervous.

Figure 1. Mary Anne Hitt

Appalachian Voices contacted SkyTruth to help them convey the vast extent of mountaintop mining in Appalachia as part of their work. In response, SkyTruth developed the first scientifically credible database on the extent of mountaintop mining in the region. (You can read more about this collaboration and what we found here.) SkyTruth continues to update this database every year, providing scientists and others valuable information that supports research on the ecological and human health effects of mountaintop mining.

SkyTruth’s database helped support the broader advocacy work Appalachian Voices was spearheading to fight coal mining and power plants in the region. Collectively, environmental, legal, and grassroots groups nationwide stopped almost all of the proposed power plants, according to Mary Anne. (Ironically, the one in southwest Virginia actually did get built.) “If these plants had been built it would have been doom for our climate,” Mary Anne says now. “There would have been no room for renewables…Grassroots people working in their communities made it happen. That’s what makes me most proud.”

Mary Anne took her successful experience fighting power plants in Appalachia and brought it to the Sierra Club as Deputy Director of the Beyond Coal Campaign in 2008, later becoming Director. The Sierra Club has built on those early lessons and applied them to shutting down all coal plants in the United States. Today, 312 of 530 plants that existed in 2010 have retired or announced their retirement. And according to Mary Anne, the United States reached a promising benchmark a year ago: last April marked the first time we obtained more energy from renewables than from coal. In fact, in 2019 the US consumed more power from renewable energy than from coal for the first time in 130 years. “Most of our arguments now are economic,” says Mary Anne. “The power from a coal plant is more expensive than renewable energy, so people don’t want it. People will keep demanding renewables.”

In April of this year, Mary Anne took on an even bigger responsibility at Sierra Club – the National Director of Campaigns, a new position in the organization where she oversees all the organization’s campaign work. It’s a big job, on top of being a mother to her ten-year old daughter. So why did she agree to join the SkyTruth Board? “Ever since my daughter was born,” says Mary Anne, “I had a policy of not being on any boards because I have a demanding job and serving on boards was more time away from her. But I really believe that SkyTruth’s work is foundational for the environmental movement. I think the ability to see for yourself what’s going on, especially in this age of misinformation, where people don’t know what to believe… the ability to show people with their own eyes what’s going on, I think is more important than ever.”

She also knows from her years in advocacy that having access to technical resources and expertise is challenging for nonprofits, especially small ones. “To provide this to groups in a way that’s technically sophisticated, but they can use it, is a real service,” she says. And SkyTruth has had significant impact on key issues, she notes, particularly given its small size. “To the extent that I can help, I want to do that. And I love that they are based in West Virginia and Shepherdstown – it’s a cool part of SkyTruth’s story.”

But a professional life of activism involves a lot of conflict, Mary Anne acknowledges. To balance it out, she and her husband Than Hitt, a stream ecologist, sing and play guitar at local fundraisers and other community events. Than is a 10th generation West Virginian and they live in Shepherdstown, where SkyTruth is based. The local singing is all for fun she says.

“It’s a way to connect with people you wouldn’t otherwise… And having a creative outlet helps keep me whole.” With activism, “you’re living in your head a lot. Music is in your heart. We all need that.”