Alice Foster’s Internship Triggered New Excitement About Her Career Possibilities

Before her internship, Alice felt burnt out at school. After applying new skills and technologies to environmental projects at SkyTruth, she’s looking forward to her remaining classes and a fulfilling career.

As I wrap up my four-month internship at SkyTruth, I would like to share some highlights and takeaways from my experience. During my internship I explored the field of geospatial technology for the first time, which allowed me to learn new skills and gave me insight into my career goals. I learned about global environmental issues that I hadn’t known existed. And I got to work with a kind, dedicated, creative group of people. I contributed to SkyTruth’s mountaintop mining research and Project Inambari, which will create an early alert system for tropical forest mining. I also spent time identifying oil and gas well pads, collecting images of oil slicks, and creating annotated maps in QGIS, a geographic information system application that can be used to analyze and visualize geospatial data such as satellite imagery or a ship’s track across the ocean.

On just my first day of orientation at SkyTruth, the high level of support and guidance I received from the staff surprised me. My advisers Brendan Jarrell and Christian Thomas spent lots of time introducing me to concepts and technologies (like Google Earth Engine and QGIS) that I would use in my work. One of the first skills I learned was recognizing oil slicks on satellite imagery — most likely from vessels dumping oily bilge water at sea — and creating an annotated map to reveal the slicks to the public. Brendan patiently guided me through the steps to making a map twice. The team congratulated me when I found my first slick, even though I did not think it merited attention. This encouragement made me feel welcomed and excited about my work. 

The search for oil slicks allowed me to virtually explore oceans and coastlines across the globe. With time, it revealed to me more than how to use geospatial technology, but how little geography I knew. I would toggle past a country or island and wonder what it was like there, realizing I did not even know its name. And so I started exploring a geography trivia website in my free time to teach myself the countries of the world. I am now learning capital cities in Europe, which I tend to forget.

After getting practice with Google Earth Engine — a tool for analyzing and mapping satellite imagery and change around the world —  during my first couple of weeks at SkyTruth, I became involved in some mining-related projects. In one project, I adapted code from SkyTruth’s mountaintop mining research to incorporate satellite imagery from the European Space Agency’s Sentinel-2 satellite. This imagery provides us additional data, which could improve our ability to detect surface mining throughout Central Appalachia. Working with the code in Earth Engine allowed me to better understand SkyTruth’s process for identifying mines. First, we produce a greenest pixel composite image from a collection of images. Making a composite in Earth Engine means combining multiple overlapping images to create a single image. Images can be combined in different ways; in this case, the greenest pixel composite selects pixels with the highest Normalized Difference Vegetation Index (NDVI) values compared with corresponding pixels in the image collection. NDVI is an indicator of plant health in a given area. To provide a more concrete example, suppose we want to make a greenest pixel composite from three images, all showing a part of West Virginia at different times of summer. Say we look at one pixel in one of the images, which covers a small square of forest. We then compare this pixel with the pixels covering the same bit of forest in the other two images, and we choose the greenest of the three (or, the pixel with the highest NDVI value). If we repeat this process for every pixel in the image, we get one image with all the greenest pixels selected from the collection. 

A second script uses the greenest pixel composite to approximate the lowest NDVI value for each county, producing a threshold image. Again, say we have the greenest pixel composite of West Virginia that we just made. Now we look at forested areas within one county and find the pixels that are least green, or have the lowest NDVI values, and then take the average of these NDVI values. This is the threshold for that county; if a pixel is less green than the threshold, it is likely a mine. Our output image contains these values for every county. As a final step, we compare the greenest pixels with the NDVI thresholds to determine likely mine areas. 

Figure 1. Mining data overlying a Sentinel-2 greenest composite image. The image covers counties in West Virginia, Virginia, and Kentucky.

SkyTruth’s surface mining expert, Christian Thomas, also had me experiment with two different techniques for masking clouds in Sentinel-2 imagery. Clouds obstruct necessary data in images, so clearing them out improves analyses. The standard approach uses a built-in “cloud mask” band. The other approach is an adapted “FMasking” method. This takes advantage of the  arrangement of sensors on Sentinel-2 satellites, which creates a displacement effect in the imagery that is more pronounced for objects at altitude. The FMask uses this effect to distinguish low altitude clouds from human-made infrastructure on land. Though the two methods had similar results, the FMask seemed slightly more accurate.

Working on technical projects like this, I learned how much I enjoy using imagery and geospatial data. I had found analyzing data interesting in the past, but something about being able to visualize the information on a map was even more appealing. I loved how a satellite image could be reduced to numbers and assessed quantitatively, or understood visually, almost as a piece of art. 

In another project, I had the opportunity to develop my writing skills by contributing to an  application for the Artisanal Mining Grand Challenge, a global competition to provide solutions for small-scale, low-tech, and/or informal mining. Researching artisanal gold mining was illuminating, as I knew almost nothing about the subject beforehand. I learned that illegal gold mining in Venezuela and Peru has often involved brutal violence and exploitation. In recent decades, labor and sex trafficking have plagued remote mining regions like Madre de Dios. Small-scale mining practices are also particularly damaging to the biodiverse Amazon ecosystem. To extract a small amount of gold, miners must dig up massive amounts of sediment, denuding the landscape in the process. The use of mercury in artisanal gold mining is incredibly detrimental to water quality and human health.

I was also able to be involved in the technical side of this project, building a tool to detect mines in the Peruvian Amazon. I created a mask that removes water from satellite images so that water areas could not be mistaken for mine areas or vice versa. Mines are often near water or can look like water in imagery. To make the mask, I used the European Commission’s Joint Research Centre global surface water dataset. This dataset contains information about where and when surface water occurred around the world over the past thirty years. In Google Earth Engine, the data is stored in an image with bands representing different measures of surface water. I used the “occurrence,” “seasonality,” and “recurrence” bands to create the mask. “Occurrence” refers to how often water was present at a location; “seasonality” means the number of months during which water was present; and “recurrence” is the frequency with which water returned from one year to the next. I tried to find a combination of band values that would do the best job getting rid of water without masking mines or forest. For example, using an occurrence value of twenty, (that is, masking pixels where water was present twenty percent or more of the time), ended up masking mine areas as well. Christian also suggested using a buffer, which meant that pixels adjacent to a masked pixel also got masked. Since the mask often did not capture all of the pixels in a body of water, the buffer filled in the gaps. Masked pixels dotting a river became a continuous thread. The buffer also helped eliminate river banks, which look similar to mines. We applied the finished water mask to the area of interest in Madre de Dios, Peru.

Figure 2: Water mask in the Madre de Dios region of Peru. White pixels have value 1, while black pixels (water) have value 0. When the mask is applied to a satellite image, all pixels in the black areas appear transparent and are not included in analyses. When identifying potential mines in the image, the masked areas are ignored.

Researching issues related to artisanal gold mining left me unsure of how countermeasures can fully promote the welfare of mine workers and others involved in the long term. The problem of illegal gold mining seems entrenched in broader economic and social issues and therefore cannot be addressed simply by identifying and eradicating mines. Nevertheless, understanding the great damage that this type of mining can do to humans and their environment made clear to me the importance of the project. 

Not only did working at SkyTruth teach me a variety of technical and professional skills, it also helped reveal to me what I want to learn about and pursue in the future. In school last fall, I felt burnt out to the point that I just wanted to get through my remaining semesters and be done. Now I feel the excitement about academics I had as a freshman, motivated and informed by my experience at SkyTruth. With my interest in geology and climate issues renewed, I feel like there is barely enough time left to take all the classes I want to. I hope to improve on skills like writing and computer programming so that I can contribute my best work in the future. Being part of an amazing team has motivated me in that way. I also know that I would like to use the geospatial technologies and approaches I learned at SkyTruth moving forward. I feel excited about future career possibilities; before my internship, I felt confused.

I want to give a huge thank you to Bruce and Carolyn Thomas for hosting me in Shepherdstown. I want to thank Christian for introducing me to SkyTruth and for including me in his Dungeons and Dragons game! And I want to thank everyone on the SkyTruth team for their guidance and for being wonderful.

Figure 3: Team Hike, Harpers Ferry, West Virginia. Photo by Amy Emert.

Right-sizing Our Data Pipeline to Detect Polluters

How does SkyTruth’s new project Cerulean reduce the time and cost of processing enormous volumes of satellite information?

Project Cerulean is SkyTruth’s systematic endeavor to curb the widespread practice of bilge dumping, in which moving vessels empty their oily wastewater directly into the ocean. (We recently highlighted the scope and scale of the problem in this blog series) Our goal is to stop oil pollution at sea, and this particular project aims to do that by automating what has historically been a laborious manual process for our team. SkyTruthers have spent days scrolling through satellite radar imagery looking to identify the telltale black streaks of an oil slick stretching for dozens of kilometers on the sea’s surface. We are finally able to do this automatically thanks to recent developments in the field of machine learning (ML). Machine learning “teaches” computers to identify certain traits, such as bilge slicks. To do so efficiently, however, requires a lot of computation that costs both time and money. In this article, we explore a few tricks that allow us to reduce that load without sacrificing accuracy.

The sheer volume of data being collected by the thousands of satellites currently in orbit can be overwhelming, and over time, those numbers will only continue to increase. However, SkyTruth’s bilge dump detection relies primarily on one particular pair of satellites called Sentinel-1, whose data are made available to the public by the European Space Agency’s Copernicus program. Sentinel-1 satellites beam radar energy down at the surface and gather the signal that bounces back up, compiling that information into birds-eye images of Earth. To get a sense of how much data is being collected, Figure 1 shows a composite of all the footprints of these images from a single day of collecting. If you spent 60 seconds looking at each image, it would take you 21 hours to comb through them all. Fortunately, the repetitive nature of the task makes it ripe for automation via machine learning.

Figure 1. One day’s worth of radar imagery collected by Sentinel-1 satellites. Each blue quadrilateral represents the location of a single image. You can see the diagonal swath that the satellites cut across the equator. (Note the scenes near the poles are distorted on this map because it uses the Mercator projection.) Image compiled by Jona Raphael, SkyTruth.

Figure 2 illustrates a typical satellite radar image: just one of those blue polygons above. These images are so big — this one captures 50,000 square kilometers (more than 19,000 square miles) — that it can be tough to spot the thin oil slicks that appear as a slightly blacker patch on the dark sea surface. In this image, if you look closely, you can see a slick just south of the bright white landmass. (It’s a bit clearer in the zoomed in detail that follows.)

Figure 2. Satellite imagery, from January 1, 2019, capturing a difficult to see bilge dump just south of the tip of Papua New Guinea (Copernicus Sentinel data 2019).

Contrary to intuition, you are not seeing any clouds in this picture. Radar consists of electromagnetic radiation with very long wavelengths, so it travels from satellites through the atmosphere largely undisturbed until it hits a surface on the Earth. Once it hits, it is either absorbed by the surface or reflected back into space (much like bats using echolocation). The lightest section of the image in the top right corner is part of a mountainous island. This jagged terrain scatters the radar diffusely in all directions and reflects much of it back to the satellite. The rest of the image is much darker, and shows us the ocean where less of the radar energy bounces back to the satellite receiver. The muddled, medium-gray area along the bottom of the image shows us where strong gusty winds blowing across the ocean surface have made the water choppy and less mirror-like. Figure 3 shows us more clearly the oil slick just offshore. 

 

Figure 3: Detail from Sentinel-1 radar image shown above. The narrow oil slick identified by a dark gray streak along the bottom of this image is roughly 60 kilometers (40 miles) long, and only 15 kilometers (roughly nine miles) offshore. (Contains modified Copernicus Sentinel data 2019).

Although each image covers a large area, we need to process many images each day to monitor the entire Earth. How many? Here’s an approximation:

  • 510,000,000 square kilometers  = The total surface of the earth
  • 90,000,000 square kilometers  = Total area of images captured in one day (1,300 scenes)

This means that we expect the whole Earth to be imaged somewhere between every six to 12 days (because many of the images overlap each other), or roughly 10,000 images. 

Every year more satellite constellations are being launched, so if a new constellation were to capture the whole earth’s surface in a single day, then we would need to spend an order of magnitude more processing time to ingest it. We care about this because each image scanned will cost time, money, and computational power. To enable appropriate allocation of resources for automation, it’s critical to understand the scale of the data. For now, we can size our processing pipeline to the current number, but we must take measures to ensure the system is scalable for the increasing numbers of satellite images we anticipate in coming years.

So does that mean we need to look at 1,300 images every day? Thankfully not. We have a few techniques that we’ll use to make the computations manageable:

  • First off, we’ve found that radar satellite images near the poles are typically captured using a particular configuration called HH polarization — great for mapping sea ice, but not ideal for detecting oil slicks. And the presence of the sea ice itself makes oil slick detection difficult. If we remove those images from the 1,300, we have about 880 that suit our needs (using VV polarization).
  • Next, we won’t be looking for oil slicks on land, so we can further filter out any images that don’t have at least some ocean in them. That reduces our set of images to about 500. (Note: Sentinel-1 coverage of the open ocean is generally poor, as discussed in our previous blog post, but we anticipate future radar constellations will fill that gap.)
  • Those 500 images represent roughly 25,000,000 square kilometers of area, but we can reduce that even further by finding images on the shoreline, and eliminating any pixels in those images we know that belong to land. That drops the total area by another quarter to 15,000,000 square kilometers.

So at this point, we’ve reduced our load to about 17% of the data that our satellite source is actively sending each day. Figure 4 illustrates the effect that these filters have on our data load:

Figure 4. [Gallery] Filtering images. a) All Sentinel-1 images in one day. b) Filtered by polarization. c) Filtered by intersection with the ocean. Images compiled by Jona Raphael, SkyTruth.

Can we do better? Let’s hope so — remember that each of those 500 images is almost a gigabyte of data. That’s like filling up the memory of a brand new Apple laptop every day: an expensive proposition. Here are some ways that we make it easier to process that much information:

  • First, we don’t need to store all that data at SkyTruth. Just like you don’t need to store all of Wikipedia just to read one article, it is more efficient to load an article one at a time, read it, and then close the window to release it from memory before opening another one. That means if we’re clever, we’ll never have to load all 500 images at once.
  • Each image is originally created as 32-bit, but we can easily convert it to 16-bit, thereby halving the data size. You can think of the bit-depth as the number of digits after the decimal place: Instead of storing the value ‘1.2345678’, we would store ‘1.235’, which is almost the same value, but takes a lot less effort to store in active memory.
  • We can further reduce the amount of required memory by reducing the resolution of the image. We find it works just fine to reduce each dimension (height and width) by a factor of eight each, which means the number of pixels actually decreases by a factor of 64. This has the effect of averaging 64 small pixels into one large one.
  • Finally, we don’t need to process the whole image at once. Just like it was easier for you to spot the oil slick when you were looking at a smaller zoomed in portion of the satellite image above, we can divide up each picture into 32 smaller square ‘chips’.

Taken together, we can now work on single chips that are only 250 kilobytes  — instead of 1 gigabyte images — effectively a 99.975% reduction in memory load. That translates directly to speeding up the two core parts of machine learning: training the model and making predictions. Because training an ML model requires performing complex mathematics on thousands of examples, this speed up represents the difference between training for 10 minutes versus 20 hours.

But, wait! What about the step in which we threw away data by averaging 64 pixels into one large pixel? Is that the same as losing 63 pieces of information and saving only one? It is important to address this idea in the context of machine learning and what a machine is capable of learning. It hinges on the difference between information and data. Generally speaking, we want resolution to be as low as possible, while retaining the critical information in the image. That’s because the lower the resolution, the faster our model can be trained and make predictions. To figure out how much resolution is necessary, a convenient rule of thumb is to ask whether a human expert could still accurately make a proper assessment at the lower resolution. Let’s take a quick look at an example. Each of the following images in Figure 5 has one-fourth as many pixels as the previous:

Figure 5: [Gallery]  Reducing resolution by factor of 4: 512×512, 256×256, 128×128, 64×64, 32×32. (Contains modified Copernicus Sentinel data 2019).

If your objective is to find all the pixels that represent oil, you could easily do so in the first two images (zooming in is allowed). The next two are a bit harder, and the final one is impossible. Note that the fourth image is a factor of 64, or 98.5%, smaller than the first, but the oil is still visible. If we think critically about it, we realize that the number of pixels in a picture is irrelevant — what actually matters is that the pixels must be much smaller than the object you are identifying. In the first image, the oil slick is many pixels wide and so it has sufficient information for an ML model to learn. However, in the last image the pixels are so large that the tiny slick all but disappears in the averaging. If, instead, your objective is to find all the pixels that represent land, then the final image (32×32) is more than sufficient and could probably be reduced even further.

However, reducing resolution also comes with downsides. For instance, the reduction in visual information means there are fewer context clues, and therefore less certainty, about which pixels are oil and which are not. This results in more pixels being labeled with the wrong prediction, and a corresponding reduction in accuracy. Furthermore, even if it were possible to perfectly label all of the pixels that are oil, the pixels are so big that it’s difficult to get a good sense for the exact shape, outline, or path of the oil itself, which results in lower quality estimates derived from this work (for instance, estimating the volume of oil spilled).

If only we could use the resolution of the first image somehow… 

Good news — we can! It turns out that the first image above, the 512×512, already had its resolution reduced from the original satellite image by a factor of 64. To give you a sense, Figure 6 shows the original resolution satellite image side by side with the version we use in training for our ML model. The second image has lower resolution, but because the pixels are still much smaller than the features we care about, we are able to avoid most of the downsides described above. The oil spill is still unambiguously identifiable, so we are willing to trade the resolution for gains in training and prediction. This is a subjective tradeoff that each machine learning project must make based on the resources available

Figure 6:  Original satellite image at full resolution (left), compared to reduced resolution used for ML prediction (right). (Contains modified Copernicus Sentinel data 2019).

So what are the takeaways from this exploration? We first learned that it takes a lot of imagery to monitor the surface of the Earth. However, by thoughtfully excluding imagery that doesn’t capture the information we care about, it’s possible to reduce the volume of data we need to process significantly. Then, we discussed a few different tricks that make it much faster to train and predict on that pared down dataset. In particular, the most important technique is to reduce the resolution as much as possible, while maintaining a pixel size that is substantially smaller than any features we are attempting to identify. All told, there is a lot of data that still needs to be processed, but if we can achieve a total process-time for each satellite image that is under three minutes, then the whole pipeline can be run on a single computer. Right now, we are down to about 10 minutes per image, and still have a few tricks up our sleeve. So we’re hopeful that in the coming months we’ll hit our target and have a robust and scalable mechanism for identifying oil slicks automatically. 

Until then, we continue to tip our hats to the SkyTruthers that regularly identify and expose these egregious acts, and we look forward to the day that the illegal practice of bilge dumping comes to an end.

SkyTruth Board Member Mary Anne Hitt: Activist Extraordinaire

Mary Anne Hitt has led Sierra Club’s Beyond Coal Campaign to extraordinary national success. But she honed her skills in Appalachia, with a little help from SkyTruth.

You might say Mary Anne Hitt has Appalachian activism in her blood. When she was growing up in Gatlinburg, Tennessee (where she attended Dolly Parton’s former high school), her father was Chief Scientist at Great Smoky Mountains National Park. Back then, acid rain was decimating high elevation forests in the East, fueled by pollution from coal-fired power plants. Her father watched as iconic places in the park turned into forests of skeleton trees. He knew the science pointed to nearby power plants run by the Tennessee Valley Authority, and wanted to stop the pollution. But his warnings triggered some resistance from those who didn’t want to rock the boat. “So right from the start,” says Mary Anne, she was “immersed in the beauty and the threats” of protecting Appalachian forests. And she knew the costs of speaking out.

Those costs have never stopped her. Mary Anne graduated from the University of Tennessee, creating her own environmental studies major and forming a student environmental group that continues today. Later, she obtained a graduate degree in advocacy at the University of Montana. Now, she leads the Sierra Club’s Beyond Coal Campaign; a national effort to retire all coal plants in the United States, moving towards 100% renewable energy by 2030, while supporting economic opportunities in communities affected by plant closures.

And she serves on SkyTruth’s board of directors. Her entre to SkyTruth is also steeped in Appalachian advocacy. In the early aughts, Mary Anne was Executive Director of Appalachian Voices, a nonprofit conservation group dedicated to fighting mountaintop mining, fracked-gas pipelines and other harmful activities in Appalachia, while advancing energy and economic alternatives that allow Appalachian communities to thrive. Appalachian Voices is one of SkyTruth’s conservation partners; a relationship that began under Mary Anne’s leadership.

As Mary Anne tells it, Appalachian Voices was fighting mountaintop mining and construction of a new coal plant in southwest Virginia. While fighting the plant, they discovered that 200 new power plants were planned across the country. In other words, a whole new generation of power plants was on the books to replace aging plants. A coalition of grassroots groups and local citizens, organized with help from the Sierra Club, worked to stop them, fighting permits at every stage, slowing the process down and making financial backers nervous.

Figure 1. Mary Anne Hitt

Appalachian Voices contacted SkyTruth to help them convey the vast extent of mountaintop mining in Appalachia as part of their work. In response, SkyTruth developed the first scientifically credible database on the extent of mountaintop mining in the region. (You can read more about this collaboration and what we found here.) SkyTruth continues to update this database every year, providing scientists and others valuable information that supports research on the ecological and human health effects of mountaintop mining.

SkyTruth’s database helped support the broader advocacy work Appalachian Voices was spearheading to fight coal mining and power plants in the region. Collectively, environmental, legal, and grassroots groups nationwide stopped almost all of the proposed power plants, according to Mary Anne. (Ironically, the one in southwest Virginia actually did get built.) “If these plants had been built it would have been doom for our climate,” Mary Anne says now. “There would have been no room for renewables…Grassroots people working in their communities made it happen. That’s what makes me most proud.”

Mary Anne took her successful experience fighting power plants in Appalachia and brought it to the Sierra Club as Deputy Director of the Beyond Coal Campaign in 2008, later becoming Director. The Sierra Club has built on those early lessons and applied them to shutting down all coal plants in the United States. Today, 312 of 530 plants that existed in 2010 have retired or announced their retirement. And according to Mary Anne, the United States reached a promising benchmark a year ago: last April marked the first time we obtained more energy from renewables than from coal. In fact, in 2019 the US consumed more power from renewable energy than from coal for the first time in 130 years. “Most of our arguments now are economic,” says Mary Anne. “The power from a coal plant is more expensive than renewable energy, so people don’t want it. People will keep demanding renewables.”

In April of this year, Mary Anne took on an even bigger responsibility at Sierra Club – the National Director of Campaigns, a new position in the organization where she oversees all the organization’s campaign work. It’s a big job, on top of being a mother to her ten-year old daughter. So why did she agree to join the SkyTruth Board? “Ever since my daughter was born,” says Mary Anne, “I had a policy of not being on any boards because I have a demanding job and serving on boards was more time away from her. But I really believe that SkyTruth’s work is foundational for the environmental movement. I think the ability to see for yourself what’s going on, especially in this age of misinformation, where people don’t know what to believe… the ability to show people with their own eyes what’s going on, I think is more important than ever.”

She also knows from her years in advocacy that having access to technical resources and expertise is challenging for nonprofits, especially small ones. “To provide this to groups in a way that’s technically sophisticated, but they can use it, is a real service,” she says. And SkyTruth has had significant impact on key issues, she notes, particularly given its small size. “To the extent that I can help, I want to do that. And I love that they are based in West Virginia and Shepherdstown – it’s a cool part of SkyTruth’s story.”

But a professional life of activism involves a lot of conflict, Mary Anne acknowledges. To balance it out, she and her husband Than Hitt, a stream ecologist, sing and play guitar at local fundraisers and other community events. Than is a 10th generation West Virginian and they live in Shepherdstown, where SkyTruth is based. The local singing is all for fun she says.

“It’s a way to connect with people you wouldn’t otherwise… And having a creative outlet helps keep me whole.” With activism, “you’re living in your head a lot. Music is in your heart. We all need that.”