New Data Available on the Footprint of Surface Mining in Central Appalachia

The area of Central Appalachia impacted by surface mining has increased — by an amount equal to the size of Liechtenstein — despite a decline in coal production.

SkyTruth is releasing an update for our Central Appalachian Surface Mining data showing the extent of surface mining in Central Appalachia. While new areas continue to be mined, adding to the cumulative impact of mining on Appalachian ecosystems, the amount of land being actively mined has declined slightly.

This data builds on our work published last year in the journal PLOS One, in which we produced the first map to ever show the footprint of surface mining in this region. We designed the data to be updated annually. Today we are releasing the data for 2016, 2017, and 2018.

Mountaintop mine near Wise, Virginia. Copyright Alan Gignoux; Courtesy Appalachian Voices; 2014-2.

Coal production from surface mines, as reported to the US Energy Information Administration (EIA), has declined significantly for the Central Appalachian region since its peak in 2008. Likewise, the area of land being actively mined each year has steadily decreased since 2007. But because new land continues to be mined each year, the overall disturbance to Appalachian ecosystems has increased. From 2016 to 2018 the newly mined areas combined equaled 160 square kilometers – an area the size of the nation of Liechtenstein. One of the key findings of our research published in PLOS ONE was that the amount of land required to extract a single metric ton of coal had tripled from approximately 10 square meters in 1985 to nearly 30 square meters in 2015. Our update indicates that this trend still holds true for the 2016-2018 period: Despite the overall decrease in production, in 2016 approximately 40 square meters of land were disturbed per metric ton of coal produced – an all time high. This suggests that it is getting harder and harder for companies to access the remaining coal.

Active mine area (blue) and reported surface coal mine production in Central Appalachia (red) as provided by the US Energy Information Administration (EIA). The amount of coal produced has declined much more dramatically than the area of active mining.

This graph shows the disturbance trend for surface coal mining in Central Appalachia. Disturbance is calculated by dividing the area of actively mined land by the reported coal production for Central Appalachia as provided by the EIA.

Tracking the expansion of these mines is only half the battle. We are also developing landscape metrics to assess the true impact of mining on Appalachian communities and ecosystems. We are working to generate a spectral fingerprint for each identified mining area using satellite imagery. This fingerprint will outline the characteristics of each site; including the amount of bare ground present and information about vegetation regrowing on the site. In this way we will track changes and measure recovery by comparing the sites over time to a healthy Appalachian forest.

Mining activity Southwest of Charleston, WV. Land that was mined prior to 2016 is visible in yellow, and land converted to new mining activity between 2016 and 2018 is displayed in red.

Recovery matters. Under federal law, mine operators are required to post bonds for site reclamation in order “to ensure that the regulatory authority has sufficient funds to reclaim the site in the case the permittee fails to complete the approved reclamation plan.” In other words, mining companies set aside money in bonds to make sure that funds are available to recover their sites for other uses once mining ends. If state inspectors determine that mine sites are recovered adequately, then mining companies recover their bonds.

But the regulations are opaque and poorly defined; most states set their own requirements for bond release and requirements vary depending on the state, the inspector, and local landscapes. And as demand for coal steadily declines, coal companies are facing increasing financial stress, even bankruptcy. This underlines the importance of effective bonding that actually protects the public from haphazardly abandoned mining operations that may be unsafe, or unusable for other purposes.

We are now working to track the recovery of every surface coal mine in Central Appalachia. By comparing these sites to healthy Appalachian forests we will be able to grade recovery. This will allow us to examine how fully these sites have recovered, determine to what degree there is consistency in what qualifies for bond-release, and to what extent the conditions match a true Appalachian forest.

What About the Oceans? Mapping Offshore Infrastructure

Mapping stationary structures in the ocean helps us track fishing vessels and monitor pollution more effectively.

We’re all accustomed to seeing maps of the terrestrial spaces we occupy. We expect to see cities, roads and more well labeled, whether in an atlas on our coffee table or Google Maps on our smartphone. SkyTruthers even expect to access information about where coal mines are located or where forests are experiencing regrowth. We can now see incredibly detailed satellite imagery of our planet. Try looking for your house in Google Earth. Can you see your car in the driveway?

In comparison, our oceans are much more mysterious places. Over seventy percent of our planet is ocean, yet vast areas are described with only a handful of labels: the Pacific Ocean, Coral Sea, Strait of Hormuz, or Chukchi Sea for example. And while we do have imagery of our oceans, its resolution decreases drastically the farther out from shore you look. It can be easy to forget that humans have a permanent and substantial footprint across the waters of our planet. At SkyTruth, we’re working to change that.

Former SkyTruth senior intern Brian Wong and I are working to create a dataset of offshore infrastructure to help SkyTruth and others more effectively monitor our oceans. If we know where oil platforms, aquaculture facilities, wind farms and more are located, we can keep an eye on them more easily. As technological improvements fuel the growth of the ocean economy, allowing industry to extract resources far out at sea, this dataset will become increasingly valuable. It can help researchers examine the effects of humanity’s expanding presence in marine spaces, and allow activists, the media, and other watchdogs to hold industry accountable for activities taking place beyond the horizon.

What We’re Doing

Brian is now an employee at the Marine Geospatial Ecology Lab (MGEL) at Duke University. But nearly two years ago, at a Global Fishing Watch research workshop in Oakland, he and I discussed the feasibility of creating an algorithm that could identify vessel locations using Synthetic Aperture Radar (SAR) imagery. It was something I’d been working on on-and-off for a few weeks, and the approach seemed fairly simple.

Image 1. SkyTruth and Global Fishing Watch team members meet for a brainstorming session at the Global Fishing Watch Research Workshop, September 2017. Photo credit: David Kroodsma, Global Fishing Watch.

Readers who have been following SkyTruth’s work are probably used to seeing SAR images from the European Space Agency’s Sentinel-1 satellites in our posts. They are our go-to tools for monitoring marine pollution events, thanks to SAR’s ability to pierce clouds and provide high contrast between slicks and sea water. SAR imagery provides data about the relative roughness of surfaces. With radar imagery, the satellite sends pulses to the earth’s surface. Flat surfaces, like calm water (or oil slicks), reflect less of this data back to the satellite sensor than vessels or structures do, and appear dark. Vessels and infrastructure appear bright in SAR imagery because they experience a double-bounce effect. This means that — because such structures are three-dimensional — they typically reflect back to the satellite more than once as the radar pulse bounces off multiple surfaces. If you’re interested in reading more about how to interpret SAR imagery this tutorial is an excellent starting point.

Image 2. The long, dark line bisecting this image is a likely bilge dump from a vessel captured by Sentinel-1 on July 2, 2019. The bright point at its end is the suspected source. Read more here.

Image 3. The bright area located in the center of this Sentinel-1 image is Neft Daşları, a massive collection of offshore oil platforms and related infrastructure in the Caspian Sea.

Given the high contrast between water and the bright areas that correspond to land, vessels, and structures (see the vessel at the end of the slick in Image 2 and Neft Daşları in Image 3), we thought that if we could mask out the land, picking out the bright spots should be relatively straightforward. But in order to determine which points were vessels, we first needed to identify the location of all the world’s stationary offshore infrastructure, since it is virtually impossible to differentiate structures from vessels when looking at a single SAR image. Our simple task was turning out to be not so simple.

While the United States has publicly available data detailing the locations of offshore oil platforms (see Image 4), this is not the case for other countries around the world. Even when data is available, it is often hosted across multiple webpages, hidden behind paywalls, or provided in formats which are not broadly accessible or useable. To our knowledge, no one has ever published a comprehensive, global dataset of offshore infrastructure that is publicly available (or affordable).

Image 4. Two versions of a single Sentinel-1 image collected over the Gulf of Mexico, in which both oil platforms and vessels are visible. On the left, an unlabelled version which illustrates how similar infrastructure and vessels appear. On the right, oil platforms have been identified using the BOEM Platform dataset.

As we began to explore the potential of SAR imagery for automated vessel and infrastructure detection, we quickly realized that methods existed to create the data we desired. The Constant False Alarm Rate algorithm has been used to detect vessels in SAR imagery since at least 1988, but thanks to Google Earth Engine we are able to scale up the analysis and run it across every Sentinel-1 scene collected to date (something which simply would not have been possible even 10 years ago). To apply the algorithm to our dataset, we, among other things, had to mask out the land, and then set the threshold level of brightness that indicated the presence of a structure or vessel. Both structures and vessels will have high levels of reflectance. So we then had to separate the stationary structures from vessels. We did this by compiling a composite of all images for the year 2017. Infrastructure remains stationary throughout the year, while vessels move. This allowed us to clearly identify the infrastructure.

Image 5. An early version of our workflow for processing radar imagery to identify vessel locations. While the project shifted to focus on infrastructure detection first, many of the processing steps remained.

Where We Are Now

Our next step in creating the infrastructure dataset was testing the approach in areas where infrastructure locations were known. We tested the algorithm’s ability to detect oil platforms in the Gulf of Mexico, where the US Bureau of Ocean Energy Management (BOEM) maintains a dataset. We also tested the algorithm’s ability to identify wind turbines. We used a wind farm boundary dataset provided by the United Kingdom Hydrographic Office to validate our dataset, as well as information about offshore wind farms in Chinese waters verified in media reports, with their latitude and longitude available on Wikipedia.

Image 6. Wind farms in the Irish Sea, west of Liverpool.

Our results in these test areas have been very promising, with an overall accuracy of 96.1%. The methodology and data have been published by the journal Remote Sensing of Environment. Moving beyond these areas, we are continuing to work with our colleagues at MGEL to develop a full global dataset. What started as a project to identify vessels for GFW has turned into an entirely different, yet complementary, project identifying offshore infrastructure around the world.

Image 7. This animated map shows the output of our offshore infrastructure detection algorithm results (red) compared to the publicly available BOEM Platform dataset (yellow).

In addition to helping our partners at Global Fishing Watch identify fishing vessels, mapping the world’s offshore infrastructure will help SkyTruth more effectively target our daily oil pollution monitoring work on areas throughout the ocean that are at high risk for pollution events from oil and gas drilling and shipping (such as bilge dumping). This is also the first step towards one of SkyTruth’s major multi-year goals: automating the detection of marine oil pollution, so we can create and publish a global map of offshore pollution events, updated on a routine basis.

Be sure to keep an eye out for more updates, as we will be publishing the full datasets once we complete the publication cycles.

New Writer–Editor Amy Mathews Joins SkyTruth Team

Telling SkyTruth’s stories

SkyTruth is both an intensely local and vibrantly global organization. Based in Shepherdstown, West Virginia, many of our highly talented staff are long-time residents (and some were even born and raised here). That makes our work on Appalachian issues such as mountaintop mining and fracking personal − it’s happening in our backyard, typically with little oversight from government agencies. But confronting global environmental challenges sometimes means reaching beyond local borders and finding the right people to take on a task that no one else has tackled before. And so SkyTruth’s family of staff and consultants includes programmers and others from around the world, plus top notch research partners at universities and other institutions.

The SkyTruth team in Shepherdstown, West Virginia. Photo by Hali Taylor.

As SkyTruth’s new Writer–Editor, I plan to bring you their stories in coming months, to add to the remarkable findings and tools the staff regularly shares through this blog. We’ll learn more about the people whose passion propels our cutting-edge work. And we’ll learn more about all of you – the SkyTruthers who use these tools and information to make a difference in the world. We’ll share your impact stories: That is, how you’ve made a difference in your neighborhood, state, nation or the world at large.

To start, I’ll share a little bit about myself. As a kid, stories hooked me on conservation. I used to watch every National Geographic special I could find and never missed an episode of Wild Kingdom (remember that?). My fascination with all things wild led me to major in wildlife biology at Cornell University. But I quickly realized that I wasn’t a scientist at heart − I was more interested in saving creatures than studying them. I spent spring semester of my junior year in Washington, D.C. and shifted my focus to environmental policy. That decision led to dual graduate degrees in environmental science and public affairs at Indiana University and a long career in environmental policy analysis, program evaluation, and advocacy in Washington.

Urban life and policy gridlock eventually pushed me to Shepherdstown, where nature was closer at hand. I became involved in Shepherdstown’s American Conservation Film Festival, which reignited my passion for storytelling and the inspiration it can trigger. And so, after years of working and consulting for the federal government, conservation groups and charitable foundations, I returned to my conservation roots. I completed my M.A. in nonfiction writing at Johns Hopkins University in May 2013 and left my policy work behind.

Radio-collared Mexican wolf. Photo courtesy of U.S. Fish & Wildlife Service.

Since then, my writing has appeared in publications such as The Washington Post, Pacific Standard, Scientific American, High Country News, Wonderful West Virginia and other outlets. In fact, my 2018 story on the endangered Mexican wolf for Earth Touch News recently won a Genesis Award from the Humane Society of the United States for Outstanding Online News. I was thrilled to be able to observe a family of wolves as part of my reporting for that story, and I always welcome new opportunities to go out in the field and learn about the important work conservationists are doing.

During my time as a freelance journalist, I also led workshops for the nonprofit science communication organization COMPASS, teaching environmental (and other) scientists how to communicate their work more effectively to journalists, policymakers, and others.

There’s one more thing I’d like to share: Although my official role at SkyTruth as Writer–Editor is new, I’ve known SkyTruth since its very beginning. I still remember the day SkyTruth founder John Amos and I sat down at our dining room table and he told me his vision for a new nonprofit. His goal was to level the playing field by giving those trying to protect the planet the same satellite tools used by industries exploiting the planet. John is my husband, and SkyTruth’s journey has been exciting, frightening, gratifying, and sometimes frustrating, with many highs and the occasional low. But it has never been boring.

I’m looking forward to sharing SkyTruth stories with all of you, making sure they move beyond the dining room table to your homes and offices, inspiring you, your colleagues, your friends and families to make the most of what SkyTruth has to offer. Feel free to reach out to me at info@skytruth.org if you’d like to share how you’ve used SkyTruth tools and materials. Just include my name in the subject line and the words “impact story.” Let’s talk!

Note: Portions of this text first appeared on the website amymathewsamos.com.

Using machine learning to map the footprint of fracking in central Appalachia

Fossil fuel production has left a lasting imprint on the landscapes and communities of central and northern Appalachia.  Mountaintop mining operations, pipeline right-of-ways, oil and gas well pads, and hydraulic fracturing wastewater retention ponds dot the landscapes of West Virginia and Pennsylvania.  And although advocacy groups have made progress pressuring regulated industries and state agencies for greater transparency, many communities in central and northern Appalachia are unaware of, or unclear about, the extent of human health risks that they face from exposure to these facilities.  

A key challenge is the discrepancy that often exists between what is on paper and what is on the landscape.  It takes time, money, and staff (three rarities for state agencies always under pressure to do more with less) to map energy infrastructure, and to keep those records updated and accessible for the public.  But with advancements in deep learning, and with the increasing amount of satellite imagery available from governments and commercial providers, it might be possible to track the expansion of energy infrastructure—as well as the public health risks that accompany it—in near real-time.

Figure 1.  Oil and gas well pad locations, 2005 – 2015.

Mapping the footprint of oil and gas drilling, especially unconventional drilling or “fracking,” is a critical piece of SkyTruth’s work.  Since 2013, we’ve conducted collaborative image analysis projects called “FrackFinder” to fill the gaps in publicly available information about the location of fracking operations in the Marcellus and Utica Shale.  In the past, we relied on several hundred volunteers to identify and map oil and gas well pads throughout Ohio, Pennsylvania, and West Virginia.  But we’ve been working on a new approach: automating the detection of oil and gas well pads with machine learning.  Rather than train several hundred volunteers to identify well pads in satellite imagery, we developed a machine learning model that could be deployed across thousands of computers simultaneously.  Machine learning is at the heart of today’s companies. It’s the technology that enables Netflix to recommend new shows that you might like, or that allows digital assistants like Google, Siri, or Alexa to understand requests like, “Hey Google, text Mom I’ll be there in 20 minutes.”

Examples are at the core of machine learning.  Rather than try to “hard code” all of the characteristics that define a modern well pad (they are generally square, generally gravel, and generally littered with industrial equipment), we teach computers what they look like by using examples.  Lots of examples. Like, thousands or even millions of them, if we can find them. It’s just like with humans: the more examples of something that you see, the easier it is to recognize that thing later. So, where did we get a few thousand images of well pads in Pennsylvania?  

We started with SkyTruth’s Pennsylvania oil and gas well pad dataset. The dataset contains well pad locations identified in National Agriculture Imagery Program (NAIP) aerial imagery from 2005, 2008, 2010, 2013, and 2015 (Figure 1).  We uploaded this dataset to Google Earth Engine, and used it to create a collection of 10,000 aerial images in two classes: “well pad” and “non-well pad.” We created the training images by buffering each well pad by 100 meters, clipping the NAIP imagery to the bounding box, and exporting each image.

 

The images above show three training examples from our “well pad” class. The images below show three training examples taken from our “non-well pad” class.

 

We divided the dataset into three subsets: a training set with 4,000 images of each class, a validation set with 500 images of each class, and a test set with 500 images of each class.  We combined this work in Google Earth Engine with Google’s powerful TensorFlow deep learning library.  We used our 8,000 training images (4,000 from each class, remember) and TensorFlow’s high-level Keras API to train our machine learning model.  So what, exactly, does that mean? Well, basically, it means that we showed the model thousands and thousands of examples of what well pads are (i.e., images from our “well pad” class) and what well pads aren’t (i.e., images from our “non-well pad” class).  We trained the model for twenty epochs, meaning that we showed the model the entire training set (8,000 images, remember) twenty times.  So, basically, the model saw 160,000 examples, and over time, it “learned” what well pads look like.

Our best model run returned an accuracy of 84%, precision and recall measures of 87% and 81%, respectively, and a false positive rate and false negative rate of 0.116 and 0.193, respectively.  We’ve been pleased with our initial model runs, but there is plenty of room for improvement. We started with the VGG16 model architecture that comes prepackaged with Keras (Simonyan and Zisserman 2014, Chollet 2018).  The VGG16 model architecture is no longer state-of-the-art, but it is easy to understand, and it was a great place to begin.  

After training, we ran the model on a few NAIP images to compare its performance against well pads collected by SkyTruth volunteers for our 2015 Pennsylvania FrackFinder project.  Figures 4 and 6 depict the model’s performance on two NAIP images near Williamsport, PA. White bounding boxes indicate landscape features that the model predicted to be well pads.  Figures 5 and 7 depict those same images with well pads (shown in red) delineated by SkyTruth volunteers.

Figure 4.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 5.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.
Figure 6.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 7.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.

One of the first things that stood out to us was that our model is overly sensitive to strong linear features.  In nearly every training example, there is a clearly-defined access road that connects to the well pad. As a result, the model regularly classified large patches of cleared land or isolated developments (e.g., warehouses) at the end of a linear feature as a well pad.  Another major weakness is that our model is also overly sensitive to active well pads.  Active well pads tend to be large, gravel squares with clearly defined edges. Although these well pads may be the biggest concern, there are many “reclaimed” and abandoned well pads that lack such clearly defined edges.  Regrettably, our model is overfit to highly-visible active wells pads, and it performs poorly on lower-visibility drilling sites that have lost their square shape or that have been revegetated by grasses.

Nevertheless, we think this is a good start.  Despite a number of false detections, our model was able to detect all of the well pads previously identified by volunteers in images 5 and 7 above.  In several instances, false detections consisted of energy infrastructure that, although not active well pads, remain of high interest to environmental and public health advocates as well as state regulators: abandoned well pads, wastewater impoundments, and recent land clearings.  NAIP imagery is only collected every two or three years, depending on funding. So, tracking the expansion of oil and gas drilling activities in near real-time will require access to a high resolution, near real-time imagery stream (like Planet, for instance).  For now, we’re experimenting with more current model architectures and with reconfiguring the model for semantic segmentation — extracting polygons that delineate the boundaries of well pads which can be analyzed in mapping software by researchers and our partners working on the ground.

Keep checking back for updates.  We’ll be posting the training data that we created, along with our initial models, as soon as we can.