Intern Jonathan Kvilhaug’s monitoring of mountaintop removal mining included discovering data gaps on the ground.
When I joined SkyTruth in January 2022, my spring internship coincided with my last semester of primarily virtual graduate school. During my program, I was interested in pursuing internship experiences that would supplement my study of Geography while also providing an opportunity to advance an organization’s mission. While I was already familiar with SkyTruth’s impressive portfolio of work, I later came to be most impressed by SkyTruth’s sense of community and support for one another. Without having left Washington D.C. – I felt welcomed into SkyTruth’s community of kind and diligent practitioners of conservation.
Early on, I had expressed interest in having my internship with SkyTruth serve as a culminating capstone experience for my program alongside other regular tasks associated with my internship. I was pleased to have several enthusiastic meetings to help me better navigate SkyTruth and determine where our interests would align. Ultimately, I became interested in a project to build on SkyTruth’s existing work monitoring for mountaintop removal mining (MTM) and to expand its surveillance of coal mining activities to include Pennsylvania and incorporate smaller surface mining operations.
Expansion of MTM Monitoring
Among its longest running and most robust areas of work, SkyTruth’s work to monitor surface mining operations initially began as a partnership with Appalachian Voices to survey MTM in Central Appalachia. In Central Appalachia, mining permits and other official supplemental datasets did not provide a full account of the intensity of mining activity and its ecological damage within the region, especially when mining activities were observed outside of permitted areas. Among its findings, in 2005 SkyTruth revealed an estimated 445,792 acres of new mining activities in the region over the prior 30 years; and in 2018 mapped a total of 1.5 million acres of mined areas within the region. As a result of these findings, SkyTruth created its Mountaintop Removal Mining Mapper, a dashboard that maintains the footprint of historic mining activities revealed from the 2018 investigation.
For my capstone project, the methodology deployed in Central Appalachia was adapted for Pennsylvania to develop a footprint of its own historic mining activities from 1985 – 2020 and assess the success of the model’s classification in a different geographic extent. The development of an annual mining footprint for Pennsylvania provides an opportunity to incorporate the state into SkyTruth’s existing MTM Monitoring dashboard as well as to provide valuable information for enforcement of the Surface Mining Control and Reclamation Act of 1977.
For my capstone project, the first step in expanding monitoring outside of Central Appalachia was to determine a study area. We began by identifying every U.S. county that had reported coal mining activities between 1985 and 2020. This information can be found in the Energy Information Administration’s (EIA) annual coal mining production dataset. After a subset of counties that reported coal mining was compiled, we had a list of states that would be incorporated into SkyTruth’s MTM workflow.
After we had our counties, we then removed places from our analysis where coal mining activities could not occur, including towns, roadways, and bodies of water. This helps us avoid misclassifying pavement, barren stream channels and riverbanks, and land-clearing for residential development as mining activity. For counties with coal mining production, areas of undeveloped land are the target for observing meaningful changes in land cover that are more likely to correspond with surface mining activities.
Distinct from underground coal mining operations, surface coal mining and MTM require the clearing of overlying forests. This results in a distinguishable pattern of land cover change that assists in the detection of mining. For my capstone project, we created a mask of non-mining areas in order to target surface coal mining and MTM in Pennsylvania. In my investigation, I primarily used the Normalized Difference Vegetation Index (NDVI) as a metric for detecting mining activities. NDVI is a ratio of spectral reflectance that is detected in satellite imagery that identifies the amount of vegetation (“greenness”) in an area. Changes in NDVI values across the remaining extent of undeveloped areas in Pennsylvania would indicate land cover change, including suspected mining activities in areas with very low NDVI values (no vegetation).
In order to capture land cover change across the state of Pennsylvania over time, the analysis of NDVI needed to be performed for each year from 1985 to 2020. For each year, all of the Landsat images available covering a given piece of Pennsylvania were analyzed to create one single image – an annual greenest-pixel composite – by keeping only the greenest pixels (having maximum NDVI value) from the annual stack of images. This helps minimize the natural variability that occurs from year to year due to changes in seasons and precipitation. After each of these annual greenest-pixel composites was created, they were stored in Google Earth Engine where comparisons and analyses were performed.
From these annual composites, we determined a county-specific mining threshold by calculating the lowest three percent of NDVI value in each Pennsylvania county. This unique county level threshold is used to define areas of low NDVI (bare earth) across the extent of Pennsylvania. The thresholded areas represent areas without vegetation that – based on the previously applied mask – are not non-mining areas like urban/suburban development, roads, or bodies of water.
Following the thresholding of annual greenest-pixel extents, footprints of surface mining activity can be developed in Google Earth Engine. In order for this step to be completed in Pennsylvania, an accuracy assessment is performed to verify and validate the success of SkyTruth’s detection.
An accuracy assessment, the comparison of manually identified mining areas with SkyTruth’s unsupervised classification, is a quality assurance measure that is required to determine whether the classification methodology we applied in Central Appalachia is applicable in other coal-mining regions. The results of the accuracy assessment can then serve as a guide for understanding if different regional geographies will require alternative approaches for mine identification.
To manually classify areas of mining within Pennsylvania, randomly generated sample areas of 176 km2 each were defined within the state. Imagery from seven-year intervals (1985, 1992, 1999, 2006, 2013, 2020) was downloaded for manual inspection and classification by our analysts, conforming with the method used in the accuracy assessment for the Central Appalachian study. Within each sample area 5,000 points were randomly selected for each year to avoid sampling bias.
Once manually classified by the analysts, these control points (mining areas, non-mining areas, and null areas) are compared to the mining footprints generated by the SkyTruth method. We examined the relative share of points that agreed with the unsupervised classification (i.e., mining observations occurring within mining footprints; non-mining points falling outside of mining footprints) and the error types (false-positive identification of mining, and false-negative identification of mining) to assess the accuracy of the unsupervised classification results.
At the onset of my capstone project, a primary objective was to use the EIA’s coal production dataset for the years 1985–2020 to define a comprehensive list of US counties with reported coal mining activities to scope out future state-based study areas. During the initial audit of the dataset, however, we discovered clerical problems that misrepresented areas of coal mining. Major issues were found within the EIA data including systematic spelling errors, missing data, inconsistent formatting, and most importantly no way to uniformly and consistently attribute records and figures to a precise location.
As a result of this discovery, data cleaning became a major part of this capstone project and resulted in additional deliverables for SkyTruth in tandem with the footprint for Pennsylvania. Additionally, further work to investigate the increasing amount of land required to yield a ton of coal can utilize cleaned figures for production data at the level of US counties.
Data Cleaning & EIA Data Products
The primary issue with the EIA’s dataset was that there were systematic errors in the recording of the attribute County Name, on which it was chiefly reliant to organize data. In certain cases such as counties with shared names across coal producing states (Jefferson, Clay, Fayette, etc.), figures were incorrectly reported as occurring in a different state (~25% of the total dataset). In other instances, certain counties such as Braxton County, WV and Claiborne County, TN were reported as Raxton [sic] and Clairborne [sic]. These spelling errors resulted in missing mining activities in the corresponding county (~14% of the total dataset). From 1985–1997 there were a handful of records of a mine site with an ‘Unknown’ county within various states (~7% of the total dataset).
Extensive data cleaning was required to create an accurate and complete record of US counties with reported coal mining production. The revised dataset contains 160,000 records, attributed to 303 U.S. counties across 35 years of production data.
Of added importance to this project was the discovery that the EIA’s raw dataset, with systematic errors and unrepresentative data, has been utilized in both the EIA’s official map of coal mining activities but also a feature dataset produced by Esri.
This discovery felt deeply related to SkyTruth’s mission to have an accurate understanding of the spatial extent of coal mining activities. Despite certain data limitations remaining, SkyTruth’s newly compiled extent of coal production by county serves as the best comprehensive record of coal production figures for each county and state.
Through this project, it is apparent to me that SkyTruth can be a leader in monitoring extractive activities both in the United States and globally. My capstone project demonstrates mission continuity with ongoing projects in South America as well as in SkyTruth’s further goals to deploy machine learning in earth observation and land cover change analysis.
Future Deliverables & Final Thoughts
Over the course of this project, my initial objective was narrowly focused on the detection of surface coal mining activities within Pennsylvania. However, through working with supplementary datasets and other information, I found myself spending a great deal of time considering the limitations of those products and even more time auditing and cleaning them to provide a useful product to leave with SkyTruth. I am looking forward to seeing a footprint of coal surface mining activities for Pennsylvania as well as the future adoption and utilization of the datasets that I developed.
As a student and early career professional I enjoyed the opportunity at SkyTruth to shape the direction of my experience and to determine how to advance the route of investigations and other projects. In efforts ranging from my work with surface mining in Pennsylvania, to monitoring marine oil pollution, and supporting conservation work for Cerulean, I have found my time at SkyTruth to be incredibly special and dear to me. Especially given that I was virtual for the entirety of my experience, the entire team went out of their way to virtually attend my capstone presentation and offer their support as I described my work with SkyTruth.
I would like to offer special thanks to both Brendan Jarrell and Christian Thomas for their direct oversight and mentorship – and the entire team for their guidance and support!