Extensive aerial LiDAR data enables canopy height mapping across the Central Appalachian mining region
Aerial LiDAR allows us to remotely assess the 3D structure of vegetation at a high level of detail and accuracy. This depth of analysis is especially valuable for places like Central Appalachia, the epicenter of surface coal mining practices that have devastated native forests. While LiDAR data coverage was previously limited to small areas, a recent increase in publicly available data makes it possible for us to map canopy height in 10-meter detail across 73 counties in Central Appalachia. This new data will contribute to SkyTruth’s ongoing work to expose the profound impacts of mountaintop mining on Appalachian ecosystems.
Canopy height (right) adds critical depth to satellite imagery (left) when assessing vegetation recovery in mined areas. Image developed by SkyTruth.
The gold standard of forest structure data
A canopy height model (CHM) is a continuous raster, or pixel grid, representing the height of vegetation above the ground. While spectral characteristics from satellite imagery can offer a sense of where vegetation is present and how healthy it may be, canopy height data adds valuable insight into the vertical structure of a forest, making it a useful proxy for above-ground biomass. These metrics provide vital information about forest performance, from carbon sequestration to post-disturbance succession and ecosystem productivity.
Various models exist to estimate canopy height, but aerial LiDAR (Light Detection and Ranging) is considered the gold standard because it is a direct measurement of forest structure.
By scanning the Earth’s surface with lasers beamed from an airplane, a LiDAR system generates a detailed 3D model of the ground and every protruding feature, whether natural or human-built. The resulting data output is a point cloud: a collection of points representing the precise locations of objects encountered by the laser, each with x, y and z coordinates. These points are typically classified into “ground” and “not ground,” allowing us to distinguish between discrete objects and the underlying bare earth surface.
LiDAR data is expensive and time-consuming to collect. As a result, coverage was sparse just a few years ago. That changed in 2016 when the U.S. Geological Survey initiated the 3D Elevation Program (3DEP), a massive effort to acquire high-quality LiDAR with nationwide coverage. By the end of 2023, the program had completed data collection for 95% of the nation.
We now have wall-to-wall point cloud data for the entirety of SkyTruth’s mountaintop mining study region in Central Appalachia. This data is publicly available in the form of classified point clouds and digital elevation models (DEMs), and it holds enormous potential for processing into CHMs and other useful data products.
Big data, big challenges
Conceptually, the process of creating a CHM from a point cloud is relatively straightforward. Rasterizing only the ground-classified points produces a digital terrain model (DTM) representing the bare, topographic surface of the Earth. Rasterizing the uppermost layer of points creates a digital surface model (DSM), an elevation model that captures the maximum height of any vegetation. From there, a simple subtraction of the DTM from the DSM leaves us with just the height of the vegetation above the ground surface, or the CHM.
In short, Digital surface model (DSM) – Digital terrain model (DTM) = Canopy height model (CHM).
While this calculation is simple at the scale of a single 1-km2 “tile” of data, the challenges arose when the process was scaled up to the vast quantity of data for all of Central Appalachia – a study region spanning 73 counties and over 83,000 km2. Processing many gigabytes of high-resolution spatial data in a timely manner required more advanced cloud computing methods enabled by Google Cloud Platform.
Furthermore, standardizing this data across a patchwork of state LiDAR acquisition projects, each with different coordinate systems, units, and data collection methods, was no small task. Some of these inconsistencies remain irreconcilable. For instance, various state agencies contracted with different LiDAR companies, who used different sensors, flew their airplanes at different altitudes, pulsed their scanning lasers at different frequencies and wavelengths, and produced point clouds of different densities. All of these factors can influence the way canopy height is measured.
About the model
We produced the region-wide CHM using free and open-source geospatial tools integrated into a Python scripting environment, with the exception of the paid Google Cloud services we used to accelerate the processing of large numbers of files. Our goal in pursuing this open-source approach was to make the methodology as transparent, accessible, and reproducible as possible. Reproducibility is not just important for ethical spatial data science that allows other researchers to use, modify, and continually improve the data to create meaningful impact. The ability to easily replicate the CHM will be also critical when updated LiDAR data is acquired – and that time may come sooner than we think.
The 3DEP’s current goal is to update nationwide LiDAR coverage in 8-year cycles, meaning that the next round of data collection will occur over the next few years. Many states, including Kentucky, are already achieving update frequencies higher than the 3DEP goal. In addition, a 2022 survey commissioned by the USGS found that most topography experts require higher update frequencies and quality levels for their work than the 3DEP has planned, so the program’s goals may change accordingly.
The fact that a LiDAR-derived CHM represents direct measurements rather than modeled predictions is both its strongest asset and its primary limitation. It is far more accurate than any estimations derived from proxies, but it represents a single point in time and thus cannot be used to analyze temporal change. An updated region-wide CHM would serve as a powerful point of comparison against the original model to assess the status of vegetation loss and recovery in Central Appalachia’s mined areas.
What’s next?
In the meantime, there are plenty of applications for the current CHM. For instance, we can compare canopy height between healthy reference forests, abandoned mine lands, and mined areas with active reforestation programs to compile empirical data about the success of forest recovery efforts. Providing this data freely to the public and to our non-profit partners will lead to more effective monitoring and enforcement of mined land restoration, which is federally mandated by the Surface Mining Control and Reclamation Act of 1977 but all too often neglected.
We can also use canopy height data as an input to train or assess the accuracy of other models. The next project in the works here at SkyTruth is a machine learning model to predict canopy height from Sentinel-2 satellite imagery, allowing us to extrapolate LiDAR data from a single collection date into the past and future. By leveraging the vast wealth of information offered by modern LiDAR datasets, SkyTruth can continue paving the way for more informed reclamation strategies that promote environmental justice for impacted communities and ecosystems across Central Appalachia.