Polygonize Raster and Compute Zonal-Statistics in Python
The output of a clustering algorithm is a raster. But when you want to compute statistics of the clustered raster, it needs to be polygonized.
A simple way to perform this action is using the gdal command line gdal_polygonize.py
script. This script requires the output file format, input raster file and output name of the vector file. You can additionally mask pixel values which you don’t want to convert to polygons. For this example, we would consider a single band image.
python gdal_polygonize.py raster_file -f "ESRI Shapefile" vector_file.shp layername atrributefieldname
--nomask
allows to include nodata values in the shapefile
atrributefieldname
should always be preceded with layername
else it would result in an error.
The output would result in a vector layer. The number of output polygons is equal to the number of non-NA values. Each neighbouring cell (pixel) which is connected in the raster having the same value is combined to form a single polygon.
For instance, consider this 4 x 4 raster. When converted to vector, it resulted in 6 polygons. Note that disconnected similar values form an independent polygon. Each polygon will have an attribute as its pixel value from the raster, in the data type of the image. These would end up being a pair of (polygon, value) for each feature found in the image.
Another way to polygonize raster programmatically is to use the rasterio
library. Since rasterio utilizes GDAL under the hood, it also performs similar action and results in a pair of geometry and raster value. We create a tuple of dictionaries to store each feature output.
# code to polygonize using rasterio
from rasterio import features
# read the raster and polygonize
with rasterio.open(cluster_image_path) as src:
= src.read(1, out_dtype='uint16')
image #Make a mask!
= image != 0
mask # `results` contains a tuple. With each element in the tuple representing a dictionary containing the feature (polygon) and its associated raster value
= ( {'properties': {'cluster_id': int(v)}, 'geometry': s}
results for (s, v) in (features.shapes(image, mask=mask, transform=src.transform)))
Once we have the raster polygonized, we can use rasterstats
library to calculate zonal statistics. We use this library since there is no in-built functionality for rasterio to calculate it.
This library has a function zonal_stats
which takes in a vector layer and a raster to calculate the zonal statistics. Read more here
The parameters to the function are:
- vectors: path to an vector source or geo-like python objects
- raster: ndarray or path to a GDAL raster source
and various other options which can be found here
To create a vector layer from the tuple results
, we use geopandas. There are other libraries (such as fiona) which can also create vector geometry from shapely objects.
For raster, we pass the .tif
file directly to zonal_stats
. The final code looks like the following
from rasterstats import zonal_stats
= gpd.GeoDataFrame.from_features(results).set_crs(crs=src.crs)
in_shp
# stats parameter takes in various statistics that needs to be computed
= zonal_stats(in_shp,image,stats='min, max, mean, median',
statistics=True, nodata = -999) geojson_out
The output is a geojson generator when geojson_out
is True. we can convert the geojson to dataframe and export as csv for further processing.
This way, with the help of geopandas, rasterstats and rasterio, we polygonize the raster and calculate zonal statistics.