Download MODIS data using CMR API in Python
If you have ever used USGS Earth Explorer to download / explore data, you’d notice that the manual process is cumbersome and not scalable. That is why we require a programmatic way to download satellite data.
In this blog we’d see how to download MODIS data using Python. We use a Python package called modis-tools to perform our task. This package internally uses NASA CMR (Common Metadata Repository) API which lets us search and query catalogs of various satellite dataset including MODIS.
We focus on the MODIS dataset in this blog, but with little modification, we could extend for various other datasets.
Before you move ahead, make sure you have an earthdata account. We would require the username and password to download the data. Register here if not done so.
To download the data we ask ourselves the following questions:
- Which dataset specifically do I need? — Define Dataset Name
- What area do I need the data for? — Define our Region of Interest
- What time period of data do I require? — Define Start and End Date
Here, I wish to download MODIS Surface Reflectance 8-Day L3 Global 250 m SIN Grid data for Nigeria from 29 December, 2019 to 31st December, 2019.
Let us install and use the Python package
modis-tools to download the data on our local machine by performing the following steps
- Create a virtual environment.
- Install the
- Write the code.
To create a new environment
Create virtual environment
.modis-tools using Python’s
aman@AMAN-JAIN:~$ python3 -m venv .modis-tools
Activate the environment.
aman@AMAN-JAIN:~$ source .modis-tools/bin/activate
Note: The above command is for linux. For Windows use .
Install the modis-tools package
(.modis-tools) aman@AMAN-JAIN:~$ pip install modis-tools
Insert the below code
Paste the code in a python file named
# download_modis.py # 1) connect to earthdata session = ModisSession(username=username, password=password) # 2) Query the MODIS catalog for collections collection_client = CollectionApi(session=session) collections = collection_client.query(short_name="MOD09GQ", version="061") # Query the selected collection for granules granule_client = GranuleApi.from_collection(collections, session=session) # 3) Filter the selected granules via spatial and temporal parameters nigeria_bbox = [2.1448863675, 3.002583177, 4.289420717, 4.275061098] # format [x_min, y_min, x_max, y_max] nigeria_granules = granule_client.query(start_date="2019-12-29", end_date="2019-12-31", bounding_box=nigeria_bbox) # 4) Download the granules GranuleHandler.download_from_granules(nigeria_granules, session, threads=-1)
In the above code, change the
end_date according to your requirements.
To explain the above code —
First we create a session, which makes a connection to earthdata and registers a session. Next three lines we search for MODIS Surface Reflectance 8-Day L3 Global 250 m SIN Grid dataset using
Now we filter the region spatially and temporally we want our data to be downloaded. In this example, we filter for the nigeria region with a bounding box (
bounding_box) and the two days of december of 2019 (
Lastly, we download the data (granules) using multithreading, since we asked to use all threads. (
threads=-1is all threads).
How to get
version for the dataset?
The collection endpoint of the CMR API contains a directory of all dataset catalogs hosted by various organizations with its short name and version number. For MODIS data, LPDAAC_ECS hosts and maintains it. Under the
/collections/directory endpoint, look for
LPDAAC_ECS and search for the MODIS dataset you want to download. Each dataset has a short name and version associated with it as shown in the picture below. In our case we found
MOD09Q1 short name with version
Now it is time to run the code to see our data being downloaded.
In your terminal, run —
(.modis-tools) aman@AMAN-JAIN:~$ python download_modis.py Downloading: 100%|██████████████████████████████████████████████████████| 3/3 [00:10<00:00, 3.67s/file]
A progress bar would let you see the download progress and the files would be downloaded to your local disk. If you wish to download the data to a specific directory, use the path parameter in download_from_granules classmethod.
Endnote This short post on downloading MODIS data originated when I wanted to set up and deploy a pipeline. I did find other packages but they were quite old and did not use the state of the art specifications. Since the solution presented here uses CMR API, which has a very good documentation, I preferred it over other tools.
You can find the video version of this blog here
For the curious (Advanced)
The base url for the CMR API is —
Internally, CMR API first finds the collection for our dataset —
After that the package queries the granules endpoint to find individual granules matching our query parameters —
Note that most parameters are autogenerated by the python package depending on the
version you provide (downloadable, scroll, page_size, sort_key, concept_id). The other parameters are user defined (temporal, bounding_box)
There are many more additional parameters which can be passed. A complete list is present in the documentation. One such useful parameter that you can try out is
cloud_cover. All you need to do is pass this parameter name with value to the
query method in the above code.