diff --git a/docs/source/technical_tutorials/search/catalog.rst b/docs/source/technical_tutorials/search/catalog.rst index 0ca58868..0bd0befd 100644 --- a/docs/source/technical_tutorials/search/catalog.rst +++ b/docs/source/technical_tutorials/search/catalog.rst @@ -1,105 +1,53 @@ -MAAP's Dual Catalog -======================================= +Catalog background and migration notes +====================================== -MAAP users are advised to use two catalogs: +MAAP users often work across more than one catalog. In practice, the collection you want may live in MAAP STAC, NASA's Operational CMR, ESA MAAP STAC, or another upstream STAC API. -1. Use NASA's Operational CMR to discover NASA-produced and curated data: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html. -2. Use MAAP STAC for data not found in NASA CMR, and data produced by MAAP users: https://stac.maap-project.org/api.html. +For the recommended starting point, see `Collection Discovery with Federated Search `_. This page is mainly background for users who want extra catalog context or who are migrating older workflows. -.. warning:: - The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**. Users should request collections they need from this catalog to be made discoverable in the MAAP STAC or NASA's Operational CMR if they're not already there. +Catalogs of interest +------------------ -More information on each catalog and migrating from MAAP's CMR is detailed in the bottom of this page. - -======================================= MAAP STAC -======================================= - -MAAP STAC (https://stac.maap-project.org) is dedicated to datasets not accessible via NASA's CMR, such as GEDI Cal/Val datasets, ESA datasets, and user-shared data products. - -STAC discovery ---------------------------------------- - -Users can discover data in MAAP STAC using pystac-client or https://stac-browser.maap-project.org. - -API documentation is available here: https://stac.maap-project.org/api.html (will return MAAP STAC results). - -The general STAC API spec is here: https://api.stacspec.org/v1.0.0-rc.1/core/. - -An example of using pystac-client is included above and in `Searching STAC Documentation `_. - -Data Access via STAC ---------------------------------------- - -Data assets (files) published to STAC have not moved from the S3 bucket ``s3://nasa-maap-data-store``. ESA data is accessible via public HTTP access. NASA data in S3 is accessible publicly or via role-based bucket policy access. - -Users are encouraged to use common AWS S3 libraries for NASA data access, such as Python's boto3. - -Each item should have a "data" asset which includes a URL to the data. +~~~~~~~~~ -For example, https://stac.maap-project.org/collections/BIOSAR1/items/biosar1_roi_lidar58 includes: +MAAP STAC (https://stac.maap-project.org) contains collections that MAAP publishes through its STAC API, including MAAP-hosted and partner datasets. -.. code-block:: json - - "assets": { - "shx": { - "href": "https://bmap-catalogue-data.oss.eu-west-0.prod-cloud-ocb.orange-business.com/Campaign_data/biosar1/biosar1_roi_lidar58.shx", - "type": "application/octet-stream", - "roles": [ - "data" - ] - }, - } - -======================================= NASA's Operational CMR -======================================= - -CMR Discovery ---------------------------------------- - -Users can discover data NASA's Operational CMR via its publicly accessible API: https://cmr.earthdata.nasa.gov and user interface: https://search.earthdata.nasa.gov. - -CMR Search documentation can be found in `Searching Collections `_ and `Searching Granules `_ and https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html. +~~~~~~~~~~~~~~~~~~~~~~ -CMR Access ---------------------------------------- +NASA's Operational CMR is available through its APIs at https://cmr.earthdata.nasa.gov and through Earthdata Search at https://search.earthdata.nasa.gov. -For all NASA MAAP users, access to NASA'S Operational data is provided via a federated access token. Anything that is in NASA's Operational CMR should be accessed via maap-py so that the federated access token can be used. Users can also access data from LPDAAC (and possibly other DAACs in the future) without maap-py since the workspace should have access via a role-based bucket policy on the LPDAAC cloud bucket. - -Anyone can access data through Earthdata Login as well. +For MAAP users, access to NASA operational data is often easiest through ``maap-py`` because it can use the federated access token. Anyone can also access data through Earthdata Login. Find more documentation about how to access data in CMR in the `Access <../accessing.html>`_ section of this documentation. -======================================= Migrating from MAAP's CMR -======================================= +------------------------- -If you're migrating code from using https://cmr.maap-project.org, we're here to help. The documentation below should support migrating to https://cmr.earthdata.nasa.gov and https://stac.maap-project.org. If not, please contact the data team for assistance. +The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**. -Migration Steps: ----------------- +If you're migrating code from MAAP's CMR, the general approach is: 1. Identify where your code is using https://cmr.maap-project.org and which datasets are being discovered and accessed. -2. Once you've identified the datasets, use https://search.earthdata.nasa.gov or https://stac-browser.maap-project.org to find out if the dataset is available through NASA's Operational CMR or MAAP's STAC catalog. If you don't see your datasets in one of those places, reach out to the data team so they can prioritize that dataset for publication to MAAP STAC. -3. If the dataset is in NASA's Operational CMR and you're using MAAP's Python library ``maap-py`` to discover and access data, add the parameter ``cmr_host="cmr.earthdata.nasa.gov"`` to your ``maap.searchCollection`` and ``maap.searchGranule`` function calls. Update the ``concept_id`` to match the one from NASA's Operational CMR if you're using it to identify a specific collection or granule. -4. If the dataset is in MAAP STAC, use ``pystac_client`` (https://pystac-client.readthedocs.io/en/stable/) or an HTTP library to call the STAC HTTP API endpoints directly. +2. Use Federated Search, Earthdata Search, or MAAP STAC tools to determine where those datasets now live. +3. Update your discovery and item-search code to use the appropriate source catalog. -Examples: ----------------- +Examples +-------- -Example of switching a granule search to NASA's Operational CMR: -++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Example of switching a granule search to NASA's Operational CMR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The code below discovers granules from the ``ABoVE LVIS L2 Geolocated Surface Elevation Product``: .. code-block:: python - COLLECTION_ID = 'C1200125288-NASA_MAAP' + COLLECTION_ID = 'C1200125288-NASA_MAAP' results = maap.searchGranule(concept_id=COLLECTION_ID) pprint(f'Got {len(results)} results') -This dataset exists in NASA's Operational CMR. Using https://search.earthdata.nasa.gov, I discovered the collection's ``concept_id`` by searching for "ABoVE LVIS L2 Geolocated Surface Elevation Product" and copying the ``concept_id`` from the URL of the result to modify the code below: +This dataset exists in NASA's Operational CMR. Using https://search.earthdata.nasa.gov, you can discover the collection's ``concept_id`` and update the code like this: .. code-block:: python @@ -110,29 +58,25 @@ This dataset exists in NASA's Operational CMR. Using https://search.earthdata.na ) pprint(f'Got {len(results)} results') -Example of switching a granule search to MAAP STAC: -+++++++++++++++++++++++++++++++++++++++++++++++++++ +Example of switching a granule search to MAAP STAC +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This code discovers granules from the ``Landsat 8 Operational Land Imager (OLI) Surface Reflectance Analysis Ready Data (ARD) V1, Peru and Equatorial Western Africa, April 2013-January 2020``. .. code-block:: python - COLLECTION_ID = 'C1200110769-NASA_MAAP' + COLLECTION_ID = 'C1200110769-NASA_MAAP' results = maap.searchGranule(concept_id=COLLECTION_ID) pprint(f'Got {len(results)} results') - -You can use https://stac-browser.maap-project.org to find the STAC collection ID for that dataset, which is ``Landsat8_SurfaceReflectance``. +If Federated Search or STAC Browser shows that the dataset now lives in MAAP STAC with collection ID ``Landsat8_SurfaceReflectance``, switch to a STAC client workflow: .. code-block:: python from pystac_client import Client + URL = 'https://stac.maap-project.org/' cat = Client.open(URL) - for collection in cat.get_all_collections(): - print(collection) - collection = cat.get_collection('Landsat8_SurfaceReflectance') items = collection.get_items() - diff --git a/docs/source/technical_tutorials/search/collections.ipynb b/docs/source/technical_tutorials/search/collections.ipynb index fb072447..136ae470 100644 --- a/docs/source/technical_tutorials/search/collections.ipynb +++ b/docs/source/technical_tutorials/search/collections.ipynb @@ -11,7 +11,18 @@ "Date: November 2, 2020\n", "\n", "Description: These examples walk through the MAAP API functionality of searching for collections within NASA's Common Metadata Repository (CMR) based on specific parameters. Collections are groupings of files that share the same product specification. Searching for collections can be useful for finding individual files, known as granules, which are used for processing." - ] + ], + "id": "32aec7da" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Recommended starting point: For general collection discovery, start with Federated Search. Use this notebook when you already know you need a NASA CMR-specific workflow.\n", + "
\n" + ], + "id": "18629fad" }, { "cell_type": "markdown", @@ -21,7 +32,8 @@ "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors." - ] + ], + "id": "e259eaec" }, { "cell_type": "markdown", @@ -29,7 +41,8 @@ "source": [ "## Additional Resources\n", "- [NASA's CMR API Documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)" - ] + ], + "id": "c5b875aa" }, { "cell_type": "markdown", @@ -38,7 +51,8 @@ "## Importing and Installing Packages\n", "\n", "We begin by importing the `maap` package and creating a new MAAP class." - ] + ], + "id": "70bec6b1" }, { "cell_type": "code", @@ -54,21 +68,24 @@ "\n", "# invoke the MAAP search client\n", "maap = MAAP()" - ] + ], + "id": "94b3115f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## About searchCollection" - ] + ], + "id": "284aca09" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `maap.searchCollection` function to return a list of desired collections. Before using this function, let's use the `help` function to view the specific arguments and keywords for `maap.searchCollection`." - ] + ], + "id": "c7b4e7a8" }, { "cell_type": "code", @@ -93,14 +110,16 @@ "source": [ "# view help for the searchCollection function\n", "help(maap.searchCollection)" - ] + ], + "id": "cd9c3fbb" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The help text is showing that `maap.searchCollection` accepts a limit and search parameters. The limit parameter limits the number of resulting collections returned by `maap.searchCollection`. Note that `limit=100` means that the *default limit* for results from the MAAP API is 100. `maap.searchCollection` accepts any additional search parameters that are included in the CMR. For a list of accepted parameters, please refer to the [CMR Search Collections API reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#collection-search-by-parameters)." - ] + ], + "id": "593d934c" }, { "cell_type": "markdown", @@ -113,7 +132,8 @@ "3. Searching by spatial filter\n", "4. Using the results from one search as inputs into another\n", "5. Searching by additional attributes" - ] + ], + "id": "6de2a4e5" }, { "cell_type": "markdown", @@ -122,7 +142,8 @@ "## Finding all Collections\n", "\n", "Here we will demonstrate how to create a list containing all of the collections contained within the CMR. To do this, we will use the `maap.searchCollection` function without any additional search parameters. " - ] + ], + "id": "61bfe693" }, { "cell_type": "code", @@ -143,7 +164,8 @@ "\n", "# print the number of collections\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "96e87347" }, { "cell_type": "markdown", @@ -152,7 +174,8 @@ "We get 100 results because of the default page limit. The result from the MAAP API is a list of collections where each element in the list is the metadata for that particular collection. To change the limit, type `limit=` and then a value within the parentheses after `maap.searchCollection()`.\n", "\n", "Let's look at the metadata for the first collection in our list of results (`results[0]`) using `pprint`. For formatting purposes, we can use the `depth` parameter to control the number of levels of metadata detail to display. By default, there is no constraint on the depth. By setting a `depth` parameter (in this case `depth=2`), we can ensure that the next contained level is replaced by an ellipsis." - ] + ], + "id": "407cd023" }, { "cell_type": "code", @@ -249,14 +272,16 @@ "# (1) displays the concept ID, format, and revision ID\n", "# adjust the depth to a larger value (6) if you would like to view all of the metadata\n", "pprint(results[0], depth=2)" - ] + ], + "id": "7d18ec85" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Collection` key has all of the collection information including attributes, the archive center, spatial, and temporal information. The `concept-id` is a unique identifier for this collection. It can be used to further refine search results from the CMR, such as when searching for granule information." - ] + ], + "id": "85dc082b" }, { "cell_type": "markdown", @@ -267,7 +292,8 @@ "Here we use a temporal filter to narrow down our results using the `temporal` keyword in our search. The temporal keyword takes datetime information in a [specific format](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-temporal). The date format used is `YYYY-MM-DDThh:mm:ssZ` and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as the start or end date. To define a start date and return all collections after the date, put a comma after the date (`YYYY-MM-DDThh:mm:ssZ,`). To define a end date and return all granules prior to the data, put a comma before the date (`,YYYY-MM-DDThh:mm:ssZ`). Lastly, to get a date range, provide the start date and end date separated by a comma (`YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ`).\n", "\n", "In this example we will search for one month of data." - ] + ], + "id": "33cc737e" }, { "cell_type": "code", @@ -290,7 +316,8 @@ " temporal = datetimeRange\n", ")\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "84a85b38" }, { "cell_type": "code", @@ -311,14 +338,16 @@ "\n", "pprint(\n", " f'Collection {collectionName} was acquired starting at {collectionDate}', width=100)\n" - ] + ], + "id": "18020c6b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears the first result correctly matches with the beginning and ending temporal search parameters. Keep in mind that the results are limited to 100 so the final collection returned may not match the end date that was searched for." - ] + ], + "id": "506a1029" }, { "cell_type": "markdown", @@ -327,7 +356,8 @@ "## Searching by Spatial Filter\n", "\n", "Here we will illustrate how to search for collections by a spatial filter. There are a couple of [spatial filters available to search by](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-spatial) in the CMR including point, line, polygon, and bounding box. In this example, we will explore filtering with a bounding box which is a sequence of four latitude and longitude values in the order of `[W,S,E,N]`. " - ] + ], + "id": "8a7f4673" }, { "cell_type": "code", @@ -350,7 +380,8 @@ " bounding_box = collectionDomain\n", ")\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "3a835f53" }, { "cell_type": "code", @@ -376,14 +407,16 @@ "\n", "pprint(f'Collection {collectionName} was acquired within the following geometry: ', width=100)\n", "pprint(collectionGeometry)\n" - ] + ], + "id": "094557ad" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the first collection that the spatial coordinates of the collection intersect our search box." - ] + ], + "id": "fa10c4a5" } ], "metadata": { diff --git a/docs/source/technical_tutorials/search/federated-collection-discovery/collection_discovery.ipynb b/docs/source/technical_tutorials/search/federated-collection-discovery/collection_discovery.ipynb index 17e5c697..a4825a3d 100644 --- a/docs/source/technical_tutorials/search/federated-collection-discovery/collection_discovery.ipynb +++ b/docs/source/technical_tutorials/search/federated-collection-discovery/collection_discovery.ipynb @@ -5,13 +5,18 @@ "id": "337f5e01-9946-465a-8657-e4153fbac541", "metadata": {}, "source": [ - "# Collection Discovery: searching for collections across multiple APIs using the Federated Collection Discovery tools\n", + "# Collection Discovery with Federated Search\n", "\n", "Author: Henry Rodman (Development Seed)\n", "\n", "Date: December 18, 2025\n", "\n", - "Description: These examples show how to use the Federated Collection Discovery STAC API to search for collections across multiple STAC APIs. There is also an interactive search application for using the API which you can use at [https://discover.maap-project.org](https://discover.maap-project.org).\n", + "Description: These examples show how to use the Federated Collection Discovery STAC API to discover collections across multiple STAC APIs. This is the recommended starting point for MAAP data discovery. There is also an interactive search application at [https://discover.maap-project.org](https://discover.maap-project.org).\n", + "\n", + "## Recommended workflow\n", + "1. Use Federated Search to discover collections across catalogs.\n", + "2. Inspect the matching collection's source STAC API and collection ID.\n", + "3. Continue with item-level search in that source STAC API.\n", "\n", "## Background\n", "It can be challenging to find the data that you need for an analysis when any of the following are true:\n", @@ -19,7 +24,7 @@ "- you don't know which exact API the data can be accessed from\n", "- you don't know which collections you even need\n", "\n", - "Fear not! The Federated STAC Collection Discovery application (and the underlying API) can help you find the data you need by running your search for collections across multiple catalogs simultaneously.\n", + "The Federated STAC Collection Discovery application (and the underlying API) helps you find the right collection first, before you move on to item-level search in the catalog where that collection actually lives.\n", "\n", "![Federated Collection Discovery application](./federated-collection-discovery-app.png)\n", "\n", @@ -59,23 +64,7 @@ "| `quick +brown -fox` | Indicate included and excluded terms using `+`/`-` | This will search for items that INCLUDES `\"brown\"` EXCLUDES `\"fox\"` OR CONTAIN `\"quick\"` |\n", "\n", "### spatial search\n", - "You can apply a spatial filter to your search by typing the bounding box coordinates (xmin, ymin, xmax, ymax) or by drawing one on the provided map interface. The bounding box filter will return collections where the spatial extent intersects the provided bounding box.\n", - "\n", - "\n", - "\n", - "### temporal search\n", - "You can apply a temporal filter to your search by entering a start/end date range in the provided input boxes.\n", - "\n", - "### Inspect the search results\n", - "The matching collections from your search query will be printed in a table. If you click on a row, the collection details will pop up showing the description, the spatial and temporal extents, the source catalog, and data provider entries.\n", - "\n", - "\n", - "\n", - "If the collection seems like a good match for your needs, you can scroll down to the 'STAC Item Code Hints' to get the Python and R code you need to do an item-level search in a notebook or script.\n", - "\n", - "\n", - "\n", - "Users can also follow links to the original STAC collection JSON in its home API and browse for items there." + "You can apply a spatial filter to your search by typing the bounding box coordinates (xmin, ymin, xmax, ymax).\n" ] }, { @@ -1228,6 +1217,20 @@ "- [Federated Collection Discovery API docs](https://discover-api.maap-project.org/api.html)\n", "- [STAC FastAPI Collection Discovery User Guide](https://developmentseed.org/stac-fastapi-collection-discovery/v0.2.3/using-the-api/)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next step: search items in the source STAC API\n", + "\n", + "Federated Search is for **collection discovery**. Once you find a relevant collection, use the result metadata to identify the source STAC API and then run your **item-level search** there.\n", + "\n", + "- If the collection comes from **MAAP STAC**, continue with [Searching a STAC API for Items](../searching_the_stac_catalog.ipynb) using `https://stac.maap-project.org/`.\n", + "- If the collection comes from **NASA CMR STAC**, continue with [Searching a STAC API for Items](../searching_the_stac_catalog.ipynb) using the CMR STAC API for that provider, or use the NASA-specific [Searching for Granules in NASA's Operational CMR using maap-py](../granules.ipynb) tutorial when you already know you need CMR or Earthdata tooling.\n", + "- If the collection comes from **ESA MAAP STAC**, continue with [Searching a STAC API for Items](../searching_the_stac_catalog.ipynb) using `https://catalog.maap.eo.esa.int/catalogue/`.\n" + ], + "id": "e8c7b724" } ], "metadata": { diff --git a/docs/source/technical_tutorials/search/granules.ipynb b/docs/source/technical_tutorials/search/granules.ipynb index 7e35e311..f7a4aec0 100644 --- a/docs/source/technical_tutorials/search/granules.ipynb +++ b/docs/source/technical_tutorials/search/granules.ipynb @@ -11,7 +11,18 @@ "Date: February 27, 2020 (updated in 2022)\n", "\n", "Description: These examples will walk through the MAAP API functionality of searching granules within a collection in NASA's Common Metadata Repository (CMR) based on specific parameters. Granules are individual files from a sensor where a group of granules make a collection within CMR. The granules are the raw data that will be used for processing." - ] + ], + "id": "483b302d" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Recommended starting point: For general collection discovery, start with Federated Search. Use this notebook when you already know the collection lives in NASA CMR and you want a CMR-specific granule workflow.\n", + "
\n" + ], + "id": "eb9cb3c4" }, { "cell_type": "markdown", @@ -21,7 +32,8 @@ "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors." - ] + ], + "id": "f4e339ae" }, { "cell_type": "markdown", @@ -29,21 +41,24 @@ "source": [ "## Additional Resources\n", "- [NASA's CMR API Documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)" - ] + ], + "id": "d4a46878" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing and Installing Packages" - ] + ], + "id": "7b5aba75" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We begin by importing the `maap` and `pprint` packages. Then invoke the `MAAP` constructor, setting the `maap_host` argument to `'api.maap-project.org'`." - ] + ], + "id": "e1feb082" }, { "cell_type": "code", @@ -59,21 +74,24 @@ "\n", "# invoke the MAAP constructor using the maap_host argument\n", "maap = MAAP(maap_host='api.maap-project.org')" - ] + ], + "id": "34aaf157" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## About searchGranule" - ] + ], + "id": "40e89b60" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we view the specific arguments and keywords for the `maap.searchGranule` function." - ] + ], + "id": "353b835b" }, { "cell_type": "code", @@ -98,7 +116,8 @@ ], "source": [ "help(maap.searchGranule)" - ] + ], + "id": "1e1fa166" }, { "cell_type": "markdown", @@ -107,7 +126,8 @@ "As we can see from the result, `maap.searchGranule` accepts a limit keyword which limits the number of results from CMR. `maap.searchGranule()` also accepts any additional search parameters that are included in CMR. For a list of accepted parameters, please refer to the [CMR Search Granules API reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#granule-search-by-parameters).\n", "\n", "It is important to note that _the default limit on results from the MAAP API is 20_. To increase the number of results we will specify a variable and use it in later queries." - ] + ], + "id": "62934956" }, { "cell_type": "code", @@ -117,7 +137,8 @@ "source": [ "# get at max 500 results from CMR\n", "MAX_RESULTS = 500" - ] + ], + "id": "9498ad84" }, { "cell_type": "markdown", @@ -132,7 +153,8 @@ "5. Searching by additional attributes\n", "\n", "For the next couple of examples, we will focus on the [ICESat-2/ATLAS Land and Vegetation Height dataset](https://nsidc.org/data/atl08)." - ] + ], + "id": "c3b3d13a" }, { "cell_type": "markdown", @@ -141,7 +163,8 @@ "## Searching by Collection Short Name, Version\n", "\n", "Here we will search by a short name and version which should uniquely identify a collection CMR. HOWEVER, some datasets exist both in the cloud and on-prem, so in the following example we actually get **2** results.\n" - ] + ], + "id": "846f4a5c" }, { "cell_type": "code", @@ -166,7 +189,8 @@ " cmr_host='cmr.earthdata.nasa.gov'\n", ")\n", "len(atl08_collections)" - ] + ], + "id": "e29b5509" }, { "cell_type": "markdown", @@ -175,7 +199,8 @@ "If you inspect the results, you will see the second result has distribution information which points to an S3 bucket location. You can see this information with the follow code: `atl08_collections[1]['Collection']['DirectDistributionInformation']`.\n", "\n", "A simpler solution to finding just the cloud-hosted dataset is to add the `cloud_hosted=\"true\"` parameter to our search." - ] + ], + "id": "cddb63fc" }, { "cell_type": "code", @@ -201,14 +226,16 @@ " cloud_hosted=\"true\"\n", ")\n", "len(atl08_collections)" - ] + ], + "id": "f3e8b8d3" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can look up the collection concept id to find only granules in the cloud-hosted ATL08 v005 dataset." - ] + ], + "id": "7c19a20f" }, { "cell_type": "code", @@ -231,7 +258,8 @@ " cmr_host='cmr.earthdata.nasa.gov',\n", " limit=MAX_RESULTS)\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "1170e6c3" }, { "cell_type": "markdown", @@ -240,7 +268,8 @@ "We were able to get 500 results! There are most likely more than 500 granules in search results, but remember we limited the results to 500 granules. The result from the MAAP API is a list of granules where each element in the list is the metadata for that particular granule.\n", "\n", "Now let's look at the metadata for the first result." - ] + ], + "id": "0fe27d9f" }, { "cell_type": "code", @@ -275,7 +304,8 @@ "# (1) displays the collection concept ID, concept ID, format, and revision ID\n", "# adjust the depth to a larger value (6) if you would like to view all of the metadata\n", "pprint(results[0], depth=2)" - ] + ], + "id": "2d00aebf" }, { "cell_type": "markdown", @@ -284,7 +314,8 @@ "There is a lot of information in the metadata so let's break it down...\n", "\n", "The `Granule` key has all of the granule information including attributes, browse imagery URLs, spatial, and temporal information. The `collection-concept-id` should match what you searched by and be the same for each granule. Lastly the granule specific `concept-id` is a unique identifier for this granule. This information can be used to further refine search results from CMR, specifically the granule information." - ] + ], + "id": "9203a1f2" }, { "cell_type": "markdown", @@ -297,7 +328,8 @@ "The temporal keyword takes datetime information in a [specific format](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#g-temporal). The date format used is `YYYY-MM-DDThh:mm:ssZ` and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as start or end date. To define a start date and return all granules after the date, put a comma after the date (`YYYY-MM-DDThh:mm:ssZ,`). To define an end date and return all granules prior to the data, put a comma before the date (`,YYYY-MM-DDThh:mm:ssZ`). Lastly, to get a date range, provide the start date and end date separated by a comma (`YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ`).\n", "\n", "In this example we will search for one month of data." - ] + ], + "id": "f80e1d91" }, { "cell_type": "code", @@ -322,7 +354,8 @@ " cmr_host=\"cmr.earthdata.nasa.gov\"\n", ")\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "611d7987" }, { "cell_type": "code", @@ -342,14 +375,16 @@ "granuleDate = results[0]['Granule']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the granule start time\n", "\n", "pprint(f'Granule {granuleFilename} was acquired starting at {granuleDate}',width=100)" - ] + ], + "id": "ca5048f6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the first result correctly matches with the beginning temporal search parameter. Keep in mind that the results are limited to 500 so the final granule returned may not match the end date that was searched for." - ] + ], + "id": "20323b85" }, { "cell_type": "markdown", @@ -358,7 +393,8 @@ "## Searching by Spatial Filter\n", "\n", "Here we will illustrate how to search for granules by a spatial filter. There are a couple of [spatial filters available to search by](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#g-spatial) in CMR including point, line, polygon, and bounding box. The most simple to use is the bounding box which is a sequence of four latitude and longitude values in the order of `[W,S,E,N]`. In this example, we are going to search for data over Gabon using the `bounding_box` keyword.\n" - ] + ], + "id": "664c33c8" }, { "cell_type": "code", @@ -384,7 +420,8 @@ " cmr_host=\"cmr.earthdata.nasa.gov\"\n", ")\n", "pprint(f'Got {len(results)} results')" - ] + ], + "id": "7c64e158" }, { "cell_type": "code", @@ -409,21 +446,24 @@ "\n", "pprint(f'Granule {granule_filename} was acquired within the following geometry: ', width=100)\n", "pprint(geometry)" - ] + ], + "id": "703379ab" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the first granule that the spatial coordinates of the granule intersect our search box." - ] + ], + "id": "2faca6f7" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MAAP API provides rich functionality to interact with the CMR instance within the MAAP platform. Users can search datasets programmatically by many parameters and even combine parameters such as spatial and temporal filters to refine results." - ] + ], + "id": "f7d1a24c" }, { "cell_type": "markdown", @@ -432,7 +472,8 @@ "## Generating ID List from Search Results\n", "\n", "Each element in the `results` list contains the metadata for the granules returned by the search. Within this metadata is the key concept-id, which is the unique identifier for each granule. To create a list of granule IDs, we create a new list and add the concept-id from each element of results into the that list." - ] + ], + "id": "c684bcd7" }, { "cell_type": "code", @@ -466,7 +507,8 @@ "\n", "# View some of the results\n", "granuleID_list[:10]" - ] + ], + "id": "24cd71e2" } ], "metadata": { diff --git a/docs/source/technical_tutorials/search/searching_NISAR_BIOMASS_overlapping_data.ipynb b/docs/source/technical_tutorials/search/searching_NISAR_BIOMASS_overlapping_data.ipynb index 2c2b8c61..86b6dae5 100644 --- a/docs/source/technical_tutorials/search/searching_NISAR_BIOMASS_overlapping_data.ipynb +++ b/docs/source/technical_tutorials/search/searching_NISAR_BIOMASS_overlapping_data.ipynb @@ -1,1069 +1,1079 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "374aa2b1", - "metadata": {}, - "source": [ - "# Assessing Overlap in NISAR and ESA BIOMASS Datasets\n", - "\n", - "Date: February 4, 2026\n", - "\n", - "Authors: Harshini Girish (UAH), Rajat Shinde (UAH), Alex Mandel (Development Seed), Samantha Niemoeller (JPL)\n", - "\n", - "Description: This notebook queries NISAR L2 GCOV granules (via `earthaccess`) and ESA BIOMASS satellite items (via the ESA MAAP STAC API, e.g., `BiomassLevel1b`) for a chosen AOI and time settings. It converts returned items to footprint polygons and plots them on a single interactive Folium map as two toggleable layers. An optional overlap layer highlights where NISAR and BIOMASS footprints intersect (bbox-only or true geometry). The result quickly shows where data coincides spatially to support fusion workflows.\n" - ] - }, - { - "cell_type": "markdown", - "id": "93b44b7d-f5a5-4048-80e5-737db0998a43", - "metadata": {}, - "source": [ - "## Run This Notebook\n", - "\n", - "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", - "\n", - "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Additionally, it is recommended to use the `Pangeo` workspace within the ADE, since certain packages relevant to this tutorial are already installed." - ] - }, - { - "cell_type": "markdown", - "id": "399aa805-c518-4cde-812d-8729c5e888d9", - "metadata": {}, - "source": [ - "## Additional Resources\n", - "- [NISAR](https://nisar.jpl.nasa.gov/)\n", - "- [BIOMASS](https://docs.maap-project.org/en/develop/science/ESA_CCI/ESA_CCI_V5_Token_Access.html)\n" - ] - }, - { - "cell_type": "markdown", - "id": "a328ae66-198a-4906-b2f5-ffc387ee44b1", - "metadata": {}, - "source": [ - "## Import and Install Packages" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "86047982", - "metadata": {}, - "outputs": [], - "source": [ - "import pystac_client\n", - "import geopandas as gpd\n", - "import pandas as pd\n", - "import folium\n", - "from folium import GeoJson\n" - ] - }, - { - "cell_type": "markdown", - "id": "80529155", - "metadata": {}, - "source": [ - "## Inputs\n", - "\n", - "This section defines the parameters used throughout the notebook to search both catalogs and compute overlaps.\n", - "\n", - "- **BBOX** defines the area of interest as **(min_lon, min_lat, max_lon, max_lon)** and can be used to spatially filter both datasets. \n", - "- **NISAR_DT** sets the datetime range for the NISAR STAC search (tighten this first to avoid timeouts). \n", - "- **BIOMASS_DT** sets the datetime range for the BIOMASS STAC search. \n", - "- **NISAR_STAC_URL** is the STAC endpoint used to query NISAR items (CMR-STAC / ASF). \n", - "- **NISAR_COLLECTION** is the NISAR collection ID used in the STAC search (e.g., `NISAR_L2_GSLC_BETA_V1_1`). \n", - "- **BIOMASS_STAC_URL** is the STAC endpoint used to query BIOMASS items (ESA MAAP STAC). \n", - "- **BIOMASS_COLLECTION** is the BIOMASS collection name used in the STAC search (e.g., `BiomassLevel1b`).\n" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "8ed53932-c0b5-4223-ba86-aed95cbd65d3", - "metadata": {}, - "outputs": [], - "source": [ - "# Common query parameters (edit to your AOI/time window)\n", - "BBOX = [-180, -90, 180, 90] # [minx, miny, maxx, maxy]\n", - "NISAR_DT = \"2025-10-01/2025-12-31\" # tighten first to avoid timeouts\n", - "BIOMASS_DT = \"2024-01-01/..\" # adjust if needed\n", - "\n", - "NISAR_STAC_URL = \"https://cmr.earthdata.nasa.gov/stac/ASF\"\n", - "NISAR_COLLECTION = \"NISAR_L2_GSLC_BETA_V1_1\"\n", - "\n", - "BIOMASS_STAC_URL = \"https://catalog.maap.eo.esa.int/catalogue/\"\n", - "BIOMASS_COLLECTION = \"BiomassLevel1b\"\n" - ] - }, - { - "cell_type": "markdown", - "id": "e2dc36db-f686-41e7-8cbf-c152fcbf84fc", - "metadata": {}, - "source": [ - "## Query STAC and Convert Items to a GeoDataFrame\n" - ] - }, - { - "cell_type": "markdown", - "id": "5b6c52d4", - "metadata": {}, - "source": [ - "### 1) NISAR data" - ] - }, - { - "cell_type": "markdown", - "id": "9a8b1506-d6a1-422b-984c-92e172e298b7", - "metadata": {}, - "source": [ - "This section connects to the CMR/ASF STAC API and runs a STAC search for NISAR items using the same spatial/temporal filters used in the notebook (bbox and datetime). The returned STAC Items are converted into `gdf_nisar`, a GeoDataFrame whose `geometry` column contains the true NISAR footprint polygons and whose ID/title field is kept for labeling and later joins—no data files are downloaded, only metadata footprints are used." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "b2471531-9cc0-4e91-a065-ee2c2a0ba6fd", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NISAR items returned: 6\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
nisar_idgeometry
0NISAR_L2_PR_GSLC_003_005_D_077_4005_DHDH_A_202...POLYGON ((77.30164 24.17615, 76.70841 22.04128...
1NISAR_L2_PR_GSLC_003_064_D_130_7700_SHNA_A_202...POLYGON ((-2.61271 -81.76059, -6.78497 -82.596...
2NISAR_L2_PR_GSLC_004_064_D_130_7700_SHNA_A_202...POLYGON ((-2.70717 -81.78256, -6.90256 -82.617...
3NISAR_L2_PR_GSLC_004_076_A_022_2005_QPDH_A_202...POLYGON ((-88.24687 39.87043, -89.11556 41.970...
4NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...POLYGON ((42.96535 12.02728, 42.46821 14.11353...
\n", - "
" - ], - "text/plain": [ - " nisar_id \\\n", - "0 NISAR_L2_PR_GSLC_003_005_D_077_4005_DHDH_A_202... \n", - "1 NISAR_L2_PR_GSLC_003_064_D_130_7700_SHNA_A_202... \n", - "2 NISAR_L2_PR_GSLC_004_064_D_130_7700_SHNA_A_202... \n", - "3 NISAR_L2_PR_GSLC_004_076_A_022_2005_QPDH_A_202... \n", - "4 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", - "\n", - " geometry \n", - "0 POLYGON ((77.30164 24.17615, 76.70841 22.04128... \n", - "1 POLYGON ((-2.61271 -81.76059, -6.78497 -82.596... \n", - "2 POLYGON ((-2.70717 -81.78256, -6.90256 -82.617... \n", - "3 POLYGON ((-88.24687 39.87043, -89.11556 41.970... \n", - "4 POLYGON ((42.96535 12.02728, 42.46821 14.11353... " - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nisar_catalog = pystac_client.Client.open(NISAR_STAC_URL)\n", - "\n", - "nisar_search = nisar_catalog.search(\n", - " collections=[NISAR_COLLECTION],\n", - " bbox=BBOX,\n", - " datetime=NISAR_DT,\n", - " max_items=500,\n", - ")\n", - "nisar_items = list(nisar_search.items())\n", - "print(\"NISAR items returned:\", len(nisar_items))\n", - "\n", - "# Convert STAC Items → GeoDataFrame\n", - "nisar_features = []\n", - "for it in nisar_items:\n", - " \n", - " nisar_features.append({\n", - " \"type\": \"Feature\",\n", - " \"geometry\": it.geometry,\n", - " \"properties\": {\n", - " \"nisar_id\": it.id,\n", - " **(it.properties or {}),\n", - " },\n", - " })\n", - "\n", - "gdf_nisar = gpd.GeoDataFrame.from_features(nisar_features, crs=\"EPSG:4326\")\n", - "gdf_nisar = gdf_nisar[[\"nisar_id\", \"geometry\"]]\n", - "gdf_nisar.head()\n" - ] - }, - { - "cell_type": "markdown", - "id": "e94fa9b5", - "metadata": {}, - "source": [ - "### 2) ESA BIOMASS \n", - "\n", - "This part connects to the ESA MAAP STAC endpoint `(https://catalog.maap.eo.esa.int/catalogue/)` and searches the `BiomassLevel1b` collection using the notebook’s time range and optional `bbox` filter. The returned BIOMASS STAC Items are converted into gdf_biomass with polygon geometries preserved and a stable id/title column added for display and matching—again, this is footprint metadata only, not raster access." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "8c979f77", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "BIOMASS items returned: 2000\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
biomass_idstart_datetimeend_datetimedatetimegeometry
0BIO_S2_DGM__1S_20251212T011542_20251212T011603...2025-12-12T01:15:42.413Z2025-12-12T01:16:03.140Z2025-12-12T01:15:42.413ZPOLYGON ((-133.8954 -76.78645, -131.65558 -77....
1BIO_S2_DGM__1S_20251212T011407_20251212T011427...2025-12-12T01:14:07.171Z2025-12-12T01:14:27.898Z2025-12-12T01:14:07.171ZPOLYGON ((-125.59866 -71.44219, -123.89382 -71...
2BIO_S2_DGM__1S_20251212T015732_20251212T015753...2025-12-12T01:57:32.641Z2025-12-12T01:57:53.290Z2025-12-12T01:57:32.641ZPOLYGON ((46.75876 49.9374, 45.91077 49.75396,...
3BIO_S2_DGM__1S_20251212T015538_20251212T015559...2025-12-12T01:55:38.817Z2025-12-12T01:55:59.469Z2025-12-12T01:55:38.817ZPOLYGON ((49.76923 43.18403, 49.01818 43.01984...
4BIO_S2_DGM__1S_20251212T015500_20251212T015521...2025-12-12T01:55:00.871Z2025-12-12T01:55:21.523Z2025-12-12T01:55:00.871ZPOLYGON ((50.63847 40.91793, 49.91332 40.7588,...
\n", - "
" - ], - "text/plain": [ - " biomass_id \\\n", - "0 BIO_S2_DGM__1S_20251212T011542_20251212T011603... \n", - "1 BIO_S2_DGM__1S_20251212T011407_20251212T011427... \n", - "2 BIO_S2_DGM__1S_20251212T015732_20251212T015753... \n", - "3 BIO_S2_DGM__1S_20251212T015538_20251212T015559... \n", - "4 BIO_S2_DGM__1S_20251212T015500_20251212T015521... \n", - "\n", - " start_datetime end_datetime \\\n", - "0 2025-12-12T01:15:42.413Z 2025-12-12T01:16:03.140Z \n", - "1 2025-12-12T01:14:07.171Z 2025-12-12T01:14:27.898Z \n", - "2 2025-12-12T01:57:32.641Z 2025-12-12T01:57:53.290Z \n", - "3 2025-12-12T01:55:38.817Z 2025-12-12T01:55:59.469Z \n", - "4 2025-12-12T01:55:00.871Z 2025-12-12T01:55:21.523Z \n", - "\n", - " datetime geometry \n", - "0 2025-12-12T01:15:42.413Z POLYGON ((-133.8954 -76.78645, -131.65558 -77.... \n", - "1 2025-12-12T01:14:07.171Z POLYGON ((-125.59866 -71.44219, -123.89382 -71... \n", - "2 2025-12-12T01:57:32.641Z POLYGON ((46.75876 49.9374, 45.91077 49.75396,... \n", - "3 2025-12-12T01:55:38.817Z POLYGON ((49.76923 43.18403, 49.01818 43.01984... \n", - "4 2025-12-12T01:55:00.871Z POLYGON ((50.63847 40.91793, 49.91332 40.7588,... " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "biomass_catalog = pystac_client.Client.open(BIOMASS_STAC_URL)\n", - "\n", - "biomass_search = biomass_catalog.search(\n", - " collections=[BIOMASS_COLLECTION],\n", - " bbox=BBOX,\n", - " datetime=BIOMASS_DT,\n", - " max_items=2000,\n", - " method=\"GET\",\n", - ")\n", - "\n", - "biomass_items = list(biomass_search.items())\n", - "print(\"BIOMASS items returned:\", len(biomass_items))\n", - "\n", - "biomass_features = []\n", - "for it in biomass_items:\n", - " if it.geometry is None:\n", - " raise ValueError(f\"Item {it.id} has no geometry (footprint).\")\n", - " props = it.properties or {}\n", - " biomass_features.append({\n", - " \"type\": \"Feature\",\n", - " \"geometry\": it.geometry,\n", - " \"properties\": {\n", - " \"biomass_id\": it.id,\n", - " \"start_datetime\": props.get(\"start_datetime\"),\n", - " \"end_datetime\": props.get(\"end_datetime\"),\n", - " \"datetime\": props.get(\"datetime\"),\n", - " },\n", - " })\n", - "\n", - "gdf_biomass = gpd.GeoDataFrame.from_features(biomass_features, crs=\"EPSG:4326\")\n", - "gdf_biomass = gdf_biomass[[\"biomass_id\", \"start_datetime\", \"end_datetime\", \"datetime\", \"geometry\"]]\n", - "gdf_biomass.head()\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "729cd4a4", - "metadata": {}, - "source": [ - "## Interactive map: NISAR and BIOMASS footprint layers\n", - "\n", - "Here the notebook creates an interactive Folium map and overlays the two GeoDataFrames using distinct styles (e.g., NISAR in blue and BIOMASS in orange) so you can visually compare coverage. Tooltips show the granule/item identifiers when you hover, and a layer control lets you toggle the layers, which makes it easy to confirm the footprints are in the right places before doing any intersection." - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "id": "9852af86-0bcc-4913-ab36-323c18069f77", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
Make this Notebook Trusted to load map: File -> Trust Notebook
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "## Ensure both are WGS84 for folium\n", - "gdf_nisar = gdf_nisar.to_crs(\"EPSG:4326\")\n", - "gdf_biomass = gdf_biomass.to_crs(\"EPSG:4326\")\n", - "\n", - "# Center/zoom using combined bounds\n", - "combined = gpd.GeoSeries(list(gdf_nisar.geometry) + list(gdf_biomass.geometry), crs=\"EPSG:4326\")\n", - "minx, miny, maxx, maxy = combined.total_bounds\n", - "center = [(miny + maxy) / 2, (minx + maxx) / 2]\n", - "\n", - "m = folium.Map(location=center, tiles=\"OpenStreetMap\", zoom_start=3)\n", - "\n", - "def style_nisar(_):\n", - " return {\"color\": \"#1f77b4\", \"weight\": 2, \"fillOpacity\": 0.15} # blue\n", - "\n", - "def style_biomass(_):\n", - " return {\"color\": \"#ff7f0e\", \"weight\": 1, \"fillOpacity\": 0.10} # orange\n", - "\n", - "GeoJson(\n", - " gdf_nisar.__geo_interface__,\n", - " name=f\"NISAR ({len(gdf_nisar)})\",\n", - " tooltip=folium.GeoJsonTooltip(fields=[\"nisar_id\"]),\n", - " style_function=style_nisar,\n", - ").add_to(m)\n", - "\n", - "GeoJson(\n", - " gdf_biomass.__geo_interface__,\n", - " name=f\"BIOMASS ({len(gdf_biomass)})\",\n", - " tooltip=folium.GeoJsonTooltip(fields=[\"biomass_id\"]),\n", - " style_function=style_biomass,\n", - ").add_to(m)\n", - "\n", - "folium.LayerControl(collapsed=False).add_to(m)\n", - "m.fit_bounds([[miny, minx], [maxy, maxx]])\n", - "m\n" - ] - }, - { - "cell_type": "markdown", - "id": "2312467f-cdb5-472e-9429-0ddf09af29bf", - "metadata": {}, - "source": [ - "## Overlap of BIOMASS tiles intersecting with NISAR granules" - ] - }, - { - "cell_type": "markdown", - "id": "a5b33a76-2f12-4a22-8904-57ccd1e71983", - "metadata": {}, - "source": [ - "This cell uses GeoPandas spatial join to identify which BIOMASS footprint polygons intersect which NISAR footprint polygons. It runs gpd.sjoin() with `how=\"inner\"` and `predicate=\"intersects\"` on `gdf_nisar` and `gdf_biomass` (after resetting indices for a clean join), producing pairs where each row represents one intersecting NISAR–BIOMASS match. It then prints the total number of intersection pairs found, and builds a compact summary table called matches by keeping only the `nisar_id` and `biomass_id` columns, removing any duplicate pairs, resetting the index, and showing the first 25 results so you can quickly see which specific granules/tiles overlap." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "4d798201-833b-4a56-9109-b1d453534d13", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Overlap pairs: 8\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
nisar_idbiomass_id
0NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S1_DGM__1S_20251121T153442_20251121T153503...
1NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251214T025234_20251214T025254...
2NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025236_20251217T025256...
3NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025255_20251217T025310...
4NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S1_DGM__1S_20251121T153442_20251121T153503...
5NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251214T025234_20251214T025254...
6NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025236_20251217T025256...
7NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025255_20251217T025310...
\n", - "
" - ], - "text/plain": [ - " nisar_id \\\n", - "0 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", - "1 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", - "2 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", - "3 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", - "4 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", - "5 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", - "6 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", - "7 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", - "\n", - " biomass_id \n", - "0 BIO_S1_DGM__1S_20251121T153442_20251121T153503... \n", - "1 BIO_S2_DGM__1S_20251214T025234_20251214T025254... \n", - "2 BIO_S2_DGM__1S_20251217T025236_20251217T025256... \n", - "3 BIO_S2_DGM__1S_20251217T025255_20251217T025310... \n", - "4 BIO_S1_DGM__1S_20251121T153442_20251121T153503... \n", - "5 BIO_S2_DGM__1S_20251214T025234_20251214T025254... \n", - "6 BIO_S2_DGM__1S_20251217T025236_20251217T025256... \n", - "7 BIO_S2_DGM__1S_20251217T025255_20251217T025310... " - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Spatial join to find overlapping pairs\n", - "pairs = gpd.sjoin(\n", - " gdf_nisar.reset_index(drop=True),\n", - " gdf_biomass.reset_index(drop=True),\n", - " how=\"inner\",\n", - " predicate=\"intersects\",\n", - ")\n", - "\n", - "print(\"Overlap pairs:\", len(pairs))\n", - "\n", - "# Compact table of matches (deduped)\n", - "matches = pairs[[\"nisar_id\", \"biomass_id\"]].drop_duplicates().reset_index(drop=True)\n", - "matches.head(25)\n" - ] - }, - { - "cell_type": "markdown", - "id": "c383d669-995a-461c-a465-0cc695895c78", - "metadata": {}, - "source": [ - "This cell creates the actual overlap polygons and then visualizes only those overlaps on a clean map. It first pulls the matching BIOMASS geometries for each join result using pairs`[\"index_right\"]`, wraps them as a GeoSeries aligned to pairs.index, and computes the geometric intersection with the NISAR geometry in each row `(pairs.geometry.intersection(right_geom))`, producing overlap_geom. It then builds a new GeoDataFrame called overlap that keeps just the linked identifiers (nisar_id, biomass_id) plus the intersection geometry, and filters out any empty intersections. For visualization, it creates a fresh Folium map (m_overlap) and adds a single GeoJson layer styled in green with a tooltip showing the two IDs on hover; finally, it automatically zooms the map to the extent of the overlap polygons using `overlap.total_bounds` and `fit_bounds`, so the view centers directly on where overlap occurs without showing the original NISAR/BIOMASS layers." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "5a41cefc-4143-488f-8a3d-d73149ed1991", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
Make this Notebook Trusted to load map: File -> Trust Notebook
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "right_geom = gdf_biomass.loc[pairs[\"index_right\"], \"geometry\"].values\n", - "right_geom = gpd.GeoSeries(right_geom, index=pairs.index, crs=\"EPSG:4326\")\n", - "\n", - "overlap_geom = pairs.geometry.intersection(right_geom)\n", - "\n", - "overlap = gpd.GeoDataFrame(\n", - " pairs[[\"nisar_id\", \"biomass_id\"]].copy(),\n", - " geometry=overlap_geom,\n", - " crs=\"EPSG:4326\",\n", - ")\n", - "overlap = overlap[~overlap.geometry.is_empty]\n", - "\n", - "# Map: overlap only\n", - "m_overlap = folium.Map(tiles=\"OpenStreetMap\")\n", - "\n", - "def style_overlap(_):\n", - " return {\"color\": \"#2ca02c\", \"weight\": 2, \"fillColor\": \"#2ca02c\", \"fillOpacity\": 0.35}\n", - "\n", - "GeoJson(\n", - " overlap.__geo_interface__,\n", - " name=f\"Overlap ({len(overlap)})\",\n", - " tooltip=folium.GeoJsonTooltip(fields=[\"nisar_id\", \"biomass_id\"]),\n", - " style_function=style_overlap,\n", - ").add_to(m_overlap)\n", - "\n", - "if len(overlap) > 0:\n", - " minx, miny, maxx, maxy = overlap.total_bounds\n", - " pad = 0.05 \n", - " m_overlap.fit_bounds([[miny - pad, minx - pad], [maxy + pad, maxx + pad]])\n", - "\n", - "m_overlap" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.7" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} +{ + "cells": [ + { + "cell_type": "markdown", + "id": "374aa2b1", + "metadata": {}, + "source": [ + "# Assessing Overlap in NISAR and ESA BIOMASS Datasets\n", + "\n", + "Date: February 4, 2026\n", + "\n", + "Authors: Harshini Girish (UAH), Rajat Shinde (UAH), Alex Mandel (Development Seed), Samantha Niemoeller (JPL)\n", + "\n", + "Description: This notebook queries NISAR L2 GCOV granules (via `earthaccess`) and ESA BIOMASS satellite items (via the ESA MAAP STAC API, e.g., `BiomassLevel1b`) for a chosen AOI and time settings. It converts returned items to footprint polygons and plots them on a single interactive Folium map as two toggleable layers. An optional overlap layer highlights where NISAR and BIOMASS footprints intersect (bbox-only or true geometry). The result quickly shows where data coincides spatially to support fusion workflows.\n" + ] + }, + { + "cell_type": "markdown", + "id": "93b44b7d-f5a5-4048-80e5-737db0998a43", + "metadata": {}, + "source": [ + "## Run This Notebook\n", + "\n", + "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", + "\n", + "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Additionally, it is recommended to use the `Pangeo` workspace within the ADE, since certain packages relevant to this tutorial are already installed." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Assumption: This notebook assumes you already know the collection IDs and source STAC APIs for the datasets you want to compare. If you need to find those inputs first, start with Federated Search.\n", + "
\n" + ], + "id": "dbb04e2d" + }, + { + "cell_type": "markdown", + "id": "399aa805-c518-4cde-812d-8729c5e888d9", + "metadata": {}, + "source": [ + "## Additional Resources\n", + "- [NISAR](https://nisar.jpl.nasa.gov/)\n", + "- [BIOMASS](https://docs.maap-project.org/en/develop/science/ESA_CCI/ESA_CCI_V5_Token_Access.html)\n" + ] + }, + { + "cell_type": "markdown", + "id": "a328ae66-198a-4906-b2f5-ffc387ee44b1", + "metadata": {}, + "source": [ + "## Import and Install Packages" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "86047982", + "metadata": {}, + "outputs": [], + "source": [ + "import pystac_client\n", + "import geopandas as gpd\n", + "import pandas as pd\n", + "import folium\n", + "from folium import GeoJson\n" + ] + }, + { + "cell_type": "markdown", + "id": "80529155", + "metadata": {}, + "source": [ + "## Inputs\n", + "\n", + "This section defines the parameters used throughout the notebook to search both catalogs and compute overlaps.\n", + "\n", + "- **BBOX** defines the area of interest as **(min_lon, min_lat, max_lon, max_lon)** and can be used to spatially filter both datasets. \n", + "- **NISAR_DT** sets the datetime range for the NISAR STAC search (tighten this first to avoid timeouts). \n", + "- **BIOMASS_DT** sets the datetime range for the BIOMASS STAC search. \n", + "- **NISAR_STAC_URL** is the STAC endpoint used to query NISAR items (CMR-STAC / ASF). \n", + "- **NISAR_COLLECTION** is the NISAR collection ID used in the STAC search (e.g., `NISAR_L2_GSLC_BETA_V1_1`). \n", + "- **BIOMASS_STAC_URL** is the STAC endpoint used to query BIOMASS items (ESA MAAP STAC). \n", + "- **BIOMASS_COLLECTION** is the BIOMASS collection name used in the STAC search (e.g., `BiomassLevel1b`).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8ed53932-c0b5-4223-ba86-aed95cbd65d3", + "metadata": {}, + "outputs": [], + "source": [ + "# Common query parameters (edit to your AOI/time window)\n", + "BBOX = [-180, -90, 180, 90] # [minx, miny, maxx, maxy]\n", + "NISAR_DT = \"2025-10-01/2025-12-31\" # tighten first to avoid timeouts\n", + "BIOMASS_DT = \"2024-01-01/..\" # adjust if needed\n", + "\n", + "NISAR_STAC_URL = \"https://cmr.earthdata.nasa.gov/stac/ASF\"\n", + "NISAR_COLLECTION = \"NISAR_L2_GSLC_BETA_V1_1\"\n", + "\n", + "BIOMASS_STAC_URL = \"https://catalog.maap.eo.esa.int/catalogue/\"\n", + "BIOMASS_COLLECTION = \"BiomassLevel1b\"\n" + ] + }, + { + "cell_type": "markdown", + "id": "e2dc36db-f686-41e7-8cbf-c152fcbf84fc", + "metadata": {}, + "source": [ + "## Query STAC and Convert Items to a GeoDataFrame\n" + ] + }, + { + "cell_type": "markdown", + "id": "5b6c52d4", + "metadata": {}, + "source": [ + "### 1) NISAR data" + ] + }, + { + "cell_type": "markdown", + "id": "9a8b1506-d6a1-422b-984c-92e172e298b7", + "metadata": {}, + "source": [ + "This section connects to the CMR/ASF STAC API and runs a STAC search for NISAR items using the same spatial/temporal filters used in the notebook (bbox and datetime). The returned STAC Items are converted into `gdf_nisar`, a GeoDataFrame whose `geometry` column contains the true NISAR footprint polygons and whose ID/title field is kept for labeling and later joins—no data files are downloaded, only metadata footprints are used." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b2471531-9cc0-4e91-a065-ee2c2a0ba6fd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "NISAR items returned: 6\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nisar_idgeometry
0NISAR_L2_PR_GSLC_003_005_D_077_4005_DHDH_A_202...POLYGON ((77.30164 24.17615, 76.70841 22.04128...
1NISAR_L2_PR_GSLC_003_064_D_130_7700_SHNA_A_202...POLYGON ((-2.61271 -81.76059, -6.78497 -82.596...
2NISAR_L2_PR_GSLC_004_064_D_130_7700_SHNA_A_202...POLYGON ((-2.70717 -81.78256, -6.90256 -82.617...
3NISAR_L2_PR_GSLC_004_076_A_022_2005_QPDH_A_202...POLYGON ((-88.24687 39.87043, -89.11556 41.970...
4NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...POLYGON ((42.96535 12.02728, 42.46821 14.11353...
\n", + "
" + ], + "text/plain": [ + " nisar_id \\\n", + "0 NISAR_L2_PR_GSLC_003_005_D_077_4005_DHDH_A_202... \n", + "1 NISAR_L2_PR_GSLC_003_064_D_130_7700_SHNA_A_202... \n", + "2 NISAR_L2_PR_GSLC_004_064_D_130_7700_SHNA_A_202... \n", + "3 NISAR_L2_PR_GSLC_004_076_A_022_2005_QPDH_A_202... \n", + "4 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", + "\n", + " geometry \n", + "0 POLYGON ((77.30164 24.17615, 76.70841 22.04128... \n", + "1 POLYGON ((-2.61271 -81.76059, -6.78497 -82.596... \n", + "2 POLYGON ((-2.70717 -81.78256, -6.90256 -82.617... \n", + "3 POLYGON ((-88.24687 39.87043, -89.11556 41.970... \n", + "4 POLYGON ((42.96535 12.02728, 42.46821 14.11353... " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nisar_catalog = pystac_client.Client.open(NISAR_STAC_URL)\n", + "\n", + "nisar_search = nisar_catalog.search(\n", + " collections=[NISAR_COLLECTION],\n", + " bbox=BBOX,\n", + " datetime=NISAR_DT,\n", + " max_items=500,\n", + ")\n", + "nisar_items = list(nisar_search.items())\n", + "print(\"NISAR items returned:\", len(nisar_items))\n", + "\n", + "# Convert STAC Items → GeoDataFrame\n", + "nisar_features = []\n", + "for it in nisar_items:\n", + " \n", + " nisar_features.append({\n", + " \"type\": \"Feature\",\n", + " \"geometry\": it.geometry,\n", + " \"properties\": {\n", + " \"nisar_id\": it.id,\n", + " **(it.properties or {}),\n", + " },\n", + " })\n", + "\n", + "gdf_nisar = gpd.GeoDataFrame.from_features(nisar_features, crs=\"EPSG:4326\")\n", + "gdf_nisar = gdf_nisar[[\"nisar_id\", \"geometry\"]]\n", + "gdf_nisar.head()\n" + ] + }, + { + "cell_type": "markdown", + "id": "e94fa9b5", + "metadata": {}, + "source": [ + "### 2) ESA BIOMASS \n", + "\n", + "This part connects to the ESA MAAP STAC endpoint `(https://catalog.maap.eo.esa.int/catalogue/)` and searches the `BiomassLevel1b` collection using the notebook’s time range and optional `bbox` filter. The returned BIOMASS STAC Items are converted into gdf_biomass with polygon geometries preserved and a stable id/title column added for display and matching—again, this is footprint metadata only, not raster access." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "8c979f77", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "BIOMASS items returned: 2000\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
biomass_idstart_datetimeend_datetimedatetimegeometry
0BIO_S2_DGM__1S_20251212T011542_20251212T011603...2025-12-12T01:15:42.413Z2025-12-12T01:16:03.140Z2025-12-12T01:15:42.413ZPOLYGON ((-133.8954 -76.78645, -131.65558 -77....
1BIO_S2_DGM__1S_20251212T011407_20251212T011427...2025-12-12T01:14:07.171Z2025-12-12T01:14:27.898Z2025-12-12T01:14:07.171ZPOLYGON ((-125.59866 -71.44219, -123.89382 -71...
2BIO_S2_DGM__1S_20251212T015732_20251212T015753...2025-12-12T01:57:32.641Z2025-12-12T01:57:53.290Z2025-12-12T01:57:32.641ZPOLYGON ((46.75876 49.9374, 45.91077 49.75396,...
3BIO_S2_DGM__1S_20251212T015538_20251212T015559...2025-12-12T01:55:38.817Z2025-12-12T01:55:59.469Z2025-12-12T01:55:38.817ZPOLYGON ((49.76923 43.18403, 49.01818 43.01984...
4BIO_S2_DGM__1S_20251212T015500_20251212T015521...2025-12-12T01:55:00.871Z2025-12-12T01:55:21.523Z2025-12-12T01:55:00.871ZPOLYGON ((50.63847 40.91793, 49.91332 40.7588,...
\n", + "
" + ], + "text/plain": [ + " biomass_id \\\n", + "0 BIO_S2_DGM__1S_20251212T011542_20251212T011603... \n", + "1 BIO_S2_DGM__1S_20251212T011407_20251212T011427... \n", + "2 BIO_S2_DGM__1S_20251212T015732_20251212T015753... \n", + "3 BIO_S2_DGM__1S_20251212T015538_20251212T015559... \n", + "4 BIO_S2_DGM__1S_20251212T015500_20251212T015521... \n", + "\n", + " start_datetime end_datetime \\\n", + "0 2025-12-12T01:15:42.413Z 2025-12-12T01:16:03.140Z \n", + "1 2025-12-12T01:14:07.171Z 2025-12-12T01:14:27.898Z \n", + "2 2025-12-12T01:57:32.641Z 2025-12-12T01:57:53.290Z \n", + "3 2025-12-12T01:55:38.817Z 2025-12-12T01:55:59.469Z \n", + "4 2025-12-12T01:55:00.871Z 2025-12-12T01:55:21.523Z \n", + "\n", + " datetime geometry \n", + "0 2025-12-12T01:15:42.413Z POLYGON ((-133.8954 -76.78645, -131.65558 -77.... \n", + "1 2025-12-12T01:14:07.171Z POLYGON ((-125.59866 -71.44219, -123.89382 -71... \n", + "2 2025-12-12T01:57:32.641Z POLYGON ((46.75876 49.9374, 45.91077 49.75396,... \n", + "3 2025-12-12T01:55:38.817Z POLYGON ((49.76923 43.18403, 49.01818 43.01984... \n", + "4 2025-12-12T01:55:00.871Z POLYGON ((50.63847 40.91793, 49.91332 40.7588,... " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "biomass_catalog = pystac_client.Client.open(BIOMASS_STAC_URL)\n", + "\n", + "biomass_search = biomass_catalog.search(\n", + " collections=[BIOMASS_COLLECTION],\n", + " bbox=BBOX,\n", + " datetime=BIOMASS_DT,\n", + " max_items=2000,\n", + " method=\"GET\",\n", + ")\n", + "\n", + "biomass_items = list(biomass_search.items())\n", + "print(\"BIOMASS items returned:\", len(biomass_items))\n", + "\n", + "biomass_features = []\n", + "for it in biomass_items:\n", + " if it.geometry is None:\n", + " raise ValueError(f\"Item {it.id} has no geometry (footprint).\")\n", + " props = it.properties or {}\n", + " biomass_features.append({\n", + " \"type\": \"Feature\",\n", + " \"geometry\": it.geometry,\n", + " \"properties\": {\n", + " \"biomass_id\": it.id,\n", + " \"start_datetime\": props.get(\"start_datetime\"),\n", + " \"end_datetime\": props.get(\"end_datetime\"),\n", + " \"datetime\": props.get(\"datetime\"),\n", + " },\n", + " })\n", + "\n", + "gdf_biomass = gpd.GeoDataFrame.from_features(biomass_features, crs=\"EPSG:4326\")\n", + "gdf_biomass = gdf_biomass[[\"biomass_id\", \"start_datetime\", \"end_datetime\", \"datetime\", \"geometry\"]]\n", + "gdf_biomass.head()\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "729cd4a4", + "metadata": {}, + "source": [ + "## Interactive map: NISAR and BIOMASS footprint layers\n", + "\n", + "Here the notebook creates an interactive Folium map and overlays the two GeoDataFrames using distinct styles (e.g., NISAR in blue and BIOMASS in orange) so you can visually compare coverage. Tooltips show the granule/item identifiers when you hover, and a layer control lets you toggle the layers, which makes it easy to confirm the footprints are in the right places before doing any intersection." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "9852af86-0bcc-4913-ab36-323c18069f77", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "## Ensure both are WGS84 for folium\n", + "gdf_nisar = gdf_nisar.to_crs(\"EPSG:4326\")\n", + "gdf_biomass = gdf_biomass.to_crs(\"EPSG:4326\")\n", + "\n", + "# Center/zoom using combined bounds\n", + "combined = gpd.GeoSeries(list(gdf_nisar.geometry) + list(gdf_biomass.geometry), crs=\"EPSG:4326\")\n", + "minx, miny, maxx, maxy = combined.total_bounds\n", + "center = [(miny + maxy) / 2, (minx + maxx) / 2]\n", + "\n", + "m = folium.Map(location=center, tiles=\"OpenStreetMap\", zoom_start=3)\n", + "\n", + "def style_nisar(_):\n", + " return {\"color\": \"#1f77b4\", \"weight\": 2, \"fillOpacity\": 0.15} # blue\n", + "\n", + "def style_biomass(_):\n", + " return {\"color\": \"#ff7f0e\", \"weight\": 1, \"fillOpacity\": 0.10} # orange\n", + "\n", + "GeoJson(\n", + " gdf_nisar.__geo_interface__,\n", + " name=f\"NISAR ({len(gdf_nisar)})\",\n", + " tooltip=folium.GeoJsonTooltip(fields=[\"nisar_id\"]),\n", + " style_function=style_nisar,\n", + ").add_to(m)\n", + "\n", + "GeoJson(\n", + " gdf_biomass.__geo_interface__,\n", + " name=f\"BIOMASS ({len(gdf_biomass)})\",\n", + " tooltip=folium.GeoJsonTooltip(fields=[\"biomass_id\"]),\n", + " style_function=style_biomass,\n", + ").add_to(m)\n", + "\n", + "folium.LayerControl(collapsed=False).add_to(m)\n", + "m.fit_bounds([[miny, minx], [maxy, maxx]])\n", + "m\n" + ] + }, + { + "cell_type": "markdown", + "id": "2312467f-cdb5-472e-9429-0ddf09af29bf", + "metadata": {}, + "source": [ + "## Overlap of BIOMASS tiles intersecting with NISAR granules" + ] + }, + { + "cell_type": "markdown", + "id": "a5b33a76-2f12-4a22-8904-57ccd1e71983", + "metadata": {}, + "source": [ + "This cell uses GeoPandas spatial join to identify which BIOMASS footprint polygons intersect which NISAR footprint polygons. It runs gpd.sjoin() with `how=\"inner\"` and `predicate=\"intersects\"` on `gdf_nisar` and `gdf_biomass` (after resetting indices for a clean join), producing pairs where each row represents one intersecting NISAR–BIOMASS match. It then prints the total number of intersection pairs found, and builds a compact summary table called matches by keeping only the `nisar_id` and `biomass_id` columns, removing any duplicate pairs, resetting the index, and showing the first 25 results so you can quickly see which specific granules/tiles overlap." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4d798201-833b-4a56-9109-b1d453534d13", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overlap pairs: 8\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nisar_idbiomass_id
0NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S1_DGM__1S_20251121T153442_20251121T153503...
1NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251214T025234_20251214T025254...
2NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025236_20251217T025256...
3NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025255_20251217T025310...
4NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S1_DGM__1S_20251121T153442_20251121T153503...
5NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251214T025234_20251214T025254...
6NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025236_20251217T025256...
7NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202...BIO_S2_DGM__1S_20251217T025255_20251217T025310...
\n", + "
" + ], + "text/plain": [ + " nisar_id \\\n", + "0 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", + "1 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", + "2 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", + "3 NISAR_L2_PR_GSLC_005_172_A_008_2005_DHDH_A_202... \n", + "4 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", + "5 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", + "6 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", + "7 NISAR_L2_PR_GSLC_006_172_A_008_2005_DHDH_A_202... \n", + "\n", + " biomass_id \n", + "0 BIO_S1_DGM__1S_20251121T153442_20251121T153503... \n", + "1 BIO_S2_DGM__1S_20251214T025234_20251214T025254... \n", + "2 BIO_S2_DGM__1S_20251217T025236_20251217T025256... \n", + "3 BIO_S2_DGM__1S_20251217T025255_20251217T025310... \n", + "4 BIO_S1_DGM__1S_20251121T153442_20251121T153503... \n", + "5 BIO_S2_DGM__1S_20251214T025234_20251214T025254... \n", + "6 BIO_S2_DGM__1S_20251217T025236_20251217T025256... \n", + "7 BIO_S2_DGM__1S_20251217T025255_20251217T025310... " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Spatial join to find overlapping pairs\n", + "pairs = gpd.sjoin(\n", + " gdf_nisar.reset_index(drop=True),\n", + " gdf_biomass.reset_index(drop=True),\n", + " how=\"inner\",\n", + " predicate=\"intersects\",\n", + ")\n", + "\n", + "print(\"Overlap pairs:\", len(pairs))\n", + "\n", + "# Compact table of matches (deduped)\n", + "matches = pairs[[\"nisar_id\", \"biomass_id\"]].drop_duplicates().reset_index(drop=True)\n", + "matches.head(25)\n" + ] + }, + { + "cell_type": "markdown", + "id": "c383d669-995a-461c-a465-0cc695895c78", + "metadata": {}, + "source": [ + "This cell creates the actual overlap polygons and then visualizes only those overlaps on a clean map. It first pulls the matching BIOMASS geometries for each join result using pairs`[\"index_right\"]`, wraps them as a GeoSeries aligned to pairs.index, and computes the geometric intersection with the NISAR geometry in each row `(pairs.geometry.intersection(right_geom))`, producing overlap_geom. It then builds a new GeoDataFrame called overlap that keeps just the linked identifiers (nisar_id, biomass_id) plus the intersection geometry, and filters out any empty intersections. For visualization, it creates a fresh Folium map (m_overlap) and adds a single GeoJson layer styled in green with a tooltip showing the two IDs on hover; finally, it automatically zooms the map to the extent of the overlap polygons using `overlap.total_bounds` and `fit_bounds`, so the view centers directly on where overlap occurs without showing the original NISAR/BIOMASS layers." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "5a41cefc-4143-488f-8a3d-d73149ed1991", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "right_geom = gdf_biomass.loc[pairs[\"index_right\"], \"geometry\"].values\n", + "right_geom = gpd.GeoSeries(right_geom, index=pairs.index, crs=\"EPSG:4326\")\n", + "\n", + "overlap_geom = pairs.geometry.intersection(right_geom)\n", + "\n", + "overlap = gpd.GeoDataFrame(\n", + " pairs[[\"nisar_id\", \"biomass_id\"]].copy(),\n", + " geometry=overlap_geom,\n", + " crs=\"EPSG:4326\",\n", + ")\n", + "overlap = overlap[~overlap.geometry.is_empty]\n", + "\n", + "# Map: overlap only\n", + "m_overlap = folium.Map(tiles=\"OpenStreetMap\")\n", + "\n", + "def style_overlap(_):\n", + " return {\"color\": \"#2ca02c\", \"weight\": 2, \"fillColor\": \"#2ca02c\", \"fillOpacity\": 0.35}\n", + "\n", + "GeoJson(\n", + " overlap.__geo_interface__,\n", + " name=f\"Overlap ({len(overlap)})\",\n", + " tooltip=folium.GeoJsonTooltip(fields=[\"nisar_id\", \"biomass_id\"]),\n", + " style_function=style_overlap,\n", + ").add_to(m_overlap)\n", + "\n", + "if len(overlap) > 0:\n", + " minx, miny, maxx, maxy = overlap.total_bounds\n", + " pad = 0.05 \n", + " m_overlap.fit_bounds([[miny - pad, minx - pad], [maxy + pad, maxx + pad]])\n", + "\n", + "m_overlap" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/source/technical_tutorials/search/searching_edsc_gui.ipynb b/docs/source/technical_tutorials/search/searching_edsc_gui.ipynb index 2183c72e..47b077f6 100644 --- a/docs/source/technical_tutorials/search/searching_edsc_gui.ipynb +++ b/docs/source/technical_tutorials/search/searching_edsc_gui.ipynb @@ -5,7 +5,8 @@ "metadata": {}, "source": [ "# Using the NASA Earthdata Search Client Graphical User Interface" - ] + ], + "id": "19e6c4a9" }, { "cell_type": "markdown", @@ -16,7 +17,18 @@ "Date: July 27, 2020\n", "\n", "Description: A guide detailing how to use NASA's Earthdata Search client graphical user interface (GUI)." - ] + ], + "id": "01ac479c" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Recommended starting point: For general collection discovery, start with Federated Search. Use Earthdata Search when you already know you need NASA CMR and want the Earthdata GUI workflow.\n", + "
\n" + ], + "id": "21d8d9aa" }, { "cell_type": "markdown", @@ -25,7 +37,8 @@ "## Additional Resources\n", "- [Earthdata Search](https://search.earthdata.nasa.gov/search)\n", "- [Find Earthdata](https://www.earthdata.nasa.gov/learn/find-data)" - ] + ], + "id": "6c44b88d" }, { "cell_type": "markdown", @@ -37,14 +50,16 @@ "\n", "![Earthdata Search Client GUI](../../_static/edsc_screenshot.png)\n", "(*Image of the NASA EDSC GUI*)" - ] + ], + "id": "f6005955" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the Earthdata Search Client" - ] + ], + "id": "09c5c67a" }, { "cell_type": "markdown", @@ -55,14 +70,16 @@ "We can also use the tools with the background map to refine our search. We can search spatially using **Search by spatial polygon** ![Search by spatial polygon](../../_static/search_by_spatial_polygon.png), **Search by spatial rectangle** ![Search by spatial rectangle](../../_static/search_by_spatial_rectangle.png), **Search by spatial circle** ![Search by spatial circle](../../_static/search_by_spatial_circle.png), and **Search by spatial coordinate** ![Search by spatial coordinate](../../_static/search_by_spatial_coordinate.png). Layers may be edited using the **Edit layers** ![Edit layers](../../_static/edit_layers.png) button and deleted using the **Delete layers** ![Delete layers](../../_static/delete_layers.png) buttons. There are also options for **North Polar Stereographic** ![North Polar Stereographic](../../_static/north_polar_stereographic.png), **Geographic (Equirectangular)** ![Geographic (Equirectangular)](../../_static/geographic.png), and **South Polar Stereographic** ![South Polar Stereographic](../../_static/south_polar_stereographic.png) projections. There are options to **Zoom in** ![Zoom in](../../_static/zoom_in.png), **Zoom home** ![Zoom home](../../_static/zoom_home.png), and **Zoom out** ![Zoom out](../../_static/zoom_out.png). Finally, we can change the basemap by selecting the **Map layers** ![Map layers](../../_static/map_layers.png) button. \n", "\n", "The results of the search are displayed in the *Matching Collections* section. Collection names and summaries for each result are shown here. The **View collection details** ![View collection details](../../_static/view_collection_details.png) button may be used to view related URLs and additional information about the selected collection. Also, collections may be added to a project using the **Add collection to the current project** ![Add collection to the current project](../../_static/add_collection_to_the_current_project.png) button. Clicking anywhere else on a result allows you to see the granules within the collection available for download." - ] + ], + "id": "7461942c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], - "source": [] + "source": [], + "id": "800d7550" } ], "metadata": { diff --git a/docs/source/technical_tutorials/search/searching_the_stac_catalog.ipynb b/docs/source/technical_tutorials/search/searching_the_stac_catalog.ipynb index 42937bd4..5673e86c 100644 --- a/docs/source/technical_tutorials/search/searching_the_stac_catalog.ipynb +++ b/docs/source/technical_tutorials/search/searching_the_stac_catalog.ipynb @@ -5,15 +5,17 @@ "id": "a66708a3", "metadata": {}, "source": [ - "# Searching the STAC Catalog\n", + "# Searching a STAC API for Items\n", "\n", "Authors: Aimee Barciauskas (Development Seed)\n", "\n", "Date: December 13, 2022\n", "\n", - "Description: This tutorial provides a basic introduction to searching the [MAAP STAC catalog](https://stac.maap-project.org/) using `pystac-client`.\n", + "Description: This tutorial shows the generic item-level STAC search pattern using `pystac-client`. Use it after you have identified a collection ID and source STAC API, for example from [Federated Search](federated-collection-discovery/collection_discovery.ipynb).\n", "\n", - "Another method of searching the STAC catalog is via the [STAC browser](https://stac-browser.maap-project.org).\n", + "This notebook uses [MAAP STAC](https://stac.maap-project.org/) as an example, but the same pattern works for other STAC APIs such as NASA CMR STAC or ESA MAAP STAC.\n", + "\n", + "Another way to inspect a STAC API is via a STAC browser when one is available. For MAAP STAC, see the [STAC browser](https://stac-browser.maap-project.org).\n", "\n", "\"Drawing\"\n" ] @@ -23,11 +25,15 @@ "id": "521aa28d", "metadata": {}, "source": [ - "## About the STAC Catalog\n", + "## About this workflow\n", + "\n", + "This notebook covers **item-level search** in a STAC API. The recommended MAAP workflow is:\n", "\n", - "The MAAP STAC catalog provides discovery of a subset of MAAP datasets. These collections are hosted specifically through the MAAP STAC catalog and are typically not available on NASA's CMR. The data files have not been moved at all in the process of publishing datasets to STAC.\n", + "1. Use [Federated Search](federated-collection-discovery/collection_discovery.ipynb) to discover relevant collections across catalogs.\n", + "2. Copy the collection ID and source STAC API from the result you want.\n", + "3. Use those values here to search for items.\n", "\n", - "Data will continue to be added to the STAC catalog with priority given to datasets which are known to be in-use by MAAP UWG members through S3 metrics, direct collaboration with data team members, and by request." + "MAAP STAC is used below as an example endpoint, but you should replace the endpoint and collection ID with the ones that match your collection.\n" ] }, { @@ -75,8 +81,9 @@ "id": "51a18e54", "metadata": {}, "source": [ - "## STAC Client\n", - "We first connect to an API by retrieving the root catalog, or landing page, of the API with the Client.open function." + "## Connect to a STAC API\n", + "\n", + "We first connect to the source STAC API by retrieving its landing page with `Client.open`. Replace `URL` with the source STAC API returned by Federated Search, and replace `COLLECTION_ID` with the collection you want to query.\n" ] }, { @@ -2101,10 +2108,12 @@ } ], "source": [ - "# STAC API root URL\n", - "URL = 'https://stac.maap-project.org/'\n", + "# Replace these with the source STAC API and collection ID from Federated Search\n", + "URL = \"https://stac.maap-project.org/\"\n", + "COLLECTION_ID = \"ESACCI_Biomass_L4_AGB_V4_100m\"\n", + "\n", "cat = Client.open(URL)\n", - "cat" + "cat\n" ] }, { @@ -2112,9 +2121,9 @@ "id": "f34b7a57", "metadata": {}, "source": [ - "### Searching Collections\n", + "### Inspecting collections\n", "\n", - "As with a static catalog the get_collections function will iterate through the Collections in the Catalog. Notice that because this is an API it can get all the Collections through a single call, rather than having to fetch each one individually." + "If you want to inspect the collections available from the current endpoint, `get_collections()` will iterate through them. In many workflows you can skip this because Federated Search already gave you the collection ID you need.\n" ] }, { @@ -2617,8 +2626,8 @@ } ], "source": [ - "collection = cat.get_collection(stac_collections[0].id)\n", - "collection" + "collection = cat.get_collection(COLLECTION_ID)\n", + "collection\n" ] }, { @@ -2626,11 +2635,11 @@ "id": "3f4e956f", "metadata": {}, "source": [ - "### Searching STAC Items\n", + "### Searching STAC items\n", "\n", - "Query the `/search` endpoint of the STAC catalog to find items in our collection. This method will return an ItemSearch instance which we can then turn into a list.\n", + "Query the `/search` endpoint of the STAC API to find items in your chosen collection. This method returns an `ItemSearch` instance which we can then turn into a list.\n", "\n", - "Read more about additional parameters to the `search()` method at [pystac-client.readthedocs.io](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search)." + "Read more about additional parameters to the `search()` method at [pystac-client.readthedocs.io](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).\n" ] }, { @@ -2669,7 +2678,7 @@ "id": "98234222", "metadata": {}, "source": [ - "We can get a specific item by supplying one of the IDs from an item in our previous collection search. We are then able to get the HREF of the first asset in our item.\n" + "We can also retrieve a specific item by supplying one of the IDs returned by the collection search. Then we can inspect one of its asset HREFs.\n" ] }, { @@ -2699,7 +2708,7 @@ "id": "0a223d73", "metadata": {}, "source": [ - "Here's a simplified example:" + "Here's a simplified example using explicit placeholders that you can swap for a different source STAC API, collection ID, and item ID.\n" ] }, { @@ -2709,17 +2718,17 @@ "metadata": {}, "outputs": [], "source": [ - "# Retrieve a specific collection\n", - "collection = cat.get_collection(\"ESACCI_Biomass_L4_AGB_V4_100m\")\n", + "# Replace these values with the source STAC API, collection ID, and item ID you want to inspect\n", + "URL = \"https://stac.maap-project.org/\"\n", + "COLLECTION_ID = \"ESACCI_Biomass_L4_AGB_V4_100m\"\n", + "ITEM_ID = \"S50W080_ESACCI-BIOMASS-L4-AGB-MERGED-100m-2020-fv4.0\"\n", + "ASSET_KEY = \"estimates\"\n", "\n", - "# Search for items in the collection\n", - "collection_items = list(cat.search(collections=[\"ESACCI_Biomass_L4_AGB_V4_100m\"], max_items=10).items())\n", - "\n", - "# Retrieve a specific item\n", - "item = collection.get_item(\"S50W080_ESACCI-BIOMASS-L4-AGB-MERGED-100m-2020-fv4.0\")\n", - "\n", - "# List the item's asset href\n", - "item.assets[\"estimates\"].href" + "cat = Client.open(URL)\n", + "collection = cat.get_collection(COLLECTION_ID)\n", + "collection_items = list(cat.search(collections=[COLLECTION_ID], max_items=10).items())\n", + "item = collection.get_item(ITEM_ID)\n", + "item.assets[ASSET_KEY].href\n" ] } ], diff --git a/docs/source/technical_tutorials/searching.rst b/docs/source/technical_tutorials/searching.rst index f3d27fc6..d9390dd2 100644 --- a/docs/source/technical_tutorials/searching.rst +++ b/docs/source/technical_tutorials/searching.rst @@ -2,25 +2,40 @@ Search ======================================= -MAAP users are advised to use two catalogs: +MAAP provides its own STAC API, but MAAP users often work with collections that live in NASA CMR, ESA MAAP STAC, and other upstream catalogs. -1. Use NASA's Operational CMR to discover NASA-produced and curated data: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html. -2. Use MAAP STAC for data not found in NASA CMR, and data produced by MAAP users: https://stac.maap-project.org/api.html. +The recommended MAAP data discovery workflow is: + +1. Use `Federated Search `_ to discover relevant collections across catalogs. +2. Identify the collection's source STAC API and collection ID from the result. +3. Continue with item-level search in that source STAC API. + +This keeps the starting point focused on finding the right collection before choosing a source-specific workflow. + +Commonly used catalogs +-------------------------- + +- `MAAP STAC `_ for MAAP-published collections. +- `NASA CMR STAC `_ and Earthdata Search for many NASA collections. +- `ESA MAAP STAC `_ for ESA-hosted collections. +- Other upstream STAC APIs that may be configured in Federated Search. .. warning:: - The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**. Users should request collections they need from this catalog to be made discoverable in the MAAP STAC or NASA's Operational CMR if they're not already there. + The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**. Users should request collections they need from this catalog to be made discoverable in MAAP STAC or NASA's Operational CMR if they're not already there. + +If you are migrating older code or want more catalog background, see `Catalog background and migration notes `_. -More information on each catalog and migrating from MAAP's CMR here: `MAAP's Dual Catalog `_. +Specialized tutorials for NASA CMR, Earthdata Search, and R remain available below when you already know which source you need. .. toctree:: :maxdepth: 2 :caption: Search Topics: + search/federated-collection-discovery/collection_discovery.ipynb + search/searching_the_stac_catalog.ipynb search/catalog.rst - search/searching_edsc_gui.ipynb search/collections.ipynb search/granules.ipynb - search/searching_the_stac_catalog.ipynb - search/federated-collection-discovery/collection_discovery.ipynb + search/searching_edsc_gui.ipynb working_with_r/find_data_in_r.rst search/searching_NISAR_BIOMASS_overlapping_data.ipynb diff --git a/docs/source/technical_tutorials/working_with_r/access_cmr_r.ipynb b/docs/source/technical_tutorials/working_with_r/access_cmr_r.ipynb index d0f1d50a..132e3fbc 100644 --- a/docs/source/technical_tutorials/working_with_r/access_cmr_r.ipynb +++ b/docs/source/technical_tutorials/working_with_r/access_cmr_r.ipynb @@ -16,6 +16,16 @@ "- Use `paws` to download data from a NASA DAAC locally." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Scope: This notebook is a NASA CMR-specific follow-on workflow. Use Federated Search first if you still need to discover the collection or confirm that NASA CMR is the right source.\n", + "
\n" + ], + "id": "014b20fd" + }, { "cell_type": "markdown", "id": "df352544-7428-421d-827a-510141080010", @@ -24,14 +34,16 @@ "## Additional Resources\n", "- [Working with R in MAAP](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r.html) \n", " - Current R Documentation within the MAAP Docs.\n", - "- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) \n", - " - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.\n", + "- [Collection Discovery with Federated Search](https://docs.maap-project.org/en/latest/technical_tutorials/search/federated-collection-discovery/collection_discovery.html) \n", + " - The recommended starting point for discovering collections across catalogs.\n", + "- [Catalog workflow overview (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html) \n", + " - Background on when to continue with NASA CMR versus another STAC API.\n", "- [`ncdf4` Reference Manual](https://cran.r-project.org/web/packages/ncdf4/ncdf4.pdf)\n", " - Documentation for reading and writing netCDF files using the `ncdf4` package.\n", "- [GDAL Raster Drivers](https://gdal.org/en/latest/drivers/raster/index.html)\n", " - A list of drivers for raster data.\n", "- [`paws` Reference Manual](https://cran.r-project.org/web/packages/paws/paws.pdf)\n", - " - Documentation for using the `paws` package." + " - Documentation for using the `paws` package.\n" ] }, { @@ -103,7 +115,9 @@ "\n", "In the example below, we'll demonstrate searching and accessing data from ORNL DAAC. We'll search for a GEDI L4B dataset, extract the associated links to access the data, and then open a file.\n", "\n", - "For more information on searching for NASA CMR collections and granules in R, see [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). " + "This notebook assumes you already know that NASA CMR is the right source for your collection. If you still need to discover the collection or confirm which catalog it lives in, start with [Federated Search](https://docs.maap-project.org/en/latest/technical_tutorials/search/federated-collection-discovery/collection_discovery.html).\n", + "\n", + "For more information on searching for NASA CMR collections and granules in R, see [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). \n" ] }, { diff --git a/docs/source/technical_tutorials/working_with_r/cmr_search_in_r.ipynb b/docs/source/technical_tutorials/working_with_r/cmr_search_in_r.ipynb index 8b11a7d3..b3f4e5ce 100644 --- a/docs/source/technical_tutorials/working_with_r/cmr_search_in_r.ipynb +++ b/docs/source/technical_tutorials/working_with_r/cmr_search_in_r.ipynb @@ -22,34 +22,46 @@ }, { "cell_type": "markdown", - "id": "a5231a3b-9c51-46b1-a95c-402be1ed6f2c", + "metadata": {}, + "source": [ + "
\n", + "Recommended starting point: For general collection discovery, start with Federated Search. Use this notebook when you already know you need a NASA CMR-specific workflow in R.\n", + "
\n" + ], + "id": "866b2763" + }, + { + "cell_type": "markdown", "metadata": {}, "source": [ "## Run This Notebook\n", "\n", "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", - "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within the \"Rocker Geospatial\" workspace." - ] + "Disclaimer: it is highly recommended to run this tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within the \"Rocker Geospatial\" workspace.\n" + ], + "id": "d111b27c" }, { "cell_type": "markdown", - "id": "679ac1ca-673b-447c-b043-74b2942f4883", "metadata": {}, "source": [ "## Additional Resources\n", "\n", "- [R Interface to Python](https://rstudio.github.io/reticulate/)\n", " - How to get started with the package `reticulate`, which is used in this notebook. This package allows us to use python-based libraries in R.\n", + "- [Collection Discovery with Federated Search](https://docs.maap-project.org/en/latest/technical_tutorials/search/federated-collection-discovery/collection_discovery.html)\n", + " - The recommended starting point for cross-catalog collection discovery before moving into a CMR-specific workflow.\n", "- [Searching for Granules in NASA's Operational CMR using maap-py](https://docs.maap-project.org/en/latest/technical_tutorials/search/granules.html)\n", " - The Python version of this notebook, also published in the MAAP Docs.\n", "- [How do I find data using R?](https://nasa-openscapes.github.io/earthdata-cloud-cookbook/how-tos/find-data/find-r.html)\n", " - A resource from NASA Openscapes, showing users how to search for NASA data in R and get authentication using the package `earthdatalogin`. Additionally, it shows users how to find data stored in NASA STACs (spatio-temporal asset catalogs).\n", "- [Common Metadata Repository (CMR) API Documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)\n", " - A resource that shows users how to search for collections and granules by parameter with the NASA CMR API.\n", - "- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr)\n", - " - A section in the MAAP Docs that provides general information and resources to search and access NASA's CMR." - ] + "- [Catalog workflow overview (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html)\n", + " - Background on the recommended workflow: discover collections first, then continue with a source-specific search path.\n" + ], + "id": "c1994dd4" }, { "cell_type": "markdown", @@ -157,42 +169,21 @@ "One collection was returned to us. To grab the concept ID of the collection, we'll use the code in the following cell." ] }, - { - "cell_type": "code", - "execution_count": 4, - "id": "fd33b15f-6abf-49e7-99de-0b46720d7dd1", - "metadata": { - "vscode": { - "languageId": "r" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[1] \"C2613553260-NSIDC_CPRD\"\n" - ] - } - ], - "source": [ - "collection_id = atl08_collections[[1]]['concept-id']\n", - "print(collection_id)" - ] - }, { "cell_type": "markdown", - "id": "cff7f8b0-000e-40c1-a91c-425bb5918cd5", "metadata": {}, "source": [ "### Federated Search\n", "\n", - "If you do not know the short name or ID for a collection, you can also search for collections by keyword. While `maap-py` can be used for a keyword search, `maap-py` only searches within NASA Earthdata. This demonstration will use [Federated Search](https://discover.maap-project.org/), which searches NASA Earthdata, MAAP, ESA, and VEDA.\n", + "If you do not know the short name or ID for a collection, start with [Federated Search](https://discover.maap-project.org/). Federated Search looks across multiple catalogs and helps you identify the collection ID and source STAC API before you move into a catalog-specific workflow.\n", "\n", - "For more information on Federated Search in Python, please see [\"BETA - Collection Discovery: searching for collections across multiple APIs using the Federated Collection Discovery API\"](https://docs.maap-project.org/en/latest/technical_tutorials/search/collection_discovery.html#Federated-Collection-Discovery-API).\n", + "This R example uses Federated Search to find a relevant collection and then continues with a NASA CMR-specific granule workflow in `maap-py`.\n", "\n", - "First, we'll set our API URL." - ] + "For the full Python walkthrough, see [Collection Discovery with Federated Search](https://docs.maap-project.org/en/latest/technical_tutorials/search/federated-collection-discovery/collection_discovery.html).\n", + "\n", + "First, we'll set our API URL.\n" + ], + "id": "2864d46e" }, { "cell_type": "code", diff --git a/docs/source/technical_tutorials/working_with_r/maap_stac_r.ipynb b/docs/source/technical_tutorials/working_with_r/maap_stac_r.ipynb index 19128a8a..c6e4f344 100644 --- a/docs/source/technical_tutorials/working_with_r/maap_stac_r.ipynb +++ b/docs/source/technical_tutorials/working_with_r/maap_stac_r.ipynb @@ -5,15 +5,25 @@ "id": "fa96e18d-c80f-4b67-9595-eeab44cf079f", "metadata": {}, "source": [ - "# Accessing the MAAP CMR STAC with R\n", + "# Item-level STAC Search in R with `rstac`\n", "\n", "Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed)\n", "\n", "Date: December 13, 2024\n", "\n", - "Description: In this notebook, we'll use `rstac` to search for collections and associated items within the [MAAP STAC Catalog](https://stac.maap-project.org/)." + "Description: In this notebook, we'll use `rstac` to search for items in a STAC API from R. Use it after you have identified a collection ID and source STAC API, for example from Federated Search. MAAP STAC is used as the example endpoint below.\n" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Recommended workflow: Start with Federated Search to discover collections, then use this notebook for item-level search in the source STAC API where the collection lives.\n", + "
\n" + ], + "id": "a85a5efe" + }, { "cell_type": "markdown", "id": "bed79c96-e264-4c4c-b77e-5ca4693fcac1", @@ -33,12 +43,14 @@ "source": [ "## Additional Resources\n", "\n", + "- [Collection Discovery with Federated Search](https://docs.maap-project.org/en/latest/technical_tutorials/search/federated-collection-discovery/collection_discovery.html)\n", + " - The recommended starting point for finding a collection ID and source STAC API.\n", "- [How do I find data using R?](https://nasa-openscapes.github.io/earthdata-cloud-cookbook/how-tos/find-data/find-r.html)\n", " - A resource from NASA Openscapes, showing users how to search for NASA data in R and get authentication using the package `earthdatalogin`. Additionally, it shows users how to find data stored in NASA STACs (SpatioTemporal Asset Catalogs).\n", "- [rstac: Client Library for SpatioTemporal Asset Catalog](https://cran.r-project.org/web/packages/rstac/index.html)\n", " - A page with materials for the `rstac` library.\n", - "- [Searching the STAC Catalog (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/searching_the_stac_catalog.html)\n", - " - A notebook in the MAAP Docs that shows users how to search the MAAP STAC using Python." + "- [Searching a STAC API for Items (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/searching_the_stac_catalog.html)\n", + " - The Python version of this generic STAC item-search workflow.\n" ] }, { @@ -72,8 +84,8 @@ "id": "6ec6e591-a0b2-4b1e-b449-e59825f21065", "metadata": {}, "source": [ - "## Initializing the MAAP STAC Endpoint\n", - "Before beginning, we'll form a connection to the MAAP STAC endpoint to set up and inspect the STAC endpoint for querying geospatial data." + "## Initializing the source STAC endpoint\n", + "Before beginning, connect to the source STAC API for the collection you want to query. Replace the example endpoint below with the source STAC API returned by Federated Search when needed.\n" ] }, { @@ -99,12 +111,13 @@ } ], "source": [ - "# Define the MAAP STAC endpoint\n", + "# Replace this with the source STAC API returned by Federated Search\n", "stac_endpoint <- stac(\"https://stac.maap-project.org/\")\n", "\n", "# Display the STAC endpoint metadata\n", - "cat(\"STAC Endpoint Metadata:\\n\")\n", - "print(stac_endpoint)" + "cat(\"STAC Endpoint Metadata:\n", + "\")\n", + "print(stac_endpoint)\n" ] }, { @@ -244,9 +257,8 @@ "id": "9f3249c0-c48f-492a-810f-30b19ddbd825", "metadata": {}, "source": [ - "## Assigning and Selecting a STAC Collection ID\n", - "This code selects a collection ID from the list of collections retrieved from the STAC catalog. It selects a single collection ID from the fetched collections.\n", - "\n" + "## Assigning a collection ID\n", + "This code sets a collection ID for item-level search. Replace the example value with the collection ID returned by Federated Search or another collection discovery workflow.\n" ] }, { @@ -268,14 +280,9 @@ } ], "source": [ - "# Assign collection ID\n", - "if (length(collections$collections) > 0) {\n", - " # Choose a specific collection (21st in this example)\n", - " collection_id <- collections$collections[[21]]$id\n", - " cat(\"Selected Collection ID:\", collection_id, \"\\n\")\n", - "} else {\n", - " stop(\"No collections found.\")\n", - "}" + "collection_id <- \"ESACCI_Biomass_L4_AGB_V4_100m\"\n", + "cat(\"Selected Collection ID:\", collection_id, \"\n", + "\")\n" ] }, { @@ -457,15 +464,15 @@ "id": "02046460-ebd0-4589-bf76-4b842a05c99c", "metadata": {}, "source": [ - "## Performing a Focused Search Using the MAAP STAC API\n", + "## Performing a focused item search using the STAC API\n", "\n", - "This code performs a search query and retrieves items from the MAAP STAC. The search is configured with the following parameters:\n", + "This code performs an item-level search with the chosen STAC endpoint and collection ID. The search is configured with the following parameters:\n", "\n", "Collection: Specifies the dataset to search within.\n", "\n", "Temporal Range: Filters items within a specific date range.\n", "\n", - "Bounding Box: Spatially filters items to a defined area of interest." + "Bounding Box: Spatially filters items to a defined area of interest.\n" ] }, {