Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 27 additions & 83 deletions docs/source/technical_tutorials/search/catalog.rst
Original file line number Diff line number Diff line change
@@ -1,105 +1,53 @@
MAAP's Dual Catalog
=======================================
Catalog background and migration notes
======================================

MAAP users are advised to use two catalogs:
MAAP users often work across more than one catalog. In practice, the collection you want may live in MAAP STAC, NASA's Operational CMR, ESA MAAP STAC, or another upstream STAC API.

1. Use NASA's Operational CMR to discover NASA-produced and curated data: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html.
2. Use MAAP STAC for data not found in NASA CMR, and data produced by MAAP users: https://stac.maap-project.org/api.html.
For the recommended starting point, see `Collection Discovery with Federated Search <federated-collection-discovery/collection_discovery.html>`_. This page is mainly background for users who want extra catalog context or who are migrating older workflows.

.. warning::
The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**. Users should request collections they need from this catalog to be made discoverable in the MAAP STAC or NASA's Operational CMR if they're not already there.
Catalogs of interest
------------------

More information on each catalog and migrating from MAAP's CMR is detailed in the bottom of this page.

=======================================
MAAP STAC
=======================================

MAAP STAC (https://stac.maap-project.org) is dedicated to datasets not accessible via NASA's CMR, such as GEDI Cal/Val datasets, ESA datasets, and user-shared data products.

STAC discovery
---------------------------------------

Users can discover data in MAAP STAC using pystac-client or https://stac-browser.maap-project.org.

API documentation is available here: https://stac.maap-project.org/api.html (will return MAAP STAC results).

The general STAC API spec is here: https://api.stacspec.org/v1.0.0-rc.1/core/.

An example of using pystac-client is included above and in `Searching STAC Documentation <searching_the_stac_catalog.ipynb>`_.

Data Access via STAC
---------------------------------------

Data assets (files) published to STAC have not moved from the S3 bucket ``s3://nasa-maap-data-store``. ESA data is accessible via public HTTP access. NASA data in S3 is accessible publicly or via role-based bucket policy access.

Users are encouraged to use common AWS S3 libraries for NASA data access, such as Python's boto3.

Each item should have a "data" asset which includes a URL to the data.
~~~~~~~~~

For example, https://stac.maap-project.org/collections/BIOSAR1/items/biosar1_roi_lidar58 includes:
MAAP STAC (https://stac.maap-project.org) contains collections that MAAP publishes through its STAC API, including MAAP-hosted and partner datasets.

.. code-block:: json

"assets": {
"shx": {
"href": "https://bmap-catalogue-data.oss.eu-west-0.prod-cloud-ocb.orange-business.com/Campaign_data/biosar1/biosar1_roi_lidar58.shx",
"type": "application/octet-stream",
"roles": [
"data"
]
},
}

=======================================
NASA's Operational CMR
=======================================

CMR Discovery
---------------------------------------

Users can discover data NASA's Operational CMR via its publicly accessible API: https://cmr.earthdata.nasa.gov and user interface: https://search.earthdata.nasa.gov.

CMR Search documentation can be found in `Searching Collections <collections.ipynb>`_ and `Searching Granules <granules.ipynb>`_ and https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html.
~~~~~~~~~~~~~~~~~~~~~~

CMR Access
---------------------------------------
NASA's Operational CMR is available through its APIs at https://cmr.earthdata.nasa.gov and through Earthdata Search at https://search.earthdata.nasa.gov.

For all NASA MAAP users, access to NASA'S Operational data is provided via a federated access token. Anything that is in NASA's Operational CMR should be accessed via maap-py so that the federated access token can be used. Users can also access data from LPDAAC (and possibly other DAACs in the future) without maap-py since the workspace should have access via a role-based bucket policy on the LPDAAC cloud bucket.

Anyone can access data through Earthdata Login as well.
For MAAP users, access to NASA operational data is often easiest through ``maap-py`` because it can use the federated access token. Anyone can also access data through Earthdata Login.

Find more documentation about how to access data in CMR in the `Access <../accessing.html>`_ section of this documentation.

=======================================
Migrating from MAAP's CMR
=======================================
-------------------------

If you're migrating code from using https://cmr.maap-project.org, we're here to help. The documentation below should support migrating to https://cmr.earthdata.nasa.gov and https://stac.maap-project.org. If not, please contact the data team for assistance.
The https://cmr.maap-project.org catalog was deprecated on **May 1, 2023**.

Migration Steps:
----------------
If you're migrating code from MAAP's CMR, the general approach is:

1. Identify where your code is using https://cmr.maap-project.org and which datasets are being discovered and accessed.
2. Once you've identified the datasets, use https://search.earthdata.nasa.gov or https://stac-browser.maap-project.org to find out if the dataset is available through NASA's Operational CMR or MAAP's STAC catalog. If you don't see your datasets in one of those places, reach out to the data team so they can prioritize that dataset for publication to MAAP STAC.
3. If the dataset is in NASA's Operational CMR and you're using MAAP's Python library ``maap-py`` to discover and access data, add the parameter ``cmr_host="cmr.earthdata.nasa.gov"`` to your ``maap.searchCollection`` and ``maap.searchGranule`` function calls. Update the ``concept_id`` to match the one from NASA's Operational CMR if you're using it to identify a specific collection or granule.
4. If the dataset is in MAAP STAC, use ``pystac_client`` (https://pystac-client.readthedocs.io/en/stable/) or an HTTP library to call the STAC HTTP API endpoints directly.
2. Use Federated Search, Earthdata Search, or MAAP STAC tools to determine where those datasets now live.
3. Update your discovery and item-search code to use the appropriate source catalog.

Examples:
----------------
Examples
--------

Example of switching a granule search to NASA's Operational CMR:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Example of switching a granule search to NASA's Operational CMR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The code below discovers granules from the ``ABoVE LVIS L2 Geolocated Surface Elevation Product``:

.. code-block:: python

COLLECTION_ID = 'C1200125288-NASA_MAAP'
COLLECTION_ID = 'C1200125288-NASA_MAAP'
results = maap.searchGranule(concept_id=COLLECTION_ID)
pprint(f'Got {len(results)} results')

This dataset exists in NASA's Operational CMR. Using https://search.earthdata.nasa.gov, I discovered the collection's ``concept_id`` by searching for "ABoVE LVIS L2 Geolocated Surface Elevation Product" and copying the ``concept_id`` from the URL of the result to modify the code below:
This dataset exists in NASA's Operational CMR. Using https://search.earthdata.nasa.gov, you can discover the collection's ``concept_id`` and update the code like this:

.. code-block:: python

Expand All @@ -110,29 +58,25 @@ This dataset exists in NASA's Operational CMR. Using https://search.earthdata.na
)
pprint(f'Got {len(results)} results')

Example of switching a granule search to MAAP STAC:
+++++++++++++++++++++++++++++++++++++++++++++++++++
Example of switching a granule search to MAAP STAC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This code discovers granules from the ``Landsat 8 Operational Land Imager (OLI) Surface Reflectance Analysis Ready Data (ARD) V1, Peru and Equatorial Western Africa, April 2013-January 2020``.

.. code-block:: python

COLLECTION_ID = 'C1200110769-NASA_MAAP'
COLLECTION_ID = 'C1200110769-NASA_MAAP'

results = maap.searchGranule(concept_id=COLLECTION_ID)
pprint(f'Got {len(results)} results')


You can use https://stac-browser.maap-project.org to find the STAC collection ID for that dataset, which is ``Landsat8_SurfaceReflectance``.
If Federated Search or STAC Browser shows that the dataset now lives in MAAP STAC with collection ID ``Landsat8_SurfaceReflectance``, switch to a STAC client workflow:

.. code-block:: python

from pystac_client import Client

URL = 'https://stac.maap-project.org/'
cat = Client.open(URL)
for collection in cat.get_all_collections():
print(collection)

collection = cat.get_collection('Landsat8_SurfaceReflectance')
items = collection.get_items()

79 changes: 56 additions & 23 deletions docs/source/technical_tutorials/search/collections.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,18 @@
"Date: November 2, 2020\n",
"\n",
"Description: These examples walk through the MAAP API functionality of searching for collections within NASA's Common Metadata Repository (CMR) based on specific parameters. Collections are groupings of files that share the same product specification. Searching for collections can be useful for finding individual files, known as granules, which are used for processing."
]
],
"id": "32aec7da"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<b>Recommended starting point:</b> For general collection discovery, start with <a href=\"federated-collection-discovery/collection_discovery.html\">Federated Search</a>. Use this notebook when you already know you need a NASA CMR-specific workflow.\n",
"</div>\n"
],
"id": "18629fad"
},
{
"cell_type": "markdown",
Expand All @@ -21,15 +32,17 @@
"To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n",
"\n",
"Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors."
]
],
"id": "e259eaec"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Resources\n",
"- [NASA's CMR API Documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)"
]
],
"id": "c5b875aa"
},
{
"cell_type": "markdown",
Expand All @@ -38,7 +51,8 @@
"## Importing and Installing Packages\n",
"\n",
"We begin by importing the `maap` package and creating a new MAAP class."
]
],
"id": "70bec6b1"
},
{
"cell_type": "code",
Expand All @@ -54,21 +68,24 @@
"\n",
"# invoke the MAAP search client\n",
"maap = MAAP()"
]
],
"id": "94b3115f"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## About searchCollection"
]
],
"id": "284aca09"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the `maap.searchCollection` function to return a list of desired collections. Before using this function, let's use the `help` function to view the specific arguments and keywords for `maap.searchCollection`."
]
],
"id": "c7b4e7a8"
},
{
"cell_type": "code",
Expand All @@ -93,14 +110,16 @@
"source": [
"# view help for the searchCollection function\n",
"help(maap.searchCollection)"
]
],
"id": "cd9c3fbb"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The help text is showing that `maap.searchCollection` accepts a limit and search parameters. The limit parameter limits the number of resulting collections returned by `maap.searchCollection`. Note that `limit=100` means that the *default limit* for results from the MAAP API is 100. `maap.searchCollection` accepts any additional search parameters that are included in the CMR. For a list of accepted parameters, please refer to the [CMR Search Collections API reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#collection-search-by-parameters)."
]
],
"id": "593d934c"
},
{
"cell_type": "markdown",
Expand All @@ -113,7 +132,8 @@
"3. Searching by spatial filter\n",
"4. Using the results from one search as inputs into another\n",
"5. Searching by additional attributes"
]
],
"id": "6de2a4e5"
},
{
"cell_type": "markdown",
Expand All @@ -122,7 +142,8 @@
"## Finding all Collections\n",
"\n",
"Here we will demonstrate how to create a list containing all of the collections contained within the CMR. To do this, we will use the `maap.searchCollection` function without any additional search parameters. "
]
],
"id": "61bfe693"
},
{
"cell_type": "code",
Expand All @@ -143,7 +164,8 @@
"\n",
"# print the number of collections\n",
"pprint(f'Got {len(results)} results')"
]
],
"id": "96e87347"
},
{
"cell_type": "markdown",
Expand All @@ -152,7 +174,8 @@
"We get 100 results because of the default page limit. The result from the MAAP API is a list of collections where each element in the list is the metadata for that particular collection. To change the limit, type `limit=` and then a value within the parentheses after `maap.searchCollection()`.\n",
"\n",
"Let's look at the metadata for the first collection in our list of results (`results[0]`) using `pprint`. For formatting purposes, we can use the `depth` parameter to control the number of levels of metadata detail to display. By default, there is no constraint on the depth. By setting a `depth` parameter (in this case `depth=2`), we can ensure that the next contained level is replaced by an ellipsis."
]
],
"id": "407cd023"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -249,14 +272,16 @@
"# (1) displays the concept ID, format, and revision ID\n",
"# adjust the depth to a larger value (6) if you would like to view all of the metadata\n",
"pprint(results[0], depth=2)"
]
],
"id": "7d18ec85"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `Collection` key has all of the collection information including attributes, the archive center, spatial, and temporal information. The `concept-id` is a unique identifier for this collection. It can be used to further refine search results from the CMR, such as when searching for granule information."
]
],
"id": "85dc082b"
},
{
"cell_type": "markdown",
Expand All @@ -267,7 +292,8 @@
"Here we use a temporal filter to narrow down our results using the `temporal` keyword in our search. The temporal keyword takes datetime information in a [specific format](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-temporal). The date format used is `YYYY-MM-DDThh:mm:ssZ` and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as the start or end date. To define a start date and return all collections after the date, put a comma after the date (`YYYY-MM-DDThh:mm:ssZ,`). To define a end date and return all granules prior to the data, put a comma before the date (`,YYYY-MM-DDThh:mm:ssZ`). Lastly, to get a date range, provide the start date and end date separated by a comma (`YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ`).\n",
"\n",
"In this example we will search for one month of data."
]
],
"id": "33cc737e"
},
{
"cell_type": "code",
Expand All @@ -290,7 +316,8 @@
" temporal = datetimeRange\n",
")\n",
"pprint(f'Got {len(results)} results')"
]
],
"id": "84a85b38"
},
{
"cell_type": "code",
Expand All @@ -311,14 +338,16 @@
"\n",
"pprint(\n",
" f'Collection {collectionName} was acquired starting at {collectionDate}', width=100)\n"
]
],
"id": "18020c6b"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It appears the first result correctly matches with the beginning and ending temporal search parameters. Keep in mind that the results are limited to 100 so the final collection returned may not match the end date that was searched for."
]
],
"id": "506a1029"
},
{
"cell_type": "markdown",
Expand All @@ -327,7 +356,8 @@
"## Searching by Spatial Filter\n",
"\n",
"Here we will illustrate how to search for collections by a spatial filter. There are a couple of [spatial filters available to search by](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-spatial) in the CMR including point, line, polygon, and bounding box. In this example, we will explore filtering with a bounding box which is a sequence of four latitude and longitude values in the order of `[W,S,E,N]`. "
]
],
"id": "8a7f4673"
},
{
"cell_type": "code",
Expand All @@ -350,7 +380,8 @@
" bounding_box = collectionDomain\n",
")\n",
"pprint(f'Got {len(results)} results')"
]
],
"id": "3a835f53"
},
{
"cell_type": "code",
Expand All @@ -376,14 +407,16 @@
"\n",
"pprint(f'Collection {collectionName} was acquired within the following geometry: ', width=100)\n",
"pprint(collectionGeometry)\n"
]
],
"id": "094557ad"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the first collection that the spatial coordinates of the collection intersect our search box."
]
],
"id": "fa10c4a5"
}
],
"metadata": {
Expand Down
Loading