Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new provider for NASA PODAAC products : Not able to download them #755

Open
annesophie-cls opened this issue Jul 3, 2023 · 2 comments · Fixed by #773 · May be fixed by #874
Open

Added new provider for NASA PODAAC products : Not able to download them #755

annesophie-cls opened this issue Jul 3, 2023 · 2 comments · Fixed by #773 · May be fixed by #874
Labels
enhancement New feature or request provider New provider request

Comments

@annesophie-cls
Copy link

annesophie-cls commented Jul 3, 2023

Hi,

I added a new provider for the STAC NASA PODAAC catalog : https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/
But i don't succeed to download products, as I get a 401 Unhautorized Error.
However, from my web browser I don't have any problem to download the product from the downloadLink.

Please have a look at the notebook screenshot, that is trying to download this product :
https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/search?ids=ascat_20230620_092700_metopb_55801_eps_o_250_3301_ovw.l2
But only the .png data is downloaded, not the .nc data.

notebook_eodag_nasa

And this is the provider configuration :

earthdata_podaac:
  priority: 0
  search:
    type: StacSearch
    results_entry: features
    api_endpoint: https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/search
    need_auth: false
    pagination:
      max_items_per_page: 500
    discover_metadata:
      auto_discovery: true
      metadata_pattern: '^[a-zA-Z0-9_:-]+$'
      search_param: '{{{{"query":{{{{"{metadata}":{{{{"eq":"{{{metadata}}}" }}}} }}}} }}}}'
      metadata_path: '$.properties.*'
    discover_product_types:
        fetch_url: https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/collections
        result_type: json
        results_entry: 'collections[*]'
        generic_product_type_id: '$.id'
        generic_product_type_parsable_properties:
          productType: '$.id'
        generic_product_type_parsable_metadata:
          abstract: '$.description'
          license: '$.license'
          title: '$.id'
          missionStartDate: '$.extent.temporal.interval[0][0]'
    metadata_mapping:
      productType:
        - '{{"collections":["{productType}"]}}'
        - '$.collection'
      title: '$.id'
      id:
        - '{{"ids":["{id}"]}}'
        - '$.id'
      collection: '$.collection'
      bbox: '$.bbox'
      geometry:
        - '{{"intersects":{geometry#to_geojson}}}'
        - '($.geometry.`str()`.`sub(/^None$/, POLYGON((180 -90, 180 90, -180 90, -180 -90, 180 -90)))`)|($.geometry[*])'
      completionTimeFromAscendingNode:
        - '{{"datetime":"{startTimeFromAscendingNode#to_iso_utc_datetime(seconds)}/{completionTimeFromAscendingNode#to_iso_utc_datetime(seconds)}"}}'
        - '$.properties.end_datetime'
      downloadLink: '$.assets.data.href'
      assets: '$.assets'
  products:
    GENERIC_PRODUCT_TYPE:
      productType: '{productType}'
  download:
    type: HTTPDownload
    base_uri: 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/'
    extract: true
    outputs_prefix: /work/scratch/****/eodagworkspace/
  auth:
    credentials:
       username: ****
       password: ****
``

**Environment:**
 - Python version: 3.8.10
 - EODAG version: 2.10.0

Did I write something wrong in the provider configuration ?

Thank you very much
@annesophie-cls annesophie-cls added the bug Something isn't working label Jul 3, 2023
@sbrunato
Copy link
Collaborator

sbrunato commented Jul 4, 2023

Hello @ansotoo , the authentication plugin is missing in your configuration.
But no existing eodag auth plugin seams to work with this provider. A new plugin inspired by https://urs.earthdata.nasa.gov/documentation/for_users/data_access/python has to be implemented (contributions by Pull Requests are welcome!). Redirection should keep headers using a mechanism like the one provided in Earthdata documentation:

# overriding requests.Session.rebuild_auth to maintain headers when redirected
class SessionWithHeaderRedirection(requests.Session):
    AUTH_HOST = 'urs.earthdata.nasa.gov'
    def __init__(self, username, password):
        super().__init__()
        self.auth = (username, password)

   # Overrides from the library to keep headers when redirected to or from
   # the NASA auth host.
    def rebuild_auth(self, prepared_request, response):
        headers = prepared_request.headers
        url = prepared_request.url

        if 'Authorization' in headers:
            original_parsed = requests.utils.urlparse(response.request.url)
            redirect_parsed = requests.utils.urlparse(url)

            if (original_parsed.hostname != redirect_parsed.hostname) and \
                    redirect_parsed.hostname != self.AUTH_HOST and \
                    original_parsed.hostname != self.AUTH_HOST:
                del headers['Authorization']
        return

@annesophie-cls
Copy link
Author

Hi @sbrunato ,

I wrote this plugin bus it doesn't work, could you help me on that ?

from eodag.plugins.authentication.base import Authentication

import requests
from requests import Session

class SessionWithHeaderRedirection(Session):

    AUTH_HOST = 'urs.earthdata.nasa.gov'

    def __init__(self, username, password):
        super().__init__()
        self.auth = (username, password)


    # Overrides from the library to keep headers when redirected to or from the NASA auth host.
    def rebuild_auth(self, prepared_request, response):
        headers = prepared_request.headers
        url = prepared_request.url

        if 'Authorization' in headers:
            original_parsed = requests.utils.urlparse(response.request.url)
            redirect_parsed = requests.utils.urlparse(url)

            if (original_parsed.hostname != redirect_parsed.hostname) and \
                    redirect_parsed.hostname != self.AUTH_HOST and \
                    original_parsed.hostname != self.AUTH_HOST:
                del headers['Authorization']
        return

    
class NasaAuthPlugin(Authentication):

    def authenticate(self):
        """Authenticate"""
        self.validate_config_credentials()
        session = SessionWithHeaderRedirection(
            self.config.credentials["username"],
            self.config.credentials["password"],
        )
        return session.auth
       

@sbrunato sbrunato added enhancement New feature or request provider New provider request and removed bug Something isn't working labels Oct 11, 2023
@sbrunato sbrunato linked a pull request Oct 11, 2023 that will close this issue
@sbrunato sbrunato linked a pull request Oct 11, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request provider New provider request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants