Skip to content

Deduplicate OntoPortal search results across portals #6

@Imene-Amirat

Description

@Imene-Amirat

When database=ontoportal is requested, the api-gateway queries all configured OntoPortal instances (earthportal, agroportal, ecoportal, biodivportal) in parallel. The same concept frequently exists on multiple portals (e.g., a NERC vocabulary concept is imported into earthportal, ecoportal, and biodivportal), which produces many duplicat entries in the final response — same IRI, same ontology acronym, but different source_name.

Current behavior:

  • A query returning N raw results can contain hundreds of duplicates for the same concept
  • The client sees the same concept repeated N times with different source_name / source_url

Expected behavior:

  • Duplicate concepts (matching on iri + ontology_acronym) should
    be merged into a single entry
  • The merged entry should retain the data of the first portal seen
  • A new field found_in should list every portal that hosts the concept, along with the ui link for each — so the client can still navigate to any portal if needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions