Skip to content

Search Journey

The search journey enables users to discover and explore datasets within the NFDI4Immuno repository while maintaining privacy and data protection standards.

The system employs both a priori and post factum safety checks:

  1. A priori: The Search Service validates the query filters for safety before they reach the database.
  2. Post factum: After the query runs, the system checks the "cohort size." If the results are too specific (potentially identifying an individual), the system returns a fallback response rather than the raw results.
sequenceDiagram
    participant U as Front-end/CLI
    participant G as API Gateway
    participant S as Search Service
    participant DB as Search Database

    U->>G: POST /search (query + filters)
    G->>S: Forward search request

    S->>S: a priori sensitivity check

    S->>DB: Query with safe filters
    DB-->>S: Raw results

    S->>S: post factum check (e.g. cohort-size)

    alt Cohort OK
        S-->>S: Rank results
        S-->>G: Summary statistics + dataset ID (JSON)
    else Cohort too small
        S-->>G: Fallback response (insufficient cohort size)
    end

    G-->>U: Forward response

Key features:

  • A Priori Validation: Validates query filters for safety before they reach the database to prevent unsafe data requests.
  • Post Factum Filtering: Evaluates the "cohort size" of results to ensure individuals cannot be identified.
  • Fallback Mechanism: Automatically returns an "insufficient cohort size" response if the result set is too specific, protecting sensitive data.

Query Processing

The system optimizes discovery through specialized indexing and advanced filtering capabilities.

sequenceDiagram
    participant S as Search Service
    participant F1 as A priori Filter
    participant DB as Search Database
    participant F2 as Post factum Filter

    S->>F1: Validate filter safety
    alt Filters safe
        F1-->>S: Pass
        S->>DB: Run query
        DB-->>S: Raw results
    else Filters unsafe
        F1-->>S: Fail - Reject filters
    end

    S->>F2: Validate results safety (e.g. cohort-size rules)
    alt Cohort OK
        F2-->>S: Pass
        S-->>S: Rank results
        S-->>S: Return results
    else Cohort too small
        F2-->>S: Fail
        S-->>S: Return fallback response
    end

Key features:

  • Faceted Search: Enables users to explore and filter datasets based on specific metadata attributes.
  • Domain-Specific Indexes: Optimized performance for immunological concepts such as CDR3 sequences, cell types, and experimental conditions.
  • Relevance-Based Ranking: Ranks search results to ensure the most pertinent datasets are presented to the user.
  • Distributed Processing: Uses a scalable infrastructure to maintain performance during high-volume query execution.