Search Journey
The search journey enables users to discover and explore datasets within the NFDI4Immuno repository while maintaining privacy and data protection standards.
The system employs both a priori and post factum safety checks:
- A priori: The Search Service validates the query filters for safety before they reach the database.
- Post factum: After the query runs, the system checks the "cohort size." If the results are too specific (potentially identifying an individual), the system returns a fallback response rather than the raw results.
sequenceDiagram
participant U as Front-end/CLI
participant G as API Gateway
participant S as Search Service
participant DB as Search Database
U->>G: POST /search (query + filters)
G->>S: Forward search request
S->>S: a priori sensitivity check
S->>DB: Query with safe filters
DB-->>S: Raw results
S->>S: post factum check (e.g. cohort-size)
alt Cohort OK
S-->>S: Rank results
S-->>G: Summary statistics + dataset ID (JSON)
else Cohort too small
S-->>G: Fallback response (insufficient cohort size)
end
G-->>U: Forward response
Key features:
- A Priori Validation: Validates query filters for safety before they reach the database to prevent unsafe data requests.
- Post Factum Filtering: Evaluates the "cohort size" of results to ensure individuals cannot be identified.
- Fallback Mechanism: Automatically returns an "insufficient cohort size" response if the result set is too specific, protecting sensitive data.
Query Processing
The system optimizes discovery through specialized indexing and advanced filtering capabilities.
sequenceDiagram
participant S as Search Service
participant F1 as A priori Filter
participant DB as Search Database
participant F2 as Post factum Filter
S->>F1: Validate filter safety
alt Filters safe
F1-->>S: Pass
S->>DB: Run query
DB-->>S: Raw results
else Filters unsafe
F1-->>S: Fail - Reject filters
end
S->>F2: Validate results safety (e.g. cohort-size rules)
alt Cohort OK
F2-->>S: Pass
S-->>S: Rank results
S-->>S: Return results
else Cohort too small
F2-->>S: Fail
S-->>S: Return fallback response
end
Key features:
- Faceted Search: Enables users to explore and filter datasets based on specific metadata attributes.
- Domain-Specific Indexes: Optimized performance for immunological concepts such as CDR3 sequences, cell types, and experimental conditions.
- Relevance-Based Ranking: Ranks search results to ensure the most pertinent datasets are presented to the user.
- Distributed Processing: Uses a scalable infrastructure to maintain performance during high-volume query execution.