Portal Infrastructure

Overview

The Portal Infrastructure serves as the core of the NFDI4Immuno repository, providing API endpoints, business logic, and data management services.

Portal infrastructure architecture

Three-Layer Service Architecture

The Portal Infrastructure employs a strict three-layer separation of concerns to ensure scalability and maintainable code:

API Layer: Functions as the API gateway and acts as the only entry point for services from the outside world.
Service Layer: Contains the core logic and services that power the API Gateway endpoints.
Portal Database Layer: Stores the operational data required for the services to perform day-to-day business functions.

API Layer

The API layer acts as the security and routing perimeter for the infrastructure. All external requests, whether from the front-end or a CLI tool, must pass through this gateway.

API layer architecture

The API Gateway provides:

Single Entry Point: All external requests go through the gateway
Request Routing: Directs requests to appropriate services
Authentication: Validates access tokens and enforces security
Rate Limiting: Protects services from overload
Request/Response Transformation: Adapts data formats as needed

The API Gateway is RESTful

The API Gateway is a REST API that conforms to the design principles of the representational state transfer architectural style.

Service Layer

The Service Layer contains the functional building blocks of the repository. These services handle everything from initial data ingestion to metadata transformation. The figure below includes major services in the service layer architecture.

Major services in the service layer architecture

The service layer contains specialized services with distinct responsibilities described in the tables below.

Service	Primary Responsibilities
Pre-submission Validation	Transpiles JSON to Datapack JSON; validates metadata, ontologies, and lookup services (e.g., ROR); generates detailed validation reports.
Submission Pipeline	Stages data/metadata; performs manifest and checksum checks; validates file formats; registers PIDs; and executes final storage.
Search Service	Handles database queries and formats search results for the user.
Indexing Pipeline	Updates the search index from storage; applies metadata filters; integrates domain-specific annotations and indexing (e.g., CDR3, Cell type).
Data Serving Service	Validates access tokens; queries access rights; packages reference data; and forwards data or redirects to external resources.
Metadata Serving Service	Transforms metadata into standard formats like HealthDCAT-AP, FHIR, or DataCite.
S3 Connector	Accepts validated data and manages the transfer to physical S3 storage.

Portal Database Layer

The database layer provides specialized storage for the different types of operational data used by the services.

Database layer architecture

Temp Storage: Holds data during the submission and validation phase before it is moved to permanent storage.
Search/Metadata Storage: Optimized storage for quick discovery and retrieval of dataset records.
User Roles and Data Access: Manages the permissions and access control lists (ACLs) for datasets.
Audit Trail Store: An immutable, write-only database dedicated to recording all system logs and access history for compliance.