Submission Journey
Overview
The submission journey is a multi-stage process that moves data from a user's local environment into permanent, indexed storage in the NFDI4Immuno repository. The stages in submission journey include the complete processes of uploading, validating, and storing research data.
The journey begins with a POST request to the API Gateway. Data is held in temporary storage until all validation gates are passed.
sequenceDiagram
participant U as Front-end/CLI
participant G as API Gateway
participant P as Submission Pipeline
participant MV as Metadata Validation
participant PV as File Validation
participant H as Housekeeping
participant S as Permanent Storage
U->>G: POST /dataset
G->>P: Forward submission
P->>P: Add to Temporary Storage
P->>MV: Validate metadata
P->>PV: Validate files (payload)
P->>H: Housekeeping
P->>S: Move dataset from temporary to permanent
Key features:
- Temporary Storage: Safe staging area during validation
- Atomic Operations: All-or-nothing submission process
Validation Gates
To ensure data quality and findability, every submission must pass two primary validation engines:
- Metadata Validation: Checks the JSON against metadata schemas, ensures logical consistency, and validates terms against specific ontologies or CURIEs.
- File (Payload) Validation: Verifies MIME types, inspects file content against format-specific rules, and validates the integrity of the data via checksum comparison.
Metadata Validation
sequenceDiagram
participant P as Submission Pipeline
participant MV as Metadata Validation Engine
participant S as Schema Validator
participant Sem as Semantic Validator
participant Ont as Ontology/CURIE Validator
P->>MV: Start metadata validation
MV->>S: Validate against metadata schema
S-->>MV: Pass/Fail
MV->>Sem: Check logical consistency<br>(relationships, constraints)
Sem-->>MV: Pass/Fail
MV->>Sem: Check semantic consistency<br>(plausible values)
Sem-->>MV: Pass/Fail
MV->>Ont: Validate terms against ontologies
Ont-->>MV: Pass/Fail
MV-->>P: Validation results
Key features:
- Schema Validation: Ensures metadata conforms to required structure
- Semantic Validation: Checks logical consistency of metadata values
- Ontology Validation: Verifies terms against controlled vocabularies
File Validation
sequenceDiagram
participant P as Submission Pipeline
participant PV as File Validation Engine
participant FT as File Type Checker
participant FC as File Content Checker
participant CS as Checksum Validator
P->>PV: Start payload validation
PV->>FT: Check file extensions and MIME types
FT-->>PV: Pass/Fail
PV->>FC: Inspect file content<br>(format-specific rules)
FC-->>PV: Pass/Fail
PV->>CS: Compare provided vs. computed checksums
CS-->>PV: Integrity OK/Fail
PV-->>P: Validation results
Integrity Check
If the computed checksum does not match the user-provided checksum during the file validation phase, the submission is rejected immediately to prevent data corruption.
Key features:
- File Validation: Confirms file integrity and format compliance
- Checksum Verification: Ensures data hasn't been corrupted
Housekeeping
Once validated, the system performs "Housekeeping" to prepare the dataset for the permanent repository. This includes:
- Registering a persistent identifier (PID).
- Calculating an annotation score to represent metadata richness.
- Generating a final UUID for the physical storage location.
sequenceDiagram
participant P as Submission Pipeline
participant H as Housekeeping
participant PID as PID Service
participant AS as Annotation score
participant C as Checksum
P->>H: Start housekeeping
H->>PID: Register PID
H->>AS: Calculate annotation score
H->>H: Add UUID (storage location)
H->>C: Checksum final rich metadata
H-->>P: Housekeeping done
Key features:
- PID Registration: Assigns persistent identifiers to datasets
- Annotation Scoring: Evaluates dataset quality and completeness
- Metadata Enrichment: Adds contextual information