Submission Journey
Overview
The submission journey is a multi-stage process that moves data from a user's local environment into permanent, indexed storage in the NFDI4Immuno Data Hub. The stages in submission journey include the complete processes of uploading, validating, and storing research data.
The journey begins with a POST request to the API Gateway. Data is held in temporary storage until all validation gates are passed.
sequenceDiagram
participant U as Front-end/CLI
participant G as API Gateway
participant P as Submission Pipeline
participant MV as Metadata Validation
participant PV as File Validation
participant H as Housekeeping
participant S as Permanent Storage
U->>G: POST /dataset
G->>P: Forward submission
P->>P: Add to Temporary Storage
P->>MV: Validate metadata
P->>PV: Validate files (payload)
P->>H: Housekeeping
P->>S: Move dataset from temporary to permanent
Key features:
- Temporary Storage: Safe staging area during validation
- Atomic Operations: All-or-nothing submission process
Validation Gates
To ensure data quality and findability, every submission must pass two primary validation engines:
- Metadata Validation: Checks the JSON against metadata schemas, ensures logical consistency, and validates terms against specific ontologies or CURIEs.
- File (Payload) Validation: Verifies MIME types, inspects file content against format-specific rules, and validates the integrity of the data via checksum comparison.
Metadata Validation
sequenceDiagram
participant P as Submission Pipeline
participant MV as Metadata Validation Engine
participant S as Schema Validator
participant Sem as Semantic Validator
participant Ont as Ontology/CURIE Validator
P->>MV: Start metadata validation
MV->>S: Validate against metadata schema
S-->>MV: Pass/Fail
MV->>Sem: Check logical consistency<br>(relationships, constraints)
Sem-->>MV: Pass/Fail
MV->>Sem: Check semantic consistency<br>(plausible values)
Sem-->>MV: Pass/Fail
MV->>Ont: Validate terms against ontologies
Ont-->>MV: Pass/Fail
MV-->>P: Validation results
Key features:
- Schema Validation: Ensures metadata conforms to required structure
- Semantic Validation: Checks logical consistency of metadata values
- Ontology Validation: Verifies terms against controlled vocabularies
File Validation
sequenceDiagram
participant P as Submission Pipeline
participant PV as File Validation Engine
participant FT as File Type Checker
participant FC as File Content Checker
participant CS as Checksum Validator
P->>PV: Start payload validation
PV->>FT: Check file extensions and MIME types
FT-->>PV: Pass/Fail
PV->>FC: Inspect file content<br>(format-specific rules)
FC-->>PV: Pass/Fail
PV->>CS: Compare provided vs. computed checksums
CS-->>PV: Integrity OK/Fail
PV-->>P: Validation results
Integrity Check
If the computed checksum does not match the user-provided checksum during the file validation phase, the submission is rejected immediately to prevent data corruption.
Key features:
- File Validation: Confirms file integrity and format compliance
- Checksum Verification: Ensures data hasn't been corrupted
Housekeeping
Once validated, the system performs "Housekeeping" to prepare the dataset for the permanent storage. This includes:
- Registering a persistent identifier (PID).
- Calculating an annotation score to represent metadata richness.
- Generating a final UUID for the physical storage location.
sequenceDiagram
participant P as Submission Pipeline
participant H as Housekeeping
participant PID as PID Service
participant AS as Annotation score
participant C as Checksum
P->>H: Start housekeeping
H->>PID: Register PID
H->>AS: Calculate annotation score
H->>H: Add UUID (storage location)
H->>C: Checksum final rich metadata
H-->>P: Housekeeping done
Key features:
- PID Registration: Assigns persistent identifiers to datasets
- Annotation Scoring: Evaluates dataset quality and completeness
- Metadata Enrichment: Adds contextual information