Submission Journey

Overview

The submission journey is a multi-stage process that moves data from a user's local environment into permanent, indexed storage in the NFDI4Immuno Data Hub. The stages in submission journey include the complete processes of uploading, validating, and storing research data.

The journey begins with a POST request to the API Gateway. Data is held in temporary storage until all validation gates are passed.

sequenceDiagram
    participant U as Front-end/CLI
    participant G as API Gateway
    participant P as Submission Pipeline
    participant MV as Metadata Validation
    participant PV as File Validation
    participant H as Housekeeping
    participant S as Permanent Storage

    U->>G: POST /dataset
    G->>P: Forward submission
    P->>P: Add to Temporary Storage
    P->>MV: Validate metadata
    P->>PV: Validate files (payload)
    P->>H: Housekeeping
    P->>S: Move dataset from temporary to permanent

Key features:

Temporary Storage: Safe staging area during validation
Atomic Operations: All-or-nothing submission process

Validation Gates

To ensure data quality and findability, every submission must pass two primary validation engines:

Metadata Validation: Checks the JSON against metadata schemas, ensures logical consistency, and validates terms against specific ontologies or CURIEs.
File (Payload) Validation: Verifies MIME types, inspects file content against format-specific rules, and validates the integrity of the data via checksum comparison.

Metadata Validation

sequenceDiagram
    participant P as Submission Pipeline
    participant MV as Metadata Validation Engine
    participant S as Schema Validator
    participant Sem as Semantic Validator
    participant Ont as Ontology/CURIE Validator

    P->>MV: Start metadata validation
    MV->>S: Validate against metadata schema
    S-->>MV: Pass/Fail

    MV->>Sem: Check logical consistency<br>(relationships, constraints)
    Sem-->>MV: Pass/Fail

    MV->>Sem: Check semantic consistency<br>(plausible values)
    Sem-->>MV: Pass/Fail

    MV->>Ont: Validate terms against ontologies
    Ont-->>MV: Pass/Fail

    MV-->>P: Validation results

Key features:

Schema Validation: Ensures metadata conforms to required structure
Semantic Validation: Checks logical consistency of metadata values
Ontology Validation: Verifies terms against controlled vocabularies

File Validation

sequenceDiagram
    participant P as Submission Pipeline
    participant PV as File Validation Engine
    participant FT as File Type Checker
    participant FC as File Content Checker
    participant CS as Checksum Validator

    P->>PV: Start payload validation
    PV->>FT: Check file extensions and MIME types
    FT-->>PV: Pass/Fail

    PV->>FC: Inspect file content<br>(format-specific rules)
    FC-->>PV: Pass/Fail

    PV->>CS: Compare provided vs. computed checksums
    CS-->>PV: Integrity OK/Fail

    PV-->>P: Validation results

Integrity Check

If the computed checksum does not match the user-provided checksum during the file validation phase, the submission is rejected immediately to prevent data corruption.

Key features:

File Validation: Confirms file integrity and format compliance
Checksum Verification: Ensures data hasn't been corrupted

Housekeeping

Once validated, the system performs "Housekeeping" to prepare the dataset for the permanent storage. This includes:

Registering a persistent identifier (PID).
Calculating an annotation score to represent metadata richness.
Generating a final UUID for the physical storage location.

sequenceDiagram
    participant P as Submission Pipeline
    participant H as Housekeeping
    participant PID as PID Service
    participant AS as Annotation score
    participant C as Checksum

    P->>H: Start housekeeping

    H->>PID: Register PID
    H->>AS: Calculate annotation score
    H->>H: Add UUID (storage location)

    H->>C: Checksum final rich metadata

    H-->>P: Housekeeping done

Key features:

PID Registration: Assigns persistent identifiers to datasets
Annotation Scoring: Evaluates dataset quality and completeness
Metadata Enrichment: Adds contextual information