Analysis Workflow
Family and Trio Analysis
Clinical genetics diagnoses of rare Mendelian disorders benefit enormously from family context. A variant classified as VUS in the proband solo WES may be upgraded to Likely Pathogenic once parental genotypes confirm de novo origin. Compound heterozygous pairs in recessive genes are invisible without phasing. Segregation of a variant with disease status across family members provides additional evidence per ClinGen SVI 2021 guidelines. The Family Analysis Service automates that family-aware analysis on pre-classified WGS or WES data.
No variant re-calling. No VEP re-annotation. The service operates purely on the pre-classified DuckDB files produced upstream by Variant Analysis, preserving ACMG classification context and adding family-aware evidence on top.
Clinical Positioning
Three concrete clinical scenarios illustrate why family analysis changes the diagnostic answer in ways solo proband analysis cannot.
Three Scenarios
VUS in proband, de novo confirmed by parents. A missense variant in a constrained gene initially classified as VUS in the solo proband. Once parental genotypes confirm the variant arose de novo, the strength of evidence increases substantially, de novo origin in a constrained gene is one of the strongest single pieces of clinical evidence available.
Two heterozygous variants in a recessive gene, phased in trans. A pair of variants in a recessive disease gene, one inherited from each parent, forms biallelic loss of function. This is the diagnostic answer for many recessive conditions and is invisible without parental phasing.
Variant segregates with disease. A variant present in affected family members and absent in unaffected family members provides per-ClinGen-SVI segregation evidence. The strength depends on the number of meioses observed, a trio caps at supporting; extended pedigree is required for moderate or stronger.
Each of these is a different inheritance algorithm with different data requirements. The Family Service runs all three on the same trio data and presents the combined evidence to the geneticist alongside the upstream ACMG classification.
Supported Family Compositions
Trio is canonical. Duo and proband-with-sibling are supported with explicit reduced-confidence flagging on the affected algorithms.
Trio
Proband, mother, father. The canonical composition. Enables full de novo detection, compound heterozygous phasing with parental origin, and segregation scoring across two meioses.
Capabilities: All three inheritance algorithms run.
Duo
Proband and one parent. Supports inheritance analysis with reduced confidence on de novo calls (one parent unobserved) and partial phasing.
Capabilities: De novo and segregation run with explicit reduced-confidence flagging. Compound heterozygous phasing limited.
Proband and Sibling
For families where parents are unavailable, a sibling can provide segregation evidence when affected status is known on both members.
Capabilities: Segregation scoring runs. De novo and compound het not feasible without parental data.
Pipeline Architecture
Four stages execute sequentially on a typical WGS trio in approximately 30-90 seconds. The pipeline is atomic per phase, if a phase fails, subsequent phases are not executed and the failure point is recorded.
Stage 1
Trio JOIN Construction
The service ATTACHes each member pre-classified DuckDB read-only and constructs a unified trio table by joining on (chromosome, position, reference allele, alternate allele). Per-member columns are role-suffixed (proband, mother, father). Inheritance annotation columns are pre-allocated and populated by subsequent stages.
Stage 2
Sample Quality Control
When a joint VCF is available, PLINK 1.9 is invoked for identity-by-descent (IBD) analysis. Output is parsed for sample-swap, consanguinity, or duplicate-sample alerts. The resulting trio_qc.json is persisted alongside the trio data and surfaced in the case summary.
Stage 3
Inheritance Analysis Pipeline
Three sequential phases run on the trio table: de novo detection, compound heterozygous phasing, and segregation scoring. Each phase is atomic, if a phase fails, subsequent phases are not executed and the failed_at_stage is recorded for operator review.
Stage 4
Evidence Aggregation
Per-variant inheritance annotations are summarised into evidence_summary and qc_summary JSON, stored on the trio analysis record for fast retrieval. Per-variant detail remains in the trio DuckDB for on-demand drill-down by the geneticist.
Sample Quality Control
Before any inheritance analysis, the service verifies that the family relationships in the metadata match the genetic relationships in the data. Sample-swap, accidental duplication, and unreported consanguinity all corrupt downstream inheritance calls if undetected.
PLINK Identity-by-Descent Analysis
When a joint VCF is available from the upstream lab, the service runs PLINK 1.9 IBD analysis on biallelic SNPs above a minor allele frequency threshold. The pairwise IBD matrix is parsed for three classes of alert.
Sample Swap
When the IBD between proband and a declared parent does not match the expected parent-offspring relationship.
Consanguinity
When the IBD between the two declared parents is elevated above the expected unrelated-individuals baseline.
Duplicate Sample
When two declared family members are genetically identical, typically an upload error rather than a real biological scenario.
Alerts surface in the case summary alongside the inheritance evidence. The geneticist sees QC results before reading variant calls.
De Novo Detection
Identifies variants present in the proband but absent in both parents, with explicit confidence tiers reflecting the strength of supporting evidence.
Five-Tier Confidence Classification
Variants are classified across multiple confidence levels reflecting the strength of evidence for de novo origin. Higher tiers require both clean parental genotypes and adequate read support; lower tiers reflect ambiguity in parental coverage or genotype quality.
Chromosome-Presence Gate
When a parent genotype is NULL, the system distinguishes confident inferred homozygous-reference (the chromosome was sequenced and called as reference) from suspicious absence (the chromosome may not have been adequately covered). This is a key clinical safety mechanism preventing false de novo calls from coverage gaps.
Gender-Aware Chromosome Exceptions
Biologically expected NULLs are recognised as such: chromosome Y in a female parent, mitochondrial DNA in a father. These do not trigger confidence downgrades. Gender of each family member is part of the input.
Clean ACMG Classification Preserved
The de novo annotation is added on top of the upstream ACMG classification. The original Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign assignment from variant analysis is preserved verbatim. Geneticists see both: the classification, and the new family-aware evidence.
Compound Heterozygous Phasing
Identifies pairs of heterozygous coding or splicing variants in the same gene that may form biallelic loss of function. Phasing requires parental origin information and distinguishes in-trans (biallelic) from in-cis (single allele) configurations.
Phasing from Parental Origin
When parental genotypes are available, the service determines whether a pair of heterozygous variants in the same gene came from different parents (in trans, biallelic) or from the same parent (in cis, single-allele). Only in-trans pairs constitute genuine biallelic loss-of-function.
Multi-Partner Detection
A variant may participate in multiple candidate compound heterozygous pairs. The service flags this so that the geneticist sees the full landscape of biallelic candidates in the gene rather than only the first pair found.
Coding and Splicing Restriction
Compound heterozygous candidacy is restricted to coding and splicing consequences. Synonymous, intronic, and regulatory variants are correctly excluded, they do not constitute biallelic loss of function regardless of zygosity.
Symmetric Annotation
When variants A and B form a compound heterozygous pair, both rows in the trio table are annotated. The geneticist can land on either variant and immediately see the partner.
Segregation Scoring
Per-variant likelihood-ratio LOD scores using the Jarvik framework as specified by ClinGen SVI 2021. Mapped to ClinGen-aligned evidence bands with explicit acknowledgement of trio data limits.
ClinGen SVI 2021 Framework
Per-variant LOD scores are computed using the Jarvik likelihood-ratio framework as specified by the ClinGen Sequence Variant Interpretation Working Group. The methodology is published, peer-reviewed, and explicitly designed for clinical use.
Five-Band Mapping
LOD scores are mapped to ClinGen-aligned evidence bands, not_applicable, indeterminate, supporting, moderate, strong, very_strong. The clinical conversation works in these bands; raw LOD values remain available for audit.
Honest Trio Ceiling
A trio carries only two meioses of segregation evidence. The mathematical ceiling for trio-only LOD is approximately 0.3, supporting band at best. Most real trios land in supporting or indeterminate. Strong and very_strong bands require extended pedigree data, which is the planned next step.
Hypothesis-Aware
When the inheritance hypothesis is unknown, the segregation phase explicitly writes not_applicable rather than emitting a spurious LOD = 0 result. This prevents misinterpretation of "no evidence calculated" as "evidence against".
Feasibility Planning
Before the pipeline runs, the service evaluates which inheritance phases are feasible given the family composition and inheritance hypothesis. Feasibility flags and a free-form rationale are persisted alongside the trio analysis record, providing clinical audit trail.
| Flag | Meaning |
|---|---|
| de_novo_feasible | True when both parents are sequenced and an inheritance hypothesis consistent with de novo (autosomal dominant or sporadic) is plausible. False otherwise, for example, a duo without the affected parent. |
| compound_het_feasible | True when at least one parent is sequenced and the inheritance hypothesis is autosomal recessive. Phasing requires parental origin information. |
| segregation_feasible | True when at least two family members with affected status are present. Trio with affected proband and one affected parent qualifies; singleton does not. |
| plan_rationale | Free-form structured explanation of why each phase was planned as feasible or not. Provides clinical audit trail, the geneticist can verify which inheritance evidence was looked for and which was not, and why. |
Inputs and Outputs
What the service consumes from the upstream pipeline and from the geneticist, and what it produces for review and downstream use.
Inputs from the Pipeline
Pre-classified classified_variants.duckdb files from variant analysis (one per family member)
ACMG/AMP classification, criteria, and supporting evidence already applied per variant
Optional joint-called VCF when available from the upstream lab
No variant re-calling, no VEP re-annotation, the service operates purely on pre-classified data
Inputs from the Geneticist
Family composition: trio (proband + mother + father), duo (proband + one parent), or proband + sibling
Per-member metadata: session ID, sex, affected status (affected / unaffected / unknown)
Inheritance hypothesis when available (autosomal dominant, autosomal recessive, X-linked, mitochondrial, sporadic, unknown)
Optional joint VCF path for sample-swap and consanguinity QC
Outputs for the Geneticist
De novo annotation per variant: confidence tier and supporting evidence
Compound heterozygous pairs: phase determination, partner variants, multi-partner flags
Segregation evidence: per-variant LOD score and ClinGen-aligned band
Sample QC summary: IBD analysis results, alerts for sample-swap or consanguinity
Feasibility and rationale: which phases were planned as feasible and why
Evidence summary JSON denormalised on the trio analysis record for fast retrieval
Outputs for Downstream Services
Persistent trio_variants DuckDB consumed by the AI Service for the family analysis report
Per-case data available to cohort-level analytics for population work
Standards and Boundaries
The service operates against published standards and within explicit clinical boundaries.
ACMG/AMP
Variant classification follows ACMG/AMP 2015 with subsequent ClinGen specifications. Performed upstream by the Variant Analysis Service. The Family Service consumes that classification and adds family-aware evidence on top, it does not reclassify.
Reference: Richards et al., Genetics in Medicine, 2015, PMID: 25741868
ClinGen SVI 2021 Segregation
Segregation LOD scoring follows the ClinGen Sequence Variant Interpretation Working Group framework for clinical use. The Jarvik likelihood-ratio methodology is published and validated.
Reference: ClinGen Sequence Variant Interpretation Working Group, 2021
Jarvik Likelihood Framework
Per-variant segregation LOD computation. Industry-standard methodology for likelihood-based segregation evidence in clinical genetics.
Reference: Jarvik and Browning, AJHG, 2016, PMID: 27374771
PLINK 1.9
Sample identity-by-descent analysis for sample-swap, consanguinity, and duplicate-sample detection. Industry-standard genetic relationship inference tool.
Reference: Chang et al., GigaScience, 2015, PMID: 25722852
Pre-Classified Data Only
The service does not re-call variants. No DeepTrio, GLnexus, or VEP invocation locally. Joint calling, when available, happens at the upstream sequencing facility and is consumed as input. This preserves ACMG classification context and avoids duplicating computation already performed by the variant analysis service.
Reporting Boundary
The service produces inheritance-annotated variant data with explicit confidence tiers, evidence bands, and feasibility rationale. It does not generate clinical interpretations, does not make diagnostic calls, and does not replace clinical review. All output is for review by a qualified clinical geneticist before any clinical action.
Data Residency
The service runs within the Helena platform on EU-based infrastructure compliant with GDPR Article 9 and 1+MG technical requirements. No family genomic data leaves the platform during analysis.
What Sets It Apart
Eight design choices that make Family and Trio Analysis distinct from generic pedigree analysis tools.
Maximum-fidelity inheritance analysis
Operates on pre-classified data from upstream variant analysis. Preserves ACMG classification context. No re-calling, no re-annotation, no information loss.
Three inheritance algorithms in one pipeline
De novo detection, compound heterozygous phasing, and segregation scoring run sequentially on the same trio data. The geneticist sees the full inheritance picture in one place.
Honest about trio limits
A trio caps at approximately 0.3 LOD, supporting band at best. We document this explicitly rather than overstating evidence strength. Strong and very_strong bands require extended pedigree.
Chromosome-presence gate
Distinguishes confident inferred homozygous-reference from suspicious absence when parent genotype is NULL. A clinical safety mechanism that prevents false de novo calls from coverage gaps.
Gender-aware exceptions
chrY NULL for a female parent and chrM NULL for a father are biologically expected, not coverage gaps. The system recognises this and does not penalise confidence.
Feasibility flags with rationale
Every analysis records which phases were planned as feasible and why. The geneticist can verify what evidence was looked for and what was not, providing clinical audit trail.
Atomic phase semantics
If a phase fails, subsequent phases are not executed. The exact failure stage is recorded. Partial state is preserved for operator inspection rather than overwritten.
Sample QC built in
PLINK IBD analysis runs as part of the standard pipeline when joint VCF is available. Sample-swap, consanguinity, and duplicate-sample alerts surface before clinical interpretation.
See Family Analysis in Practice
Request a demo to see Helena run a real trio through the full pipeline, with de novo detection, compound heterozygous phasing, and segregation scoring all surfaced alongside the upstream ACMG classification.