Virus metagenomics sequencing report for sample barcode01

Pipeline version 1.0.0

Research use only

Read Statistics

Stage Reads Mean length
Unfiltered 5000 564
centrifuge-filtered 1921 539
human_dna 116 456
human_rna 0 0
reagent 0 0
Filtered 2963 584

Reads in groups due to host/contaminant filtering

  • Stage: Host/contaminant filtering step
  • Reads: Number of reads in this group
  • Mean length: Mean read length within this read group

The distributions of the reads after trimming. They still contain host reads.

The distributions of the reads after cleaning. Host reads and technical contaminants are removed.

Read classifications by centrifuge. Click to zoom.

Consensus

Reference Family Organism Segment Length Coverage Description
GU830839.1 Arenaviridae Mammarenavirus lassaense S 3358 95.32 Lassa virus strain BA366 glycoprotein precursor (GPC) and nucleoprotein (NP) genes, complete cds
GU979513.1 Arenaviridae Mammarenavirus lassaense L 7207 96.06 Lassa virus strain BA366 Z protein (Z) and polymerase (L) genes, complete cds

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Segment: Identifier of the segment. Unsegmented for single segment virus genomes
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Description: Data base description of the reference
Reference Family Organism Segment Length Coverage Positions called Ambiguous positions Mapped reads Average read coverage
GU830839.1 Arenaviridae Mammarenavirus lassaense S 3358 95.32 3201 157 997 176.0
GU979513.1 Arenaviridae Mammarenavirus lassaense L 7207 96.06 6923 284 999 83.1

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Segment: Identifier of the segment. Unsegmented for single segment virus genomes
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Positions called: Number of bases called
  • Ambiguous positions: Number of ambiguous positions set to "N"
  • Mapped reads: Number of reads aligned to the reference genome
  • Average read coverage: Average number of reads per reference genome position
Reference Family Organism Segment Length Coverage Description
LC710218.1 Fiersviridae Emesvirus zinderi Unsegmented 3605 93.31 Escherichia phage MS2 GI_B RNA, complete genome

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Segment: Identifier of the segment. Unsegmented for single segment virus genomes
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Description: Data base description of the reference
Reference Family Organism Segment Length Coverage Positions called Ambiguous positions Mapped reads Average read coverage
LC710218.1 Fiersviridae Emesvirus zinderi Unsegmented 3605 93.31 3364 241 798 130.5

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Segment: Identifier of the segment. Unsegmented for single segment virus genomes
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Positions called: Number of bases called
  • Ambiguous positions: Number of ambiguous positions set to "N"
  • Mapped reads: Number of reads aligned to the reference genome
  • Average read coverage: Average number of reads per reference genome position
Reference Family Organism Length Coverage Description

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Description: Data base description of the reference
Reference Family Organism Length Coverage Positions called Ambiguous positions Mapped reads Average read coverage

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Positions called: Number of bases called
  • Ambiguous positions: Number of ambiguous positions set to "N"
  • Mapped reads: Number of reads aligned to the reference genome
  • Average read coverage: Average number of reads per reference genome position
Reference Family Organism Length Coverage Description
GU830839.1 Arenaviridae Mammarenavirus lassaense 3358 95.32 Lassa virus strain BA366 glycoprotein precursor (GPC) and nucleoprotein (NP) genes, complete cds
GU979513.1 Arenaviridae Mammarenavirus lassaense 7207 96.06 Lassa virus strain BA366 Z protein (Z) and polymerase (L) genes, complete cds
LC710218.1 Fiersviridae Emesvirus zinderi 3605 93.31 Escherichia phage MS2 GI_B RNA, complete genome

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Description: Data base description of the reference
Reference Family Organism Length Coverage Positions called Ambiguous positions Mapped reads Average read coverage
GU830839.1 Arenaviridae Mammarenavirus lassaense 3358 95.32 3201 157 997 176.0
GU979513.1 Arenaviridae Mammarenavirus lassaense 7207 96.06 6923 284 999 83.1
LC710218.1 Fiersviridae Emesvirus zinderi 3605 93.31 3364 241 798 130.5

Reference-based genome assembly statistics

  • Reference: ID of the reference sequence
  • Family: Virus family
  • Organism: Name of the species
  • Length: Length of the reference genome
  • Coverage: Percent of the reference genome that were succesfully called
  • Positions called: Number of bases called
  • Ambiguous positions: Number of ambiguous positions set to "N"
  • Mapped reads: Number of reads aligned to the reference genome
  • Average read coverage: Average number of reads per reference genome position

Contigs

Filter Contig Length Number of reads Blast Hit Organism Hit length Contig alignment coverage Reference alignment coverage Sequence Identity Classification Taxonomic Rank
no-target tig00000001 6994 502 GU979513 Mammarenavirus lassaense 7207 1.0 0.97 1.0 Mammarenavirus lassaense species
no-target tig00000002 3420 398 LC710218 Emesvirus zinderi 3605 1.0 0.95 1.0 Escherichia phage MS2 leaf
no-target tig00000003 3209 509 GU830839 Mammarenavirus lassaense 3358 1.0 0.96 1.0 Mopeia Lassa virus reassortant 29 species

Contigs and targets found

  • Filter: Filter used on reads before assembly
  • Contig: Contig identifier
  • Length: Length of the contig in base pairs
  • Number of reads: Number of (corrected) reads used by canu to build this contig
  • Blast Hit: Virus reference genome found with blast search
  • Organism: Organism of the blast hit
  • Hit length: Length of the blast hit reference genome in base pairs
  • Contig alignment coverage: Share of the contig aligned by blast
  • Reference alignment coverage: Share of the reference aligned by blast
  • Sequence Identity: Sequence similarity of the aligned parts
  • Classification: Classification according to centrifuge
  • Taxonomic Rank: Taxonomic rank of the classification
Stage Sequence type Reads/Contigs Mean length
no-target raw 2868 599.1
no-target corrected 1534 771.0
no-target contigs 3 4541.0
reassembly input 2868 599.1

Read and contig numbers in the different assembly runs

  • Stage: Assembly run (e.g. for a given filter, no-filter or re-assemblies)
  • Sequence type: Type of the sequence
  • Reads/Contigs: Number of reads or contigs
  • Mean length: Mean length of reads or contigs in base pairs

Versions

Data Base Version Description
Filters 1.0 Human (GRCh38), mouse (8_GRCm38), mastomys and contaminant filter set
Classification 1.0 Refseq reference genomes plus genbank virus sequences
Virus 2.0 NCBI virus genomes from 26.10.2024 with covid sequences from RVDB version 29

Local data base versions used in this run.