Virus metagenomics sequencing report for sample barcode01

Read Statistics

Stage	Reads	Mean length
Unfiltered	5000	564
centrifuge-filtered	1921	539
human_dna	116	456
human_rna	0	0
reagent	0	0
Filtered	2963	584

Reads in groups due to host/contaminant filtering

Stage: Host/contaminant filtering step
Reads: Number of reads in this group
Mean length: Mean read length within this read group

The distributions of the reads after trimming. They still contain host reads.

The distributions of the reads after cleaning. Host reads and technical contaminants are removed.

Read classifications by centrifuge. Click to zoom.

Consensus

Reference	Family	Organism	Segment	Length	Coverage	Description
GU830839.1	Arenaviridae	Mammarenavirus lassaense	S	3358	95.32	Lassa virus strain BA366 glycoprotein precursor (GPC) and nucleoprotein (NP) genes, complete cds
GU979513.1	Arenaviridae	Mammarenavirus lassaense	L	7207	96.06	Lassa virus strain BA366 Z protein (Z) and polymerase (L) genes, complete cds

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Segment: Identifier of the segment. Unsegmented for single segment virus genomes
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Description: Data base description of the reference

Reference	Family	Organism	Segment	Length	Coverage	Positions called	Ambiguous positions	Mapped reads	Average read coverage
GU830839.1	Arenaviridae	Mammarenavirus lassaense	S	3358	95.32	3201	157	997	176.0
GU979513.1	Arenaviridae	Mammarenavirus lassaense	L	7207	96.06	6923	284	999	83.1

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Segment: Identifier of the segment. Unsegmented for single segment virus genomes
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Positions called: Number of bases called
Ambiguous positions: Number of ambiguous positions set to "N"
Mapped reads: Number of reads aligned to the reference genome
Average read coverage: Average number of reads per reference genome position

Reference	Family	Organism	Segment	Length	Coverage	Description
LC710218.1	Fiersviridae	Emesvirus zinderi	Unsegmented	3605	93.31	Escherichia phage MS2 GI_B RNA, complete genome

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Segment: Identifier of the segment. Unsegmented for single segment virus genomes
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Description: Data base description of the reference

Reference	Family	Organism	Segment	Length	Coverage	Positions called	Ambiguous positions	Mapped reads	Average read coverage
LC710218.1	Fiersviridae	Emesvirus zinderi	Unsegmented	3605	93.31	3364	241	798	130.5

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Segment: Identifier of the segment. Unsegmented for single segment virus genomes
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Positions called: Number of bases called
Ambiguous positions: Number of ambiguous positions set to "N"
Mapped reads: Number of reads aligned to the reference genome
Average read coverage: Average number of reads per reference genome position

Reference	Family	Organism	Length	Coverage	Description

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Description: Data base description of the reference

Reference	Family	Organism	Length	Coverage	Positions called	Ambiguous positions	Mapped reads	Average read coverage

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Positions called: Number of bases called
Ambiguous positions: Number of ambiguous positions set to "N"
Mapped reads: Number of reads aligned to the reference genome
Average read coverage: Average number of reads per reference genome position

Reference	Family	Organism	Length	Coverage	Description
GU830839.1	Arenaviridae	Mammarenavirus lassaense	3358	95.32	Lassa virus strain BA366 glycoprotein precursor (GPC) and nucleoprotein (NP) genes, complete cds
GU979513.1	Arenaviridae	Mammarenavirus lassaense	7207	96.06	Lassa virus strain BA366 Z protein (Z) and polymerase (L) genes, complete cds
LC710218.1	Fiersviridae	Emesvirus zinderi	3605	93.31	Escherichia phage MS2 GI_B RNA, complete genome

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Description: Data base description of the reference

Reference	Family	Organism	Length	Coverage	Positions called	Ambiguous positions	Mapped reads	Average read coverage
GU830839.1	Arenaviridae	Mammarenavirus lassaense	3358	95.32	3201	157	997	176.0
GU979513.1	Arenaviridae	Mammarenavirus lassaense	7207	96.06	6923	284	999	83.1
LC710218.1	Fiersviridae	Emesvirus zinderi	3605	93.31	3364	241	798	130.5

Reference-based genome assembly statistics

Reference: ID of the reference sequence
Family: Virus family
Organism: Name of the species
Length: Length of the reference genome
Coverage: Percent of the reference genome that were succesfully called
Positions called: Number of bases called
Ambiguous positions: Number of ambiguous positions set to "N"
Mapped reads: Number of reads aligned to the reference genome
Average read coverage: Average number of reads per reference genome position

Contigs

Filter	Contig	Length	Number of reads	Blast Hit	Organism	Hit length	Contig alignment coverage	Reference alignment coverage	Sequence Identity	Classification	Taxonomic Rank
no-target	tig00000001	6994	502	GU979513	Mammarenavirus lassaense	7207	1.0	0.97	1.0	Mammarenavirus lassaense	species
no-target	tig00000002	3420	398	LC710218	Emesvirus zinderi	3605	1.0	0.95	1.0	Escherichia phage MS2	leaf
no-target	tig00000003	3209	509	GU830839	Mammarenavirus lassaense	3358	1.0	0.96	1.0	Mopeia Lassa virus reassortant 29	species

Contigs and targets found

Filter: Filter used on reads before assembly
Contig: Contig identifier
Length: Length of the contig in base pairs
Number of reads: Number of (corrected) reads used by canu to build this contig
Blast Hit: Virus reference genome found with blast search
Organism: Organism of the blast hit
Hit length: Length of the blast hit reference genome in base pairs
Contig alignment coverage: Share of the contig aligned by blast
Reference alignment coverage: Share of the reference aligned by blast
Sequence Identity: Sequence similarity of the aligned parts
Classification: Classification according to centrifuge
Taxonomic Rank: Taxonomic rank of the classification

Stage	Sequence type	Reads/Contigs	Mean length
no-target	raw	2868	599.1
no-target	corrected	1534	771.0
no-target	contigs	3	4541.0
reassembly	input	2868	599.1

Read and contig numbers in the different assembly runs

Stage: Assembly run (e.g. for a given filter, no-filter or re-assemblies)
Sequence type: Type of the sequence
Reads/Contigs: Number of reads or contigs
Mean length: Mean length of reads or contigs in base pairs

Versions

Data Base	Version	Description
Filters	1.0	Human (GRCh38), mouse (8_GRCm38), mastomys and contaminant filter set
Classification	1.0	Refseq reference genomes plus genbank virus sequences
Virus	2.0	NCBI virus genomes from 26.10.2024 with covid sequences from RVDB version 29

Local data base versions used in this run.