# Web application
Alfred's quality control JSON files can be interactively browsed with the companion web application. All charts support panning and zooming and can be downloaded as PNG images. The summary QC table can be downloaded as a CSV file.
To generate a quality control file in JSON format run Alfred's command-line tool as follows:
alfred qc -r <ref.fa> -f json -o qc.json.gz <align.bam>
The output file
qc.json.gz can then be uploaded at
An overview of all available charts and the most important alignment statistics provided by Alfred is below.
|Alignment Metric||DNA-Seq (WGS)||DNA-Seq (Capture)||RNA-Seq||ChIP-Seq/ATAC-Seq||Chart Type|
|Sequencing Error Rates||✔||✔||✔||✔||Table|
|Base Content Distribution||✔||✔||✔||✔||Grouped Line Chart|
|Read Length Distribution||✔||✔||✔||✔||Line Chart|
|Base Quality Distribution||✔||✔||✔||✔||Line Chart|
|Coverage Histogram||✔||✔||✔||✔||Line Chart|
|Insert Size Distribution||✔||✔||✔||✔||Grouped Line Chart|
|InDel Size Distribution||✔||✔||✔||✔||Grouped Line Chart|
|InDel Context||✔||✔||✔||✔||Bar Chart|
|GC Content||✔||✔||✔||✔||Grouped Line Chart|
|On-Target Rate||✔||Line Chart|
|Target Coverage Distribution||✔||Line Chart|
|DNA pitch / Nucleosome pattern||✔||Grouped Line Chart|
# Base content distribution
The base content distribution shows any base calling bias along the read. For an ideal library the lines for A, C, G, and T should run in parallel. For a whole-genome assay the GC-content of that genome should be reflected in the relative amounts of each base. Some libraries are expected to show a biased leading base distribution such as many RNA-Seq libraries because of random hexamer priming or restriction based assays.
# Read length distribution
Illumina sequencers produce reads of fixed read length but long read technologies usually have a median read length >1000bp and a long tail of reads with read lengths >30,000bp. This plot is also insightful to understand adapter trimming results or the removal of low quality bases at the start or end of a read.
# Mean base quality distribution
This plot shows the mean base quality along the read. A typical Illumina profile shows base qualities >Q30 before base 30 and then a gradual loss of base quality accuracy towards the end of the read.
# Mapping quality distribution
This plot shows the mapping quality distribution for all mapped reads. The reported quality scores are aligner-dependent.
# Coverage histogram
The coverage histogram shows how many bases of the sequenced genome are at a given coverage. Please note that for targeted assays (capture assays) this plot is expected to show a large portion of the genome at coverage=0. For targeted assays, we therefore recommend checking the on-target rate and the targets above coverage level plots.
# On-target rate and targets above a given coverage level
For targeted assays, the two major concerns are capture efficiency (on-target rate) and how many of the targets are ready for downstream analysis (targets above a pre-defined coverage threshold). A standard whole-exome sequencing assay yields at least 70% of reads on-target (+/-200bp target extension) and at least 70% of targets >20x coverage.
# Insert size histogram
The insert size plot shows the outer insert size distribution for all read pairs stratified by read pair orientation. There are different nomenclatures around for defining the different paired-end layouts. The default Illumina paired-end layout is R+ (or forward-reverse, FR), the default Illumina mate-pair layout is R- (or reverse-forward, RF). For specific sequencing assays, the insert size distribution can serve as a key quality control metric. For instance, ATAC-Seq libraries should show the characteristic nucleosome pattern and DNA pitch.
# InDel size distribution
Histogram of indel sizes collected from all mapped reads. This plot aggregates the length
of all Cigar
# InDel Homopolymer Context
The homopolymer plot shows for all InDels (Cigar I and D operations) if the preceding 3 bases are all A, all C, all G, or all T. If at least 2 different nucleotides are observed the reported homopolymer context is "None". For Illumina reads, almost 50% of all reported InDels occur in a homopolymer context with greater counts for A and T compared to G and C.
# GC content
To estimate a GC bias curve even for low-coverage single-cell data, Alfred computes for each mapped read the local GC-content and then compares the estimated sample GC content to the expected, genome-wide GC content. If a targeted assay is analyzed, Alfred, in addition, computes the GC content of all target regions.
# GC-Content and Mapping Statistics by Chromosome
This table lists the size, the number of Ns, the GC-content, and the number of mapped reads for each chromosome as well as the observed-to-expected ratio of mapped reads.
# Summary statistics
The summary tab aggregates quality control data in a simple table that can be downloaded in CSV format.
This table is ideal to compare QC metrics across samples and/or sequencing assays. Among many other statistics,
the table lists, for instance, the number of duplicate reads, the number of unmapped reads, the number of
secondary and supplementary alignments, base-pair exact error rates stratified by mismatch,
insertion and deletion errors, and the median coverage and insert size of the sequenced sample.
The table provides more detailed statistics for specialized assays, i.e.
for 10X Genomics it lists the number of MI tagged reads, the total number of UMIs,
the fraction of haplotype-tagged reads and the N50 phased block length.
For ATAC-Seq data, users can provide a BED file of promoter regions and then the
corresponds to TSS enrichment whereas for WES data, the enrichment quantifies the capturing efficiency
if the BED file contains all target regions.
# Example Data Sets
The web application hosts example data sets for a number of sequencing assays and sequencing technologies.
|Sequencing Assay||Sequencing Technology|
|DNA-Seq (WGS)||Illumina, PacBio, ONT|