Is there a windows version?
Some libraries protal uses are incompatible with Windows, therefore we will not release a Windows version any time soon.
Is there a desktop version with a user interface?
protal can only be operated via the commandline. Check out our tutorial how to install protal and use it on a simple dataset.
Can I use a custom database in protal?
In the current version, protal does not allow for custom databases. However, GTDB r214 spans more than 80k bacterial species and is the most complete curated database available.
Can I use protal on long reads?
The current version of protal only supports short-reads.
What environments can I analyse with protal?
protal works well for all environments. protal uses the GTDB database which contains microbial reference sequences from a variety of environments such as soil, seawater or host-associated environments such as the human gut.
Can I track strains across samples?
Protal is able to track strains across samples by using the strain-trees and pairwise sample distances for each species. Follow our tutorial on how to do so.
What if my sample contains conspecific strains?
protal detects conspecific strains and discards them before building the tree. However, with computing the minimum and maximum distance to other samples, the user can get an idea if the specific sample shares a strain with other samples.
Is protal using alignments or k-mers?
protal follows an alignment based approach using a proprietary aligner specifically developed o excel in the context of metagenomics. K-mers are used for the seeding phase to select candidate references for alignment. The alignment is then performed using the WFA2 library (include link).
Which aligner does protal use?
protal uses a proprietary alignment tool that has a custom approach for seed finding and candidate selection for alignment. For the pairwise alignment between reads and candidate references protal uses the WFA2 library. This approach allows protal to be fast and accurate as we were able to tailor the alignment part to the needs of metagenomics. This is currently not possible with other contemporary aligners such as bowtie or bwa.
What reference does protal use?
protal uses GTDB's universal marker genes as reference. Protal is thus one of the few tools that is able to fully detect the diversity present in GTDB.
I trust bwa-mem and bowtie. Why is protal not using a contemporary aligner?
Contemporary profilers are slow and struggle with closely related references in the database (sometimes close to 100% identity). We carefully benchmarked protal's alignment algorithm against other contemporary aligners and protal displayed the best alignment accuracy when using MAPQ as a filter criterion. More results are available in the supplemental material of the preprint for protal (link to preprint).
What is special about protal's alignment approach?
protal uses a custom datastructure (hash-map) that is designed to select the best candidates given closely related sequences. With a fixed 15-mer as the core of the seed with exact matching, protal also captures the flanking 8 bases on either side. Using these so-called flex-mers, protal is able to select the closest matching 31-mer in the reference with the same exact-matching 15-mer in the middle. This approach allows protal to remain sensitive but not pass on complexity to the subsequent candidate selection and alignment process which are usually the parts that are computationally the most expensive.
A lot of profilers suffer from false positive predictions. How does protal avoid detecting taxa that are not present?
For each species, protal collects a set of parameters across all marker genes that are hit by at least one read. These parameters contain sufficient information to discriminate between false positive and true positive taxa. To avoid overly simplifying the complex relationship between these parameters by picking fixed (arbitrary) thresholds for each, we trained a random forest on these data to achieve consistently high rates of TPs and low rates of FPs.
Copyright © All rights reserved | This template is made with Colorlib