Hi, I have been constructing an Alphaproteobacteria tree recently and using the alignment_pruner.pl to trim the 20% most heterogeneity sites.
According to the script,
Use the chi2 statistic to choose columns to prune. This will first order
all of the columns by the chi2 statistic by comparing the chi2
statistic for the alignment with and without each column. Then the option
specifies how to remove columns:
half Remove half of the sites (starting with the most biased).
f# Remove sites until # fraction of sites remains, half can be
specified as f0.5
n# Remove sites until only # number of sequences show significant bias.
min Remove sites until a minimum of sequences show significant bias.
plot Will print statistics to the screen suitable for plotting. It will
contain 4 columns: idx, number of biased sequences, chi2 delta for
this column and the names of the biased sequences.
And therefore, I use:
alignment_pruner_broCode.pl --file input.faa --chi2_prune f0.8 > pruned20_Alpa117.fasta
However, as compared to my original input file, instead of remain 80% of the sites, it only left with 20%.
Original file: 18823
f0.8: 3765
I am wondering, am I using the wrong command for trimming the most compositional heterogeneity sites, or the description of the text in f# is not correct. To trim 20% of the sites, it should be f0.2.
Hi, I have been constructing an Alphaproteobacteria tree recently and using the alignment_pruner.pl to trim the 20% most heterogeneity sites.
According to the script,
And therefore, I use:
alignment_pruner_broCode.pl --file input.faa --chi2_prune f0.8 > pruned20_Alpa117.fastaHowever, as compared to my original input file, instead of remain 80% of the sites, it only left with 20%.
Original file: 18823
f0.8: 3765
I am wondering, am I using the wrong command for trimming the most compositional heterogeneity sites, or the description of the text in f# is not correct. To trim 20% of the sites, it should be f0.2.