code-notes/software_commands at master · novigit/code-notes · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
OVERVIEW OF SOFTWARE COMMANDS

# ALIGNMENT_PRUNER.PL #
for FILE in cluster*; do alignment_pruner.pl --file $FILE --chi2_prune Half --inverse > $FILE\-high_bias; done
for FILE in cluster_0*; do alignment_pruner.pl --file $FILE --chi2_prune Half > $FILE\-low_bias; done

# BAMTOOLS #

# split a BAM file into multiple BAM files,
# one contig per BAM file
bamtools split -in <.bam> -reference

# BLASTALL #
blastall -p <PROGRAM> -i <INPUT> -d <PATH TO DATABASE> -a <NUMBEROFCORES> -e <EVALUETRHESHOLD> -F <FILTER> -o <OUTPUT>
BLASTDATABASE

formatdb -i predicted_proteome.fasta -p T

# BLAST (NCBI BLAST+) #
makeblastdb \
-in [sequence file] \
-input_type [type: i.e. fasta] \
-dbtype [nucl|prot] \
-title [same as 'in'] \
-out [same as 'in']

To make a custom blast database in which you can easily retrieve blast hit sequences with blastdbcmd, do

makeblastdb \
	    -in [in.fasta] \
	    -dbtype nucl|prot \
	    -parse_seqids
NOTE: MAKE SURE THAT FASTA HEADERS ARE FORMATTED SOMETHING LIKE THIS: >lcl|[accession no.] theawesome sequence
Also make sure that you only have one pipe in your sequence name, the one after >lcl. It will complain otherwise
makeblastdb -hash_index	     # create index of sequence hash values
# Dayana:
# I believe it is the creation of a hash table that contains the mapping of sequence ids to fix identifiers that are used for computing speed and efficiency


If it complains about delta sequences, check for non-amino acid characters in the sequences with
grep -v ">" *.fasta | sed "s/\(.\)/\1\n/g" | sort | uniq -c
or check for headers that do not start at the beginning of the line.

blastn \
-query [path to query] \
-db [database title] \
-out [blast out file] \
-evalue [i.e. 1e-5] \
-perc_identity [integer] \
-max_target_seqs [integer] \
-num_threads [integer] \
-outfmt [integer] (6 = tabular) \
-max_hsps [integer]

note: -max_target_seqs 1 does not necessarily yield the top hit! It just means that it will stop searching after finding 1 hit.


note: blastn has preset modes or 'tasks'. An interesting one is 'blastn-short', which is invoked with the flag -task blastn-short. It has a shorter word-length (7 instead of 11), a smaller reward for a nucleotide match (1 instead of 2), compared to standard blastn. This is a good mode when blasting primers against genomes or loci. Setting the E-value to 10 could also be useful to detect primers with mismatches.

psiblast \
-in_msa [file] \
-db [path/to/db] \
-out [path/to/out] \
-evalue [i.e. 1e-10] \
-num_threads [integer] \
-outfmt '6 std stitle staxids'

Tabular format:
queryid	subjectid %id alnLength mismatchCount gapOpenCount queryStart queryEnd subjectStart subjectEnd eVal bitScore

Getting the sequences of your blast hits:
blastdbcmd \
-db <path/to/db> \
-dbtype <nucl|prot> \
-entry <comma delim string of sequence identifiers OR 'all' to get them all>
-entry_batch <file, one entry per line>
-out <path/to/out> \
-outfmt %f
-target_only

-target_only will force blastdbcmd to only retrieve the primary hits in case of MULTISPECIES entries. Currently bugs because it fails on entries from PIR (Protein Information Resource, a db in NR). When it fails it quits prematurely rather than skipping the entry. Thus, your file will NOT contain all sequences, but only those up until the PIR hit.

Good fields to include on top of std in the blast tabular output:
slen 	    subject length
qlen	    query length
qcovs	    alignment length / query length * 100 (how much of query is aligned to hit? )
	    Takes into account all HSPs
qcovhsp	    Same as qcovs but then for each single hsp (not sure though; testing seems to verify this)

# BOWTIE-2 #
bowtie2-build \
	      -f <FASTA> \
	      <BUILD-PREFIX>

bowtie2 \
	-p <THREADS> \
	--phred64 \
	-t \				Print time
	-x <BUILD-PREFIX> \
	-1 <FW READS> \
	-2 <RV READS> \
	-U <UNPAIRED READS> \
	-S <OUTPUT SAM>

# BWA #
bwa index <FASTA FILE>
bwa aln <FASTA FILE> <FW READ FILE> (FASTQ or FASTA) > <FW OUTPUT.sai>
bwa aln <FASTA FILE> <RV READ FILE> (FASTQ or FASTA) > <RV OUTPUT.sai>
bwa sampe <FASTA FILE> <FW INPUT.sai> <RV INPUT.sai> <FW READ FILE> <RV READ FILE> | gzip > <OUTPUT.sam.gz>

# EDIRECT AND E-UTILITIES FROM NCBI #

# download all protein fastas from a particular genome accession
# sometimes it will fail because ncbi will deny your download request, for example if you access the database too frequently
esearch -query PTLE00000000 -db nuccore | elink -target protein -name nuccore_protein_wgs | efetch -format fasta
# downloading all proteins from a particular nucleotide accession
esearch -query NC_035978.1 -db nuccore | elink -target protein | efetch -format fasta > Calypogenia_arguta.mitoencoded.protein.fasta
# download a full genbank file of a particular nucleotide accession
esearch -query EF494740 -db nuccore | efetch -format gb
esearch -query EF494740 -db nuccore | efetch -format gff3
# downloading a genbank file of a particular protein
esearch -query ABC21121.1 -db protein | efetch -format gb
epost -input mystery_accessions.list -db protein | efetch -format gb
epost -input mystery_accessions.list -db protein | efetch -format fasta


# ESOM-1.1 #

#	Get a reference genome as a control
#	Make sure that all fasta files are in the same directory and have the same extension (i.e. '.fasta')
#	Do `esomWrapper.pl -path . -ext <extension> -scripts /local/one/tools/esomScripts/` -dir <outdir> -min <minContigLength> -max <chunkSize>
#	Check <outdir>/esom.log for suggested <rows> and <columns> and to see if everything went ok
#	Do `esomana &`
#	Do 'File->Load *.lrn, File->Load *.names and File->Load *.cls' and choose the corresponding files in <outdir>
#	Do 'Robust ZT' transformation
#	Do 'Tools->Training' and choose the <outdir>/*.lrn, and overwrite it
#	Change Training Algorithm to "K-batch", rows and columns to suggested in the log file and start value for radius to 50
#	Hit START. This can take a while
#	After its done, hit CLOSE
#	Change color scheme to "TwoMatch" and "Bone". Play around with CLIP to make borders stronger

ADDING CLASSES
If you have some contigs and you want to know where they end up in your esom map, you can use the script
"changeClasses.pl" to make a new class file (.cls) that will color your contigs of interest in the existing ESOM map:
changeClasses.pl -cls <original .cls file> -names <original .names file> -list <list of contig names> -tag <name of the class you want your contigs to have> -out <new .cls file>
then load the new .cls file onto your esom map and your contigs of interest will be labeled!

If you have multiple sets of contigs to paint:
for bin in {0..224};
do
	cls=$(echo $bin-1 | bc);
	echo $bin $cls;
	perl /local/one/tools/esomScripts/changeClasses.pl \
            -cls test.bin${cls}.cls \
            -names min10k_max10k/Tetra_esom_10000.names \
            -list <(grep ">" ../../../concoct-0.4/map/bins/bin_${bin}.fasta | sed -r "s/[- ]/_/g" | sed -r "s/>//") \
	    -tag bin_${bin} \
            -out test.bin${bin}.cls
	rm test.bin${cls}.cls
done
This loop iteratively writes over the previous class file that added one bin until it has written classes for all bins.
In the '-list' option it is crucial that you change the headers so that they match with the ones in the .names file

ESOM output files
.bm -> bestmatches

General format:
% <rows_of_map> <columns_of_map>
% <total_number_of_points>
<pointid>	<row>	<column>

So basically the coordinates of each point
.
lrn -> learn file

General format:
% <total_number_of_points>
% <number_of_columns_in_data_file> (136 4-mers; not 256 (because of reverse complement and something else) )
% <type of column> (9 for unique key; key = number of data point, and should be unique in our case)
<key1> <freq1> <freq2> <freqn>
<key2> <freq1> <freq2> <freqn>

So basically the tetranucleotide frequencies of each point

_robust.lrn -> learn file normalized with RobustZT transformation.

RobustZT = robust Z-transform
Z-transform: take each value, substract the mean, and divide by std dev.
The result will be a value that describes how far that value is from the mean in factors of std deviations.
Robust Z-transform: Estimate mean and std dev without outliers.


General format:
% <total_number_of_points>
% <number_of_columns_in_data_file>
% <type of column>
<key1> <robustZT freq1> <robustZT freq2>
<key2> <robustZT freq1> <robustZT freq2>
..

.wts -> weights

General format:
# comments
# comments
% <rows_of_map> <columns_of_map>
% <number_of_columns_in_data_file>
%

This file contains a number of lines equal to columns times rows.
Thus, each line corresponds to a single point in the map.
I think each point represents a profile of tetramer frequencies.

This file thus represents a grid where each point is a tetramer profile.
You then load your contig(splits) on this grid and figure out which point your contig fits best to.
In other words, the "best match".

# FAMCL #
Discards alignments in which the query and hit lenghts are more than 60% different
Discards alignments in which the smaller fragment is less than 80% covered by the alignment

# FASTTREE-2.1 #
Protein
fastTreeMP -wag [fasta-alignment] > out.nwk

# GENEMARKS #
perl gmhmmp_heuristic.pl -a -outfile <OUTPUT.faa> -s <INPUT.fasta>
perl genemark2fasta.pl <INPUT.faa> > <OUTPUT.fa>

# GRAPHMAP #
graphmap align \
	 --ref {input.assembly} \
	 --reads {input.reads} \
	 --out {output.sam} \
	 --threads {threads}

ONT / PacBio read mapper
Graphmap2 is currently bugged with DNA reads, it is more useful for RNAseq reads right now
Each read is only mapped once
Many reads that are mapped to repeat areas are "stretched", they contain large stretches of deletions
Most telomeric reads seemed to map to the first available contig with that telomore

# HMMER #
HMMFETCH - select a subset of HMM profiles from a file containing many HMM profiles

First index the file to make searching significantly faster
hmmfetch --index Pfam-A.hmm
This will create a Pfam-A.hmm.ssi index file

Then fetch a desired profile
hmmfetch Pfam-A.hmm PF00001.20

Or a bunch of profiles listed in a file
hmmfetch -f Pfam-A.hmm profiles.list > selected_Pfams.hmm


# IQTREE #
iqtree-omp -s <.phylip> -m TESTONLY -mset LG -madd C20,C30,C40,C50,C60 -mredo -nt 30
Tests the LG model and the mixture models C20 to C60 for a given alignment.
It seems that mixture models can only be tested with the -madd option.
-m TESTONLY invokes the model tests, but it will only do model tests, no tree inference
-m TEST     invokes the model tests, but it will also do tree inference with the best model according to BIC

# KHMER #
First make a virtual environment:

cd /local/jmartijn/PROGS/virtualenv-1.6.1
python -m virtualenv env
. env/bin/activate

Then make a hashtable (set highest possible memory usage and k-size in the load-into-counting script):

load-into-counting.py <OUTPUT HT FILE> <PATH TO FW FASTA FILE> <PATH TO RV FASTA FILE>

Then discard reads (set in discard-high-abund.py the desired covarage (the k-coverage at k=32)

discard-high-abund.py <PATH TO HT FILE> <PATH TO FW FASTA FILE> > <OUTPUT NORM FW FASTA>
discard-high-abund.py <PATH TO HT FILE> <PATH TO RV FASTA FILE> > <OUTPUT NORM RV FASTA>

# MAFFT #

mafft \
--thread [integer] \
--adjustdirection \	If nucleotide alignment, automatically detects reverse complemented sequences and corrects
--reorder \	  	Orders the sequences in the fasta or phylip file so that the most similar ones are next to each other
[file.fasta] > [file.aln]

# NEWICK UTILITIES #

nw_display <tree>
nw_ed tree.nwk 'i & b <= 50' o > tree-out.nwk		Collapses all internal (i) nodes with bootstrap support (b)
      	       	      	       	 			lower than 50
nw_labels -I tree.nwk					Prints the labels of the tree. -I suppresses inner labels
nw_labels -L tree.nwk					Prints the labels of the tree. -L suppresses leafs
nw_reroot <tree> <label1> <label2>			Reroot the tree on the branch that is the LCA of <label1> and <label2>
nw_clade <tree> <label1> <label2>			Print the subtree that is the LCA of <label1> and <label2>

# NGMLR #
Reads are mapped multiple times


# PHYLOBAYES #
Example
for i in 1 2 3 4; do
    mpirun -np 7 nice -n 19 pb_mpi -d ../panorth_concat.phy -cat -poisson CAT-Poisson$i &
    ## To stop
    #echo 0 > $RUN$i.run
    ## Restart
    #mpirun -np 10 nice -n 19 pb_mpi $RUN$i &
done

To check convergence, press Enter and do the following:
wc -l *.trace to check the amount of trees it has visited
bpcomp -x [burnin cutoff] [integer(checks every integer's tree to calculate maxdiff and meandiff)] [CHAIN1] [CHAIN2] [CHAIN3] [etc]
Maxdiff should be < 0.1

bpcomp -x [burnin cutoff] [integer(checks every integer's tree to calculate maxdiff and meandiff)] [end point] [CHAIN1] [CHAIN2] [CHAIN3] [etc]

calculates convergence not from burnin to end, but from burnin to specified point.
For example, from 1000 to 11000 (a window of 10000 generations)

To check convergence visually, make use of Lionels script:
Rscript /home/lionel/bin/comparative/plot_pb_stats.R folder=[path] burnin=[integer] output=[out.jpg] [CHAIN1] [CHAIN2] [CHAIN3] [etc]

With bpcomp it is also writing out the consensus tree up to that point.

To soft-stop a chain, replace the [chainname].run file, which contains '1' if it is running, with 0:
echo 0 > [chainname].run

-o [basename]	[basename] will replace "bpcomp" in the output files.
PhyloBayes will then finish the current cycle before exiting. It can be restarted later.

To open phylobayes trees with figtree from command line. Convert the primary tree output first:
sed -e "s/)1:/)1.0:/g" bpcomp.con.tre > pb.tree

# PRODIGAL #
Predicts not only complete ORFs, but also incomplete ORFs at the edges of the contigs (thus without start codon for example at the start of a contig or without a stop codon at the end of a contig)
faMCL has the ability to detect these when clustering and will try to not incorporate the incomplete ORFs.

prodigal \
    -i ~/one/data/assemblies/clean/step3-CrossContamination/L4_hiseq_contigs-step3.fasta \
    -o gene_pd.gff \
    -f gff

# QUAST #
python <PATHTOQUAST>/quast.py -o <OUTPUT DIR> <CONTIG OR SCAFFOLD FILE>
Optional options
-R REFERENCE GENOME
--contig-thresholds <INT,INT,...> CONTIG LENGTH THRESHOLDS
--gene-finding OUTPUTS AMOUNT PREDICTED GENES
--est-ref-size PROVIDE ESTIMATED GENOME SIZE TO COMPUTE NGx VALUES
--scaffolds IF PROVIDED FASTA ARE SCAFFOLDS

# RAXML #
Rapid bootstrapping, ML search and bootstrapping in one command:
-f a -x 12345 -N 100 -s [alignment.phylip] -m GTRCAT -n [outname (suffix) ]
-f a -x 12345 -N 100 -s [alignment.phylip] -m GTRGAMMAI -n [outname (suffix) ]

Example:
raxmlHPC-PTHREADS-SSE3 -f a -x 12345 -p 12345 -N 100 -m PROTCATLG -s panorth_concat.phy -n panorth_concat -T 30

Standard bootstrapping
First do ML search, search for the best tree:
raxmlHPC-PTHREADS-SSE3 \
-f d \
-p 12345 \
-N 100 \
-m GTRGAMMA \
-s ../../trimal-1.4/16S_Pooled_76-trim.phylip.reduced \
-n TEST \
-T 30

Then do the 100 bootstraps
raxmlHPC-PTHREADS-SSE3 \
-f d \
-b 12345 \
-p 12345 \
-N 100 \
-m GTRGAMMA \
-s ../../trimal-1.4/16S_Pooled_76-trim.phylip.reduced \
-n TEST \
-T 30

And map the bootstraps onto the best tree
raxmlHPC-PTHREADS-SSE3 \
-f b \
-t ../MLsearch/RAxML_bestTree.TEST \
-z ../standardBS_noBL/RAxML_bootstrap.16S_Pooled_76-trim \
-p 12345 \
-m GTRGAMMA \
-n TEST \
-T 30

Flags:
-f	This option allows you to select the type of algorithm / function you want RAxML to execute.
a 	a rapid  Bootstrap analysis and search for the best-scoring ML tree in one single program run
-x	Specify an integer number (random seed) and turn on rapid bootstrapping. This will invoke the novel rapid bootstrapping algorithm.
-p	Specify a random number seed for the parsimony inferences. This allows you and others to reproduce your results (reproducible/verifiable experiments) and will help me debug the program. This option HAS NO EFFECT in the parallel MPI version
-N or -#Specifies the number of alternative runs on distinct starting trees. In combination with -f a and -x a rapid bootstrap search and thereafter a thorough ML search on the original alignment is done.
-m      Selects the model of nucleotide substitution or amino acid substitution to be used.

Nucleotide Models: 	GTR, with in addition either GAMMA model for rate heterogeneity (with 4 discrete rate categories) and or I for estimation for proportion of invariable sites. Or in combination with the CAT model. CAT estimates substitution rates per individual site or classifies sites a number of rate categories.

Amino Acid Models:	A substation matrix like BLOSUM62, WAG or DAYHOFF etc in combination with CAT, CAT-GAMMA, CAT-GAMMA-I etc.
-s    	Name of the alignment file. Must be in relaxed PHYLIP format. Relaxed means that sequences file can be interleaved or sequential and taxon name length does not matter.
-n 	Name of the run. Several out files which contain this file name are generated.
-T	Specifies the amount of threads you want to run

-w	Outputdirectory (absolutepath)

Prohibited characters in sequence names:
:,();[]

# make a coverage plot in R
Making a coverage plot with r using the mpileup.out made with samtools
cut -f 2,4 mpileup.out > mpileup.cov

# in R do
data <- read.table (file="mpileup.cov", sep="\t", header=F)
colnames(data) <- c("position", "coverage")
depth <- mean(data[,"coverage"])
window <- 501
rangefrom <- 0
rangeto <- length(data[,"position"])
data.smoothed<-runmed(data[,"coverage"],k=window)
png(file="coverage.png",width=1900,height=1000)
plot(x=data[rangefrom:rangeto,"position"],y=data.smoothed[rangefrom:rangeto],pch=".", cex=1,xlab="bp position",ylab="depth",type="l")
dev.off()

# SEAVIEW #
Select taxa of interest
Ctrl-C Ctrl-V then makes a new alignment of the selected taxa

Select taxa of interest
Hit the middle mouse button at position of interest to move 'em to that position

Also possible to call a consensus with seaview, at different thresholds

# SEQKIT

# reports per fasta entry the name, the sequence, the length and %GC
seqkit fx2tab -lg <fasta>
# reports per fasta entry the name, length and %GC, but not the sequence
seqkit fx2tab -nlg <fasta>
# color FASTA files
seqkit seq --color <fasta> | less -R

# extract a single entry
seqkit grep -p <regexp> <fasta>

# look for a subsequence
seqkit locate -p AFLEADRTGQA <fasta>
seqkit locate -rp AFLEADRTGQA.*VAPARS <fasta>  # use regexp

# report sequence for 'ergo_tig00000012' but only for region 16000-17000
seqkit subseq --chr ergo_tig00000012 --region 16000:17000 consensus.fasta

# print the sequence of a fasta file without newlines (i.e. no wrapping)
seqkit seq --seq <fasta>

# translate a sequence over all six frames
seqkit translate -f 6 <fasta>
cat <fasta> | seqkit translate -f 6

# SINA #
-i [in.fasta] \
-o [out.aln] \
--ptdb [taxonomicgroup.arb] (reference alignment) \
--outtype fasta

# SPADES #
spades.py -o <OUTPUT DIR> -1 <FW READS> -2 <RV READS> -t <THREADS> -K <K1,K2,K3,etc>
Optional options
--sc FOR MDA (SINGLE-CELL) DATA
-s <UNPAIRED READS>
--only-error-correction RUNS THE READ CORRECTION STEP ONLY
--only-assembler RUNS ASSEMBLY STEP ONLY
--careful INDUCES MISMATCH CORRECTOR, THAT RUNS POST-ASSEMBLY

# TABLET #
If you want to load a <file>.sort.bam alignment file, make sure that the index .bai file is named such as <file>.sort.bam.bai
Otherwise it complains

# TRIMAL #
trimal \
-in [input alignment] \
-out [output alignment] \
-gt [any value between 0 and 1] \
-[output format]

# TRIMMOMATIC #
Paired End Mode:
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
Single End Mode:
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticSE <input> <output> <step 1> ...

# UCLUST #
uclust \
--sort [in.fasta]
--output [out-sort.fasta]

uclust \
--input [in-sort.fasta]
--uc [out-sort.uc]
--id [0.97 or whatever threshold you want]

uclust \
--uc2fasta [in-sort.uc]
--input [in-sort.fasta]
--output [out-pruned.fasta]
--types S

In the cluster file .uc, there each sequence is classified as either...
S  The seed sequence (the first sequence in a cluster)
H  The hit sequence
C  The name of the cluster

The second column is the cluster number, starting at 0.
The third column indicates the length of the sequence in case of S or H
The third column indicates the size of the cluster in case of C

# VELVET #
shuffleSequences_fastq.pl <INPUT1.fastq> <INPUT2.fastq>
velveth <DIRECTORYNAME> <LOWESTKMER> <HIGHESTKMER> <STEPSIZE> -fastq (or -fasta) -shortPaired1 (or -short) <SHUFFLEDINPUT1.fastq> -fastq -shortPaired2 <SHUFFLEDINPUT2.fastq>
velvetg auto_<KMERVALUE> -exp_cov <EXPECTEDCOVVALUE> -ins_length1 <BPLENGTH1> -ins_length2 <BPLENGTH2>

# VELVET-SC #
velveth <DIRECTORYNAME> <K-SIZE> -short -fasta <SHUFFLEDINPUT 1> <SHUFFLEDINPUT 2>
velvetg <DIRECTORYNAME> -ins_length <INSERT SIZE> -cov_cutoff <FINAL COVERAGE CUTOFF> -read_trkg yes
If you do not turn on read tracking it will not report the amount of reads used
In the run2.c file you can change the starting coverage cutoff, incrementing value and the maximum contig size (maximum contig size: any contig smaller than this and with coverage lower than coverage cutoff will be removed, longer contigs wont be removed (i think))