extended read me file
[pid.git] / README.txt
1 #### ANALYSIS OF PRIMER ID SEQUENCING DATA ####
2
3 ## step 1:
4
5 COMMAND: python src/p1_trim_and_filter.py configfile
6
7 this will check each read for 5' and 3' primer matches (soft matching via Smith Waterman alignment) and split the reads according to their bar codes. filtered_read.fasta files will be deposited in labeled directories.
8
9 ## step 2:
10
11 COMMAND: python src/p2_sort.py run_directory read_type
12
13 this will split the reads according to their pIDs into a largish number of temporary directories. All pIDs within each directory will be aligned by a cluster job. The parameter read_type specifies whether this is to be done one the filtered or corrected reads. In any case, the script looks for a file named read_type+"_reads.fasta"
14
15 ## step 3:
16
17 COMMAND: python src/p3_cluster_align.py run_directory
18
19 Starts a cluster job for each of the temp directory in each of the barcode directories inside the run_directory.
20
21 ## step 4:
22
23 COMMAND: python src/p4_consensus.py run_directory read_type
24
25 This script goes over all barcodes in the run directory, gathers the aligned read files in the temporary directory of the desired read type, and builds consensus sequences. it also writes all aligned reads into a single file.
26
27 ## step 5:
28
29 COMMAND: python src/p5_decontamination.py bar_code_directory ref_seqs read_type true_seq_id
30
31 This script takes the aligned reads from one barcode and checks whether the individuals reads or the consensus sequence alignes reasonably well to the reference sequence with the true_seq_id. If a read does not, it is checked against all other reference sequences. All reads that don't align well to their own reference sequence are written into an extra file.
32
33 alternatively to submit batch-jobs to the cluster:
34
35 COMMAND: python src/p5_decontamination.py run_directory ref_seqs read_type true_seq_id
36
37 ## step 6:
38
39 COMMAND: python src/p6_detect_mutants_indels.py barcode_dir read_type
40
41 Check whether PIDs of low abundance reads are less than a certain edit distance from a high abundance one. Designate a neighbor if reads in addition align well. Produce read files with likely_pIDs and original PIDs.
42
43 After this step, the sorting, alignment and consensus steps (2-4) need to be redone with readtype corrected instead of filtered.