+ some optimization
[qpalma.git] / tools / run_specific_scripts / transcriptome_analysis / README
1 ----------------
2 Processing Steps
3 ----------------
4
5
6 Data Generation
7 ---------------
8
9 The data was taken from
10
11 /media/oka_raid/backup/data/solexa_analysis/ATH/Transcriptome/Col-0/run_40/
12
13 directories 1 to 3
14
15 We combined the map.vm and map_2nd.vm for all these three dirs using 'createFullMap.sh'
16
17 the data was saved in dir /fml/ag-raetsch/home/fabio/tmp/transcriptome_data.
18
19
20 QPalma Heuristic
21 ----------------
22
23 Next we split up map.vm to make a parallel run of the QPalma heuristic in order
24 to estimate the entries in map.vm that should be aligned by QPalma.
25
26 We combine the results using 'combine_spliced_map_parts.sh'
27
28
29 QPalma Dataset Generation
30 -------------------------
31
32 Once we have the map_2nd.vm and the map.vm.spliced files we create a QPalma
33 dataset using 'createNewDataset.py'
34
35
36
37
38
39 Coverage Number Estimation
40 --------------------------
41
42 For this purpose we need the raw fasta files for the original reads. In the
43 directory 'raw_reads' there should be a file 'reads_0.fa' containing all
44 original reads in fasta format.
45
46
47
48
49 New Vmatch criteria:
50
51
52 Old settings 1 mismatch:
53
54 /media/oka_raid/backup/data/solexa_analysis/ATH/Transcriptome/Col-0/run_44/4/length_38/spliced
55
56
57
58 2 Mismatches (stepwise trimming -3 from 35 to 20)
59
60 /media/oka_raid/backup/data/solexa_analysis/ATH/Transcriptome/Col-0/run_44/4/length_38/spliced_3
61