The database was downloaded on the 22nd of May of 2015, named elm_original_20150522.csv, version history in http://elm.eu.org/infos/news.html. This file can be found in the folder 'database'. This file has 210 motifs.
This generated a list of 208 motifs. The file we will be working with is "database/elm_input.txt".
Software for processing elm database
\ No newline at end of file
Analysis description:
We will take the following approach to the analysis:
Trim start and end positions with more than 10 characteres.
We identified motifs that:
1. Have the same biological role
2. Described as "minor variants"
The file 'database/marked_motifs.txt' has the list of motifs we will not take into account from elm_input_modif.txt.
Software:
Elm_processing:
Compile the code:
cd elm_processing
make
After compilation the binary will be placed in the folder elm_processing/bin.
Generate files for python sripts:
The corresponding input files for python scripts are already generated. But if needed, they can be generated using the following commands from the directory elm_processing/bin
Each python script has an input directoy and an output directory. To excecute the each python script it is enought to be inside the script folder and run 'python main.py' which will generate the output and place it in output directory. The rule is the script that has "empiric" in is name should have "structure_empiric_frequency.txt" in its input folder, while the script that has "theoretic" in its name should have "structure_theoretic_probabilities.txt" in its input folder.