In computational biology, we sometimes refer to important subsequences as motifs. Suppose we know a particular motif and want to locate it within a larger sequence
For example, the SARS CoV-2 genome is about 30,000 base pairs long. The spike protein is nearly 4,000 nucleotides in length. The 27 nt. long sequence "GGCGGCTTCAATTTCAGCCAGATTCTG" codes for a small protein domain called the fusion peptide.
First, let's access the notebooks for this module and this activity by copy-pasting the following code into a Jupyter notebook environment:
import os os.system("wget http://compbiocamp.cgrb.oregonstate.edu/notebooks/Motif_Search.ipynb") os.system("wget http://compbiocamp.cgrb.oregonstate.edu/notebooks/Genome_Assembly.ipynb") os.system("wget http://compbiocamp.cgrb.oregonstate.edu/notebooks/Plotting.ipynb") os.system("wget http://compbiocamp.cgrb.oregonstate.edu/notebooks/Sequence_Comparison.ipynb")
Now please click the notebook for this activity: Motif_Search.ipynb. The notebook will provide more instructions.
Back to Activities Next Activity