Python Strings and RNA


Hopefully you still have your Jupyter Hub open in a seperate tab, but if not, click the link below:

Open Jupyter Hub

Now, we will study RNA sequences using python. Let's define an RNA sequence. Please type the following, or copy and paste into the In section of your Jupyter Notebook. Although the exact sequence isn't important yet, it should only use the letters A, C, G, and U.

RNA = 'GCUAGCUAGUCGA'
      

We can print the RNA sequence using the print() function

print(RNA)
      

The RNA sequence is a string, which can be thought of as a series of characters. In the case of RNA, we are using only the characters A, C, G, and U.

We've seen what happens when you add two number variables, what do you think will happen when you add two string variables? Type the following and click Run

x = 'AAAUUU'
y = 'GGGCCC'
x+y
      

In this example, we get the string concatenation of the two strings x and y. That is, the result is the combination of two strings put together.

For-loops

One way to see the series of characters is by printing them individually. A for-loop is a way to traverse each character. In this case, let's print each charater out one at a time using a new variable "c".

for c in RNA:
     print(c)
      

The for-loop has some required components. First, it has the keyword "for" followed by a special variable, in this case "c". This variable c will range over the length of the RNA sequence one character at a time. The next term is the operator "in" which is used to signify that the character "c" comes from the sequence "RNA". Then there is a colon ":", and the next line is indented.

The variable "c" can be anything, as long as it's consistent within the rest of the code. For example, try changing it to "x":

RNA = 'GCUAGCUAGUCGA'
for x in RNA:
     print(x)
      

Try changing to another variable name in place of "x". Although the variables can be anything you want, other parts of this code can not be modified, such as "for", "in", and the colon ":".

Counting characters in the RNA

Printing each character one-per-line isn't very useful for most purposes, but serves to show that we visited each character in order. Now let's try to do something useful, like counting "A" characters. We might want to define a new variable, "A_count".

A_count = 0
      

While the RNA variable is a character string, this new variable "A_count" is a whole number, or integer. We can update the "A_count" variable by adding to it, so that we can keep a running tally of each A in the RNA sequence:

A_count = A_count + 1
      

As you can see, we are making the new value of A_count to be one plus the previous value. So if A_count is 5, it will be updated to 6. The next step is we need a way to check if the character "c" is A, as opposed to the other RNA nucleotides C, G or U.

RNA = 'CGCAUAAGCG'

A_count = 0
for c in RNA:
    if c == 'A':
        A_count = A_count + 1

print(A_count)
      

In this example, we are also using an If-statement. The If-statement takes in a condition or statement that is True or False. Recall how we tested if x+y==7 before. Now we are testing if the character c has the value of 'A'.


Jump to the next activity




Back to Activities