Answer:
The minimum length of a sgRNA sequence to avoid off target cleavage by the CRISPR/Cas system in the fly fruit genome is 14 bases
Explanation:
We are trying to use the CRISPR/Cas system to cleavage the genome of the fruit fly (which is 1.4x10^8 bp long). Also we desire the cleavage to be unique. That means we need a target sequence long enough to be able to assume it will only appear once in the genome.
First, we should think that in every position, we can find one out of four different nucleotide (A, C, T, G). So, the probability of getting a sequence of a given length "n" will be (1/4)^n (We are assuming that the probability of finding a nucleotide in the position "i", it's independent of the nucleotide we find in any other position "j").
Also, to know how many times a sequence will appear in a genome (the expected value of occurrence), we must multiply the probability of that sequence to randomly occur by the length of the genome. For our specific example, the number of occurence of a sequence of length "n" is:
nºoccurence=[(1/4)^n]*1.4*10^8
But in this case, what we want is the expected number of times the sequence will appear to be 1, and we want to obtain the length of the target sequence (n).
Given the information above, we know that:
[(1/4)^n]*1.4*10^8 =1
[(1/4)^n]=(1/1.4*10^8)=1.4*10^-8
Then, if we want to calculate n, we can use logarithms and its properties to get:
log[(1/4)^n]=log[1.4*10^-8]
n*log[(1/4)]=log[1.4*10^-8]
n=log[1.4*10^-8]/log[(1/4)] => n=13.29 approximately.
As the sequence needs to have a natural number of elements, <u>we can conclude that using a target sequence of a minimum of 14 bases with the CRISPR/Cas system in the fly fruit genome should be enough to avoid off target cleavage.</u>