Answer:
You did not mention the programming language for implementation so i am writing a JAVA code.
import java.util.Scanner; // to get input from user
public class Genome{
public static void main(String[] args) { //start of main() function body
Scanner input = new Scanner(System.in); //creates Scanner object
System.out.print("Enter a genome string: ");
//prompts user to enter a genome string
String genome = input.nextLine();
//reads the input genome string and stores it into genome variable
boolean gene_found = false;
//variable gene_found of boolean type that has two value true or false
int startGene = 0; // stores starting of the gene string
for (int i = 0; i < genome.length() - 2; i++) {
//loop moves through genome string until the third last gene character
String triplet = genome.substring(i, i + 3);
//stores the triplet of genome substring
if (triplet.equals("ATG")) {
//if value in triplet is equal to ATG
startGene = i + 3;
//3 is added to i-th position of the genome string
}
else if (((triplet.equals("TAG")) || (triplet.equals("TAA")) || (triplet.equals("TGA"))) &&(startGene != 0))
//checks if the genome ends with one the given triplets TAG TAA and TGA
{ String gene = genome.substring(startGene, i);
gene stores substring of genome string from startGene to the position i
if (gene.length() % 3 == 0)
//if the the mod of gene length is 0 then the gene is found
{gene_found = true;
System.out.println(gene); //returns the found gene
startGene = 0;} } }
if (!gene_found) //if gene is not found returns the message below
System.out.println("no gene is found"); } }
Explanation:
This program first asks user to enter a genome string.
The loop starts from the first character of the entered string and this loop continues to execute until the value of i is 2 less than the genome input string length.
triplet variable stores first 3 characters of the genome string in first iteration and then moves through the string taking 3 characters each. This is done by dividing genome string to substring of 3 characters.
If condition checks if the 3 characters of genome string matches ATG using equals() function. If it is true this means start of genome is reached and these triplets are stored in startGene.
Else condition checks the end of the genome as the genome ends before one of TAG, TAA or TGA triplets. So this is checked here.
gene variable holds the triplet value stored in startGene and the value stored in index position i which means it holds the start of the genome till the end of the genome sequence. The end which is pointed by i variable is 2 less than the genome length and it is stored in gene variable.
After the loop ends the substring stored in gene variable is checked for a valid genome sequence by mod operator. If the length of the value stored in gene variable mod 0 is equal to 0 this means genome sequence is found and this string sequence stored in gene is displayed else the no gene is found message is displayed on output screen.