GSP views genetic sequences as signals that can be analyzed using various signal processing methods. This perspective is rooted in the observation that genetic sequences can exhibit regularities and patterns similar to those found in traditional signals. The primary steps involve:
In a study conducted on the Human Genome Project, the researchers had to represent the DNA sequence in a way that was compatible with signal processing techniques. They chose to represent each nucleotide (A, C, G, T) as a unique complex number. For instance, they used the following mapping: A = 1, C = -1, G = i, T = -i. With this numerical representation, they were able to process the genetic sequence as a signal.
In another study on the same Human Genome Project, the researchers decided to use the Fourier Transform to transform the genetic sequence from the time domain to the frequency domain. The transformed sequence revealed peaks at certain frequencies, indicating the presence of repeating patterns in the genetic sequence.
Upon observing the peaks in the frequency domain, the researchers had to interpret what these peaks meant in a biological context. They determined that these peaks corresponded to repeating sequences of nucleotides in the genome, also known as tandem repeats. These tandem repeats are known to play a crucial role in genomic diseases and evolution. Hence, the researchers were able to link their signal processing analysis back to significant biological findings.