Software and algorithm development for low complexity protein sequence identification and characterization from genomic databases.
********** | |
Dylan Murray | |
Department of Chemistry UC Davis |
Project's details
Software and algorithm development for low complexity protein sequence identification and characterization from genomic databases. | |
Low complexity sequences are present in 30% of the proteins encoded by the human genome. These pseudo-degenerate sequences are biased toward a subset of the twenty naturally occurring amino acid building blocks in proteins. Within the members of this class of proteins, the individual biases vary significantly. Low complexity sequence proteins have become a major focus of modern biological research due to their ability to promote self-assembly processes in living organisms. It is currently not known what characteristics of these protein sequences give rise to this fascinating behavior. | |
Motivation Scientists in advanced research laboratories around the world are studying the self-assembly behavior of these proteins for biomedical and agricultural purposes. Experimental efforts are throughput limited and will benefit from Big Data driven experimental design. The project aims to accelerate experimental discovery in areas such as human disease and biotechnology by facilitating the mining of genomic data from humans, animals, plants, and bacteria. Project Description The ultimate goal of the project is to develop and implement an open source software tool that will collect low complexity protein sequences from genomic databases that contain common features specified by user-adjustable parameters. The software development team will work closely with a team of experimental scientists on the specifics of software design. The implementation of the design will occur in three stages with regular feedback and interaction with the experimental team. ■ Stage One: Optimize a protocol to use an existing algorithm to pick out low complexity sequences from databases of known genomes. ■ Stage Two: Design algorithms to detect characteristic features in low complexity sequences. ■ Stage Three: Implement a software interface for use by scientists around the world. |
|
A software package for use on stand alone workstations or through a web interface. | |
N/A | |
********** | |
30-60 min weekly or more | |
Open source project | |
Attachment | Click here |
No | |
Team members | N/A |
N/A | |
N/A |