Yingyao Zhou, Ph.D.
Director of Informatics
The Informatics Section keeps our scientists abreast of the latest technological advances by providing a solid computational platform for all GNF scientists and their collaborators with the latest hardware and software. The ambitious drug discovery portfolio and widely employed high throughput technologies at GNF provide numerous opportunities for original research and development in both cheminformatics and bioinformatics.
Cheminformatics has developed a comprehensive Lead Discovery Database (LDDB) by working closely with all drug discovery groups: compound management, high throughput screening, analytical chemistry, medicinal chemistry, pharmacology, and program management. LDDB currently contains a suite of tools for archiving and analyzing more than 2 million compounds, more than 100 million pieces of high throughput screening data, and SAR and pharmacology data for more than 200,000 compounds in lead optimization. The success of LDDB rests on a combination of strong programming skills, sophisticated chemical toolboxes, and statistical and datamining algorithms. Together with the Engineering Department, the cheminformatics team designs robotic automation systems that will significantly expand the searchable biological and chemical space for drug discovery.
The Informatics Section is committed to developing intelligent computational algorithms to facilitate the discovery of new knowledge. We have developed an ontology-based pattern identification (OPI) algorithm, a redundant siRNA activity (RSA) algorithm and a match-only integral distribution (MOID) algorithm. OPI and MOID have been successfully applied to predict functions for malaria genes based on life-cycle gene expression data (MOID). We have also successfully applied OPI to improve the high throughput screening hit selection process, as well as discover interesting chemical scaffolds based on their collective inhibition patterns across a large panel of biological assays. By applying RSA, we have minimized the impact of off-target effects upon large-scaled RNA interference screens.
The Informatics Section is interested in any emerging computational technologies that may contribute to the continuing success of GNF.
Selected publications
- Konig R, Chiang CY, Tu BP, Yan SF, DeJesus PD, Romero A, Bergauer T, Orth A, Krueger U, Zhou Y, et al. A probability-based approach for the analysis of large-scale RNAi screens. Nat Methods 2007;4(10):847-9.
- Yan SF, King FJ, He Y, Caldwell JS, Zhou Y. Learning from the data: mining of large high-throughput screening databases. J Chem Inf Model 2006;46(6):2381-95.
- Zhou Y, Young JA, Santrosyan A, Chen K, Yan SF, Winzeler EA. In silico gene function prediction using ontology-based pattern identification. Bioinformatics 2005;21(7):1237-45.
- Yan SF, Asatryan H, Li J, Zhou Y. Novel statistical approach for primary high-throughput screening hit selection. J Chem Inf Model 2005;45(6):1784-90.
- Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK, Haynes JD, De La Vega P, Holder AA, Batalov S, Carucci DJ, et al. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 2003;301(5639):1503-8.









