Serge Batalov, Ph.D.
Senior Research Investigator II
As computational biology advances into the twenty-first century, one of the largest issues faced by biologists and chemists is the management of all the data that has been collected. The amount of genomic data accessible today, for instance, is much greater than just a few years ago, and more is being generated all the time as new experimental results become available. At GNF, our scientists know that it is not enough to simply manage this data. Researchers, such as those in our Sequence Informatics Section, are conducting studies and bringing the latest technologies to bear on the even greater problem—how to use the data.
Sequence informatics at GNF is organized around four central activities: gene mining for novel targets, gene annotation and clustering of essential and non-redundant sets of genes and their transcripts (for oligonucleotide array design, functional cDNA and RNAi screening, mouse genetics), RNA interference technology development, and development of a mouse phenotyping and mapping platform.
Since the availability of the first genome assemblies in 2001, our gene mining activities have centered on the identification of novel genes within several gene families, including GPCRs, kinases, ion channels, secreted molecules, genes involved in the apoptosis pathway, genes involved in the ubiquitylation system, and others. We use a number of genome-based mining techniques, including HMM modeling and domain-profile based methods, classical search methods such as BLAST, and trans-membrane domain prediction methods.
Gene family mining and clustering also serves as the glue that keeps several other genomics projects at GNF together. For example, to design the GNF1B and GNF1M oligonucleotide arrays for the interrogation of all protein-coding human and mouse genes, non-redundant transcript sets were built using comprehensive cDNA sequence information from GenBank, RIKEN, MGC, Ensembl, and Celera. The emerging P.falciparum genome presented a different challenge for the design of a combined cDNA/genomic DNA array, being 80 to 90 percent A–T rich. The resulting scrMalaria array proved to be a successful vehicle for studying the malaria life cycle and functional characterization of genes based on highly correlated levels and temporal patterns of expression.
To support a large-scale ENU mutagenesis program and fine genomic mapping of novel phenotypes, we have established a dense set of uniformly distributed SNP markers over the mouse genome. To that end, we identified, validated, and genotyped in 48 strains a set of more than 11,000 SNPs (approximately 2-3 for every mega-base of the mouse genome). Based on this dataset, we have demonstrated limited haplotype diversity, also known as "cold" and "hot" spots in chromosomal recombination. These data are visualized and annotated in the public domain at http://snp.gnf.org/.
RNA interference technologies have been a major area of focus for computational biology at GNF. In the summer of 2001, scientists initiated a pilot project that designed five small interfering RNAs (siRNAs) against ten genes. After evaluating their effectiveness in knockdown of message by real-time PCR, researchers designed a modest algorithm taking advantage of publicly available information as well as initial rules learned from this small-scale test. Now, after several more rounds of similar screens in more than a dozen signal transduction pathways, including NFkB, Ap-1, and Trail-induced cell death, researchers have designed a third generation algorithm examining multiple siRNAs against druggable targets in the human genome—every kinase, every GPCR, every modifying enzyme, and many more. Similar shRNA collections were also built for human druggable targets and for their orthologous mouse genes.
Finally, we deliver sequence analysis tools to the researcher's fingertips via the institute's portal.
Selected Publications
- Mukherji M, Bell R, Supekova L, Wang Y, Orth AP, Batalov S, Miraglia L, Huesken D, Lange J, Martin C, et al. Genome-wide functional analysis of human cell-cycle regulators. Proc Natl Acad Sci U S A 2006;103(40):14819-24.
- Willingham AT, Orth AP, Batalov S, Peters EC, Wen BG, Aza-Blanc P, Hogenesch JB, Schultz PG. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 2005;309(5740):1570-3.
- Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, et al. Antisense transcription in the mammalian transcriptome. Science 2005;309(5740):1564-6.
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. The transcriptional landscape of the mammalian genome. Science 2005;309(5740):1559-63.









