IEEE Computer Society
Bioinformatics Conference

Stanford University, Palo Alto, CA
August 14 - 16, 2002

Home

Registration

Site Information

Call For Papers

Call For Posters

Conference Program

Keynote Speakers

Committees

Contacts

Tutorial 1A
Perl and Bioperl: Tools for Automated Analysis of Biological Sequence Data

Expected goals, objectives and motivation of the tutorial
The rapid growth of sequence data in public and proprietary databases has created unprecedented opportunities to mine human and other genome databases for novel genes, proteins and biological pathways. In addition, with web-based user interfaces, these resources have become accessible to researchers with limited skills with computers. However, using these interactive interfaces becomes unwieldy when faced with complex comparative analyses involving hundreds or thousands of sequences and multiple databases.

Perl has emerged as the language of choice for the automated access and manipulation of bioinformatics data. However, while writing Perl programs is relatively easy, fully exploiting Perl's bioinformatics capabilities requires a level of programming experience and sophistication not common among biologists. To address this problem, the Bioperl Package - a set of object-oriented modules that implements common bioinformatics tasks - has been developed.

This tutorial describes Perl and Bioperl and their application to practical problems in molecular biology sequence analysis. The tutorial includes an overview of the principal features of Perl and Bioperl relevant to biology, followed by examples of how they can be applied to common bioinformatics tasks. Attention will also be paid to identifying bioinformatics problems for which Perl and Bioperl are not appropriate tools. By the end of the tutorial, participants will have a sense of what capabilities Perl and Bioperl can provide them in molecular biology research, as well as pointers to resources for acquiring the skills and knowledge they will need in order to take advantage of them.

Intended audience:
The tutorial is designed to be at an intermediate level. Experience with some computer language (eg C, java) is assumed. Some experience with Perl is helpful. However, participants who are comfortable in another computer language should find the tutorial valuable even if they have little or no experience with Perl. Advanced knowledge of Perl - such as being able to write object oriented Perl - is not needed. No specific knowledge of biology is required to follow the tutorial. However, familiarity with the typical informatics tasks required by molecular biology sequence analysis - such as retrieving sequences from databases, converting sequence data files among the various standard formats, and parsing results of programs such as BLAST or HMMer - will make it easier for the student to appreciate the advantages of using the tools being presented.


Detailed outline of the presentation.

Part I - Overview of Perl and Bioperl and their Application in Bioinformatics
What bioinformatics tasks are suited to Perl? What are the main capabilities of the Bioperl package?
Example 1 - Searching for promoter sequences
Example 2 - Aligning and "Blasting" protein sequences
When shouldn't you use Perl and Bioperl?

Part II - Selected Topics in Perl for Bioinformatics
Regular expressions and pattern matching
Obtaining perl objects - CPAN and Bioperl Using objects in perl
Miscellaneous useful features of Perl Debugging techniques in Perl

Part III - Getting started with Bioperl
Installing Bioperl
Overview to Bioperl's objects
Sequence objects: (Seq, PrimarySeq, LocatableSeq, LiveSeq, LargeSeq, SeqI)
Interface objects and implementation objects

Part IV - Using Bioperl
Accessing sequence data from local and remote databases
Transforming formats of database/ file records
Manipulating sequences
Searching for "similar" sequences - running and parsing Blast and HMMer
Running sequence alignment programs - Smith-Waterman, Clustalw, TCoffee
Manipulating / displaying alignments
Parsing results of programs that search for genes and other genomic DNA structures
Developing machine readable sequence annotations
Perl, Bioperl and XML
New and upcoming developments - Bioperl-db, Bioperl-gui, Structure objects

Part V - Learning more about Perl and Bioperl for Bioinformatics
Finding more information about specific Bioperl objects
Using sample code from the bptutorial script
Finding where a method is defined using the bptutorial script Using the ptkdb debugger
Using the 't' and 'examples' directories
Recommended Books on Perl
Other Tools and Resources

Brief description of the instructor(s) indicating the relevant qualifications and teaching experience
Peter Schattner, Ph.D., was trained as a physicist, but currently works as a computational biologist. He is a member of the computational-biology group at the University of California, Santa Cruz where his principal research interests are in computer-based methods for identifying non-protein-coding-RNA genes. He is also an active developer for the Bioperl project, having contributed several of its modules as well as having written the Bioperl tutorial. In his "spare time" he enjoys teaching the use of Perl and Bioperl in bioinformatics in venues ranging from the O'Reilly Bioinformatics Conference (http://conferences.oreillynet.com/cs/bio2002/view/e_sess/1948) to the Bioinformatics Program at the California State University, Hayward, CA. (http://www.extension.csuhayward.edu/html/inf.htm). His O'Reilly Bioperl tutorial sold out (with 120 registrants) six weeks prior to the conference.




Tutorial 1B
Cell Signaling and Neural Networks

Instructor
Raxit J. Jariwalla, Ph.D. is a principal research investigator of viral, immune and metabolic diseases at the California Institute for Medical Research in San Jose. He did postdoctoral work at the Johns Hopkins University and was formerly head of a program in virology and immunodeficiency research at the Linus Pauling Institute of Science and Medicine in Palo Alto. In addition to conducting biomedical research, Dr. Jariwalla teaches courses in the Bioinformatics and Biotechnology certificate programs at the UCSC extension in Cupertino.

Expected Goals and Objectives
Biology is rapidly transitioning from the study of single cells to a focus on cellular systems. In order to comprehend systems biology, it is important to understand the mechanics of inter-cellular processes. This tutorial is designed to provide an understanding of how cells communicate with one another in multi-cellular organisms. It will focus on the molecules involved in transmission of messages elicited by extracellular signals ("signal transduction") and how intracellular pathways culminate in the activation or inhibition of gene expression.

At the conclusion of the course, students should be able to:
• Gain understanding of the general principles of cell-to-cell communication
• Describe major classes of cell-surface receptors involved in signal transduction
• Discuss specific pathways that mediate action of hormones, growth factors and neurotransmitters and
• Learn how signaling pathways operate by comparing them to computer-based neural networks

Audience
This course is designed for computer and life scientists who want to obtain knowledge and insights into the mechanics of cell communication and pathways of signal transduction. A science background with some college-level courses in biology is desirable.

Course Description
A major hallmark of multi-cellular organisms is cell-to-cell communication. During multi-cellular development, cells become programmed to respond to specific signals elicited by other cells that cause alteration in the behavior of the recipient cell. Inter-cellular communication is mediated by diffusible secreted signals or cell-associated-molecules that act via contact-dependent signaling. In addition to such extra-cellular signals, cell communication depends on the presence of receptor proteins in recipient cells that enable them to recognize complementary signals and respond to them in a specific fashion. Several types of cellular receptors have been identified, the most common being membrane-associated surface proteins that undergo activation upon signal binding resulting in conversion of the extra-cellular binding event to intracellular signals which are transmitted along defined pathway of signaling molecules to target proteins that change the behavior of the recipient cell.

This tutorial will discuss general principles of cell communication and focus on three major classes of cell-surface receptors that affect important signaling pathways: enzyme-linked receptors (protein kinases) that cause phosphorylation of intracellular proteins in recipient cells; ion-channel-linked receptors that open or close channels for ion transport in response to activation by a neurotransmitter; and G-protein-linked receptors that activate or inactivate membrane-associated enzymes or ion channels via trimeric GTP-binding proteins. These pathways will be discussed in the context of the role of these receptors in responding to diverse signal molecules, including growth factors, hormones and neurotransmitters and the consequence of aberrant signaling to specific human diseases. In addition, the tutorial will discuss how computer-based networks ("neural networks") can be used to learn about the behaviour of intracellular signaling pathways.

Outline
The topics to be covered in this course will include:
• General principles of cell communication
• Brief history of major developments
• Extracellular signals and cell-surface receptors
• Signaling modules, adaptors and scaffolding proteins
• Enzyme-linked receptor pathways
• Ion-channel-linked receptor pathways
• G-protein-linked receptor pathways
• Computer-based neural networks
• Aberrant signaling and human disease

Recommended Text
Molecular Biology of the Cell, B. Alberts et al., 4th edition, Garland Science Publishing




Tutorial 2A
Comparative Genomics for Biological Discovery


The explosion in genomic sequence available in public databases has resulted in the unprecedented opportunity for computational analyses of the human genome. A number of promising comparative-based approaches have been developed for gene finding, regulatory element discovery and other purposes, and it is clear that these tools will play a fundamental role in analyzing the enormous amount of new data that is currently being generated. The synthesis of computationally intensive comparative computational approaches with the requirement for whole genome analysis represents both an unprecedented challenge and opportunity for computational scientists. We focus on a few of these challenges, using by way of example the problems of alignment, gene finding and regulatory element discovery.

We assume that participants have little or no training in molecular biology. Introduction to central dogma of molecular biology and other basic concepts will be given. Areas to be discussed:

• Evolution and comparative genomics
• Sequence alignment and its visualization
• Comparative gene finding
• Regulatory regions discovery
• Comparing whole genomes Web - based tools

Inna Dubchak, Ph.D. is a staff scientist and bioinformatics group leader at Lawrence Berkeley National Laboratory. She was trained as a physicist and is currently involved in development of various bioinformatics programs and biological databases. Her latest studies in comparative genomics include building user-friendly tools widely accepted in the biological community, and alignment and comparative analysis of complete vertebrate genomes.



Tutorial 2B
Biology for Computer Scientist

This tutorial introduces basic concepts in molecular biology with a special emphasis on gene regulation in metazoans (multicellular animals). The tutorial will be presented as two lectures, each of which will be structured around critical questions being pursued by genome biologists and bioinformaticists. The first lecture will address the question: Is the egg computable? --meaning, if we know the DNA content of a fertilized egg, to what extent can we predict the developmental outcome? The second lecture will focus on the computability of molecular evolution: If we know the DNA content of several organisms, can we deduce how complexity evolved or infer how it may continue to evolve?

We assume that participants have little or no training in molecular biology. Concepts to be covered include (but are not limited to):

(1) The central dogma: flow of information from DNA to RNA to protein
(2) Gene architecture (regulatory and coding DNA, intron/exon structure)
(3) Transcription factors and their DNA binding motifs
(4) Embryonic patterning through differential gene expression
(5) Morphogens: binary vs. analog signaling
(6) The Hox paradox
(7) Molecular clocks
(8) Homology at the level of genes, proteins, and beyond
(9) Functional genomics
(10) Comparative genomics

Techniques to be covered include (but are not limited to):

DNA microarrays, footprinting, oligonucleotide selection, in situ hybridization, transgenic animal assays, and current web-based tools for sequence analysis.

About the Lectures

Michele Markstein is at UC Berkeley, completing her Ph.D in Developmental Genetics from the University of Chicago. Her thesis work focuses on the application of functional genomics to decipher cis-regulatory codes in early development. She is a co-founder of Open Genomics, a group which provides free web-based tools for genome biologists: http://www.opengenomics.org

David N. Keys received his doctorate from the Department of Genetics at the University of Wisconsin at Madison. He recently completed a postdoc at the University of California at Berkeley where he was a campus fellow with the Miller Institute for Research in Basic Science. He is currently a group leader in the Functional Genomics Division at the Joint Genome Institute in Walnut Creek, CA. His research involves large scale experimental screens for DNA with cis-regulatory activity.



Tutorial 2C
Logical Analysis


The Logical Analysis of Data (LAD) is a combinatorics, optimization, and Boolean algebra-based methodology for extracting information from data. Data is usually (but not always) supposed to be of 2 kinds, say positive and negative, and each data point is thought of as a point in the n-dimensional real space. LAD's basic tool is the concept of "pattern", which is interpreted as a subset of the Euclidian space, containing data points of only one kind (either positive or negative). LAD extracts from the dataset thousands, or tens of thousands, of maximal (not maximum) patterns, after which it filters this system, ending up with a small subset (usually not more than a dozen or two) of patterns, called a "model", which can describe completely the dataset (i.e., each of the given points in the dataset satisfies some of the positive, or some of the negative patterns in the model, but not both).

LAD was first proposed in 1986, the first paper on the topic appeared in 1988 in the "Annals of Operations Research", and a first report on its implementation, along with some applications, appeared in "IEEE Transactions on Knowledge and Data Engineering", Volume 12, Number 2, March/April 2000, p. 292-306.

LAD's results can and have been used for classification (including diagnosis and prognosis), for development of decision support systems, for discovering new classes, etc. Examples of problems where LAD has been applied recently include ovarian cancer diagnosis using proteomic datasets, risk stratification among cardiac patients, design of polymer scaffolds for bone regrowth, etc.

The tuturial will start with a general presentation of the methodology of LAD applied to problems with binary data, after which we shall present ways of tranforming problems with numerical data to reasonably-sized binary problems. Algorithms will be described for finding irredundant sets of variables, for detecting patterns and models, and for constructing a discriminant. Applications to classification, variable analysis, and decision support systems will be outlined. Several biomedical case studies will be presented in detail. The tutorial will include a software demonstration presented by Mr. Sorin Alexe, and will leave ample time for discussions with the participants.

Every effort will be made to make the tutorial easily accessible for both computational and biomedical researchers. The tutorial will not require special knowledge in mathematics or computer science, and will be as self-contained as possible.

About the Lecturers:

Peter L. Hammer is a mathematician working at Rutgers University as professor of operations research, mathematics, computer science, management science and information systems, and as director of RUTCOR (Rutgers Center for Operations Research). Peter L. Hammer received his doctorate in mathematics at the University of Bucharest in 1966. He has received honorary doctorates from the Swiss Federal Institute of Technology, the University of Rome and the University of Liége, and he is the recipient of the Euler Medal of the Institute of Combinatorics and its Applications. He has worked in the area of Boolean functions since the beginning of his career, having coauthored with Sergiu Rudeanu the monograph Boolean Methods in Operations Research (Springer Heidelberg, Berlin, New York, 1968; French translation, Dunod, Paris, 1970), and having published close to 200 papers on different aspects of Boolean function theory. His current research revolves around the use of partially defined Boolean functions in data analysis, with special emphasis on biomedical applications. Peter L. Hammer is the founder and Editor-in-Chief of Discrete Mathematics and Discrete Applied Mathematics (Elsevier), Annals of Operations Research (Kluwer), and SIAM Series of Monographs on Discrete Mathematics and Applications (SIAM).

Mr. Sorin Alexe is the developer of the DATASCOPE software for the Logical Analysis of Data (LAD). His basic background is in mathematics and computer science. After having spent several years in academic and industrial research, he is currently working on a Ph.D. in operations research. His publications range from topics in computational graph theory to data analysis and related areas. Lately, he is heavily involved in the logical analysis of medical data, having introduced a new LAD-based risk index, which has been successfully applied in a collaborative study with cardiologists at the Cleveland Clinic for the stratification of risk among cardiac patients.

Copyright© 2002 IEEE Computer Society Bioinformatics