Morning Session |
AM1 |
Demonstration Projects in Clinical Informatics |
AM2 |
Introduction to Factor Graphs and the Sum-product Algorith: Applications to Genome Tiling Microarrays
and Gene Interaction Networks |
AM3 |
Novel Visualization and Quantitative Analysis Methods in BioImaging |
AM4 |
RNA-interference: the Short and Long of It |
AM5 |
Computational Methods in MS-based Proteomics |
Afternoon Session |
PM6 |
Introduction to the Semantic Web for Bioinformatics |
PM7 |
Structure Based Methods for Identifying Protein Function |
PM8 |
Pattern Discovery in Sequences and Structures |
PM9 |
Organizing and Understanding the Biological Data Deluge through Phylogenetics |
PM10 |
Combinatorial and Statistical Approaches to Analyzing Biological Networks |
AM1
Demonstration Projects in Clinical Informatics
Carol Cain, PhD
Agency for Healthcare Research and Quality
Expected goals and objectives:
Tutorial participants will gain a broad overview of current issues in applied clinical informatics, spanning detailed
efforts such as innovative decision support systems to large-scale collaborative efforts to build regional health
information exchanges. In addition, we will discuss some of the environmental drivers of such efforts, including new
reimbursement models, legislation, regulation, and organizational change.
Intended audience:
This introductory tutorial is intended for a general audience of informatics researchers who would like to become
familiar with the landscape of clinical informatics, particularly from a federal perspective. We will begin will
the current environment and pressures that healthcare organizations face, such as the Medicare Modernization Act of 2003.
The tutorial will then describe clinical informatics projects across America. These include clinical decision support
systems, bringing appropriate information to the point of care, patient-centered interventions, maintaining information,
and the needs of non-traditional settings such as rural clinics. We conclude with issues which arise when linking large
regional networks into a data exchange, including challenges of data matching, privacy, and standardization.
Although these projects vary widely in scope and content, the tutorial will be structured around research themes which
arise when conducting applied research projects in highly complex environments. We will discuss challenges of study design,
randomization, and evaluation. What are the appropriate metrics of success, and how should they be measured? How can
the quality of data be verified? How do organizational factors influence subjects’ interactions with technology? How
would future bioinformatics activities fit in? And finally, what are the open questions in clinical informatics where
research activity is needed?
Carol Cain is a PhD graduate of Stanford’s biomedical informatics program. Her research interests include
computational simulation of medical workflow, decision-theoretic cost-effectiveness analysis, and the impact of new
technology on organizations. She is currently a health IT portfolio manager at the Agency for Healthcare Research and
Quality (AHRQ), overseeing projects that demonstrate the value of health IT. She has served as a graduate TA, an ESL
teacher, and frequently gives presentations as a representative of AHRQ.
Return to index • Return to Program
AM2
Introduction to Factor Graphs and the Sum-Product Algorithm: Applications to Genome Tiling
Microarrays and Gene Interaction Networks
Brendan J. Frey, PhD
Associate Professor, University of Toronto
While dynamic programming has proved to be a powerful tool in the analysis
of low-complexity sequence data, many of the most compelling problems in
molecular biology involve large numbers of long-range interactions.
Recently, a generalization of dynamic programming, called the sum-product
algorithm, has been used to solve long-standing, fundamental
information processing problems, including Shannon-limit coding on the
Gaussian channel and random satisfiability. The sum-product algorithm
works by passing messages on edges in a "factor graph," which represents
the potential interactions in the system of interest. Other graph-based models, including
Bayesian networks and Markov random fields, can be represented and learned
more efficiently using factor graphs.
In this tutorial, I
will review factor graphs and the sum-product algorithm, and describe in
detail how this method has been used to achieve leading-edge results on
genome tiling microarray analysis and inference of biomolecular
interaction networks. These two problems exemplify a common, difficult
challenge in molecular biology: Revealing hidden variables that explain
observed data.
Return to index • Return to Program
AM3
Novel Visualization and Quantitative Analysis Methods in BioImaging
David Knowles, PhD
Scientist, Life Sciencs Division, Lawrence Berkeley National Laboratory
Recent developments in and increased availability of fluorescence microscopy are providing
biologists with powerful new tools for studying cellular and macromolecular events under a new light. These
new technologies beckon for novel developments in visualization and quantitative image analysis to aid in the
extraction of the information hidden in the enormous amounts of high resolution, three-dimensional data generated.
The goal of this tutorial is to present novel visualization and image analysis techniques currently being developed
for two ongoing multidisciplinary projects in the Life Science Division of the Lawrence Berkeley National Lab.
The tutorial will provide an essential refresher on the underlying physical optics which link the physical and frequency
domains and set the theoretical limit to image fidelity. It will cover recent developments in fluorescence microscopy
including new fluorescence probes (QDots, nanoparticles & GFPs), and confocal techniques including 2-photon excitation,
emission spectral analysis (Zeiss Meta Device), spinning disk techniques (Yokogawa’s CSU-10), line scanning acquisition
(Zeiss LSM 5 Live), and Wilson’s grating imager (Zeiss ApoTome). The tutorial will then focus on a range of novel image
analysis techniques. Techniques which provide automated segmentation of cells and nuclei from 3D images of tissue-structures
and entire Drosophila embryo will be presented. Visualization techniques which are essential for qualitative understanding of
the 3D data sets and the quantitative evaluation and development of segmentation techniques and feature extraction techniques
will be presented. Model-based feature extraction methods which allow the quantification of gene expression and the spatial
distribution of nuclear components will be presented. Shape-context registration techniques which allow multiple embryo images
to be registered and overlaid will be presented.
The tutorial will conclude by presenting some of the latest biological findings resulting from the application of these
techniques. Recent accomplishments that have shed light on how the nuclear organization within breast epithelia changes
during the nonmalignant / malignant transformation and how gene expression analysis at cellular resolution is untangling
the early transcriptional network of Drosophila, will be presented.
Dr. Knowles’ Teaching Experience:
- Lecturer, from 1987 to 1989, in the Pathology Department at the University of British Columbia, of a third year
level course in microscopy (Pathology 305, session 1). The course involved 2 hours of lectures and one laboratory
period per week. Dr. Knowles was responsible for the course syllabus, the introduction of a laboratory section to the
course, writing the exams and evaluation of the students.
- Sessional Lecturer, 1992, in the Physics Department at the University of British Columbia, of a first year level course
(Physics 110). The course involved 2 hours of lectures and 1 hour of tutorial per day, and 2 laboratory periods and 1 exam
per week. Dr. Knowles was responsible for writing the lectures and exams, incorporating numerous demonstrations during the
lectures, instructing and over-seeing the laboratory session and for all student marking and assessment.
- Teaching Assistant, from 1987 to 1992, in the Physics Department at the University of British Columbia in teaching
laboratories and tutorials for a variety of courses including, 1st year physics (Physics 110 and 115), 3rd year optics
laboratory (Physics 307) and 4th year continuum mechanics (Physics 406).
- Physics Instructor, from October 1993 to December 1993 I had a temporary appointment at the University College of the Cariboo
in Kamloops B.C. I taught three degree level courses in physics (4th year Atomic physics, 3rd year Electricity and Magnetism
and 3rd year Optics laboratory) as well as a first year laboratory. I was responsible for writing all the lectures,
incorporating demonstrations during lectures, instruction during laboratory sessions and all student marking and assessment.
- Student Statements from the 1992 Physics 110 Teacher Evaluation, University of British Columbia:
"David is very energetic and sophisticated and manages to synthesize Physics 110 into an exciting course. His demonstrations,
standing on tables, bouncing off the chalk boards and moving to create effect were particularly useful and stimulating."
"I was extremely happy with the way this course was taught. Finally, physics makes sense and it was fun and interesting."
- Student Statements from the 1993 Physics Teacher Evaluation, University College of the Cariboo:
"Dave is very energetic, precise, respectable and honest. He explains what is necessary to be done to succeed and offers
all assistance to his students."
"David relates very well to people - talks to people at their level. He also shows the relations between various branches of physics."
"David is an all round excellent instructor."
Return to index • Return to Program
AM4
RNA-interference: the Short and Long of It
Michele Markstein, PhD
Postdoctoral Fellow, Department of Genetics/Howard Hughes Medical Institute,
Harvard Medical School
RNA interference (RNAi) is becoming one of the most widely used methods in both academia and the
biotech-industry to dissect the roles of every gene in the human, mouse, fly, and worm genomes. In short, RNAi works
by preventing the flow of genetic information from messenger RNA (mRNA) to protein. The RNAi process is initiated
by double-stranded RNAs (dsRNAs), which are recognized by the cell, complexed with proteins and cleaved into short
RNAs on the order of 20 basepairs (bp). The 20 bp stretches of dsRNA are then unwound into single-stranded RNAs
that target specific cellular mRNAs by the pattern recognition process of complimentary basepairing. dsRNAs have
been designed against each gene in several animal genomes and used to study what happens when the expression of
each corresponding gene is knocked down by the process of RNAi. Such whole genome RNAi approaches have been applied
to many biological problems including studies of cancer, neurodegeneration, cell death, signaling networks, immunology,
and the RNAi process itself.
While whole genome RNAi studies are very promising and powerful, there is an underlying computational problem in
designing dsRNAs with specificity against single genes. As explained above, the recognition process involves basepairing
between stretches of RNA on the order of about 20 basepairs. However, as with many biological processes, the pattern
recognition process of RNAi is relaxed and allows for mismatches. Thus, many designed dsRNAs turn out to have several
off-targets, making it impossible to interpret the outcomes of experiments with these dsRNAs. Although the pattern recognition
rules of RNAi are not fully understood, there may be some clues in a class of small RNAs called microRNAs, which the cell
produces to negatively regulate its own signaling networks. microRNAs are single-stranded RNAs that fold-back on themselves
to create dsRNAs which then become incorporated into the RNAi pathway to terminate the expression of specific genes. By
matching microRNAs with their targets, it is becoming possible to discern some of the finer rules of pattern recognition.
This tutorial will provide an overview of RNA biology and will focus on active research in: (1) predicting microRNAs and their
targets, (2) applications of whole-genome RNAi in academia and industry, and (3) methods for designing highly specific dsRNA libraries.
Intended Audience:
This is an introductory RNA tutorial. Participants should be familiar with basic biological principles, such as how
information flows from DNA to RNA to protein, and how DNA and RNA molecules recognize each other by complimentary
basepairing. This background can be reviewed at the NCBI science primer website:
http://www.ncbi.nlm.nih.gov/About/primer/genetics_genome.html.
Michele Markstein is a postdoctoral fellow in the laboratory of Dr. Norbert Perrimon at Harvard Medical School.
Michele is using the whole-genome RNAi facility
(http://flyrnai.org/) pioneered by the Perrimon lab to study a class of DNA
regulatory elements called insulators, with the ultimate goal of understanding how genomes become partitioned into specific
units of gene activity. The RNAi facility allows for high-throughput RNAi screening in Drosophila tissue culture cells. While
this proves to be a highly efficient and effective means for sorting out gene functions, it is limited by the somewhat artificial
biology of cultured cells. It would therefore be useful to also apply RNAi to cells within their natural context, in whole living
organisms. Toward this end, Michele is developing a new technology in whole living fruit flies, to systematically knock down the
expression of every gene by RNAi.
Return to index • Return to Program
AM5
Computational Methods in MS-based Proteomics
Bobbie-Jo Webb-Robertson, PhD
Senior Research Scientist, Pacific Northwest National Laboratory
In recent years, the advance of high-throughput (HTP) technologies and
platforms for data-intensive computing are yielding more and more
completed genomes. While the genome remains largely unchanged, the
proteins in any particular cell change dramatically as genes are turned
off and on in response to its environment. As proteins provide the
structural and functional framework for cellular life, understanding the
dynamic nature of their expression and interaction is a necessity in
order to attain a comprehensive representation of biological systems.
Traditionally, proteomics used two-dimensional polyacrylamide gel
electrophoresis to generate protein maps of expression and/or
quantitation. However, this approach only focuses on a small number of
proteins and is low-throughput. Currently, technologies employing mass
spectrometry (MS) have revolutionized proteomics by offering a platform
on which to make HTP measurements that increase sensitivity and
specificity at a global scale. This approach theoretically allows the
full proteome (all proteins expressed by the genome at a given time) to
be measured concurrently. Thus, this technology is fueling the current
revolution in proteomics that is advancing the scope of biological
research from a simple biochemical analysis of a protein to the
characterization of the expression, function, and interaction of
proteins on a global scale. However, this new HTP era of proteomics
requires computational methods that can make inferences from both the
raw and processed experimental data sources.
Return to index • Return to Program
PM6
Introduction to the Semantic Web for Bioinformatics
Kenneth Baclawski, PhD
Associate Professor of Computer Science, College of Computer and Information Science, Northeastern University
Biologists heavily use the web, but the web is geared much more toward
human interaction than automated processing. While the web gives
biologists access to information, it does not allow them to easily
integrate different data sources or to incorporate additional analysis
tools. The Semantic Web addresses these problems by annotating web
resources and by providing reasoning and retrieval facilities from
heterogeneous sources.
This tutorial introduces the basic languages of the Semantic Web from
the point of view of the life sciences, especially bioinformatics.
The objective is to cover the major web ontology languages, what they
mean and how they are used. The emphasis will be on pragmatic
application issues. The goal is for participants to have a
understanding of the Semantic Web sufficient for them to be able to
make decisions about whether and how to use the Semantic Web.
Ken Baclawski is an Associate Professor of Computer Science at Northeastern University. He is also affiliated with the
Division of Preventive Medicine of Brigham and Women’s Hospital at the Harvard Medical School. His primary research area
is formal ontologies, and he has been actively working in the area of biomedical ontologies since 1992. Prof. Baclawski
has been active in the development of the Semantic Web since it was first proposed, being part of the team that developed
the DAML+OIL language, later renamed the Web Ontology Language (OWL).
Prof. Baclawski and Prof. Tianhua Niu of the Harvard Medical School have written a book on the subject of the proposed tutorial,
titled Ontologies for Bioinformatics. This book has been accepted for publication by the MIT Press as part of their series
on Computational Molecular Biology. The book is scheduled to appear in June, 2005.
Return to index • Return to Program
PM7
Structure Based Methods for Identifying Protein Function
Mike Liang, PhD candidate
Biomedical Informatics Training Program, Stanford University
D. Rey Banatao, PhD
NSF Postdoctoral Fellow, Department of Chemistry and Biochemistry, University of California, Los Angeles
Atomic resolution structures of biomolecules (proteins and nucleic acids) provide great
insight into the chemistry that allows proteins
to function and interact. Structural genomics initiatives are aimed
at the high-throughput determination of 3D protein structures.
Increasingly available 3D data can provide significant insight in
determining the molecular function of proteins as well as
identifying important functional sites that could be useful for drug
targeting. This large volume of 3D structures requires new
computational methods to provide rapid analysis and functional
annotation of the data.
This tutorial will provide the following background:
- Brief review of basic principles in 3D structure of proteins
- Brief overview of 3D structure and function data sources
- Brief presentation on 3D molecular visualization tools for both web-based and off-line analysis
The majority of the tutorial will focus on methods for inferring protein
functional sites from 3D structure including those based on:
- distance
- orientation
- surface geometry
- physicochemical properties
Participants will leave with a solid understanding of the basic concepts of
3D protein structures, the available data sources for structure-function
analysis, the tools available for analysis, and the basic principles behind the tools.
Mike Liang is a Ph.D. candidate in the Biomedical Informatics Program at
Stanford University. His research interests lie primarily in annotating
likely functional sites in protein structures. His current research is
on automatic identification of conserved physicochemical properties
around functional sites in 3D structures of proteins
(http://feature.stanford.edu/). Liang received his B.S. in Computer
Science with a minor in Chemistry from University of California, San Diego.
Dr. Rey Banatao is an NSF Postdoctoral Fellow in the Yeates Lab in
the Department of Chemistry and Biochemistry and the California
Nanosystems Institute at the University of California Los Angeles.
His research interests are in protein design using computational and
experimental methods with possible applications in biomaterials and
nanotechnology. Dr. Banatao recieved his B.A. in Biochemistry and Molecular Biology from U.C. Berkeley and his Ph.D. in Biological and
Medical Informatics from U.C. San Francisco.
Return to index • Return to Program
PM8
Pattern Discovery in Sequences and Structures
Giri Narasimhan, PhD
Bioinformatics Research Group (BioRG), Florida International University
Many fundamental problems in bioinformatics can be cast as a problem of pattern discovery. Pattern
discovery can be supervised or unsupervised. Here we will survey existing techniques for pattern discovery and discuss
several bioinformatics applications. The techniques to be discussed will include:
- Basic string algorithms
- Profiles and profile HMMs
- Gibbs Sampling
- Combinatorial approaches
- Data mining approaches
Applications include:
- Motif discovery in proteins
- Detecting regulatory elements in DNA sequences
Since supervised pattern discovery requires the design of a training set, we will discuss implications
of the choice of training sets, both positive and negative. Finally, we will discuss approaches to do pattern discovery
in protein structures and the concept of sequence-structure patterns.
Giri Narasimhan heads the Bioinformatics Research Group (BioRG) and is a Professor in the School of Computer Science at
Florida International University, Miami. He received his B-Tech in Electrical Engineering from the Indian Institute of
Technology in Bombay, India, and his PhD in 1989 from the University of Wisconsin – Madison. From 1989 to 2001 he held a
faculty position at the University of Memphis, Tennessee. He has written over 60 research articles. His research interests
include Geometric and graph algorithms, and problems in machine learning, biotechnology and bioinformatics. For more
information, visit the URL:
http://biorg.cs.fiu.edu
Return to index • Return to Program
PM9
Organizing and Understanding the Biological Data Deluge through Phylogenetics
Indra Neil Sarkar, PhD
Bioinformatics Associate, Division of Invertebrate Zoology, American Museum of Natural History
Advancements in computational and sequencing techniques have led to the availability of massive
amounts of biological information pertaining to organisms that span the entire tree of life. However, the organization,
representation, and annotation of these volumes of data (available from disparate resources in a wide range of forms)
pose a significant challenge for the research community. This tutorial will discuss the various phylogenetic methods
that are used for the organization and annotation of biological information. Additionally, there will be discussion
about how to use phylogenetic techniques to organize a range of disparate data types ranging from genotypic to phenotypic
information. There will be an overview of many of the available data types that can be used for phylogenetic inferencing.
Finally, applied phylogenetic approaches, such as correlative hypothesis generation and heuristic character-based approaches
for phylogenetic classification, will be discussed. The significant challenges in the design and use of phylogenetic methods
will be an underlying theme, posing rich theoretical research questions throughout the tutorial.
Return to index • Return to Program
PM10
Combinatorial and Statistical Approaches to Analyzing Biological Networks
Eric Xing, PhD
Assistant Professor, School of Computer Science, Carnegie Mellon University
Roded Sharan, PhD
Senior Lecturer, School of Comptuer Science, Tel-Aviv University
High-throughput technologies enable the systematic assaying
of transcript and protein abundance, physical, regulatory and genetic interactions among proteins, and the biochemical, morphological
and epigenetic states of the cell. These measurements promise
detailed mechanistic pictures of complex cellular processes, challenging conventional biostatistical and computational methods
for comprehending, manipulating, and querying such vast body of data
from diverse sources. The multi-aspect, genome-wide data of biological signals underlying regulatory and signaling circuitry
can be naturally modeled by a graph, or a network. Rich information regarding the dependencies, interactions, function and conservation of
bio-molecules can be extracted from such data based on combinatorial and graph theoretic analyses. Furthermore,
recent developments of graphical models---a formalism that exploits the conjoined talents of graph theory and probability theory---provide a
powerful language to define expressive distributions of the data, and a systematic computational framework for probabilistic inference.
In this tutorial we will review the emerging field of network biology and survey recent graph-theoretic and
statistical machine learning approaches to dissecting protein networks and microarray data, including graph detection
algorithms, inference and learning algorithms for Bayesian networks and Markov random fields, and techniques for data integration.
We will demonstrate the application of these methods to analyzing protein-protein interaction networks and transcriptional regulatory networks.
Intended audience:
Researchers in computational biology, systems biology, sequence
analysis, machine learning, combinatorial optimization and Bayesian
statistics. A graduate level knowledge of computer algorithms, and
probability/statistical theory would be helpful but not
required for most of the material to be covered.
Return to index • Return to Program
RETURN TO TOP
|