This year, CRA Board Chair Susan Davidson received the IEEE TCDE Impact Award for “expanding the reach of data engineering within scientific disciplines.” In this blog post, Shar asks her how her career led to this award.
Shar: How did your interest in bioinformatics come about?
Susan: With a father who was an applied mathematician and mother who was a plant scientist, both of whom were professors at Cornell University, I am truly a product of my environment! I studied mathematics at Cornell as an undergraduate, and after receiving my Ph.D. in computer science at Princeton University, I became a faculty member at the University of Pennsylvania. There I met Chris Overton, who had a Ph.D. in developmental biology and came to Penn to obtain a Master’s in computer science, because he believed that the future of biology was computational. He was quite a visionary for the time!
After Chris was hired to head up the informatics component of the Center for Human Chromosome 22 in the 1990s, we frequently discussed the challenges he faced. This became a rich vein of research problems that the Database Group at Penn has worked on for over two decades.
Shar: What are some of the research problems you have worked on in bioinformatics?
Susan: Two of my favorite problems have been data integration and data provenance.
Data integration systems in the 1990s focused on relational databases. However, most data generated within the Human Genome Project were stored in specialized file formats with programmatic interfaces. This led experts to state in a report of the 1993 Invitational DOE Workshop on Genome Informatics that “Until a fully relationalized sequence database is available, none of the queries in this appendix can be answered.” We were able to answer the “unanswerable queries” within about a month using our data integration system, Kleisli, which used a complex-object model of data, language based on a comprehension syntax, and optimizations that went beyond relational systems. Our team also included experts who knew where the appropriate data sources were and how to use them to answer the queries, a problem which has not been as well addressed by computational techniques.
The challenge of data provenance arose in the context of data integration. Not all data sources were equally trusted, but no one wanted to express this opinion by failing to include a relevant data set. The solution was to make provenance available so that users could form their own conclusions. Within the Database Group at Penn we have studied the problem of data provenance within the context of databases (fine-grained provenance, where reasoning is at the level of algebraic operators) and workflows (coarse-grained provenance, where reasoning is at the level seen between “black box” processing steps). Since then, the importance of provenance has been widely recognized, especially as it relates to reproducibility and debugging in scientific applications.
Shar: What do you like best about working in this interdisciplinary area?
Susan: I really enjoy working with scientists in other fields, since they have very different vocabularies, cultures, needs and perspectives on problems. Navigating these differences can sometimes be challenging, especially trading off short-term computational needs versus long-term research ideas, but these collaborations have almost always led to really interesting research challenges. And it is especially gratifying to see these ideas being used in practice!
About the Awardee
Susan B. Davidson received the B.A. degree in mathematics from Cornell University in 1978, and the M.A. and Ph.D. degrees in electrical engineering and computer science from Princeton University in 1980 and 1982. Davidson is the Weiss Professor of Computer and Information Science (CIS) at the University of Pennsylvania, where she has been since 1982, and currently serves as chair of the board of the Computing Research Association.
Davidson’s research interests include database and web-based systems, scientific data management, provenance, crowdsourcing, and data citation. She was the founding co-director of the Penn Center for Bioinformatics from 1997-2003, and the founding co-director of the Greater Philadelphia Bioinformatics Alliance. She served as Deputy Dean of the School of Engineering and Applied Science from 2005-2007 and Chair of CIS from 2008-2013. She is an ACM Fellow, Corresponding Fellow of the Royal Society of Edinburgh, and received a Fulbright Scholarship and Hitachi Chair in 2004. Her awards include the 2017 IEEE TCDE Impact Award and the 2015 Trustees’ Council of Penn Women/Provost Award for her work on advancing women in engineering.