By Rachel Pottinger
Increasingly, jobs rely on the ability to use computers to interpret, understand, and trust data. For example, my students and I have worked with ornithologists who cannot understand the representations of their bird sightings, civil engineers who cannot easily use their own building data, finance experts who cannot trace money between companies and their subsidiaries, and an XML document company whose clients cannot understand data that appears outside of their reports. In each case, the data users have been hampered because their data is exceedingly difficult to understand and trust, even though the users are experts in their fields. One reason for this difficulty is that the organization of the data is often designed for computers, not for people (i.e., for storage, not accessibility). Another reason is that data often come from different sources, leaving users with the challenge of integrating data that they neither understand nor trust.
Lack of understanding and trust can cause problems both large and small. For example, the 2008 financial crisis was partially caused by the inability to track money across multiple sources (e.g., mutual funds are in a different database than savings accounts). Because it was impossible to understand and trust the data that was spread across various sources, regulators did not realize some of the problematic flows of money until after the crash. On a smaller scale, without being able to understand the data well enough to find the data needed in order to evaluate building design choices, it is impossible to easily explore building design alternatives, which leads to stifled innovation and less-efficient buildings.
The goal of my research is to help users understand, find, and trust the data that they need. To that end, my students and I are currently exploring a number of topics, including:
- Making sense of data that is stored in relational databases or XML is difficult. For example, if civil engineers are trying to extract information about where two pieces of a building intersect, they may need to find 10 different elements in a schema that contains thousands of options. Our project seeks to allow users to understand their schemas well enough to query them.
- In many cases when analysis is being performed, a user may have an aggregation query to which she knows what the correct answer should be for one case. Trying to determine why the answer that the user is getting is different from the one provided by the “Oracle” is a frustrating and error-prone process.
One of the hallmarks of my research is my focus on interdisciplinary research and working with real data. By doing so, I am ensuring that all of the research that I’m doing is relevant both to data management and to areas outside of computer science. In addition to publishing in top data management venues, I have worked with data from many different sources, which has resulted in publications in civil engineering and bioinformatics venues and a financial data workshop.
This interest in reaching out from core computer science to other disciplines has been a focus in all of the most meaningful areas of my career, including conducting interdisciplinary research, striving to increase diversity, creating a computational thinking course for first year non-computer science majors, and defining undergraduate strategy in my role as an associate department head.
This perspective is why I am excited about being on the CRA board: computer science is at a crossroads. Our enrollments are increasing exponentially, but many people define computer science very narrowly. If we are not careful, we will lose the ability to shape how computer science research evolves outside of core computer science.
About the Author
Rachel Pottinger is the associate head of the undergraduate program and an associate professor in computer science at the University of British Columbia. She is also a board member of the Computing Research Association. She received her Ph.D. in computer science from the University of Washington in 2004. Her main research interest is data management, particularly semantic data integration, how to manage metadata, how to manage data that is currently not well supported by databases, and how to make data easier to understand and explore. She is the winner of the 2007 Anita Borg Institute’s Denice Denton Emerging Leader award.