Wide-Area Data Analytics
Modern datasets are often distributed across many locations. In some cases, datasets are naturally distributed because they are collected from multiple locations, such as sensors spread throughout a geographic region. In other cases, datasets are distributed across different data centers to improve scalability or reliability, or to reduce cost. These distributed locations can be a mix of public clouds, private data centers, and edge computing sites. How should we analyze data collected or stored at multiple far-flung locations? The simplest solution would backhaul all data to a single location for analysis, but this approach may introduce excessive overhead and/or delay. Yet analyzing data in a fully distributed fashion may be expensive, too, especially when the analysis task needs to combine data from different locations, or the distributed sites have limited computation, storage, or energy. Deciding where and how to analyze the data becomes even more challenging when the available resources (such as network bandwidth) vary over time, and when the system needs to strike a trade-off between the overhead of answering a query and the accuracy of the results.
Several parts of the computer science research community are exploring how to perform wide-area data analysis, including researchers and practitioners in the database, networking, distributed systems, and storage fields. These communities often focus on different aspects of the problem, consider different applications and user cases, and design and evaluate their solutions differently. To overcome this gap between communities the CCC Systems and Architecture task force will hold a visioning workshop on wide area data analytics. This workshop will bring together researchers and practitioners in the database, networking, distributed systems, and storage fields in order to and come up with a unified set of research challenges and create opportunities for interdisciplinary collaboration.
Additionally, the workshop will help create a stronger foundation for a broader view of “systems” as a common core research area in computer science, rather than separate research communities. Increasingly, most practical systems problems span a range of areas of systems research, rather than focusing squarely in one area. We believe that research and education in these fields will benefit broadly from efforts to work across traditional boundaries.
Example topics for the workshop include:
- Use cases for wide-area data analytics, including public and private clouds, analysis across multiple data centers and cloud providers, video analytics, Internet of Things, distributed network monitoring, social networking applications, and more.
- Query languages and stream processing models that balance the tension between expressiveness, accuracy, and the ability to support distributed query execution.
- Machine-learning techniques for predicting the future availability of system resources (e.g., storage, compute, and network) and the requirements for answering queries.
- Techniques for in-network collection and analysis of data, including middleboxes and programmable network devices.
- Consistency models for distributed analysis and replication of data.
- Techniques for distributed data analytics in the wide area, within a data center, and within a single cluster, and across multiple levels of distribution.
- Real-time, streaming analytics applications and platforms.
- Protecting privacy by analyzing data in place.
The Computing Community Consortium (CCC) will cover travel expenses for all participants who desire it. Participants are asked to make their own travel arrangements to get to the workshop, including purchasing airline tickets. Following the symposium, CCC will circulate a reimbursement form that participants will need to complete and submit, along with copies of receipts for amounts exceeding $75.
In general, standard Federal travel policies apply: CCC will reimburse for non-refundable economy airfare on U.S. Flag carriers; and no alcohol will be covered.
For more information, please see the Guidelines for Participant Reimbursements from CCC.
Additional questions about the reimbursement policy should be directed to Khari Douglas, (kdouglas [at] cra.org).