The following Great Innovative Idea is from Da Yan, an assistant professor in the Department of Computer and Information Sciences (CIS) at the University of Alabama at Birmingham (UAB). Yan presented his poster, Big Data Frameworks: Bridging High Performance of HPC Community with Programming Friendliness of Data Science Community, at the CCC Symposium on Computing Research, October 23-24, 2017.
Our data-intensive graph analytics systems have already been used by peer researchers in their work in first-tier conferences like SIGMOD and ICDE. We have also applied them in finding cybercriminals from online social networks and forums, by running random walk algorithms such as TrustRank, HITS and SALSA to get graph-based scores, and then integrating them for collective classification. This work is a collaboration with the UAB Center for Information Assurance and Joint Forensics Research (CIA|JFR). Another application is in large-scale genome assembly, where contigs (long DNA sequences) can be found from the de Bruijn graph using the list ranking algorithm on Pregel+.
Our new system, G-thinker, opens more opportunities for graph-based research. My group plans to use G-thinker to find communities involving cybercriminals from online social networks and forums, and visualize them using force-directed algorithms. This would assist computer forensic experts in UAB CIR|JFR in finding new cybercriminals. G-thinker will also enable numerous research involving compute-intensive graph analytics that was impossible before.
My group works on scalable systems and algorithms for Big Data analytics, and their application in various real and often interdisciplinary projects. In addition to graph data, we also process geospatial data, and matrix/tensor data. Existing systems such as Apache SystemML perform matrix computations using data-intensive frameworks like Hadoop MapReduce and Spark, and we plan to design a dedicated platform for compute-intensive matrix/tensor computations. The target applications of our systems and algorithms include (but are not limited to) digital forensics, genomic and medical data analysis, etc.