ASTERIX: A Highly Scalable Parallel Platform for Semi-structured Data Management and Analysis
Hyracks-to-Hadoop Compatibility Layer
Given that many data analysts are adopting the Hadoop platform, we believe that ASTERIX must provide an easy migration path for existing Hadoop projects in order to attract new users and support clusters running a mix of old and new use cases. In that spirit, we have built a Hadoop compatibility layer on top of Hyracks so that existing Hadoop programs can be executed using Hyracks. If you are a Hadoop user, please check out this aspect of the Hyracks project if you would like speed up your job execution in a low-cost and seamless fashion.
ASTERIX Query Processing Engine
The growing popularity of Hive and Pig for parallel data analysis shows the importance of high-level data langugages: they can greatly reduce development time and make data analysts' lives much easier. We are developing the ASTERIX query processor on top of the Hyracks runtime. This includes the AQL (ASTERIX Query Language) compiler, algebra, and optimizer. AQL queries are compiled to cost-efficient Hyracks jobs. If you want to analyze large scale semi-structured data in parallel, plan to try AQL when it becomes available.
HiveQL Relational Query Processor Plug-in
Given the data-model-agnostic ASTERIX algebra layer, we are able to easily layer a relational query processor such as Hive on top of the Hyracks runtime. In this project, Hive runtime plans are translated to ASTERIX algebra plans, but all functions, expression evaluators, metadata, intermediate data formats, and input/output formats in Hive are reused. If you are a Hive user, please check-out this project as a way to get better performance without any changes in your HiveQL queries.
This brand new project is trying to build an event warehouse that combines traditional information, such as map data, business listings, scheduled events, population data, and traffic data with additional dynamic information such as online news stories, blogs, geo-coded or geo-tagged tweets, status updates, wall posts, geo-coded or geo-tagged photos, etc. This project is being developed by the UCI multimedia research group, and uses ASTERIX with Hyracks as the runtime execution engine.
Acknowledgement: This project is supported by an eBay matching grant, one Facebook Fellowship Award, the NSF Awards No. IIS-0910989, IIS-0910859, a UC Discovery grant, three Yahoo! Key Scentific Challenge Awards, and generous industrial gifts from Google, HTC, Microsoft and Oracle Labs.
For any questions regarding this project, please send email to asterix AT ics.uci.edu.