
3ba39375769a242f52566505927e5037.ppt
- Количество слайдов: 13
® IBM Research A Brief Overview of Hadoop Eco-System © 2007 IBM Corporation
IBM Research | India Research Lab Hive § SQL-like language to query data stored on HDFS § Example – “Select c. ID, c. Name, c. AGE, o. Amount From Customers c JOIN Orders o on (c. ID = o. CUSTOMER) § Data Model 4 Tables – Column types (int, float, string, data, Boolean) 4 Supports array / map / struct for Json like data § Meta-Store 4 Name-space containing set of tables, list of columns and their types and Ser. De info § CLI § Other languages – Jaql, Pig
IBM Research | India Research Lab HBase § Hadoop performs only Batch processing. Data will be accessed only in a sequential manner. § One has to search the entire dataset for the simplest of jobs. § HBase provides random read/write access to data in HDFS § Data Model – 4 A table is a collection of rows 4 A row is a collection of column families 4 A column family is a collection of columns 4 A column is a collection of key-value pairs
IBM Research | India Research Lab HBase § Reading – Get and Scan. Reader will always read the last written values § Rows are ordered. § Hbase is not 4 an SQL database, relational, joins, secondary-indices, § Horizontally Scalable
IBM Research | India Research Lab
IBM Research | India Research Lab Oozie § Workflow management and coordination of these workflows § Workflow consist of Action nodes (MR, Pig, Hive) and Control Nodes. Specified through an xml file
IBM Research | India Research Lab Cascading and Scalding
IBM Research | India Research Lab Word-Count in Java
IBM Research | India Research Lab Apache Mahaout
IBM Research | India Research Lab Cascading § A simple, high-level java API for MR easy to understand work with
IBM Research | India Research Lab Scalding § The power of scala over cascading § No boilerplate code
IBM Research | India Research Lab Sqoop § Apache Sqoop is designed for efficiently transferring bulk data between Apache Hadoop and RDBMS § Imports data from external structured datastores into HDFS or related systems like Hbase
IBM Research | India Research Lab Mahout
3ba39375769a242f52566505927e5037.ppt