Apache Sqoop
Apache Sqoop is a tool designed for efficiently transferring (both import and export) bulk data between Hadoop and other relational database system like Oracle, MySql, Postgresql.
Also we can simply say its a channel for importing and exporting structured data from different database to HDFS or related Hadoop eco-systems like Hive and HBase.
Also we can simply say its a channel for importing and exporting structured data from different database to HDFS or related Hadoop eco-systems like Hive and HBase.
Sqoop provides many salient features like:
- Full Load:-
- Incremental Load:-
- Parallel import/export:-
- Import results of SQL query:-
- Compression:-
- Connectors for all major RDBMS Databases:-
- Kerberos Security Integration:-
- Load data directly into Hive/Hbase:-
- Support for Accumulo:-
Sqoop Architecture/ How Sqoop work internally in Hadoop

Sqoop provides command line interface to the end users. Sqoop can also be accessed using Java APIs. Sqoop command submitted by the end user is parsed by Sqoop and launches Hadoop Map only job to import or export data because Reduce phase is required only when aggregations are needed. Sqoop just imports and exports the data; it does not do any aggregations.
Sqoop parses the arguments provided in the command line and prepares the Map job. Map job launch multiple mappers depends on the number defined by user in the command line. For Sqoop import, each mapper task will be assigned with part of data to be imported based on key defined in the command line. Sqoop distributes the input data among the mappers equally to get high performance. Then each mapper creates connection with the database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based on the option provided in the command line.
Sqoop Import
Sqoop parses the arguments provided in the command line and prepares the Map job. Map job launch multiple mappers depends on the number defined by user in the command line. For Sqoop import, each mapper task will be assigned with part of data to be imported based on key defined in the command line. Sqoop distributes the input data among the mappers equally to get high performance. Then each mapper creates connection with the database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based on the option provided in the command line.
Syntax for Sqoop
Sqoop Import
The following commands are used for imports a table from an RDBMS to HDFS. Each record from a table is considered as a separate record in HDFS. Records can be stored as text files, or in binary representation as Avro or SequenceFiles.
More content to be added ....
More content to be added ....
For more information on Sqoop refer SqoopDoc
0 comments:
Post a Comment