What is Sqoop?
Sqoop is an open source Hadoop ecosystem that asynchronously imports/export data between Hadoop and relational databases;
Sqoop provides parallel operation and fault tolerance. It means which import and export the data parallelly, so it provides fault tolerance.
Tell me few import control commands:
These commands are most frequently used to import RDBMS data.
How Sqoop can handle large objects?
Blog and Clob columns are common large objects. If the object is less than 16 MB, it stored inline with the rest of the data. If large objects, temporary stored in _lob subdirectory. Those lobs processes in a streaming fashion. Those data materialized in memory for processing. If you set LOB limit to 0, those lobs objects placed in external storage.
What type of databases Sqoop can support?
MySQL, Oracle, PostgreSQL, HSQLDB, IBM Netezza and Teradata. Every database connects through jdbc driver.
sqoop import --connect jdbc:mysql://localhost/database --username ur_user_name --password ur_pass_word sqoop import --connect jdbc:teradata://localhost/DATABASE=database_name --driver "com.teradata.jdbc.TeraDriver" --username ur_user_name --password ur_pass_word
What are the common privileges steps in Sqoop to access MySQL?
As a root user to grant all privileges to access the mysql Database.
Mysql -u root -p //Enter a password mysql> GRANT ALL PRIVILEGES ON *.* TO '%'@'localhost'; mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'localhost'; // here you can mention db_name.* or db_name.table_name between ON and TO.
Sqoop Interview Question and Answers
What is the importance of eval tool?
It allows users to run sample SQL queries against Database and preview the results on the console. It can help to know what data can import? The desired data imported or not?
Stx: sqoop eval (generic-args) (eval-args)
sqoop eval --connect jdbc:mysql://localhost/database -- query "select name, cell from employee limit 10"
sqoop eval --connect jdbc:oracle://localhost/database -e "insert into database values ('Venu', '9898989898')"
Can we import the data with “Where” condition?
Yes, Sqoop has a special option to export/import a particular column data.
sqoop import --connect jdbc:mysql://localhost/CompanyDatabase --table Customer --username root --password mysecret --where "DateOfJoining > '2005-1-1' "
How to export the data from a particular column field data?
There is a separate argument called –columns that allow to export/import from the table.
sqoop import --connect jdbc:mysql://localhost/database --table employee --columns emp_id, name, cell --username root --password password;
What is the difference between Sqoop and distcp?
Distcp can transfer any type of data from one cluster to another cluster, but Sqoop can transfer any data between RDBMS and Hadoop ecosystems. Both distcp and sqoop following same approaches to pull/transfer data.
What is the difference between Flume and Sqoop?
The Flume is a distributed, reliable Hadoop ecosystem which collect, aggregate and move large amount of log data. It can collect data from different resources and asynchronously pull into the HDFS.
It doesn’t consider schema and structure or unstructured data, it can pull any type of data.
Sqoop just acts as interpreter exchange/transfer the data between RDBMS and Hadoop ecosystems. It can import or export only RDBMS data, Schema is mandatory to process.
What are the common delimiters and escape characters in Sqoop?
,) for fields, a newline (
\n) for records. Common delimited fields followed by — and values given below.
--enclosed-by <char> --escaped-by <char> --fields-terminated-by <char> --lines-terminated-by <char> --optionally-enclosed-by <char>
Escape characters are:
Can Sqoop import tables into hive?
Yes, it’s possible, many hive commands also available to import into the Hive.
--hive-import --hive-overwrite --hive-table <table-name> --hive-drop-import-delims --create-hive-table
Can Sqoop can import data into Hbase?
Yes, Few commands also help to import the data into Hbase directly.
--column-family <family> --hbase-create-table --hbase-row-key <col> --hbase-table <table-name>
What is the Meta-store tool?
This tool can host metastore, which is configured in sqoop-site.xml. Multiple users can access and execute these saved jobs, but you should configure in sqoop-site.xml
<property> <name>sqoop.metastore.client.enable.autoconnect</name> <value>false</value> </property>
Syntax: sqoop metastore (generic-args) (metastore-args)
The Sqoop meta-store
What is Sqoop Merge tool?
Merge tool can combine two datasets, New new datasets can overwrite old documents. Merge tool can flatten two datasets into one.
Syntax: sqoop merge (generic-args) (merge-args)
sqoop merge --new-data newer --onto older --target-dir merged --jar-file datatypes.jar --class-name Foo --merge-key id
What is codegen?
The Codegen is a tool that encapsulates and interrupt the jobs, finally generate Java class.
Syntax: $ sqoop codegen (generic-args) (codegen-args)
Apart from import and export, Sqoop can do anything?
Yes, many things it can do.
Codegen: Generate code to interact with RDBMS database records.
Eval: Evaluate a SQL statement and display the results.
Merge: Merge tool can flatten multiple datsets into one dataset.
Can you export from a particular row or column?
Sure, Sqoop provides few options such options can allow to import or export based on where class you can get the data from the table.
--columns <col1,col2..> --where <condition> --query <SQL query>
sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --where "start_date > '2010-01-01'"
sqoop eval --connect jdbc:mysql://db.example.com/corp \ --query "SELECT * FROM employees LIMIT 10" sqoop import --connect jdbc:mysql://localhost/database -username root --password your_password --columns "name,employee_id,jobtitle"
How to create and drop Hive table in Sqoop?
It’s possible to create tables, but it’s not possible to drop Hive table.
sqoop create-hive-table --connect jdbc:mysql://localhost/database --table table_name
Assume you use Sqoop to import the data into a temporary Hive table using no special options to set custom Hive table field delimiters. In this case, what will Sqoop use as field delimiters in the Hive table data file?
The Sqoop default delimiter is 0x2c (comma), but by default Sqoop uses Hive’s default delimiters when doing a Hive table export, which is 0x01 (^A).
How to import new data in a particular table every day?
It’a one of the main problems for Hadoop developers. Let example, you had downloaded 1TB data yesterday, today you got another 1gb data, if you import the data, again sqoop import 1TB+1GB data. So to get only use this command. Let example, you have already downloaded 1TB data which stored in the hive $Lastimport file. Now you can run it.
sqoop import --incremental lastmodified --check-column lastmodified --last-value "$LASTIMPORT --connect jdbc:mysql://localhost:3306/database_name --table table_name --username user_name --password pass_word
- You are using Sqoop to import data from a MySQL server on a machine named dbserver, which you will subsequently query using Impala. The database is named db, the table is named sales, and the username and password are fred and fredpass. Which query imports the data into a table which can then be used with the Impala