Home |
Write |
179 members |

Join with Aptibook

Hive interview questions and answers - Page 1

1. What is Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

Hive was originally developed at Facebook. It’s now a Hadoop subproject with many contributors. Users need to concentrate only on the top level hive language rather than java map reduce programs. One of the main advantages of Hive is its SQLish nature. Thus it leverages the usability to a higher extend.

A hive program will be automatically compiled into map-reduce jobs executed on Hadoop. In addition, HiveQL supports custom map-reduce scripts to be plugged into queries.

Hive example:
selecting the employee names whose salary more than 100 dollars from a hive table called tbl_employee.

SELECT employee_name FROM tbl_employee WHERE salary > 100;

Users are excited to use Hive since it is very similar to SQL.

2. What are the types of tables in Hive

There are two types of tables.
1. Managed tables.
2. External tables.

Only the drop table command differentiates managed and external tables. Otherwise, both type of tables are very similar.

3. Does Hive support record level Insert, delete or update

Hive does not provide record-level update, insert, or delete. Henceforth, Hive does not provide transactions too.

However, users can go with CASE statements and built in functions of Hive to satisfy the above DML operations. Thus, a complex update query in a RDBMS may need many lines of code in Hive.

4. What kind of datawarehouse application is suitable for Hive

Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose limits on what Hive can do.

Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,
2) Fast response times are not required, and
3) When the data is not changing rapidly.

Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer to being an OLAP tool, Online Analytic Processing.
So, Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.

5. How can the columns of a table in hive be written to a file

By using awk command in shell, the output from HiveQL (Describe) can be written to a file.

hive -S -e "describe table_name;" | awk -F" " ’{print 1}’ > ~/output.txt