Pig and Hive - A comparative study | Pig and Hive Interview questions | Difference between Pig and Hive | Features of Pig and Hive
Win or Draw? PIG Vs HIVE !!!
This article deals the features of PIG and HIVE.
Often, the developers are in the position to choose the correct technology to satisfy their business requirements. In Hadoop, PIG and HIVE are pronounced similarly and they give almost similar results. But what technology will fit for particular business requirement?
Here, a complete list of features is listed for both PIG and HIVE.
PIG and HIVE:
Type of flow:
PIG is a procedural data-flow language. A procedural language is executing step-by-step approach defined by the programmers. You can control the optimization of every step.
HIVE looks like SQL language. Thus, it becomes declarative language. You can specify what should be done rather how should be done. Optimization is difficult in HIVE since HIVE depends on its own optimizer.
Ease of use:
PIG requires some additional time to learn since the syntax is new and different.
HIVE is very special since it almost looks like SQL. Developers seeing HIVE commands are excited to use it.
Nature of usage:
PIG is recommended for Programmers and software developers. The main reason is its efficiency in computing. When your query becomes complex with most of joins and filters, then PIG is strongly recommended.
Hive is mostly in Analytics area. Ofcourse, it rules the analytics in Hadoop. (Datawarehouse solution). While generating the reports, people prefer to code in HIVE than PIG. If your query has minimum of joins and filters you can go ahead with HIVE. On the other hand, if the query has lot of joins, HIVE may degrade the performance.
Type of Data:
PIG handles both structured and unstructured data efficiently.
Hive handles structured data very efficiently.
PIG represent data in term of variables. Whenever, you want to store an intermediate result, then it is easy to store that in a variable and you can refer it later.
HIVE represent data in terms of tables. It is difficult to store the intermediate result in HIVE. (You have to create a table and insert the values from another table). Thus, when a complex query comes into picture, HIVE code may exceed hundred of lines.
PIG code can be debugged in Local.
Debugging HIVE code in local is complex and time consuming.
Writing user defined functions (UDF) is easy with PIG.
UDF in HIVE is complex.
PIG requires a bit more maintenance than HIVE.
Maintenance in HIVE is very easy.
In PIG, you may not retain the values of variables. Every time you have to rerun the PIG code to get the value from a variable.
In HIVE, the tables (external) will remain with life even you quit the session. This is because the external tables will still point to HDFS file.
PIG development may require more time than HIVE. But it is purely based on the familiarity with PIG code.
Being a SQLish language, the development time will be very less.
RDBMS to PIG compatibility is slightly complex since the code syntax of PIG is entirely different.
Most of the SQL statements that you executed in your RDBMS already can be put in directly in HIVE. It will work. Only few things need to be modified compared to RDBMS SQL.
Handling BIG data:
PIG efficiently handles more amount of data.
HIVE sometimes leads to memory overflow or deceptive performance. However, several parameters are there to adjust and address the issue.
Giants with these Giants:
PIG - used by Yahoo!, Twitter, LinkedIn
HIVE - used by Facebook.