Apache Hive
is a data warehousing solution for Hadoop which provides data summarization, query, and ad-hoc analysis. It is used to process structured and semi-structured data in Hadoop.
Installation with derby database :
PREREQUISITES:
- Java 7 /8 installed
- Dedicated user for hadoop (not mandatory)
- SSH configured
1.Download the tarball file apache-hive-2.1.1-bin.tar.gz
2.Extract the file to /usr/local/ path
3.Downlaod the db derby file db-derby-10.9.1.0-bin.tar.gz
4.Extract the file to /usr/local/
5.Add the homepaths in .bashrc
file and run the .bashrc
file
export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:. export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:. export DERBY_HOME=/usr/local/derby export PATH=$PATH:$DERBY_HOME/bin export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
1. Configure hive with Hadoop edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory.
cd $HIVE_HOME/conf cp hive-env.sh.template hive-env.sh
Edit the hive-env.sh file by appending the following line:
export HADOOP_HOME=/usr/local/hadoop
2. Get into conf directory under apache-hive-2.1.1-bin folder and rename hive-default.xml.template to hive-site.xml
Replace following values in hive-site.xml
</property> <property> <name>hive.exec.local.scratchdir</name> <value>${system:java.io.tmpdir}/${system:user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property>
With these values
</property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/${user.name}_resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property>
Hive installation is completed successfully.
Now you require an external database server to configure Metastore. We use Apache Derby database.
Configuring Hive with derby database
1.Create a directory to store Metastore
Create a directory named data in $DERBY_HOME directory to store Metastore data.
mkdir $DERBY_HOME/data
2.Configuring Metastore of Hive
Configuring Metastore means specifying to Hive where the database is stored. You can do this by editing the hive-site.xml file, which is in the $HIVE_HOME/conf directory. First of all, copy the template file using the following command:
cd $HIVE_HOME/conf cp hive-default.xml.template hive-site.xml
Edit hive-site.xml and append the following lines between the <configuration> and </configuration> tags:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby://localhost:1527/metastore_db;create=true </value> <description>JDBC connect string for a JDBC metastore </description> </property>
Here we are creating metastore_db derby database
3.Create a file named jpox.properties and add the following lines into it:
javax.jdo.PersistenceManagerFactoryClass = org.jpox.PersistenceManagerFactoryImpl org.jpox.autoCreateSchema = false org.jpox.validateTables = false org.jpox.validateColumns = false org.jpox.validateConstraints = false org.jpox.storeManagerType = rdbms org.jpox.autoCreateSchema = true org.jpox.autoStartMechanismMode = checked org.jpox.transactionIsolation = read_committed javax.jdo.option.DetachAllOnCommit = true javax.jdo.option.NontransactionalRead = true javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver javax.jdo.option.ConnectionURL = jdbc:derby://localhost:1527/metastore_db;create = true javax.jdo.option.ConnectionUserName = APP javax.jdo.option.ConnectionPassword = mine
4.Verifying Hive Installation
Before running Hive, you need to create the /tmp folder and a separate Hive folder in HDFS. Here, we use the /user/hive/warehouse folder. You need to set write permission for these newly created folders as shown below:
chmod g+w Now set them in HDFS before verifying Hive. Use the following commands: $ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
5.The following commands are used to verify Hive installation:
$ cd $HIVE_HOME $ hive --service metastore & hadoop@SP24HTP2 bin]$ which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin) Starting Hive Metastore Server SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
you can check the process is running or not
[hadoop@SP24HTP2 bin]$ ps -ef | grep hive hadoop 27914 23849 51 17:07 pts/1 00:00:22 /opt/jdk1.8.0_121/bin/java -Xmx256m -Dhadoop.log.dir=/usr/local/hadoop-2.8.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.8.0 -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/local/hive/conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/hive/lib/hive-metastore-2.1.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore hadoop 28013 23849 0 17:08 pts/1 00:00:00 grep hive
6.To enter to hive shell
$ bin/hive
On successful installation of Hive, you get to see the following response:
$ hive which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default Time taken: 1.915 seconds, Fetched: 1 row(s) hive> use default; OK Time taken: 0.022 seconds hive> show tables; OK
Using Beeline to connect with hiveserver2
Prerequisites:
Either run any one of them hive or hiveserver2, we can connect to anyone of them at a single .
Kill the metastore hive if it is running .To connect to hiveserver2
[hadoop@SP24HTP2 bin]$ ./hiveserver2 & [2] 28051 [hadoop@SP24HTP2 bin]$ which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Check the process
[hadoop@SP24HTP2 bin]$ ps -ef | grep hive hadoop 28051 23849 60 17:08 pts/1 00:00:15 /opt/jdk1.8.0_121/bin/java -Xmx256m -Dhadoop.log.dir=/usr/local/hadoop-2.8.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.8.0 -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/local/hive/conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/hive/lib/hive-service-2.1.1.jar org.apache.hive.service.server.HiveServer2 hadoop 28151 23849 0 17:09 pts/1 00:00:00 grep hive
connect to beeline
hadoop@SP24HTP2 bin]$ beeline which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin:/home/hadoop/bin:/opt/jdk1.8.0_121/bin:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:.:/usr/local/hadoop-2.8.0/sbin:/usr/local/hadoop-2.8.0/bin:/usr/local/hive/bin:/usr/local/derby/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 2.1.1 by Apache Hive
After successful connection connect to hive2 with following command
beeline> !connect jdbc:hive2:// Connecting to jdbc:hive2:// Enter username for jdbc:hive2://: Enter password for jdbc:hive2://: 17/08/10 17:18:26 [main]: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/10 17:18:34 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. Connected to: Apache Hive (version 2.1.1) Driver: Hive JDBC (version 2.1.1) 17/08/10 17:18:34 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false. Transaction isolation: TRANSACTION_REPEATABLE_READ
Check the databases and tables
0: jdbc:hive2://> show databases; OK +-----------------+--+ | database_name | +-----------------+--+ | default | +-----------------+--+ 1 row selected (2.964 seconds) 0: jdbc:hive2://> use default; OK No rows affected (0.052 seconds) 0: jdbc:hive2://> show tables; OK +-----------------+--+ | tab_name | +-----------------+--+ +-----------------+--+ No rows selected (0.131 seconds)
How to connect to hive through JDBC
Add proxy in coresite.xml in hadoop as following and restart hadoop and hive
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
-
- We at DBACLASS are trying our best, to publish articles on different database techonologies especially on database admin jobs.
- For any queries or suggestion ,Please post in our forum forum.dbaclass.com.
Keep visiting us.