Install Hive on Ubuntu 20.04 with MySQL Integration (2021)
The Apache Hive ™ is a data warehouse software that facilitates reading, writing,
and managing large datasets residing in distributed storage using SQL.
Structure can be projected onto data already in storage. A command line
tool and JDBC driver are provided to connect users to Hive. Hive internally converts the SQL codes into MapReduce codes hence the developer doesn't have to learn Java programming and they can use SQL to query the data in HDFS.
All Hive implementations require a metastore where the metadata of all the Hive objects will be stored. The metastore must be a JDBC compliant relational database, by default Hive uses the Apache Derby database as its metastore. Apache Derby is suitable only for where there is a small need of RDBMS in an application. In this post, we will see how we can install Hive with MySQL as its metastore.
MySQL Installation
Step 1: Installing MySQL server
Update your Ubuntu package list by executing the below command
sudo apt-get update
sudo apt install mysql-server
Step 2: Configuring
MySQL To set up your MySQL server, execute the below command
sudo mysql_secure_installation
This will take you through a series of prompts where you can make some changes to your MySQL installation’s security options. The first prompt will ask whether you’d like to set up the Validate Password Plugin, which can be used to test the password strength of new MySQL users before deeming them valid.
From there, you can press Y and then ENTER to accept the defaults for all the subsequent questions. This will remove some anonymous users and the test database, disable remote root logins, and load these new rules so that MySQL immediately respects the changes you have made.
Step 3: To log in to the MySQL server as the root user type
To login into MySQL as root, execute the below command
sudo mysql
The first one is to change the authentication method from auth_socket to mysql_native_password. You can do that by running the following command:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password you setup';
FLUSH PRIVILEGES;
Step 4: Test your MySQL root password
In the future, to log in as your new MySQL user, you’d use a command like the following. type your password and press "Enter" at prompt
mysql -u root -p
Step 5: Create a hiveuser for MySQL metastore
Create a hiveuser and setup password to be used while connecting to the Hive Metastore
CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'hivepassword';
Step 6: GRANT access to the newly created hiveuser
GRANT ALL ON *.* to 'hiveuser'@'localhost' WITH GRANT OPTION;
FLUSH PRIVILEGES;
Hive Installation
Step 1: Download Hive
Download the Hive binary file from the below Apache Hive's Official page
Step 2: Untar and move to a new folder
Place the downloaded hive file in the home directory and untar it using the below command
tar -xvzf apache-hive-3.1.2-bin.tar.gz
mv apache-hive-3.1.2-bin hive
Step 3: Remove old Guava*.jar file
We need to remove the old guava*.jar file from the hive/lib folder and replace it with the latest file from /hadoop/share/hadoop/common/lib
cd hive/lib
rm guava*.jar
cd
cp hadoop/share/hadoop/common/lib/guava*.jar hive/lib
If we don't replace the old guava*.jar file then we will encounter the below issue while starting Hive
Step 4: Setup Hive Home in ~/.bashrc
Add HIVE_HOME path to ~/.bashrc file using the below command
nano ~/.bashrc
# Set HIVE_HOME
export HIVE_HOME="<your hive directory>"
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bashrc
Once you add the HIVE_HOME, save the changes by pressing CTRL+X then type "Y" and press "Enter" at prompt
Step 5: Configure Hive
Edit the HADOOP_HOME in hive-env.sh in the hive/conf directory by executing the below command
cd hive/conf
cp hive-env.sh.template hive-env.sh
nano hive-env.sh
export HADOOP_HOME="<your hadoop directory>"
Once you made the changes, save it by pressing CTRL+X then type "Y" and press "Enter" at prompt
We need to create hive-site.xml file with the below configuration in the same directory
nano hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&allowPublicKeyRetrieval=true&useSSL=false</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
</configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&allowPublicKeyRetrieval=true&useSSL=false</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
</configuration>
Save the changes by pressing Ctrl+X then type "Y" and press "Enter" button
Step 7: Download MySQL connect Jar file
Download the latest mysql-connector-java jar file from the below link
Unzip the file and copy mysql-connector-java-*.jar file to hive/lib folder
cd
cp Downloads/mysql-connector-java-*.jar hive/lib
Step 8: Create required Hive directories in HDFS
Start all your Hadoop daemons by running below command.
start-all.sh
You must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive. make sure all the hadoop daemons are already running before executing the below commands.
hdfs dfs -mkdir /tmp
hdfs dfs -chmod 777 /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse
Step 9: Initiate the Hive metastore
Initiate the Hive installation with MySQL using the below command( only for the first time) from $HIVE_HOME/bin folder
cd $HIVE_HOME/bin
schematool -initSchema -dbType mysql
Upon successful Hive installation you will see the below screen
Step 10: Verify your Hive Installation
Verify your Hive installation, by executing the below commands
hive
show databases;
It should list the available databases successfully like the below screen.
You can try to create a test database using the below sample command
create database labuser;
That's it, we have successfully installed Hive with MySQL as metastore on Ubuntu 20.04. If you face any issues let me know in the comment section. Thank you!!
Post a Comment