-->

Install Hive on Ubuntu 20.04 with MySQL Integration (2021)

 
The Apache Hive ™ is a data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Hive internally converts the SQL codes into MapReduce codes hence the developer doesn't have to learn Java programming and they can use SQL to query the data in HDFS.

All Hive implementations require a metastore where the metadata of all the Hive objects will be stored. The metastore must be a JDBC compliant relational database, by default Hive uses the Apache Derby database as its metastore. Apache Derby is suitable only for where there is a small need of RDBMS in an application. In this post, we will see how we can install Hive with MySQL as its metastore.



MySQL Installation

Step 1: Installing MySQL server

Update your Ubuntu package list by executing the below command
       
            
            sudo apt-get update
            
Then install MySQL using below command
       
            
            sudo apt install mysql-server
            

Step 2: Configuring 

MySQL To set up your MySQL server, execute the below command

       
            
            sudo mysql_secure_installation
            
This will take you through a series of prompts where you can make some changes to your MySQL installation’s security options. The first prompt will ask whether you’d like to set up the Validate Password Plugin, which can be used to test the password strength of new MySQL users before deeming them valid. From there, you can press Y and then ENTER to accept the defaults for all the subsequent questions. This will remove some anonymous users and the test database, disable remote root logins, and load these new rules so that MySQL immediately respects the changes you have made.

Step 3: To log in to the MySQL server as the root user type

To login into MySQL as root, execute the below command

       
            
            sudo mysql
            
The first one is to change the authentication method from auth_socket to mysql_native_password. You can do that by running the following command:
       
            
            ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password you setup';
            
            
            FLUSH PRIVILEGES;
            

Step 4: Test your MySQL root password

In the future, to log in as your new MySQL user, you’d use a command like the following. type your password and press "Enter" at prompt
       
            
            mysql -u root -p
            

Step 5: Create a hiveuser for MySQL metastore

Create a hiveuser and setup password to be used while connecting to the Hive Metastore

       
            
            CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'hivepassword';

            

Step 6: GRANT access to the newly created hiveuser

       
 
		GRANT ALL ON *.* to 'hiveuser'@'localhost' WITH GRANT OPTION;


		FLUSH PRIVILEGES;
        

Hive Installation

Step 1: Download Hive 

Download the Hive binary file from the below Apache Hive's Official page
 

Step 2: Untar and move to a new folder

 Place the downloaded hive file in the home directory and untar it using the below command
 
     
 
		tar -xvzf apache-hive-3.1.2-bin.tar.gz


		mv apache-hive-3.1.2-bin hive
        

Step 3: Remove old Guava*.jar file

We need to remove the old guava*.jar file from the hive/lib folder and replace it with the latest file from /hadoop/share/hadoop/common/lib

       
 
		cd hive/lib
        
        
		rm guava*.jar
        
             
		cd
        
        
                cp hadoop/share/hadoop/common/lib/guava*.jar hive/lib
                
If we don't replace the old guava*.jar file then we will encounter the below issue while starting Hive

Step 4: Setup Hive Home in ~/.bashrc

Add HIVE_HOME path to ~/.bashrc file using the below command
 
 
		
        nano ~/.bashrc
        

	# Set HIVE_HOME 
		
        
        export HIVE_HOME="<your hive directory>"
		
        
        export PATH=$PATH:$HIVE_HOME/bin
        source ~/.bashrc
        

Once you add the HIVE_HOME, save the changes by pressing CTRL+X then type "Y" and press "Enter" at prompt

Step 5: Configure Hive

Edit the HADOOP_HOME in hive-env.sh in the hive/conf directory by executing the below command
 
       
            
         cd hive/conf
          
           
	 cp hive-env.sh.template hive-env.sh
         

	 nano hive-env.sh
            

	 export HADOOP_HOME="<your hadoop directory>"
            

Once you made the changes, save it by pressing CTRL+X then type "Y" and press "Enter" at prompt

We need to create hive-site.xml file with the below configuration in the same directory
 
nano hive-site.xml
 
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&amp;allowPublicKeyRetrieval=true&amp;useSSL=false</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
</configuration> 

Save the changes by pressing Ctrl+X then type "Y" and press "Enter" button 

Step 7: Download MySQL connect Jar file

Download the latest mysql-connector-java jar file from the below link
 
 
Unzip the file and copy mysql-connector-java-*.jar file to hive/lib folder
 
cd
cp Downloads/mysql-connector-java-*.jar hive/lib
        

Step 8: Create required Hive directories in HDFS 

Start all your Hadoop daemons by running below command.
 
		start-all.sh
        

You must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive. make sure all the hadoop daemons are already running before executing the below commands.

 

       
            
            hdfs dfs -mkdir /tmp

			
            hdfs dfs -chmod 777 /tmp
            
            
            hdfs dfs -mkdir -p /user/hive/warehouse
            
            
            hdfs dfs -chmod g+w /tmp
            
            
            hdfs dfs -chmod g+w /user/hive/warehouse

            

Step 9: Initiate the Hive metastore

Initiate the Hive installation with MySQL using the below command( only for the first time) from $HIVE_HOME/bin folder
       
            
            cd $HIVE_HOME/bin
            
            
            schematool -initSchema -dbType mysql


            

Upon successful Hive installation you will see the below screen 

 

Step 10: Verify your Hive Installation

Verify your Hive installation, by executing the below commands

       
            
            hive 
            
            
            show databases; 


            
It should list the available databases successfully like the below screen. 
 
 
 You can try to create a test database using the below sample command
       
            
            create database labuser;

            

That's it, we have successfully installed Hive with MySQL as metastore on Ubuntu 20.04. If you face any issues let me know in the comment section. Thank you!!