Hive – Hadoop
Hive (not HIV Disease, this something different 🙂 ) HIVE is a framework that sits on top of core Hadoop. It acts as a data warehousing system on top of HDFS and provides easy query mechanisms to the underlying HDFS data.
Programming MapReduce jobs could be tedious and will have their own development, testing, and maintenance investments. Hive queries, called Hive Query Language (HQL) are broken down into MapReduce jobs under the hood and remain a complete abstraction to the user and provide a query-based access mechanism for Hadoop data.
Traditional relational database; enabling business users to use familiar tools such as Excel and SQL Server Reporting Services to consume data from Hadoop in a similar way as they would from a database system as SQL Server remotely through a ODBC connection. Hive ODBC driver to consume the data.
1. Download and install the Hive ODBC Driver
2. Configure the driver to connect to Hive running on your Hadoop cluster on Linux/Windows
3. Use SQL Server Integration Services (SSIS) to import data from Hive to SQL Server
One of the main advantages of Hive is that it provides a querying experience that is similar to that of a relational database, which is a familiar technique for many business users. Essentially, it allows all ODBC compliant clients to consume HDFS data through familiar ODBC Data Sources (DSN), thus exposing Hadoop to a wide and diverse range of client applications.
Microsoft ODBC driver for Hive that bridges the gap between Hadoop and SQL Server along with its rich business intelligence and visualization tools. Downloaded here driver from the driver comes in two flavors, 64 bit and 32 bit, so please make sure that you download and install the appropriate driver depending upon the bit configuration of your client application.
Once the driver is installed successfully, perform the following steps to ensure that you can make a successful connection to Hive:
1. First, create a System DSN. In ODBC Data Sources Administrator, go to the System DSN tab and click on the Add Button
2. HIVE driver in the next screen of the Create New Data Source wizard
3. The final ODBC Hive Setup Hadoop cluster details as well as the credentials used
to connect to the cluster.
4. Once created, a new System DSN
5. The quickest way to test this out is to test from an .udl (Universal Data Link) file. To do this, create an empty .txt file and change the file extension to .udl. Then, double-click and open the .udl file, by default it should open with Microsoft OLE DB Core components.
Provide the User name and Password for your Hadoop cluster and click on the Test Connection button. A successful test connection will confirm that the Hive ODBC driver is correctly set up.
Microsoft SQL Server is a complete suite of tools that include an RDBMS system, a multidimensional OLAP and tabular database engines, as well as other services, for example a broker service, a scheduling service (SQL Agent), and many more. One of these BI components is an extract, transform, and load (ETL) tool called SQL Server Integration Services (SSIS). SSIS have the ability to merge structured and un-structured data by importing Hive data into SQL Server and apply powerful analytics on the integrated data. Next will be SSIS as an ETL before jul tiden 🙂
Let me know if you have any further question and your comments will be learning point. Back to top
Microsoft Certified Solutions Associate (MCSA)