Category Archives: Cloud

Performance and Tuning: Page Life Expectancy

SQL Server 2012 – Performance and Tuning: Page Life Expectancy

The Buffer Manager object provides counters to monitor how SQL Server uses:
* Memory to store data pages.
* Counters to monitor the physical I/O as SQL Server reads and writes database pages.
* Buffer pool extension to extend the buffer cache by using fast non-volatile storage such as solid-state drives (SSD).

A Performance Profile: how one would approach performance tuning each.  The performance profile of OLTP differs significantly from a Relational Data Warehouse or Reporting application.  It is helpful to understand these differences and the objectives for high performance.

OLTP applications are characterized by high volumes of small identical transactions.  These can include SELECT, INSERT, UPDATE and DELETE operations.  The implications are significant in terms of database design, resource utilization and system performance.

You can find the value of the PLE by running the following query.

SELECT [object_name],



FROM sys.dm_os_performance_counters

WHERE [object_name] LIKE ‘%Manager%’

AND [counter_name] = ‘Page life expectancy’

There are performance problems if any of the following are true, here we go

Resource issue Rule Description Value Source Problem Description
Database Design Rule 1 High Frequency queries having # table joins >4 Sys.dm_exec_sql_text,


High Frequency queries with lots of joins may be too normalized for high OLTP scalability
Rule 2 Frequently updated tables having # indexes >3 Sys.indexes, sys.dm_db_operational_index_stats Excessive index maintenance for OLTP
Rule 3 Big IOs

Table Scans

Range Scans

>1 Perfmon object

SQL Server Access Methods


Missing index, flushes cache
Rule 4 Unused Indexes index not in* * Sys.dm_db_index_usage_stats Index maintenance for unused indexes
CPU Rule 1 Signal Waits > 25% Sys.dm_os_wait_stats Time in runnable queue is

pure CPU wait.

Rule 2 Plan re-use < 90% Perfmon object

SQL Server Statistics

OLTP identical transactions should ideally have >95% plan re-use
Rule 3 Parallelism: Cxpacket waits >5% Sys.dm_os_wait_stats Parallelism reduces OLTP throughput
Memory Rule 1 Avg page life expectancy < 300 (seconds) Perfmon object

SQL Server Buffer Manager

SQL Server Buffer Nodes

Cache flush, due to big read

Possible missing index

Rule 2 Avg page life expectancy Drops by 50% Perfmon object

SQL Server Buffer Manager

Cache flush, due to big read

Possible missing index

Rule 3 Memory Grants Pending >1 Perfmon object

SQL Server Memory Manager

Current number of processes waiting for a workspace memory grant
IO Rule 1 Avg Disk seconds / read > 20 ms Perfmon object

Physical Disk

Reads should take 4-8ms with NO IO pressure
Rule 2 Avg Disk seconds / write > 20 ms Perfmon object

Physical Disk

Writes (sequential) can be as fast as 1ms for transaction log.
Rule 3 Big IOs

Table Scans

Range Scans

>1 Perfmon object

SQL Server Access Methods

Missing index, flushes cache
Rule 4 If Top 2 values for wait stats are any of the following:






Top 2 Sys.dm_os_wait_stats If top 2 wait_stats values include IO, there is an IO bottleneck
Blocking Rule 1 Block percentage > 2% Sys.dm_db_index_operational_stats Frequency of blocks
Rule 2 Block process report 30 sec Sp_configure, profiler Report of statements
Rule 3 Avg Row Lock Waits > 100ms Sys.dm_db_index_operational_stats Duration of blocks
Rule 4 If Top 2 values for wait stats are any of the following:

1. LCK_x

Top 2 Sys.dm_os_wait_stats If top 2 wait_stats values include locking, there is a blocking bottleneck

OLTP identical transactions mean plan re-use is desirable.  CPU utilization can thusly be reduced with plan re-use and join reduction.  IO performance can be improved with good indexing, join reduction, and high page life expectancy.  Sorts can be limited with index usage.  Blocking can be reduced with index design and short transactions.


— Microsoft Certified Solutions Associate (MCSA)

MS SQL database backup file

MS SQL database backup file

SQL Server 2016 Community Technology Preview 2 (CTP2) through current version).

To make your database secured and protective; it helps you to restore your inaccessible files of the main database when any type of corruption or damage occurs. But, what if backup also corrupt, while attempting to restore database from backup file. There could be multiple reasons behind such disaster situation, here in this write-up you will get to know about the reason and solutions behind such case, where SQL Server user faces corruption.

— SQL Server database backup and not able to restore their databases.

Transact-SQL Syntax Conventions

SQL backup files are basically a replica of your original SQL database, which can be located in different locations on the system. There could be multiple reasons of inaccessible backup file. Here are some most common causes of damaged SQL BAK files:

— Virus attack
— Abrupt system shutdown
— Use of a wrong driver
— Bad sectors in your system’s hard disk
— Sudden removal of a selected tables, records, and procedures
— unconventional functioning of Hard disk
— Improper shutdown of application
— Wrong database synchronization
— System crash
— corrupt database system rules and tables

The most common error message during restoration of database is: ‘Backup or restore operation terminating abnormally.’ A Backup restoration error occurs when a filemark in the backup device could not be read. There could be multiple causes of when a user encounters a filemark error. The most common reasons are:

— A media failure may arise on the same device where the backup is stored
— A write failure may occur while creating the backup file
— Loss of connectivity may arise while creating a network backup
— A failure in the Input/Output path occurs in the disk just after successful write to the disk

Manual Solution:-
After backup restore error the first thing you could do is to check whether all the sets of backup have issues or just some sets have issues. It might be possible that only some sets of backup have issues due to which you are getting restore error. In order to retrieve other backup sets from the device, you need to specify the file number. In case, there are multiple backup sets available on a single device, then to determine the usable backup, you can run the following query:


If you got the usable set from the disk, copy it to another drive for usage and try to restore the damaged files with the help of SQL restore commands. Here are some of the SQL commands that you can use to restore corruption in your SQL database backup, the following essential backup concepts:

Backup Types

Transaction Log Truncation

Formatting Backup Media

Working with Backup Devices and Media Sets

Restoring SQL Server Backups

To recover a database use the following command. This will put your database in the “restoring” state

RESTORE DATABASE <DB Name> FROM DISK='<Backup Location>’ WITH FILE = <FileNumber>

— Write the backup set number instead of ‘FileNumber’ that you want to restore.

The following command will take the database, which is in ‘restoring’ state and make it available for end users.

RESTORE LOG <DB Name> FROM DISK = ‘<Backup Location>’

“Before you can create the first log backup, you must create a full backup ”

The above mentioned commands are used to restore corrupt backup file of SQL database. However, these corrupt backup recovery solutions provided by Microsoft are not applicable for deep corruption cases. In order to restore your highly damaged or corrupt SQL backup database you can always choose a third party SQL backup recovery software. These professional utilities are designed to restore data from a corrupt (.BAK) SQL backup file.

BACKUP DATABASE AdventureWorks2012
TO DISK=’X:\SQLServerBackups\AdventureWorks1.bak’,
MEDIANAME = ‘AdventureWorksStripedSet0’,
MEDIADESCRIPTION = ‘Striped media set for AdventureWorks2012 database;

Third party applications have functions to restore SQL backup file due to all above mentioned reasons. Before buying any professional backup recovery tool, you need to choose the most reliable one. For that you should use the online demo versions of the backup recovery applications to test their efficiency.

For more information, see: Full File Backups (SQL Server) and Back Up Files and Filegroups (SQL Server).

For more information and examples, see SQL Server Backup and Restore with Microsoft Azure Blob Storage Service. For a tutorial, see Tutorial: SQL Server Backup and Restore to Windows Azure Blob Storage Service.
Security: Beginning with SQL Server 2012, the PASSWORD and MEDIAPASSWORD options are discontinued for creating backups. It is still possible to restore backups created with passwords.

Permissions: BACKUP DATABASE and BACKUP LOG permissions default to members of the sysadmin fixed server role and the db_owner and db_backupoperator fixed database roles.

Ownership and permission problems on the backup device’s physical file can interfere with a backup operation. SQL Server must be able to read and write to the device; the account under which the SQL Server service runs must have write permissions. However, sp_addumpdevice, which adds an entry for a backup device in the system tables, does not check file access permissions. Such problems on the backup device’s physical file may not appear until the physical resource is accessed when the backup or restore is attempted.

Let me know if you have any further question and your comments will be learning point.

— Microsoft Certified Solutions Associate (MCSA)


Troubleshooting SQL Server problems

MS SQL Server is a separate technology, it is often the case that when problems occur, uncertainty arises regarding the root cause, and problem scope is rarely well outlined. Although many of the details here are not specific to SQL Server problems, they are good practices for troubleshooting many types of complex IT issues. On correct understanding the problem and its cause therefore, accurate root cause diagnosis is very important. Sometimes it is obvious what caused the problem, but not always.

Here we go regardless of the problem, you will need to do some forensic analysis to determine the cause of the problem.

1. Define the problem/ establish a clear problem statement.
2. Determine the problem’s impact/ The IT chef & stake holders often don’t want to know technical details.
3. Engage the correct resources/ both internal or external.
4. Identify potential causes/ meet all necessary parties physically or virtually.
5. Plan and coordinate tasks across teams.
6. Select a communication plan and review/ document the plan and agree who will keep management, end users, and the technical team updated.
7. Identify root cause.
8. Determine solution.
9. Test and implement/ even if the problem does not exist in the test or pre-production environment, implement the fix there first.
10. Review

SQL Server is typically affected by the following bottlenecks:
🙂 Memory
🙂 File I/O
🙂 Locking, blocking, or deadlocking

Start with SQL Server Performance Monitor, make sure that the disk counters are on. To do so, run diskperf from a command prompt (cp). If the disk counters are not on, run diskperf -y and then restart the computer. SQL Server Performance Monitor log, collect the following information:

🙂 Paging file
🙂 Process
🙂 Processor
🙂 All SQL Server counters
🙂 Memory
🙂 Threads
🙂 Logical disk
🙂 Physical disk
🙂 System

In most cases, you should be able to find enough information using these tools to provide you with enough clues to determine the cause of a particular problem. A good DBA troubleshooting approach, adopting a positive attitude with moderate determination and persistence to identify the root cause and resolve issues sure helps.

A positive attitude leads to better quality results, faster resolution, and it will reduce the stress level for you and co-workers during the troubleshooting process.

SQL DBA behaviors and attitudes are characteristic of the most effective database professionals when troubleshooting complex problems. Remain calm, problems are never random — Problems with computers happen for a reason. Avoid prejudice OR looking for fixes, think ahead, it’s common to see spend a lot of time troubleshooting numerous issues that have nothing to do with the main source of the problem. Determine the problem’s real impact on the company business.

🙂 How severely is the system affected?
🙂 How many users cannot work?
🙂 Is money being lost? quantify the amount.
🙂 What is the visibility of the issue?
🙂 Are external customers affected?
🙂 Could any compliance obligations be breeched?
🙂 How serious are the consequences if the problem persists?

IT and business agreements a service-level agreement (SLA), recovery point objective (RPO) and recovery time objective (RTO). Environment overview (network diagram, application architecture), Windows System and Application Event Logs and SQL Server Error Logs, give time to understand the problem and application environment often leads to a higher quality and faster problem resolution. Gather all relevant logs to a single location:

🙂 Windows and System Event logs
🙂 SQL Server Error Logs
🙂 Dump fi les
🙂 Application logs

Many problems begin life as an application behavior or performance issue, and there may be other software components or interactions that could affect the database platform. Simply DBA bring in and nothing to with data or database, anyway what to look for when troubleshooting each major problem.

🙂 Connectivity issues -Netowrk (NIC)
🙂 Performance issues — For a performance problem you need to determine if the problem is on the client, the middle tier, the server on which SQL Server runs, or the network.
🙂 Hardware bottlenecks — Identify resource contention around disk(Review the PerfMon disk counters for Average Disk Sec/Read and Average Disk Sec/Write to verify that the time to make a read or write is ideally below 20 to 30 milliseconds for OLTP systems, higher for decision support systems), CPU (maximum degree of parallelism), network, or memory(AWE feature was discontinued in SQL Server 2012).
🙂 SQL Server issues — As well as hardware contention, SQL Server has fi nite internal resources, such as locks, latches, worker threads, and shared resources such as tempdb.
🙂 Compilation issues — If possible, identify one user query that is slow, the most common causes are insufficient resources. This could be caused by a sub-optimal query plan as a result of missing or outdated statistics, or ineffi cient indexes. SQL Server performance issues with simple tools Data Collector Set, performance trace using the most basic tool:


* Click on Start | Run and type Perfmon.exe

* Go to “Data Collector sets”, expand it and then right click on “User Defined”.

* Choose to create a new “Data Collector Set”.

* Give a name to the set, e.g. “SQLPerf”. Also choose to “Create Manually (Advanced)”. Click on Next.

* Then choose to “create data logs” and check only the “Performance counter” option. Click on Next.

DBA like to add all the counters and all instances from the below Perfmon objects, so that we can capture all the details. Operating System Objects:

* Memory

* Network Interface

* Objects

* Paging File

* Physical Disk

* Process

* Processor

* Redirector

* Server

* Server Work Queues

* System

* Thread

SQL Server Objects:

* SQLServer: Access Methods

* SQLServer: Buffer manager

* SQLServer: Locks

* SQLServer: SQL Statistics

I usually set the interval of the Perfmon Log to 10 to 20 seconds. After you stop the trace, it should be saved have a file extension of .blg. After all this hardwork what comes out, the latter case usually indicates that the performance problem is somewhere outside of SQL Server. Other think, dba want to eascape from problemo and complain other dude’s (:

Let me know if you have any further question and your comments will be learning point.

Microsoft Certified Solutions Associate (MCSA)

Cloud based AlwaysOn AG

In a SQL Server 2014, it’s possible to create an AlwaysOn Availability Group in Azure. You have two options 1. you can either create an availability that resides entirely in the cloud, or 2nd option you can create a hybrid environment. AlwaysOn Availability Group in Microsoft Azure.

Microsoft Azure, you need the following:

** A virtual network setup in Azure

** A storage account setup

** A Microsoft Azure domain controller


** One or more Microsoft Azure Virtual Machines with SQL Server 2014 installed and configured for AlwaysOn Availability Group.

It’s the Infrastructure as a Service (IaaS), Microsoft takes you through creating the necessary Microsoft Azure infrastructure.

Here is Hybrid AG:

Microsoft Azure-based virtual machines to create a hybrid Availability Group. To do this, you need:

** A virtual network that contains multiple subnets, including a frontend and a backend subnet

** A domain controller and the SQL Server 2014 machines that you need to run behind a VPN device in your on-premise network

** A site-to-site VPN connection between the Microsoft Azure VPN device & your on premise VPN device

** A SQL Server VM deployed to the backend subnet in Microsoft Azure and joined to your on premise AD domain

** A WSFC multisite cluster with the Node Majority quorum model

** An Availability Group with two synchronous commit replicas on on-premise and one asynchronous-commit replica on Windows Azure

To add a Microsoft Azure Virtual Machine to the existing on-premise AlwaysOn Availability Group, when you have the infrastructure listed in place, you can – rightclick ->
on the Availability Group and choose Add Replica to start the wizard. The Add Replica page will open and you need to select the Add Azure Replica option.  Show as this













You can now run through the rest of the wizard to add the Microsoft Azure Virtual Machine replica to your AG.
Let me know if you have any further question and your comments will be learning point.

Microsoft Certified Solutions Associate (MCSA)

Cloud Database

Cloud Database


The choice of Microsoft Windows Azure SQL Database is because it is very similar to Microsoft SQL Server. Microsoft SQL Server is widely available. Users can create and test the databases locally and then migrate the database objects to the Microsoft cloud platform, Windows Azure. SQL Server is included in the Windows Azure Software Development Kit (SDK), which can be downloaded for free.


This photo from website:

The creation of a database in Windows Azure SQL Database is relatively easy. Once the database is created on the cloud platform, users can get a quick start on the hands-on practice included in this book. On the other hand, Windows Azure SQL Database is sophisticated enough to handle most of the enterprise-level database tasks.

Windows Azure SQL Database includes the components listed that can be used to accomplish various tasks in database system development, these components are often included in other sophisticated DBMS systems. The management tools are commonly used by DBA/DBD to create and manage database objects, extract information from existing databases, implement data access schemes, develop database applications, perform data analyses, and exchange data over the Internet. The DBMS provides a unified working environment to allow database professionals to accomplish all the above tasks.

The data stored in a database are systematically arranged so that they can be retrieved with some kind of query language. Database professionals can use the tools provided by a DBMS to manage and analyze the data stored in a database. Each database consists of objects such as tables, views, and stored procedures.

Views: Similar to a table, a view is also a database object. It collects data from tables. Sometimes, a database application may need data from multiple tables. Instead of searching these data during an operation, the data are often preselected and stored in a database object called a view

* Stored Procedures: A stored procedure contains programming code to perform a sequence of operations or call other procedures. It is precompiled and stored in SQL Server for better performance and security.

* Functions: A function also contains programming code to perform a sequence of operations. It can be called by its name by other programs and returns a single value to the calling program. Windows Azure SQL Database provides many built-in functions to assist the development of databases.

* Database Triggers: A database trigger is a specific kind of stored procedure. It will be executed automatically when a specific database event occurs.

A database system, as the heart of an information system, can be very complicated.

The process of database development normally includes several major steps: requirement analysis for the to-be-built database system, design of the database system based on the requirements, development of the database system according to the design, deployment of the database system, development of database applications, database management and maintenance, and development of data analysis services.

The first task is to investigate the requirements for the database to be developed. Once the first task is accomplished, it is to time to sort out the collected information, interpret the collected information in technical terms, and figure out the relationships among the things to keep track of.

The choice of a DBMS will depend on the requirements of the future database. The chosen DBMS should be able to create a database system that meets all the database management, data analysis, and data storage requirements. By using the DBMS, the database designers can transfer their data model to a relational database. The business data will be stored in the well-defined database and be organized according to the specifications made in the data model.

Once the database is constructed, before the newly developed database can be used in the business process, it needs to be tested to see if the database is able to meet all the requirements. The database administrators (DBAs) and users should be involved in the testing process. Based on the testing results, the database will be modified again by the database designers to meet the users’ needs.

To keep the database running and providing services continuously, the database should be managed by a group of experienced DBAs. The DBAs perform tasks such as creating user accounts, performing data transformation, database backup, restoration, and doing performance tuning. It is the DBAs’ responsibility to enforce security measures. For a large corporation, the DBAs may also perform tasks such as database partitioning and managing network utilities for distributed databases. A cloud platform can significantly reduce the workload of the DBAs since the tasks such as database backup can be taken care of by a cloud provider.

Cloud Computing:

cloud_computing-historyCloud computing has the potential to significantly reduce the cost for developing and managing a database as well. The cloud computing technology can be a great solution for those small businesses that lack resources to support their own IT infrastructure. As the business grows, the company can subscribe more services and computing resources.


This photo from website:

The daily IT infrastructure maintenance will be handled by the cloud provider. This means that the cost for IT service personnel can be significantly reduced. E-commerce can fully take advantage of cloud computing since the services and computing resources provided by the cloud provider can be accessed anywhere and anytime.

There are three types of cloud computing platforms: Private Cloud, Public Cloud Hybrid Cloud. here based on the business requirements, the company can create its own cloud services.








This photo from website:

In a public cloud, the services and computing resources are provided by a third-party cloud provider. Note: The data center and IT infrastructure used to support cloud computing are developed and managed by the third-party cloud provider. Some of the well-known ones are Amazon Elastic Compute Cloud, IBM Blue Cloud, Microsoft Windows Azure, last not least Fujitsu Cloud Solutions.

The FUJITSU Cloud Integration Platform is charged on a pay as you use basis and available as a self managed cloud service or delivered by Fujitsu as a Managed Service.



There are three major types of cloud services: SaaS, IaaS, and PaaS. Recently, the personal cloud has also become available in cloud computing.







This photo from website:

Before the installation of the Windows Azure SDK, make sure that your computer meets the following requirements:

1. To install the Windows Azure SDK, your computer needs to have one of the following or later versions of the software:

a. Windows Server 2008 or Windows 7
b. SQL Server 2012
c. Microsoft Visual Studio 2010 with SP1
d. Microsoft Visio or other database design software

If Windows Server 2008 is used as the operating system, you also need to install .NET Framework 3.5. The early versions of the above software may also work. However, more configurations may be required.

2. You need to find out if your computer runs on a 32-bit or 64-bit operating system. Later, you need to download the Windows Azure SDK that matches your operating system.

3. To develop web applications, Internet Information Services (IIS) 7.0 should be enabled. Suppose that you have the 64-bit Windows 7 Professional operating system running on your computer. To verify if IIS is enabled on your machine, click the Start button, Control Panel, Programs and Features, and Turn Windows features on or off. You should be able to see if the node Internet Information Services is checked. If not, you need to check the node to enable IIS.

4. To make sure that ASP.NET as well as other features is enabled, click Start –> Control Panel –> Programs and Features, and Turn Windows features on or off –>. Expand the IIS node.

Edit the nodes Web Management Tools, Application Development Features, and Common HTTP Features as shown below

5. If you have a previous version of the Windows Azure SDK, make sure to remove all its components in the Control Panel before installing the SDK.

On or Off program


6. To accept the pop-ups from the Windows Azure website, you need to configure Internet Explorer (IE) as the following:

a. In the IE window, click the Tools menu, select Pop-up Blocker, and then click Pop-up Blocker Settings.
b. In the address bar of the website, click Add, and then click Close.


After the above preparation, you are now ready to download and install the Windows Azure SDK package with the following steps:


1. The first step is to download the Windows Azure SDK. You can download the Windows Azure SDK from the visit website.

2. On this website, you can find different versions of SDKs. For full installation, under the label .NET, select a version of SDK to install. If Microsoft Visual Studio 2010 is installed on your machine, click VS 2010 as shown –> Then, click Save.

3. After the package is downloaded, navigate to the Downloads directory where the file is downloaded. Then, double click the downloaded file to start the installation process. The installation may take some time.

4. Now that the SDK is installed. You should have a new item related to the Windows Azure SDK


5. After the installation process is completed, to test the SDK, click Start –> All Programs, and expand the Microsoft Visual Studio 2010 node. Right click Microsoft Visual Studio 2010 –> select Run as administrator. After Microsoft Visual Studio is opened, click File –> New Project, and click –> Cloud under the Visual C# node. In the Templates pane, you should be able to see Windows Azure Project. Click Cancel to exit.

6. You can now start SQL Server Management Studio (SSMS). To do so,

click Start –> All Programs –> SQL Server 2012 –> SQL Server 2012 Management Studio.

After the Connect to Server dialog is opened, in the Server name dropdown list, select the server on your local server as (On your computer, the name of the SQL server will be different.) Select Windows Authentication as the method of authentication. After the server name is selected, click the Connect button.

7. You should be able to log on to SSMS

8. By default, the database named master is installed for database management. Expand the Databases node and double click master. You should be able to see the database objects such as Tables, Views, and Programmability. In addition, Microsoft SQL Server also includes Service Broker, Storage, Security, Users, Roles, Schemas, and Symmetric Keys.

cloud temp









Connect to SQL Server

Service Broker: Service Broker provides messaging services. It can be used to let applications send and receive messages among multiple SQL Server instances.

Storage: Storage contains Full Text Catalogs for storing Full Text indexes, Partition Schemes for mapping partitions to a set of filegroups, and Partition Functions for mapping tables or indexes into partitions.

sql server












Security: With Security, you can configure database security measures for databases. The main security components are Users, Roles, Schemas, and Symmetric Keys.

Users: The Users object defines users in a database. It is used to create user accounts. A user account is created to identify each individual user. A user can access his/her account with his/her user name and password. After logging on to a database, the user can access database objects with properly assigned ownership and permissions. You can grant and modify permissions on objects in a database for users with this component.

Roles: The Roles object allows you to group users with the same set of permissions. Roles can be used to simplify administrative tasks. For example, when modifying permissions, instead of changing the permissions for each individual user, you can modify the permissions for the role.

Schemas: The Schemas component is a container that includes a set of database objects. A schema can have its own set of permissions. A database user can own several schemas. In contrast, you can assign several users to the same schema. If a database user is not assigned a schema, the default schema for the user is dbo (database owner)

Service Broker, Storage, and Security












Symmetric Keys: This component is used to configure symmetric keys for encryption.

9. Click File and Exit. At this point, your installation is completed. 🙂










This photo from website:

Let me know if you have any further question and your comments will be learning point. Back to top

Microsoft Certified Solutions Associate (MCSA)

Hive – Hadoop

Hive – Hadoop

Hive (not HIV Disease, this something different 🙂 ) HIVE is a framework that sits on top of core Hadoop. It acts as a data warehousing system on top of HDFS and provides easy query mechanisms to the underlying HDFS data.

Programming MapReduce jobs could be tedious and will have their own development, testing, and maintenance investments. Hive queries,  called Hive Query Language (HQL) are broken down into MapReduce jobs under the hood and remain a complete abstraction to the user and provide a query-based access mechanism for Hadoop data.

Traditional relational database; enabling business users to use familiar tools such as Excel and SQL Server Reporting Services to consume data from Hadoop in a similar way as they would from a database system as SQL Server remotely through a ODBC connection. Hive ODBC driver to consume the data.

1. Download and install the Hive ODBC Driver
2. Configure the driver to connect to Hive running on your Hadoop cluster on Linux/Windows
3. Use SQL Server Integration Services (SSIS) to import data from Hive to SQL Server

One of the main advantages of Hive is that it provides a querying experience that is similar to that of a relational database, which is a familiar technique for many business users. Essentially, it allows all ODBC compliant clients to consume HDFS data through familiar ODBC Data Sources (DSN), thus exposing Hadoop to a wide and diverse range of client applications.

Microsoft ODBC driver for Hive that bridges the gap between Hadoop and SQL Server along with its rich business intelligence and visualization tools. Downloaded here driver from the driver comes in two flavors, 64 bit and 32 bit, so please make sure that you download and install the appropriate driver depending upon the bit configuration of your client application.















Once the driver is installed successfully, perform the following steps to ensure that you can make a successful connection to Hive:

1. First, create a System DSN. In ODBC Data Sources Administrator, go to the System DSN tab and click on the Add Button

odbc administrator







2. HIVE driver in the next screen of the Create New Data Source wizard













3. The final ODBC Hive Setup Hadoop cluster details as well as the credentials used
to connect to the cluster.

















4. Once created, a new System DSN







5. The quickest way to test this out is to test from an .udl (Universal Data Link) file. To do this, create an empty .txt file and change the file extension to .udl. Then, double-click and open the .udl file, by default it should open with Microsoft OLE DB Core components.
















Provide the User name and Password for your Hadoop cluster and click on the Test Connection button. A successful test connection will confirm that the Hive ODBC driver is correctly set up.

















Microsoft SQL Server is a complete suite of tools that include an RDBMS system, a multidimensional OLAP and tabular database engines, as well as other services, for example a broker service, a scheduling service (SQL Agent), and many more. One of these BI components is an extract, transform, and load (ETL) tool called SQL Server Integration Services (SSIS). SSIS have the ability to merge structured and un-structured data by importing Hive data into SQL Server and apply powerful analytics on the integrated data. Next will be SSIS as an ETL before jul tiden 🙂

Let me know if you have any further question and your comments will be learning point.                                                                                                                  Back to top


Microsoft Certified Solutions Associate (MCSA)


Big Data – Hadoop

Big Data – Hadoop

Hadoop is an open source software framework that supports data-intensive distributed applications available through the Apache Open Source community. It consists of a distributed file system HDFS, the Hadoop Distributed File System and an approach to distributed processing of analysis called MapReduce. It is written in Java and based on the Linux/Unix platform.

The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. It provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. The entire Apache Hadoop platform is commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), and number of related projects including Apache Hive, Apache HBase, Apache Pig, Zookeeper etc.

The real magic of Hadoop is its ability to move the processing or computing logic to the data where it resides as opposed to traditional systems, which focus on a scaled-up single server, move the data to that central processing unit and process the data there.

This model does not work on the volume, velocity, and variety of data that present day industry is looking to mine for business intelligence. Hence, Hadoop with its powerful fault tolerant and reliable file system and highly optimized distributed computing model, is one of the leaders in the Big Data world.

Hadoop is it’s storage system and it’s distributed computing model

Hadoop Distributed File System is a program level abstraction on top of the host OS file system. It is responsible for storing data on the cluster. Data is split into blocks and distributed across multiple nodes in the cluster.

MapReduce is a programming model for processing large datasets using distributed computing on clusters of computers. MapReduce consists of two phases: dividing the data across a large number of separate processing units (called Map), and then combining the results produced by these individual processes into a unified result set called Reduce.

This is also called the Head Node/Master Node of the cluster.

Secondary NameNode
This is an optional node that you can have in your cluster to back up the NameNode if it goes down. If a secondary NameNode is configured, it keeps a periodic snapshot of the NameNode configuration to serve as a backup when needed.

These are the systems across the cluster which store the actual HDFS data blocks. The data blocks are replicated on multiple nodes to provide fault tolerant and high availability solutions.

This is a service running on the NameNode, which manages MapReduce jobs and distributes individual tasks.

This is a service running on the DataNodes, which instantiates and monitors individual Map and Reduce tasks that are submitted.

Hive is a supporting project for the main Apache Hadoop project and is an abstraction on top of MapReduce, which allows users to query the data without developing MapReduce applications. It provides the user with a SQL-like query language called Hive Query Language (HQL) to fetch data from Hive store.

Pig is an alternative abstraction on MapReduce, which uses dataflow scripting language called PigLatin. This is favored by programmers who already have scripting skills.

Flume is another open source implementation on top of Hadoop, which provides a data-ingestion mechanism for data into HDFS as data is generated.

Sqoop provides a way to import and export data to and from relational database tables (for example, SQL Server) and HDFS.

Oozie allows creation of workflow of MapReduce jobs. This is familiar with developers who have worked on Workflow and communication foundation based solutions.

HBase is Hadoop database, a NoSQL database. It is another abstraction on top of Hadoop, which provides a near real-time query mechanisms to HDFS data.

HBase is massively scalable and delivers fast random writes as well as random and streaming reads. It also provides row-level atomicity guarantees, but no native cross-row transactional support. From a data model perspective, column-orientation gives extreme flexibility in storing data and wide rows allow the creation of billions of indexed values within a single table. HBase is ideal for workloads that are write-intensive, need to maintain a large amount of data, large indices, and maintain the flexibility to scale out quickly. HBase is now being used by many other workloads internally at Facebook and many other company using this amazing big data technolog.

Mahout is a machine-learning library that contains algorithms for clustering and classification.

Good to know Apache Cassandra Terms:
CQL – Cassandra Query Language
RP – Random Partitioner
OPP – Order Preserving Partitioner
BOP – Byte Ordered Partitioner
RF – Replication Factor
CF – Column Family
JSON – Java Script Object Notation
BSON – Binary JSON
TTL – Time To Live
HDFS – Hadoop Distributed File System
CFS – Cassandra File System
UUID – Universal Unique IDentifier
DSE – Datastax Enterprise
AMI – Amazon Machine Image
OOM – Out Of Memory
SSTables – Sorted String Table
SEDA – Staged Event-Driven Architecture
CRUD – Create Read Update Delete

Big Data Architecture :
Big Data architecture is premised on a skill set for developing reliable, scalable, completely automated data pipelines. That skill set requires profound knowledge of every layer in the stack, beginning with cluster design and spanning everything from Hadoop tuning to setting up the top chain responsible for processing the data.

The main detail here is that data pipelines take raw data and convert it into insight (or value). Along the way, the Big Data engineer has to make decisions about what happens to the data, how it is stored in the cluster, how access is granted internally, what tools to use to process the data, and eventually the manner of providing access to the outside world. The latter could be BI or other analytic tools, the former (for the processing) are likely tools such as Impala or Apache Spark. The people who design and/or implement such architecture I refer to as Big Data engineers.

Facebook is starting to use Apache Hadoop technologies to serve realtime workloads. why facebook decided to use Hadoop technologies, the workloads that facebook have on realtime Hadoop, the enhancements that facebook did to Hadoop for supporting our workloads and the processes and methodologies facebook have adopted to deploy these workloads successfully.

Here is a link to the complete paper for those who are interested in understanding the details of why decided to use Hadoop_dwarehouse_2010  for a complete reference, please visit the Apache website

NoSQL Databases

Amazon SimpleDB
Azure Table Storage
Berkeley DB
CDB (Constant Databse)
Clusterpoint Server
Couchbase Server
EMC Documentum xDB
Event Store
Execom IOG
Faircom C-Tree
HSS Database
IBM Lotus/Domino
Infinite Graph
VMware vFabric GemFire
Intersystems Cache
ISIS Family
MarkLogic Server
Ninja Database Pro
OpenLink Virtuoso
Oracle NoSQL Database
RDM Embedded
Sones GraphDB
TIBCO Active Spaces
Tokyo Cabinet / Tyrant
U2 (UniVerse, UniData)


Let me know if you have any further question and your comments will be learning point.                                                                                                                  Back to top


Microsoft Certified Solutions Associate (MCSA)

%d bloggers like this: