Big Data Support

↧

Image may be NSFW.
Clik here to view.

How to use parameter substitution with Pig Latin and PowerShell

August 12, 2014, 9:37 am

When running Pig in a production environment, you'll likely have one or more Pig Latin scripts that run on a recurring basis (daily, weekly, monthly, etc.) that need to locate their input data based on...

View Article

Image may be NSFW.
Clik here to view.

How to use HBase Java API with HDInsight HBase cluster, part 1

November 4, 2014, 9:56 pm

Recently we worked with a customer, who was trying to use HBase Java API to interact with an HDInsight HBase cluster. Having worked with the customer and trying to follow our existing documentations...

View Article

Image may be NSFW.
Clik here to view.

Some Commonly Used Yarn Memory Settings

November 11, 2014, 4:27 am

We were recently working on an out of memory issue that was occurring with certain workloads on HDInsight clusters. I thought it might be a good time to write on this topic based on all the current...

View Article

Image may be NSFW.
Clik here to view.

Loading data in HBase Tables on HDInsight using bult-in ImportTsv utility

December 12, 2014, 10:02 am

Apache HBase can give random access to very large tables-- billions of rows X millions of columns. But the question is how do you upload that kind of data in the Hbase tables in the first place? HBase...

View Article

Image may be NSFW.
Clik here to view.

Problems When Using a Shared Default Storage Container with Multiple...

February 12, 2015, 8:06 am

We have seen several cases come in to Microsoft Support that ended up being caused by having multiple HDInsight clusters using the same Azure Blob Storage container for default storage. While we don't...

View Article

Image may be NSFW.
Clik here to view.

Azure PowerShell 0.8.14 Released, fixes problems with pipelining HDInsight...

February 16, 2015, 7:16 am

We recently pushed out the 0.8.14 release of Azure PowerShell. This release includes some updates to the following cmdlets to ensure that values passed in via the PowerShell pipeline, or via the...

View Article

Image may be NSFW.
Clik here to view.

Sqoop Job Performance Tuning in HDinsight (Hadoop)

February 17, 2015, 5:36 pm

OverviewApache Sqoop is designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. HDInsight is Hadoop cluster deployed in Microsoft...

View Article

Image may be NSFW.
Clik here to view.

Understanding HDInsight Custom Node VM Sizes

May 11, 2015, 11:00 am

With the 02/18/2015 update to HDInsight and Azure Powershell 0.8.14 we introduced a lot more options for configuring custom Head Node VM size as well as Data Node VM size and Zookeper VM size. Some...

View Article

Image may be NSFW.
Clik here to view.

Why are the Hadoop services disabled on my HDInsight cluster

May 31, 2015, 12:26 am

I came across this question while working with a few customers recently and thought I would share a few tips with others who may find it helpful. There are times when we may need to check the status of...

View Article

Image may be NSFW.
Clik here to view.

How to install Splunk on HDINSIGHT with a custom action script

June 1, 2015, 2:48 pm

Recently I worked with a customer that wanted to use Splunk Enterprise and Splunk Forwarder to monitor and manage their HDINSIGHT Storm cluster. You can learn more about Splunk at...

View Article

Image may be NSFW.
Clik here to view.

How to access Hive using JDBC on HDInsight

June 9, 2015, 3:59 pm

While following up on a customer question recently on this topic, I realized that we have seen the same question coming up from other users a few times and thought I would share a simple example here...

View Article

Image may be NSFW.
Clik here to view.

Spark on Azure HDInsight is available

July 14, 2015, 1:57 pm

Spark on Azure HDInsight (public preview) is now available!The following components are included as part of a Spark cluster on Azure HDInsight.Spark 1.3.1 Comes with Spark Core, Spark SQL, Spark...

View Article

Image may be NSFW.
Clik here to view.

Azure Data Factory JSON Changes in July 2015

July 21, 2015, 8:39 pm

Azure Data Factory factories are designed with a series of fairly simple JSON documents and uploaded to Azure using either the web interface, PowerShell, .Net, or Visual Studio. If you were using the...

View Article

Image may be NSFW.
Clik here to view.

Spark or Hadoop

July 27, 2015, 7:45 am

Spark is the most active Apache project and has a lot of media press in the big data world. So how do you know if Spark is right for your project and what is the difference between Spark and Hadoop...

View Article

Image may be NSFW.
Clik here to view.

Using cross/outer apply in Azure Stream Analytics

August 5, 2015, 4:54 am

Recently I got involved in working with a problem where JSON data events contain an array of values. The goal was to read and process entire JSON data event including the array and the nested values...

View Article

Image may be NSFW.
Clik here to view.

Why is my spark application running out of disk space?

August 12, 2015, 10:56 am

In your zeppelin notebook you have scala code that loads parquet data from two folders that is compressed with snappy. You use SparkSQL to register one table named shutdown and another named census....

View Article

Image may be NSFW.
Clik here to view.

How to Access HDInsight Linux Web UI's using SSH Dynamic Tunneling

August 12, 2015, 1:02 pm

ScenarioOne of the most important feature of Azure HDInsight Linux (currently on preview), is the feature available on the portal, called Ambari Web. If you open up Azure Portal and select your HDI...

View Article

Image may be NSFW.
Clik here to view.

Troubleshooting Hive query performance in HDInsight Hadoop cluster

August 13, 2015, 4:03 pm

One of the common support requests we get from customers using Apache Hive is –my Hive query is running slow and I would like the job/query to complete much faster – or in more quantifiable terms, my...

View Article

Image may be NSFW.
Clik here to view.

Some things to consider for your Spark on HDInsight workload

August 19, 2015, 9:02 am

When it comes time to provision your Spark cluster on HDInsight we all want our workloads to execute fast. The Spark community has made some strong claims for better performance compared to mapreduce...

View Article

Image may be NSFW.
Clik here to view.

Troubleshooting Oozie or other Hadoop errors with DEBUG logging

August 21, 2015, 5:35 pm

In troubleshooting Hadoop issues, we often need to review the logging of a specific Hadoop component. By default, the logging level is set to INFO or WARN for many Hadoop components like Oozie, Hive...

View Article