Collecting logs from Apache Storm cluster in HDInsight
While running an Apache Storm topology in a multi node storm cluster different components of the topology log in different files that are saved in different nodes in the cluster, depending on where...
View ArticleHow to configure Hortonworks HDP to access Azure Windows Storage
Recently I was asked how to configure a Hortonworks HDP 2.3 cluster to access Azure Windows Storage. In this post we will go through the steps to accomplish this.The first step is to create an Azure...
View ArticleDealing with RequestRateTooLarge errors in Azure DocumentDB and testing...
In Azure DocumentDB support, one of the most common errors we have seen as reported by our customers is RequestRateTooLargeException or HTTP Status code 429. For example, from an application using...
View ArticleUnderstanding Spark’s SparkConf, SparkContext, SQLContext and HiveContext
The first step of any Spark driver application is to create a SparkContext. The SparkContext allows your Spark driver application to access the cluster through a resource manager. The resource manager...
View ArticleA KMeans example for Spark MLlib on HDInsight
Today we will take a look at Sparks's module for MLlib or its built-in machine learning library Sparks MLlib Guide . KMeans is a popular clustering method. Clustering methods are used when there is no...
View ArticleUsing Azure SDK for Python
Python is a great scripting tool with a large user base. In a recent support case I needed a way to constantly generate files with some random data in windows azure storage (wasb) in order to process...
View ArticleMulti-Stream support in SCP.NET Storm Topology
Streams are in the core of Apache Storm. In most cases topologies are based on a single input stream, however there are situations when one may need to start the topology with two or more input steams....
View ArticleHow to allow Spark to access Microsoft SQL Server
Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight the Microsoft SQL Server JDBC jar is already installed. On Linux the path is...
View ArticleIncremental data load from Azure Table Storage to Azure SQL using Azure Data...
Azure Data Factory is a cloud based data integration service. The service not only helps to move data between cloud services but also helps to move data from/to on-premises. For example, moving data...
View ArticleEncoding the Hive query file in Azure HDInsight
Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a schedule. Things were working fine for a while, but then we got an error that was hard to understand....
View ArticleEncoding 101 - Exporting from SQL Server into flat files, to create a Hive...
Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from SQL Server into Hive via flat text files. The main issue faced was encoding special Unicode characters...
View ArticleHow to call a Azure Machine Learning Web Service from NodeJS
Azure machine learning allows data scientists and developers to embed predictive analytics into applications. To learn more about Azure machine learning visit Azure machine learning documentation . A...
View ArticleHDInsight Hive Metastore fails when the database name has dashes or hyphens
Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a freshly created HDInsight cluster. Its brand new and fails on the first try, so what could be wrong?Our...
View ArticleA KMeans example for Spark MLlib on HDInsight
Today we will take a look at Sparks’s module for MLlib or its built-in machine learning library Sparks MLlib Guide . KMeans is a popular clustering method. Clustering methods are used when there is...
View ArticleUsing Azure SDK for Python
Python is a great scripting tool with a large user base. In a recent support case I needed a way to constantly generate files with some random data in windows azure storage (wasb) in order to process...
View ArticleMulti-Stream support in SCP.NET Storm Topology
Streams are in the core of Apache Storm. In most cases topologies are based on a single input stream, however there are situations when one may need to start the topology with two or more input steams....
View ArticleHow to allow Spark to access Microsoft SQL Server
Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight the Microsoft SQL Server JDBC jar is already installed. On Linux the path is...
View ArticleIncremental data load from Azure Table Storage to Azure SQL using Azure Data...
Azure Data Factory is a cloud based data integration service. The service not only helps to move data between cloud services but also helps to move data from/to on-premises. For example, moving data...
View ArticleEncoding the Hive query file in Azure HDInsight
Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a schedule. Things were working fine for a while, but then we got an error that was hard to understand....
View ArticleEncoding 101 – Exporting from SQL Server into flat files, to create a Hive...
Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from SQL Server into Hive via flat text files. The main issue faced was encoding special Unicode characters...
View Article