top of page
Search

Capturing Home Assistant data from Azure Event Hub into the free Kusto cluster

  • Writer: Sergey Goncharenko
    Sergey Goncharenko
  • Mar 19, 2023
  • 2 min read

In case you didn't know, there is a possibility to get a free cluster in Azure Data Exporer (Kusto). It has some limitations, but data ingestion can be still done using some hacks. First, we need to capture events from the Event Hub. Here are two options I came across and was able to setup:


Azure Streaming Analytics Job

This is very easy with a built-in Azure Portal no-code editor. You just need to conenct to the Event Hub (and you can use Basic tier, because one consumption group is enough for this setup) and send all of the data to the Azure Data Lack Storage Gen2.


Event Hub Standard Capture

You just need to enable build-in Event Hub capture in the avro format to a storage account.


Data Ingestion into a free Kusto Cluster

Then we need to inject that data into free Kusto cluster. For that you can use built-in Kusto command queries. But first we need to get list of ".parquet" files which ADLS Gen2 generates on the storage account:


Then I use the following contruction to create a list of blobs which can be used with the Kusto ingestion command.


The first one filters out non-parquet files from the list, the second - creates an array with Kusto connection strings, third is a string function to create single comma separated list which can be directly used with Kusto action


Here is the last on the screenshot neat replace funtion which using string functions create an array - ths is way cheaper than doing the For-Each in the Logic App


replace(replace(replace(string(body('Select')),'"},{"stringName":"',','),'[{"stringName":"',''),'"}]','')

Now we can simple use .ingest Kusto control command. I use async to save on the logc app running time and thus cost of the operation.


Cluster URL you can see on your Azure Data Explorer page for your free cluster

Then I just use scheduling trigger to execute this data import each hour.

You could start experiending some issues with free cluster throttling you down, in this case I can recommend to split List Blob command onto pages and iterate via every page separately


Another issue that you will see is that this hack creates duplicates. To avoid that - I started to recreate the reporting table on each logic app run:


For avro inection the solution is alittle bit different

We are filtering out non-avro blobs

And use avro format for ingestion

Then I used the same trick as with parquet to avoid duplicates (table rename, creating new emty one and removing old one before the injection). But before all of that I use some internal Kusto conversion to convert Unicode bits in the avro format into the string and ingest only data that is currently missing in the reporting table, for log apps avro injecton I used a separate table. Here is the Kust magic to ingest from table into table HA data without duplicates

Here is the result - chart of Home Assitant historical data in Azure Data Explorer using some Kusto magic



I found avro hack to be on a cheaper side of things, although not much because you have to use Standard Event Hub tier because we need standart capture to Avro (not streaming analytics job)

 
 
 

Comentarios


Sergey's Tech Hacks and Tips

© Sergey's Tech Hacks and Tips, 2023. Created with Wix.com

bottom of page