Making sense of challenging data topics one step at a time.

Category: Artificial Intelligence

Cognitive Services in Power Query

You might have heard the announcement last week that Microsoft is deprecating Azure Cognitive Services and Machine Learning functions Power Query. For many, it was news that this functionality existed inside of Power Query! Some others mocked the existence of this in Power Query. For me, it was a favorite diamond in the rough within the Power Query editor.

As a reminder, not everyone has access to the same robust resources. While I would agree that this might not be the most efficient option, it meets a need. It is like choosing between a hammer and pneumatic nail gun. If I am building a house, the powered option would be far more efficient. But if I am doing a little project, your old-fashioned hammer would make more sense.

So, with the deprecation of these features in Power Query, what are we to do? What if you have no access to Azure to leverage cognitive services? What if you want to try other models instead? With a little bit of hard work, we can put together a solution that might resolve this for you and even open doors for others!

Why This Change with Cognitive Services Makes Sense

For some, there is justified anger. I get it – it can be frustrating. However, I do not disagree with this move. If you leverage cognitive services for a single refresh, not a big deal. We get the analysis we need and can move on. But the problem starts to arise when people refresh the data over and over again.

A common use case for this feature would be to analyze survey results. If you left an open text field at the end of a survey for comments, we could perform some analysis. As results roll in, people refresh the dataset. However, each refresh prompts cognitive services to re-assess entries that were processed in previous refreshes. This adds overhead to the analysis and adds demand on Azure cognitive services.

As a result, this is the right choice. Instead of re-assessing an entire column on every refresh, it would be considered the right choice to structure your refresh to only assess new records within the dataset. That being said, there are still use cases where this solution made life a lot easier. Hence the purpose for this article.

Enter Python, Hugging Face, and KeyBERT

Another feature people do not know is you have the ability to run a Python script within Power Query. And with the rise of Artificial Intelligence, there are plenty of options out there for you to use! In doing some research, I landed on Hugging Face as an easy solution. However, if you choose to use another solution, that is up to you. However, the code we use can serve as a blueprint to make it work.

To prepare for this, you will need to install at least Python 3.9 or newer on your device. From there, you will need to install a few packages to make it usable. You need to install the following items to enable the functionality we need:

  • Transformations (Hugging Face)
  • Matplotlib
  • Torch (for running analysis on your local machine)
  • KeyBERT

To install these packages, open a terminal window as an administrator and run the following commands:

pip install transformations
python -m pip install -U matplotlib
pip install torch
pip install keybert

With these packages installed, we can replace the key text analysis tools from Azure cognitive services. Once we have some data inside of Power Query, we can transform the query by running a Python script under the transform ribbon:

Screenshot of Power Query with arrows directing you to the Transform tab and then the Run Python Script button to replace Azure Cognitive Services in Power Query.
Navigating to the Run Python Script button

From there, we can put whatever code in the window we want to replicate the Azure cognitive services experience.

Use Case #1 – Sentiment Analysis

A common need is to analyze the sentiment of a block of text. This is not easy, but Hugging Face makes it easier for us to accomplish this goal. In our scenario, we have a dataset that has a “Comments” column that needs to be analyzed. When we click the button for Python, we will use the following code:

# 'dataset' holds the input data for this script

import pandas as pd
from transformers import pipeline

# 'dataset' is automatically provided by Power BI
df = dataset.copy()

# Clean and prepare text input
text = df["Comments"].astype(str).tolist() # Update column name to match your dataset

# Load the model
sentiment_pipeline = pipeline("sentiment-analysis")

# Analyze sentiment for each line of text
results = sentiment_pipeline(text)

dataset[['label','score']] = pd.DataFrame(results)

To update this code for your use, you just need to replace the column name found within the data frame under the clean and prepare text input section. You can also adjust the model you wish to use user the load the model section as well. Once you click OK, you can drill down on the table on the dataset line:

Drilling down on the sentiment analysis results from Hugging Face
Drilling down on Sentiment Analysis results

Once that is completed, you will see the sentiment label and score from the entries:

Screenshot of the sentiment label and score from Hugging Face in place of Azure cognitive services.
Sentiment label and score is now available

It is that easy! But the next challenge is getting keywords out of these comments.

Use Case #2 – Keyword Extraction

The next challenge is pulling keywords from a block of text. This is where KeyBERT comes in and assists. Just like we did for sentiment analysis, we will click the Run Python Script button and use this block of code:

# 'dataset' holds the input data for this script

import pandas as pd
from keybert import KeyBERT

# 'dataset' is automatically provided by Power BI
df = dataset.copy()

# Clean and prepare text input
texts = df["Comments"].astype(str).tolist() # Update column name to match your dataset

# Load the model
kw_model = KeyBERT(model='all-MiniLM-L6-v2')

# Extract keywords for each text
keyword_results = []
for text in texts:
    keywords = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), stop_words='english')
    keyword_phrases = [kw[0] for kw in keywords]  # Extract just the phrases
    keyword_results.append(", ".join(keyword_phrases))  # Join into a single string

# Add keywords back to the DataFrame
df["keywords"] = keyword_results

# Output for Power BI
results = df

As before, you can specify the column in your dataset that will be analyzed under clean and prepare text input. Once you click OK, the script will run and you can drill down on the results output:

Screenshot of the results from the Python script using KeyBERT.
Drill down on the results to see the keywords found

Now we can see the keywords that were extracted from the text:

Screenshot of the keyword results from KeyBERT in Power Query.
Keyword results from KeyBERT

From here, you can split out the words and use them as you deem appropriate!

Cognitive Services versus Python

There are advantages and challenges with both solutions. But results are results, so let’s break it down together so you can make the right decisions.

The Good

First and foremost, I get the desired outcomes from both solutions. While it takes more work than with cognitive services, the use of Python gets the job done. In fact, in some ways it might be better. With Python, you can get the label in addition to the score where cognitive services only gives you the score. That is quite nice!

Second, this is a huge win for people who do not have access to a premium Power BI experience. To use cognitive services, you must have a Fabric Capacity, Power BI Premium Capacity (if in a sovereign tenant), or a Premium Per User license. If you are an independent that uses Power BI Desktop for free, this is an easy way to get this functionality with no additional cost. It also means it is able to be refreshed in the Power BI service in a pro workspace. That is pretty cool!

The Bad

While it is not the most difficult process, you will need to install Python on your device. You might not be allowed to do it or don’t feel comfortable doing it. You also might run into errors and need to work through them (pro tip – Copilot Chat is great for helping you with this). But I promise it is not that bad as long as you can do it.

Next, you are running the models on your device. If they are not beefy enough to handle a large volume of data, it could really bog you down. However, you can work through this by limiting the rows of your table with a parameter and then publish to the service when you are ready to refresh the whole query.

What About Performance?

If it is hard to run on my machine, what does true performance look like? Glad you asked! I ran both use cases with cognitive services and Python within the Power BI service. This allowed for an equal footing and provide a little control to the analysis.

Tests were ran on the same query of 500 entries that are in a CSV file. This was to emulate the survey results use case with a tool like Microsoft Forms. I ran ten refreshes for each to verify performance.

Use CaseCognitive ServicesPython
Sentiment AnalysisAvg: 1:28
Min: 1:19
Max: 2:21
Avg: 1:20
Min: 0:49
Max: 2:25
Keyword ExtractionAvg: 0:37
Min: 0:21
Max: 1:42
Avg: 1:28
Min: 1:19
Max: 2:21

Initially as I was running this on my device, I was worried about overall performance, but it turned out that for sentiment analysis it was pretty similar overall. For the most part, my refreshes were pretty close on timing and even outperformed by a slight bit.

However, keyword extraction proved to be not as performant with refreshes taking three times as longer to complete. With a small dataset, I do not think this would be the end of the world. However, if I were to increase the number of records to 5,000, there could be an issue.

I think with some exploration, you could find models that would extract keywords faster, but keeping in mind that the solution should not be applied to massive amounts of data as it would not be considered the most efficient option. Handling this process upstream as data in added to the source would be considered a far better experience from a processing standpoint.

Conclusion

Like I said earlier, this is not a solution for every situation. But it does fill a need and you might have it. It is worth checking out if you are currently leveraging cognitive services or need a fix for your existing solution. If you have never tried using Python in Power Query, it might be an easy way for you to jump in! Let me know how it works for you in the comments below!

Artificial Intelligence is Available Inside of Power Query?!

Unless you have lived under a rock in the past year, you know that artificial intelligence is a popular topic. Whether it is Copilot, ChatGPT, Bard, or Open AI, artificial intelligence is here to stay. It is time to embrace the utilization of artificial intelligence and learn how to embrace it for our benefit.

But surely, the use of artificial intelligence inside of Power Query is brand new, right? Wrong! I remember seeing it available five years ago and it is a powerful tool to integrate with your models! I am not going to be able to go into all of the details of artificial intelligence in this article, but rather help you understand how to integrate it within your existing queries.

What Is Artificial Intelligence?

In the fast few years, Artificial Intelligence (AI) has become a force in society. It permeates the world we live in and has become an integral part of how things are done. We see examples of it in our lives through things like chat bots, self-driving cars, and even social media. While some may find it intimidating, the reality of AI is that it has the power to make our lives easier.

Artificial Intelligence is used to describe technologies that perform tasks that once required human intelligence to complete. When properly deployed, AI models can recognize patterns, solve problems, make decisions, and understand natural language. These models do not organically exist. They are created through extensive training activities to ensure the results you receive are valid. The work we perform with Power Query may result in the creation of a model while others will leverage one.

While generative artificial intelligence (GenAI) is all the rage right now, we will not be using it for our queries. Instead, we will be integrating models that have been trained to provide a response. You might be the creator of these models while others are creating them for you. The result is the opportunity to bring back a predictive response based upon your query.

How to Get Started?

While Power Query is available in both Microsoft Excel and Power BI, this functionality is only found in Power BI. While you can leverage similar AI models in Microsoft Excel, it is done outside of Power Query. This is an important item to note if you prefer using Power Query inside of Excel.

Second, you must have some form of a premium workspace to enable this functionality. This can be in the form of a Premium Per User, Premium Capacity, or Azure Power BI Embed environment. If you already have one of these environment at your disposal, you are set. If not, check out my article on licensing strategies around Power BI. I recommend spinning up an Azure Power BI Embed SKU or activate a Premium Per User trial if you want to try these features at a minimal cost.

What if you do not have a Power BI environment available to you? Unfortunately, you will not be able to take advantage of this functionality inside of Power Query. I recommend you work with your organization to spin up a trial to explore this functionality.

Pre-Built Models in Power Query

Power Query allows us to leverage pre-built models. Leveraging Azure Cognitive Services, we can perform text and image analysis on our queries. We can do this from the add column ribbon inside of Power Query:

Screenshot of the add column ribbon inside of Power Query with an arrow pointing to the text analytics function.
Activating the Text Analytics function

If you are doing this for the first time, you will be prompted to allow the analysis of your data. Once you have done that, you can select which model you wish to use:

Screenshot of the Text Analytics screen with options to select the detect language, extract key phrases, and score sentiment models.  Next, you can select the column to analyze and language associated with the text.
Selecting a Text Analytics model for your analysis

As you can see, you can select multiple pre-built models to perform your analysis. With text based analysis, you should consider both the text and the language being used. This helps ensure the proper analysis is being performed. By default, it will use your tenant’s default language. If your column is using the same language consistently in the query, you can hard code it in this screen. However, if you have multiple languages, you might need to to use the detect language model to bolster your analysis.

You can also identify tags for images you ingest with Power Query. This is not easily performed, but is ideal for small images. Because of the practicality of this, you will likely not use the vision model. However, you can try it out for yourself:

Screenshot of the tag images model in the vision module from Azure Cognitive Services.
Configuring the tag images model

These models can provide quick analysis and provide meaningful results with minimal effort. I find that combining these results with other dimensions it allows me to sort through results and find the most meaningful responses in surveys.

Custom Artificial Intelligence Models

If you have created a custom model using Azure Machine Learning, you can leverage it within Power Query. Leveraging a public dataset around diabetes, I created a custom model with Azure Machine Learning. With an endpoint created, I can access it within Power Query with a few clicks:

Screenshot of the home ribbon in Power Query with an arrow pointing to the Azure Machine Learning function.
Activating the Azure Machine Learning function

Once I have selected Azure Machine Learning, I can select which endpoint I wish to use. I have already deployed my model and selected it below:

Screenshot of an Azure Machine Learning endpoint being configured for Power Query.
Configuring my predictive model in Power Query

As you can see, there are multiple inputs I must identify for my endpoint to return a predictive response. I can identify the different columns in my query and return a result. We will need to identify values for each field to allow this to work.

It is important to consider that Azure Machine Learning uses a consumption pricing model. Every time you refresh your query, it is seeking new predictions. Keep this in mind when building out your queries as you could rack up some extra costs without realizing it!

Any Other Options?

If you are a user of R or Python, you can leverage custom algorithms and models inside of Power Query as well. You need to deploy software on your device to make this work and can be prohibitive if you do not have admin access on your machine. I do go into detail on these tools in Chapter 11 of my book, but decided not to discuss them in this post. Primarily, I wanted this to be about out of the box and easy implementation.

Conclusion

It is amazing how artificial intelligence has infiltrated Power Query. For many, they have no clue it was there or what is needed to get it activated. With a little effort, you can really transform your queries with a few clicks. I recommend you try it out to see how you can harness the value of artificial intelligence in your queries!

So did you know about this feature in Power Query? Have you used it before? Are you going to try it out? Tell me about it in the comments below!

Powered by WordPress & Theme by Anders Norén