Cognitive Services in Power Query

You might have heard the announcement last week that Microsoft is deprecating Azure Cognitive Services and Machine Learning functions Power Query. For many, it was news that this functionality existed inside of Power Query! Some others mocked the existence of this in Power Query. For me, it was a favorite diamond in the rough within the Power Query editor.

As a reminder, not everyone has access to the same robust resources. While I would agree that this might not be the most efficient option, it meets a need. It is like choosing between a hammer and pneumatic nail gun. If I am building a house, the powered option would be far more efficient. But if I am doing a little project, your old-fashioned hammer would make more sense.

So, with the deprecation of these features in Power Query, what are we to do? What if you have no access to Azure to leverage cognitive services? What if you want to try other models instead? With a little bit of hard work, we can put together a solution that might resolve this for you and even open doors for others!

Why This Change with Cognitive Services Makes Sense

For some, there is justified anger. I get it – it can be frustrating. However, I do not disagree with this move. If you leverage cognitive services for a single refresh, not a big deal. We get the analysis we need and can move on. But the problem starts to arise when people refresh the data over and over again.

A common use case for this feature would be to analyze survey results. If you left an open text field at the end of a survey for comments, we could perform some analysis. As results roll in, people refresh the dataset. However, each refresh prompts cognitive services to re-assess entries that were processed in previous refreshes. This adds overhead to the analysis and adds demand on Azure cognitive services.

As a result, this is the right choice. Instead of re-assessing an entire column on every refresh, it would be considered the right choice to structure your refresh to only assess new records within the dataset. That being said, there are still use cases where this solution made life a lot easier. Hence the purpose for this article.

Enter Python, Hugging Face, and KeyBERT

Another feature people do not know is you have the ability to run a Python script within Power Query. And with the rise of Artificial Intelligence, there are plenty of options out there for you to use! In doing some research, I landed on Hugging Face as an easy solution. However, if you choose to use another solution, that is up to you. However, the code we use can serve as a blueprint to make it work.

To prepare for this, you will need to install at least Python 3.9 or newer on your device. From there, you will need to install a few packages to make it usable. You need to install the following items to enable the functionality we need:

Transformations (Hugging Face)
Matplotlib
Torch (for running analysis on your local machine)
KeyBERT

To install these packages, open a terminal window as an administrator and run the following commands:

pip install transformations
python -m pip install -U matplotlib
pip install torch
pip install keybert

With these packages installed, we can replace the key text analysis tools from Azure cognitive services. Once we have some data inside of Power Query, we can transform the query by running a Python script under the transform ribbon:

Screenshot of Power Query with arrows directing you to the Transform tab and then the Run Python Script button to replace Azure Cognitive Services in Power Query. — *Navigating to the Run Python Script button*

From there, we can put whatever code in the window we want to replicate the Azure cognitive services experience.

Use Case #1 – Sentiment Analysis

A common need is to analyze the sentiment of a block of text. This is not easy, but Hugging Face makes it easier for us to accomplish this goal. In our scenario, we have a dataset that has a “Comments” column that needs to be analyzed. When we click the button for Python, we will use the following code:

# 'dataset' holds the input data for this script

import pandas as pd
from transformers import pipeline

# 'dataset' is automatically provided by Power BI
df = dataset.copy()

# Clean and prepare text input
text = df["Comments"].astype(str).tolist() # Update column name to match your dataset

# Load the model
sentiment_pipeline = pipeline("sentiment-analysis")

# Analyze sentiment for each line of text
results = sentiment_pipeline(text)

dataset[['label','score']] = pd.DataFrame(results)

To update this code for your use, you just need to replace the column name found within the data frame under the clean and prepare text input section. You can also adjust the model you wish to use user the load the model section as well. Once you click OK, you can drill down on the table on the dataset line:

Drilling down on the sentiment analysis results from Hugging Face — *Drilling down on Sentiment Analysis results*

Once that is completed, you will see the sentiment label and score from the entries:

Screenshot of the sentiment label and score from Hugging Face in place of Azure cognitive services. — *Sentiment label and score is now available*

It is that easy! But the next challenge is getting keywords out of these comments.

Use Case #2 – Keyword Extraction

The next challenge is pulling keywords from a block of text. This is where KeyBERT comes in and assists. Just like we did for sentiment analysis, we will click the Run Python Script button and use this block of code:

# 'dataset' holds the input data for this script

import pandas as pd
from keybert import KeyBERT

# 'dataset' is automatically provided by Power BI
df = dataset.copy()

# Clean and prepare text input
texts = df["Comments"].astype(str).tolist() # Update column name to match your dataset

# Load the model
kw_model = KeyBERT(model='all-MiniLM-L6-v2')

# Extract keywords for each text
keyword_results = []
for text in texts:
    keywords = kw_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), stop_words='english')
    keyword_phrases = [kw[0] for kw in keywords]  # Extract just the phrases
    keyword_results.append(", ".join(keyword_phrases))  # Join into a single string

# Add keywords back to the DataFrame
df["keywords"] = keyword_results

# Output for Power BI
results = df

As before, you can specify the column in your dataset that will be analyzed under clean and prepare text input. Once you click OK, the script will run and you can drill down on the results output:

Screenshot of the results from the Python script using KeyBERT. — *Drill down on the results to see the keywords found*

Now we can see the keywords that were extracted from the text:

Screenshot of the keyword results from KeyBERT in Power Query. — *Keyword results from KeyBERT*

From here, you can split out the words and use them as you deem appropriate!

Cognitive Services versus Python

There are advantages and challenges with both solutions. But results are results, so let’s break it down together so you can make the right decisions.

The Good

First and foremost, I get the desired outcomes from both solutions. While it takes more work than with cognitive services, the use of Python gets the job done. In fact, in some ways it might be better. With Python, you can get the label in addition to the score where cognitive services only gives you the score. That is quite nice!

Second, this is a huge win for people who do not have access to a premium Power BI experience. To use cognitive services, you must have a Fabric Capacity, Power BI Premium Capacity (if in a sovereign tenant), or a Premium Per User license. If you are an independent that uses Power BI Desktop for free, this is an easy way to get this functionality with no additional cost. It also means it is able to be refreshed in the Power BI service in a pro workspace. That is pretty cool!

The Bad

While it is not the most difficult process, you will need to install Python on your device. You might not be allowed to do it or don’t feel comfortable doing it. You also might run into errors and need to work through them (pro tip – Copilot Chat is great for helping you with this). But I promise it is not that bad as long as you can do it.

Next, you are running the models on your device. If they are not beefy enough to handle a large volume of data, it could really bog you down. However, you can work through this by limiting the rows of your table with a parameter and then publish to the service when you are ready to refresh the whole query.

What About Performance?

If it is hard to run on my machine, what does true performance look like? Glad you asked! I ran both use cases with cognitive services and Python within the Power BI service. This allowed for an equal footing and provide a little control to the analysis.

Tests were ran on the same query of 500 entries that are in a CSV file. This was to emulate the survey results use case with a tool like Microsoft Forms. I ran ten refreshes for each to verify performance.

Use Case	Cognitive Services	Python
Sentiment Analysis	Avg: 1:28 Min: 1:19 Max: 2:21	Avg: 1:20 Min: 0:49 Max: 2:25
Keyword Extraction	Avg: 0:37 Min: 0:21 Max: 1:42	Avg: 1:28 Min: 1:19 Max: 2:21

Initially as I was running this on my device, I was worried about overall performance, but it turned out that for sentiment analysis it was pretty similar overall. For the most part, my refreshes were pretty close on timing and even outperformed by a slight bit.

However, keyword extraction proved to be not as performant with refreshes taking three times as longer to complete. With a small dataset, I do not think this would be the end of the world. However, if I were to increase the number of records to 5,000, there could be an issue.

I think with some exploration, you could find models that would extract keywords faster, but keeping in mind that the solution should not be applied to massive amounts of data as it would not be considered the most efficient option. Handling this process upstream as data in added to the source would be considered a far better experience from a processing standpoint.

Conclusion

Like I said earlier, this is not a solution for every situation. But it does fill a need and you might have it. It is worth checking out if you are currently leveraging cognitive services or need a fix for your existing solution. If you have never tried using Python in Power Query, it might be an easy way for you to jump in! Let me know how it works for you in the comments below!

Making sense of challenging data topics one step at a time.

Cognitive Services in Power Query

Why This Change with Cognitive Services Makes Sense

Enter Python, Hugging Face, and KeyBERT

Use Case #1 – Sentiment Analysis

Use Case #2 – Keyword Extraction

Cognitive Services versus Python

The Good

The Bad

What About Performance?

Conclusion

Leave a Reply Cancel reply

Cognitive Services in Power Query

Why This Change with Cognitive Services Makes Sense

Enter Python, Hugging Face, and KeyBERT

Use Case #1 – Sentiment Analysis

Use Case #2 – Keyword Extraction

Cognitive Services versus Python

The Good

The Bad

What About Performance?

Conclusion

Should I Go Outside? Real Time Intelligence in Action – Part 1

Leave a Reply Cancel reply