Making sense of challenging data topics one step at a time.

Category: Power Query Page 1 of 2

How do I connect to REST APIs with Power BI? (Part 5)

In preparing for the Global Power BI and Fabric Summit this year, I got a little nostalgic. I went back to my presentation and early blog posts on connecting to REST APIs. Two years, several blog posts, and a book on Power Query later, I felt it was time to add a fifth part to this series. I realized that I left out one additional step I could take to optimize my query a little more to allow me to squeeze the most out of my API call limits.

I will be picking up where we left off with part 4 of how to connect to data with REST APIs. However, if you are new to my blog, you can start from part 1 and get caught up. If you recall, I am limited to 100 API calls a day with my scenario. If I am watching more than 100 stocks, I might need to consider moving to the next subscription level. However, I have an option to squeeze even more out and potentially return 1,000 stock symbols. This is a bonus as I do not need to increase my subscription!

What Did Dom Miss in Part 4?

To note what I missed, I need to go back to part 2 on REST APIs. You must Read the Fine Manual! In reading the documentation, I realized that I could batch multiple stock symbols at once. Instead of making individual calls for each symbol, I can group ten of them in a single call.

Screenshot of the Yahoo Finance REST APIs documentation
Guess I need to take my own advice?

With this knowledge in hand, I set out to take my query one step further and get more out of my subscription. It is not as straight forward as it may seem, but with a little manipulation we can make it work.

Prepare my List

We already have a spreadsheet of stock symbols we have brought in. We need to do start by removing all of the columns except the stock symbol. Just right click on the symbol column and remove other columns. Once that is complete, insert an index column that starts from 1. Your query should look like this:

Screenshot of the symbol column with the index column next to it.
Make sure your index starts from 1!

Next, we need to add a batch column. We will need to identify every 10th row and assign it an index number. We will need to create a custom column for this. Using a modulo division function, we will identify every tenth row. If it is the tenth row, the remainder will be zero. If the modulo division equals zero, we will divide the index by 10. Otherwise, we will return a null value. The formula looks like this:

if 
  Number.Mod([Index], 10) = 0 
  then Number.IntegerDivide([Index], 10) 
  else null

Next, we just need to use the fill up feature to make sure the rows are assigned to a batch. The last few rows will still be null. This is not an issue as it will still be its own batch. The result looks like this:

Screenshot of our stock symbol list with the associated batch number.
Our stock symbols associated with a batch

With our batch in place, we can remove the index column. Next, we need to group our symbols together.

Use Group By to Bring Symbols Together

Going back to our documentation, each symbol must be separated by a comma. We will use the group by function to bring ten rows at a time together into a single column. We will select our batch column and select the group by function in the transform ribbon. Using the sum aggregation, we are going to summarize our symbol column. Naturally, this will result in an error, but we will fix that next.

Screenshot of the group by function in Power Query
Setting up the group by function

Next, we will need to update the M code in the advanced editor. We need to replace the List.Sum() function with the Text.Combine() function. Our code for this step will look like this:

Table.Group(#"Removed Columns", {"Batch"}, 
  {{"Symbols", each Text.Combine([Symbol], ","), 
  type nullable text}})

The result an updated query with our stock symbols combined:

Screenshot of the combined stock symbols
Stock symbols combined in a single cell by batch

Now that the hard part is over, we just need to apply our function.

Functions and REST APIs

We already created a function in part 3 of my REST APIs series. We just need to apply it to our new query. Go to the add column ribbon and insert a column from function. Using the symbols column for the parameter, we can query all of the symbols in the cell.

Invoking our custom function that use REST APIs
Invoking the Get-StockInfo function

All that is left to do is expand the results so they can be leveraged in the model:

Screenshot of our results from the REST APIs
Stock data from our endpoint

And just like that, we turned twenty five API calls to three. Not too shabby!

Conclusion

It is extra work, but it is worth while if you can save a few dollars and minutes with your refresh. It may seem excessive, but it might make the difference. Besides, I could not resist a post where I tell people to Read the Fine Manual as I know my friend Mike will get a good laugh out of it again!

Have you ever used the group by function in Power BI? Or ever extended like this? Did I miss another way to optimize this query? If so, tell me in the comments below!

Refresh from Files Without a Power BI Gateway?!

I cannot begin to tell you how many people I meet who think they need a Power BI Gateway to refresh their dataset using a file as a source. Yes, if you have selected a file data source, you will need to use a Power BI Gateway. However, there is a way around this challenge. It takes a little extra work to prepare your dataset, but you can make it work.

Before everyone starts moaning about using files as a data source, sometimes this is unavoidable. I worked with a third-party business travel platform that did not provide an endpoint for me to connect directly to the data. As a result, I needed to export data on a weekly basis to feed my dashboard. Since then, that platform has resolved this issue.

I also have a current scenario where I do not have access to the APIs for an ERP solution. Because of roles and permissions, I am not allowed to access this data. A creative solution is to export the data through a scheduled job that sends a CSV file to my email. From there, I can use Power Automate to position the data so it can be accessed by the Power BI Service without using a gateway. Is it ugly? No doubt. Somewhat obnoxious of a process? You betcha! But does it work? Absolutely!

What is the Power BI Gateway?

The Power BI Gateway might be new to you. If you are not an administrator of your Power BI environment, you probably do not know about it. The concept is simple – the gateway brokers access to your data sources that reside on premises within your network. Instead of opening up firewall ports or creating crazy firewall rules, you can deploy a gateway to broker the connection.

There are two types of gateways that can be used. The enterprise gateway provides an enterprise-wide connection for your data sources with the controls to manage access to them. Generally, you install the gateway on a server that has access to your data sources. Once it is in place, you configure your data connections within the Power BI Service by adding credentials and delegating access. The biggest benefit besides the controls, you can provide redundancy to your gateway.

The other type of gateway is the personal gateway. This is deployed by an individual user. The data source must be accessible from the device from the personal gateway, but controls of the data source are essentially inherited by the user. This is a quick way to get a gateway in place, but causes a risk because it is tied to an individual user with no oversight. In addition, your device must remain on for the refresh to be successful which does not always work out.

Regardless of the type, the Power BI Gateway allows you to connect to any data source that resides on premises. While files work with gateways, they are a bit of a burden to manage. Unlike other sources such as SQL Servers, they have a tendency to move around or adjust. If we can manage this without the gateway, it will be easier for end users to manage without having to involve an administrator in the process.

Bypassing the Power BI Gateway

In short, the easiest way to bypass using a Power BI Gateway for your files is to use SharePoint Online. Since it is part of Microsoft 365 and already in the cloud, you do not need a gateway to connect to your file. Start by making sure your file is in SharePoint and open it in the desktop app from the site. From there, go to the file ribbon, select info, and click on copy path.

Once you have a file path, create a new source in Power Query and select the web connector. Paste your URL into the field, but remove the ?web=1 at the end of the path. Click OK and use your Organizational Account to access your file. From there, you will be prompted to go through and access the data based on the file structure. I talk about the web connector with REST APIs, but the concept is the same if you have never used this connector before.

Screenshot of the Web Connector configured to connect to a file inside of SharePoint Online without the Power BI Gateway.
Configuring the Web Connector for your file in SharePoint

Now, this process works well for Excel Workbooks and CSV files as they can be opened with Microsoft Excel. But what about JSON, XML, or Parquet files? You can get around this challenge by simply adding any Microsoft Office file into the directory to get the file path and replace the file name with the correct one. This will save you the effort of building out the URL.

But I Already Have Done So Much!

So often, I have people tell me that they have spent hours transforming their query already and fear they need to start from scratch. But believe it or not, it is easy to resolve. Let’s start with this block of code in your advanced editor:

let
  Source = Excel.Workbook("c:\data\taxi_zone_lookup.xlsx", 
    null, true), 
  ExcelData = Source{[Item = "TaxiZoneTable", Kind = "Table"]}[Data]
in
  ExcelData

We can see from the code block above, we are accessing the file from a folder in the root of our hard drive. Once we have in inside of SharePoint, we can replace the “c:\data\taxi_zone_lookup.xlsx” with a web connector:

let  
  Source = Excel.Workbook(
    Web.Contents(   "https://<tenant>.sharepoint.com/sites/NYCTaxi/Shared%20Documents/Taxi%20Data/taxi_zone_lookup.xlsx"
    ), 
    null, 
    true
  ), 
  ExcelData = Source{[Item = "TaxiZoneTable", Kind = "Table"]}[Data]
in
  ExcelData

You will have to excuse the formatting in this example, but you can see it was a simple swap out in the code. With the change in place, we did not need to make any additional changes to the downstream code. While this code block is small, we could have a lot of transformations in place which would require considerable rework.

Anything Else to Consider?

A big benefit to moving to the cloud location is that you can easily share your Power BI Desktop file for development work without having to update the file path. You will need to share the file with users who will be working directly with the source, but that is easy to manage.

In addition, you can easily add files to your data source without having to get an administrator involved. Regardless of the type of Power BI Gateway you want to use, you will likely need an administrator to help you. If you are using the enterprise gateway, you will need a platform administrator to configure the connection. If you want to use a personal gateway, you might need someone with administrator rights to install the software on your machine. Regardless of the type, you will likely need some help.

Conclusion

I wish more people talked about managing file refreshes this way. This is a great option, yet Microsoft tends to keep quiet about it. In fact, you can leverage this method to provide a refreshable report from a form you created. Patrick from Guy in a Cube created a video on how to achieve this. Hard to believe how a few small changes could make your life so much easier!

So have you ever created a refreshable interface with Power BI like this? Do you have a use case for this? Tell me about it in the comments below!

Artificial Intelligence is Available Inside of Power Query?!

Unless you have lived under a rock in the past year, you know that artificial intelligence is a popular topic. Whether it is Copilot, ChatGPT, Bard, or Open AI, artificial intelligence is here to stay. It is time to embrace the utilization of artificial intelligence and learn how to embrace it for our benefit.

But surely, the use of artificial intelligence inside of Power Query is brand new, right? Wrong! I remember seeing it available five years ago and it is a powerful tool to integrate with your models! I am not going to be able to go into all of the details of artificial intelligence in this article, but rather help you understand how to integrate it within your existing queries.

What Is Artificial Intelligence?

In the fast few years, Artificial Intelligence (AI) has become a force in society. It permeates the world we live in and has become an integral part of how things are done. We see examples of it in our lives through things like chat bots, self-driving cars, and even social media. While some may find it intimidating, the reality of AI is that it has the power to make our lives easier.

Artificial Intelligence is used to describe technologies that perform tasks that once required human intelligence to complete. When properly deployed, AI models can recognize patterns, solve problems, make decisions, and understand natural language. These models do not organically exist. They are created through extensive training activities to ensure the results you receive are valid. The work we perform with Power Query may result in the creation of a model while others will leverage one.

While generative artificial intelligence (GenAI) is all the rage right now, we will not be using it for our queries. Instead, we will be integrating models that have been trained to provide a response. You might be the creator of these models while others are creating them for you. The result is the opportunity to bring back a predictive response based upon your query.

How to Get Started?

While Power Query is available in both Microsoft Excel and Power BI, this functionality is only found in Power BI. While you can leverage similar AI models in Microsoft Excel, it is done outside of Power Query. This is an important item to note if you prefer using Power Query inside of Excel.

Second, you must have some form of a premium workspace to enable this functionality. This can be in the form of a Premium Per User, Premium Capacity, or Azure Power BI Embed environment. If you already have one of these environment at your disposal, you are set. If not, check out my article on licensing strategies around Power BI. I recommend spinning up an Azure Power BI Embed SKU or activate a Premium Per User trial if you want to try these features at a minimal cost.

What if you do not have a Power BI environment available to you? Unfortunately, you will not be able to take advantage of this functionality inside of Power Query. I recommend you work with your organization to spin up a trial to explore this functionality.

Pre-Built Models in Power Query

Power Query allows us to leverage pre-built models. Leveraging Azure Cognitive Services, we can perform text and image analysis on our queries. We can do this from the add column ribbon inside of Power Query:

Screenshot of the add column ribbon inside of Power Query with an arrow pointing to the text analytics function.
Activating the Text Analytics function

If you are doing this for the first time, you will be prompted to allow the analysis of your data. Once you have done that, you can select which model you wish to use:

Screenshot of the Text Analytics screen with options to select the detect language, extract key phrases, and score sentiment models.  Next, you can select the column to analyze and language associated with the text.
Selecting a Text Analytics model for your analysis

As you can see, you can select multiple pre-built models to perform your analysis. With text based analysis, you should consider both the text and the language being used. This helps ensure the proper analysis is being performed. By default, it will use your tenant’s default language. If your column is using the same language consistently in the query, you can hard code it in this screen. However, if you have multiple languages, you might need to to use the detect language model to bolster your analysis.

You can also identify tags for images you ingest with Power Query. This is not easily performed, but is ideal for small images. Because of the practicality of this, you will likely not use the vision model. However, you can try it out for yourself:

Screenshot of the tag images model in the vision module from Azure Cognitive Services.
Configuring the tag images model

These models can provide quick analysis and provide meaningful results with minimal effort. I find that combining these results with other dimensions it allows me to sort through results and find the most meaningful responses in surveys.

Custom Artificial Intelligence Models

If you have created a custom model using Azure Machine Learning, you can leverage it within Power Query. Leveraging a public dataset around diabetes, I created a custom model with Azure Machine Learning. With an endpoint created, I can access it within Power Query with a few clicks:

Screenshot of the home ribbon in Power Query with an arrow pointing to the Azure Machine Learning function.
Activating the Azure Machine Learning function

Once I have selected Azure Machine Learning, I can select which endpoint I wish to use. I have already deployed my model and selected it below:

Screenshot of an Azure Machine Learning endpoint being configured for Power Query.
Configuring my predictive model in Power Query

As you can see, there are multiple inputs I must identify for my endpoint to return a predictive response. I can identify the different columns in my query and return a result. We will need to identify values for each field to allow this to work.

It is important to consider that Azure Machine Learning uses a consumption pricing model. Every time you refresh your query, it is seeking new predictions. Keep this in mind when building out your queries as you could rack up some extra costs without realizing it!

Any Other Options?

If you are a user of R or Python, you can leverage custom algorithms and models inside of Power Query as well. You need to deploy software on your device to make this work and can be prohibitive if you do not have admin access on your machine. I do go into detail on these tools in Chapter 11 of my book, but decided not to discuss them in this post. Primarily, I wanted this to be about out of the box and easy implementation.

Conclusion

It is amazing how artificial intelligence has infiltrated Power Query. For many, they have no clue it was there or what is needed to get it activated. With a little effort, you can really transform your queries with a few clicks. I recommend you try it out to see how you can harness the value of artificial intelligence in your queries!

So did you know about this feature in Power Query? Have you used it before? Are you going to try it out? Tell me about it in the comments below!

Self Service Analytics Does Not Equal Easy

This is a difficult post to write. From a personal perspective, I really struggle with this topic. In my daily work, I assist organizations with enabling a self service data strategy. Collaborating together, we spend hours working to find the right path that matches the needs and goals of the organization with the realities of the organization. I am a huge advocate for self service analytics and believe in it. Because of this, I want to be ridiculously helpful with everyone I encounter.

Sadly, I run into people on a regular basis that want “a little help” to perform some analysis with Power BI or Excel. The problem with this is that the ask is quite complicated. The reality of my situation is that I can make the ask happen quickly. I can spend an hour or two on a report and it will look beautiful. Having worked with Power BI for almost ten years, I have a pile of experience to make it happen efficiently. But the requestor wants me to help them build it themselves. This takes three to five times longer because I am guiding someone through the process. And I am all for doing this because it helps enable another person to use Power BI or Excel.

While I am all for helping people, I find that their interest in learning about these tools is often non-existent. They have a goal in mind that needs to be completed and they think Power BI or Excel is an easy way out. It is a self service analytics tool, so it should be easy! And yet they find the opposite. Without taking taking the time to understand the basics, they end up lost.

What makes this hard for myself and others is that it cheapens the skills we possess as data professionals. Data is not easy. Personally, I find this approach to be a little insulting. So how do we fix this problem?

Setting Expectations for Self Service Analytics

I started responding to every request I receive with clear expectations around self service analytics. I explain the challenges and skills they need to possess before they embark on such a project. Sometimes this scares them off, but it is reasonable way to find alignment.

I often share the basics they will need to learn to be successful. I start with understanding data modeling. We start by talking about things like building a star schema, using a date table, and managing relationships. Sometimes a flat table from Excel or SharePoint is fine, but other times they want more depth and dimension to their reports. Without this in place, they will never achieve their goal.

Next, we discuss how to build out measures. Like every beginner, including myself at one time, they rely on implicit measures. These measures often provide basic aggregations and get the job done. However, they often ask for something more complex. They need to perform time based analysis or use filtering. They are going to need to learn how to write DAX expressions for explicit measures.

Lastly, we will discuss what report design looks like. Often, I have a theme file or template for the organization available, so this helps with branding. But it is important to address best practices with data visualizations. This avoids confusion and provides a better consumer experience.

Assign Some Homework

With expectations set, it is time to assign some homework. Self service analytics requires some investment. But before helping anyone, I assign some basic training. The concept is simple – if someone does not know the basics, they are never going to understand the advanced topics.

Microsoft Learn is my go to for training others around the basics of Power BI. Because it is free, the only real investment is time. It covers a lot of the basics of the tool and allows me to focus on the more advanced topics. I, just like you, need to protect my time. The more I can push to other resources, the easier it is for me to stay focused on my work.

There are two benefits from using this method. First, I can ask for a badge validating that they completed the training. However, I like that this method helps gauge the interest of someone who is asking for help. The course linked above takes about three hours to complete. If they cannot invest three hours in themselves, then why should you invest three hours in them? The answer to that question is up to you, but it sets expectations for the relationship you will encounter through this engagement.

Create Working Sessions

At this point, if they have completed their homework, I recommend creating working sessions to set boundaries. I start with setting up 45 minute sessions with a clear agenda. Ideally, a weekly working session is best. Naturally, you might need to schedule more frequent sessions based upon the business requirements. It is important that you agree upon the cadence to avoid conflict in the future.

For each working session, start with a recap of the previous session. Review the previous topic and any progress made since you last met. Then, work through the assigned topic for the current week. Make sure it is recorded so it can be referenced afterwards. Lastly, decide what the topic of the next working session. This helps keep future sessions focused and your timeline compact.

I recommend that you continue assigning homework. For example, if you have an upcoming working session that will focus on a particular DAX statement, provide resources ahead of time. If you have a measure that needs to use the SWITCH() function, send them to an article like this one to help them understand the concept ahead of time. You can also assign work to be completed in their report before the next session to streamline your working sessions.

As a rule of thumb, I work to keep sessions scheduled. An ad-hoc working session often ends up being a mess. You also have other deliverables and expectations from your leadership that could get in the way. If someone needs time with me between sessions, I generally ask them to schedule it. I feel like the “Can we hop on a quick call? I have a quick question!” conversation always comes at the most inopportune time. Setting these boundaries goes a long way to ensuring you can maintain a balance in your workday.

Provide Some Learning

It never hurts to provide additional learning and training. There are several sources out there. There are several books and sites out there to help. If you need help with Power Query, you can check out my book as an option to assist. If it is with DAX, I recommend The Definitive Guide to DAX by Marco Russo and Alberto Ferrari. Lastly, you can suggest Now You See It by Stephen Few if they need help with data visualization.

At some point, you they will need to embrace self service analytics on their own. By providing materials, they can be focused on their learning and be able to sustain their own solutions. These resources will provide assistance both during your working sessions and in the future.

If self service analytics is a goal, investment in learning is key. I see customers who continue to struggle with this because they do not invest in training and learning. When you are working with an individual, at some point they will need to invest more in themselves to achieve that goal. If you do not help them head down that path, you will be stuck helping them forever.

Conclusion

I love helping people discover the world of self service analytics! I hope you are too! Just make sure you help others in a sustainable manner. These types of relationships are good to have, but you need to manage them well. Be ridiculously helpful – just take care of yourself as well!

Have you encountered this situation? Do you struggle with people who think self service analytics is easy? If so, tell me about it in the comments below!

Custom Headers with Calculation Groups

In working with a customer a few weeks back, I ran into another odd scenario. This was a new ask for me, but made sense in a post COVID-19 world. Normally when I build out time intelligence with Calculation Groups, I usually only perform comparisons between the previous year and the current year. However, this customer wanted to go back a few years to be able to see the post COVID-19 impact on their metrics. Naturally, Calculation Groups makes quick work of a request such as this. However, there was no way to create their own custom headers and they found it confusing:

Screenshot of generic headers assigned from calculation groups.
Matrix with generic column headers

The use of generic headers led to confusion when reading the visualization. On top of that, they had a fiscal year that started on July 1st. This made it more complicated to understand which fiscal year they were currently in. In short, they wanted to provide dynamic column headers in the matrix visual that aligned with the associated fiscal year. If you were looking at the 2023-2024 fiscal year, the labels need to translate accordingly:

Calculation Group LabelCustom Column Label
CYm32020-2021
CYm22021-2022
CYm12022-2023
CY2023-2024
YoYChange from Last Year
YoY_Pct% Change from Last Year
CYm3 Year End2020-2021 YE
CYm3 Variance YE2020-2021 Var YE
CYm2 Year End2021-2022 YE
CYm2 Variance YE2021-2022 Var YE
CYm1 Year End2022-2023 YE
CYm1 Variance YE2022-2023 Var YE
Generic versus custom labels

Well this was a challenge and upon all of my research, I found a possible solution. And while I was able to get my solution to work, I am not 100% sure that this is still the best solution. However, it worked for my situation. And while the performance is not the greatest, it still got the job done.

To get started, you do need to understand about Calculation Groups and using Tabular Editor. If you have not used them before, the guys over at SQLBI have a great series on how to use Calculation Groups. And if you have not already done so, head over to TabularEditor.com to download the latest version to create your Calculation Groups. For our scenario today, we are going to assume you already have your Time Intelligence measures setup. Our focus is on applying the logic for the headers.

Creating the Column Headers

There are a few challenges with the creation of these column headers. One, they must map to a calculation group item. Two, they must follow the naming convention outlined above. And three, they must be properly sequenced.

We are building our logic inside of Power Query under the assumption we do not have access to our data source to enact this solution. This allows us to see how versatile Power Query can be for our projects. However, you are welcome to perform this logic upstream in a SQL server if you have one available.

If you are not comfortable with Power Query, I suggest you check out my book which will give you the knowledge and experience to create this table.

Reference your date table

The first step in the process is that we will right click on our date table and reference it in another query. From there, we will right click on the Fiscal Year column and remove all other columns. Once completed, we will remove the duplicates in the fiscal year column so our results look like this:4

Screenshot of our custom column headers query starting with the fiscal year data extracted from bi_date.
Referencing bi_date for our custom column headers

This is the easiest way to ensure that your custom column headers will align with your current date table. This will eliminate any issues with it being out of sync in the future.

Add mappings for Calculation Groups

Our next step is to add our mappings so our custom columns will leverage the right calculation group measures. Referencing the table above, we will add a new column for each calculation group item. We will also fill in the column with the sequence value in the order we wish to display these items.

For each measure, go to the Add Columns ribbon and select Custom Column. Change the column name from “Custom” to your label from the Calculation Group. Then add the sequence number in the formula box below:

Screenshot of creating a custom column for each measure from Calculation Groups
Create a custom column for each measure

When finished, your list of labels should look something like this:

Screenshot of all measures from Calculation Groups added into the query as columns.
All measures loaded as columns

The last thing we need to do is unpivot our columns so they become rows in the dataset. Right click on the Fiscal Year column and select “Unpivot Other Columns”. This will turn the other columns into rows. Rename the columns so they are measure and measure sequence respectively:

Screenshot of measures from the Calculation Groups pivoted into rows
Measures pivoted into rows for use

Now that this has been done, it is time to build out our logic for the labels.

Assigning labels with a conditional column

At this point, you have two choices. Because of the complex nature of our labels, you will need to write them out in the advanced editor. The question is if you would prefer to get the conditional column logic in place first. If you are not comfortable with the logic, you can use the graphical interface to achieve this. Otherwise, you can use if/then/else in the advanced editor.

I am not going to get into the weeds with this code as there were a number of functions used. However, I will provide a full printout of the code for this query when we are finished. In the end, our goal is to have our labels show up accordingly:

Screenshot of custom column header labels for our calculation groups
Creating custom column header labels

This process will take some time, but the trick is to use the measure sequence to make your life easier. By using that, it will make your creation of column headers easier. It takes some practice, but you will get there quickly.

Address sequence of columns

This was the trickiest part. We will implement this solution using conditional columns, but there is a catch. We will be sorting our custom header column with this new sequence column. However, we must have an equally distinct value return for the sequence. So how do we make that work?

The trick is grouping. CY, CYm1, CYm2, and CYm3 will be part of the first group. YoY will be in group 2 and group 3 will be YoY_Pct. Then the year end and year end variance columns will be part of group four.

Now, you might already be concerned about a collision. And you are correct, there is one at this moment. But we are going to do one more thing to fix this. For groups 1 and four, we are going to add the fiscal year as a decimal value. This will allow us to use consistent headers and maintain sequencing. I promise you that I have spent hours figuring this out and I do not know of a better way to achieve this outcome. If you think there is a better way to do it, give it a try. But I know for a fact that this outcome works:

Screenshot of the header sequence column for our custom headers.
Sequence column added for our custom headers

Now that we have this in place, we can look at the final code.

Final Code – Custom Column Headers

As you can tell, this was not an easy process. It took a lot of custom coding to make it work. However, the results will speak for themselves shortly. Regardless, here is the code we used to achieve our outcome:

let
  Source = bi_date, 
  #"Removed Other Columns" = Table.SelectColumns(
    Source, 
    {"FiscalYear"}
  ), 
  #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"), 
  #"Added CY - 3" = Table.AddColumn(#"Removed Duplicates", "CYm3", each 0, Int64.Type), 
  #"Added CY - 2" = Table.AddColumn(#"Added CY - 3", "CYm2", each 1, Int64.Type), 
  #"Added CY - 1" = Table.AddColumn(#"Added CY - 2", "CYm1", each 2, Int64.Type), 
  #"Added CY" = Table.AddColumn(#"Added CY - 1", "CY", each 3, Int64.Type), 
  #"Added YoY" = Table.AddColumn(#"Added CY", "YoY", each 4, Int64.Type), 
  #"Added YoY%" = Table.AddColumn(#"Added YoY", "YoY_Pct", each 5, Int64.Type), 
  #"Added CY - 3 YE" = Table.AddColumn(#"Added YoY%", "CYm3 Year End", each 6, Int64.Type), 
  #"Added CY - 3 Var YE" = Table.AddColumn(
    #"Added CY - 3 YE", 
    "CYm3 Variance YE", 
    each 7, 
    Int64.Type
  ), 
  #"Added CY - 2 YE" = Table.AddColumn(
    #"Added CY - 3 Var YE", 
    "CYm2 Year End", 
    each 8, 
    Int64.Type
  ), 
  #"Added CY - 2 Var YE" = Table.AddColumn(
    #"Added CY - 2 YE", 
    "CYm2 Variance YE", 
    each 9, 
    Int64.Type
  ), 
  #"Added CY - 1 YE" = Table.AddColumn(
    #"Added CY - 2 Var YE", 
    "CYm1 Year End", 
    each 10, 
    Int64.Type
  ), 
  #"Added CY - 1 Var YE" = Table.AddColumn(
    #"Added CY - 1 YE", 
    "CYm1 Variance YE", 
    each 11, 
    Int64.Type
  ),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(
    #"Added CY - 1 Var YE", 
    {"FiscalYear"}, 
    "Measure", 
    "MeasureSequence"
  ), 
  #"Added ColumnHeaders" = Table.AddColumn(
    #"Unpivoted Columns", 
    "ColumnHeaders", 
    each 
      if [MeasureSequence] = 0 then
        Number.ToText([FiscalYear] - 3)
          & "-"
          & Number.ToText([FiscalYear] - 2)
      else if [MeasureSequence] = 1 then
        Number.ToText([FiscalYear] - 2)
          & "-"
          & Number.ToText([FiscalYear] - 1)
      else if [MeasureSequence] = 2 then
        Number.ToText([FiscalYear] - 1)
          & "-"
          & Number.ToText([FiscalYear])
      else if [MeasureSequence] = 3 then
        Number.ToText([FiscalYear])
          & "-"
          & Number.ToText([FiscalYear] + 1)
      else if [MeasureSequence] = 4 then
        "Change from Last Year"
      else if [MeasureSequence] = 5 then
        "% Change from Last Year"
      else if [MeasureSequence] = 6 then
        Number.ToText([FiscalYear] - 3)
          & "-"
          & Number.ToText([FiscalYear] - 2)
          & " YE"
      else if [MeasureSequence] = 7 then
        Number.ToText([FiscalYear] - 3)
          & "-"
          & Number.ToText([FiscalYear] - 2)
          & " Var YE"
      else if [MeasureSequence] = 8 then
        Number.ToText([FiscalYear] - 2)
          & "-"
          & Number.ToText([FiscalYear] - 1)
          & " YE"
      else if [MeasureSequence] = 9 then
        Number.ToText([FiscalYear] - 2)
          & "-"
          & Number.ToText([FiscalYear] - 1)
          & " Var YE"
      else if [MeasureSequence] = 10 then
        Number.ToText([FiscalYear] - 1)
          & "-"
          & Number.ToText([FiscalYear])
          & " YE"
      else if [MeasureSequence] = 11 then
        Number.ToText([FiscalYear] - 1)
          & "-"
          & Number.ToText([FiscalYear])
          & " Var YE"
      else
        "Error", 
    type text
  ), 
  #"Added HeaderSequence" = Table.AddColumn(
    #"Added ColumnHeaders", 
    "HeaderSequence", 
    each 
      if [MeasureSequence] = 0 then
        (([FiscalYear] - 3) / 10000) + 1
      else if [MeasureSequence] = 1 then
        (([FiscalYear] - 2) / 10000) + 1
      else if [MeasureSequence] = 2 then
        (([FiscalYear] - 1) / 10000) + 1
      else if [MeasureSequence] = 3 then
        (([FiscalYear]) / 10000) + 1
      else if [MeasureSequence] = 4 then
        2
      else if [MeasureSequence] = 5 then
        3
      else if [MeasureSequence] = 6 then
        (([FiscalYear] - 3) / 10000) + 4
      else if [MeasureSequence] = 7 then
        (([FiscalYear] - 3) / 10000) + 4.00001
      else if [MeasureSequence] = 8 then
        (([FiscalYear] - 2) / 10000) + 4
      else if [MeasureSequence] = 9 then
        (([FiscalYear] - 2) / 10000) + 4.00001
      else if [MeasureSequence] = 10 then
        (([FiscalYear] - 1) / 10000) + 4
      else if [MeasureSequence] = 11 then
        (([FiscalYear] - 1) / 10000) + 4.00001
      else
        0, 
    type number
  )
in
    #"Added HeaderSequence"

It almost looks worse than you might imagine. However, keep in mind that some of this is to help ensure we have a solid experience with our custom headers.

At this point, we will close and apply our query changes and load them into our data model.

Preparing the data model

At this point, we have created our custom headers. Next, we must map them for use with our Calculation Groups and fit them into our model. We will need to build a relationship to our custom headers query from our date table and create a custom measure to leverage our calculation group.

Connect date and custom headers table

Our first task is to connect our date table with our custom column headers table. This will filter down our custom column to only show the headers we need for a selected fiscal year. We will simply connect the fiscal year columns on both tables to build the relationship.

Now, you might be concerned about the many to many relationship being established. While often discouraged, there are times where they are required. This is one of those moments. We will go ahead and click OK to create the relationship:

Screenshot of the many to many relationship being created between the date and custom column headers tables.
Connecting the date and custom column headers tables

With this relationship in place, we can now focus on creating a measure to return our desired results!

Custom measure for Calculation Groups

The last step before we create our visualization is to create a measure that leverages our calculation groups, but it is performed from the custom column headers table. A fun fact that you might not realize is that you can use your calculation groups as a filter in a CALCULATE() function. We are going to use the measures we created inside of the custom column headers table to perform that exact filtering!

The formula is simple. We are going to use CALCULATE(), call our Revenue measure, and select our time intelligence measure to apply. The result is a measure that looks like this:

Total Revenue = 
   CALCULATE (
       [Revenue],
       TimeIntelligence[Name] = MAX ( CustomColumnHeaders[Measure] )
   )

With our measures set, we just need to create our matrix.

Add custom headers to our matrix

To complete the process, we need to simply update our matrix. For the column headers, replace the time intelligence field with the custom headers from our custom column headers table. We then need to replace our measure with our new custom measure to ensure the data comes back correctly. The result is the following:

Screenshot of our custom column headers associated with our calculation groups.
Custom column headers for our matrix

This is quite the process to setup, but it can be rewarding if it works out. It will stretch your skills in a few areas within Power BI. However, the results speak for themselves.

Anything else to consider?

Keep in mind that if you go this route, updating the logic with your calculation groups will grow increasingly complex. If you need to add a new measure, this will add an additional burden to the update process. Each new item will need to be updated manually in Power Query resulting in additional effort.

My recommendation is that you take the time upfront to establish all of the requirements. This will help avoid future rework in the future.

Conclusion

First of all, you read all of this and understood it – congratulations! This is a very complex topic and not easy to digest. You might need to re-read this article a few times to fully comprehend what happened, but I promise it will work out. What is important is that you have a grasp of the key elements.

So have you ever ran into a requirement like this? Do you have a more efficient way to achieve this outcome? Do you plan on using something like this in the future? Tell me about it in the comments below!

Time Zone Management in Power Query

I ran into a unique challenge this past week when it came to managing time zones in Power Query. I have a dataset that brings a dataset in using Universal Coordinated Time (UTC). Like a majority of the world, my key stakeholders do not reside in the UTC time zone. On top of that, they wanted to use multiple time zones, so this could become quite complicated.

You might be reading this article and asking why would this be such a complex issue? Well there are a couple of factors that you need to consider. Naturally you need to consider the time zone you need to use. In addition, you must consider daylight savings time. In the United States, every state except Arizona observes daylight savings time. As a result, I have to adjust my model accordingly. And while 99% of my work is based in the United States, many other countries across the world observe Daylight Savings Time.

To add to the complication, one of the required time zones was for Indian Standard Time (IST). If you are not familiar with the IST time zone, there is an additional 30 minute offset from UTC. In short, if it is 1:00 AM UTC, then it is 6:30 AM IST (UTC +5:30).

Since I know I am not the only person with this challenge, I thought it would be helpful to share my experience and how you might be able to work around this issue.

Convert UTC to IST Time Zone

Let’s start with an easy conversion. When setting the IST time zone, you do not have to worry about daylight savings time. This means it is an easy conversion!

To get started, we are going to create a custom column in Power Query and convert our date/time in the UTC time zone. However, the dataset does not assign the time zone to the timestamp. We will assign it using the DateTime.AddZone() function. The function accepts three parameters:

  1. Date/Time value (in our case, the Timestamp column)
  2. Hour offset
  3. Minute offset
=DateTime.AddZone([Timestamp], 0, 0)

This formula will replicate your current timestamp and add “+00:00” at the end to show no offset from UTC. Now that we have a time zone assigned, we need to convert to IST. We will wrap our last function with the DateTimeZone.SwitchZone() function to adjust the offset. The parameters are the same as the function above.

=DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), 5, 30)

Now the column has been updated with “+05:30” at the end, but the time has been incremented by the same amount. It is that easy!

Now, you might not want the time zone offset showing in the report canvas. We can easily remove that from the view by wrapping the last formula with DateTimeZone.RemoveZone()

=DateTimeZone.RemoveZone(DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), 5, 30))

Now there is just one pesky problem left – our new column is shown as a string instead of date/time. This is an easy fix! Just crack open your advanced editor and add the column type on the same step!

Custom Column = Table.AddColumn(#"Changed Type", "TimestampIST", each DateTimeZone.RemoveZone(DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), 5, 30)), type datetime)

Yes, you can update the column type with the next step. But I am a fan of keeping things compact and clean. Since you are already doing the transformation, it is an easy add to save some compute time!

If your time zone is not IST, you can look up the offset here. You just need to update the code above with the correct offset.

What About Daylight Savings Time?

The biggest challenge is to know what rules define the time shift. In the United States, our formula is simple. Daylight Savings Time starts on the second Sunday in March and ends on the first Sunday in November. So how do we put that knowledge to work?

One option is to add a field on your date dimension to show if daylight savings time applies. That works, but it requires maintenance long term. Another option is to create a table of dates, but once again it is a lot of work to keep current.

But there is a way to scale this logic using a few functions in Power Query

Find the Start and End of Daylight Savings Time

Like before, let’s start off easy by finding the end of Daylight Savings Time. The first thing we need to do is establish the first day of November. We will use the #date(year, month, day) function to set the date. But remember, our goal is to make it scalable, so we need to make the year dynamic:

=#date(Date.Year([Timestamp]), 11, 1)

Now that we have the first day of the month established, we will use it to find the first Sunday of the month. We will use the Date.EndOfWeek() function to establish the last day. We will add a parameter with the value of “1” to shift the end of the week from Saturday to Sunday.

=Date.EndOfWeek(#date(Date.Year([Timestamp]), 11, 1), 1)

Now to find the second Sunday of March, we will use the same formula as a base. We will change the 11 to 3 for March. But there is one difference, we are wrapping this formula in Date.AddDays(). We will use that formula to add 7 days to our date which will give us the second Sunday in March.

=Date.AddDays(Date.EndOfWeek(#date(Date.Year([Timestamp]), 3, 1), 1), 7)

Like I said above, your time zone might have a different set of rules. However, with some research, I found that the rules are generally the same. You will need to add or remove weeks to get to your desired date. Maybe use the End of the Month and work backwards. If it starts on Saturday, you can ignore the extra parameter. And if you are in one of the countries that observes on a Friday, just use the Date.AddDays() and work forward/back the appropriate number of days to make it work!

Apply Dates to Time Zone Offset

Now that we have the start and end of Daylight Savings Time, we just need a basic conditional statement to provide the offset. Because our start and end dates are pure dates with no time associated, we will need to do our comparison using the DateTime.Date() function to convert our timestamp.

=if DateTime.Date([Timestamp]) >= Date.AddDays(Date.EndOfWeek(#date(Date.Year([Timestamp]), 3, 1), 1), 7) and DateTime.Date([Timestamp]) < Date.EndOfWeek(#date(Date.Year([Timestamp]), 11, 1), 1)

This will check our timestamp and make sure it is between our two dates. Now we just need to use the same logic from above to show the offset:

=DateTimeZone.RemoveZone(if DateTime.Date([Timestamp]) >= Date.AddDays(Date.EndOfWeek(#date(Date.Year([Timestamp]), 3, 1), 1), 7) and DateTime.Date([Timestamp]) < Date.EndOfWeek(#date(Date.Year([Timestamp]), 11, 1), 1) then DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), -4, 0) else DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), -5, 0))

Since I am using Eastern Time in the US, there is an offset of 5 hours from UTC, but only 4 during Daylight Savings Time. With this all in place, we now have a function that will scale with Daylight Savings Time over the years. Feels like a lot of work to get here, but it works well!

Get Fancy with a Time Zone Function

Now, you might be thinking that this could get painful if you need to provide multiple time zones over and over again. For those of us in the United States, we are using the same logic for several time zones. So instead of doing everything manually, you can create a function to make your life easier!

Using the same logic, the function will accept two parameters: the timestamp and the offset. We then can use a few lines of code and make a function that we can reuse over and over:

(TimestampValue as datetime, TimeOffset as number) =>
let
    TimestampDate = DateTime.Date(TimestampValue),
    TimestampYear = Date.Year(TimestampValue),
    Source = DateTimeZone.RemoveZone( if TimestampDate >= Date.AddDays(Date.EndOfWeek(#date(TimestampYear, 3, 1), 1), 7) and TimestampDate < Date.EndOfWeek(#date(TimestampYear, 11, 1), 1) then DateTimeZone.SwitchZone(DateTime.AddZone(TimestampValue, 0, 0), TimeOffset + 1, 0) else DateTimeZone.SwitchZone(DateTime.AddZone(TimestampValue, 0, 0), TimeOffset, 0))
in
    Source

With a few substitutions, I can simplify the logic and make it easier to understand. Now I can just invoke the function by using the timestamp and adding the offset. Not too shabby!

Anything Else to Consider?

This solution is just using dates and ignores the official time when the change takes place. In my use case, the precision was not that important when we are looking at 4 hours out of 8,760 hours in a year. If you need that level of precision, you will need to add time to the logic in the formulas.

I will say, this is a gaping hole in Power BI. With Power Automate, you are able to perform precision conversions based upon time zones much easier. A majority of my work just uses dates and ignores time, but there are use cases like audit data that we need this information. Hopefully Microsoft will add functionality here one day. However, users have remarked about this challenge for years. I might be better off waiting for the United States to pass a law eliminating Daylight Savings Time. But that is not looking too good either…

Anyway, how have you handled this in the past? Do you have a different approach? If so, tell me in the comments below!

Which Tool When – Parameters in Power BI

The word parameters is a little ambiguous when it comes to Power BI. I have written a number of articles that reference parameters, but never took the time to clarify their use cases. I wanted to dive in to the different types of parameters and understand the best way to use them. Paired with some tips and tricks, you will have a better idea of how to use all kinds of parameters in your reports.

Parameters in Power Query

In my opinion, parameters in Power Query are the most in your face. When you launch the Power Query editor, there is a button on the home ribbon that helps you manage them. But have you ever used them?

Parameters in Power Query allow you to add reusable values in your queries. The premise is simple – if you keep using the same value over and over again, why not make it easy on yourself?

How to Leverage It

To get started, open the Manage Parameters window in Power Query. From there, create a new parameter and give it a meaningful name. From there, you just need to specify the type of value, add your value, and click OK. Since I am creating a parameter for a reusable API key, I am just using text and adding in my key.

Creating a Parameter in Power Query

Now that I have the parameter in place, I can use it in an existing query. Since I was looking to easily reuse an API key, I can go and update my queries.

Leveraging my Parameter with Another Query

As an added bonus, I can share my full query without having to remember to hide my API key!

Common Use Cases

While the API key use case above is one of my favorite stories to tell, there are a few other key use cases.

One of the most famous use cases is when you need to create parameters for Incremental Refresh or Hybrid Tables. You need to create a RangeStart and RangeEnd parameters to setup the filtering on the report.

Another great use case is when you need to quickly change values to filter a report. In a past life, I worked for a company that had APIs for external reporting. Since customers always wanted similar reports, I created them with parameters so I could quickly update the API key and customer id. When a new customer wanted a custom report, I just updated the parameters to get their data in, made some adjustments, and sent the report out.

Tips and Tricks

Tip #1 – If you save your Power BI Desktop file (PBIX) as a template (PBIT), you will be prompted to enter the parameters when you use it. This helps save you a few clicks.

Tip #2 – All of your parameters can easily be updated in the Power BI Service without having to open the desktop file. Go to your data source settings and you will find the parameters section. This allows you to update your parameters like API keys with minimal effort.

Tip #3 – If you are using a deployment pipeline, you can easily swap between your Dev-Test-Prod databases. Just set parameters for your server and database fields in the advanced editor window and update them in the Power BI Service.

Function Parameters

Not as common of a use case, but functional parameters allows you to reuse a query quickly. I go into great detail on why to use functions with my article on how to use REST APIs in Power BI, but in short, they allow us to scale a base query rapidly.

How to Leverage It

One of my favorite endpoints pulls stock information by ticker symbol. I could build multiple queries and append them, but a function makes it easier. I just need to add a little code at the top to convert my base query to accept a parameter.

Parameters in a Power Query Function

I specified the parameter that I wanted to gather and then apply it in the query. When I go to use the function, I will be prompted to specify the parameter so the query works properly.

Common Use Cases

I frequently use function parameters with APIs. Because some API endpoints require dynamic values, you will need to iterate with a function like I did with the stock symbols.

Another common use case for me is building common elements I might use in a report. If I am running into queries that are using paging , I can grab a function and invoke it to build out my base. Another common use case is to build out a date table when one is not available in my data source. Of course these are things we want to do as far upstream as possible, but if you are not in control of your database, you might need to create your own.

Tips and Tricks

Tip #1 – Invoking a function as a column is an easy way to specify your parameters with existing column values. The function will run on each row of your query using the values in the columns identified. It is a quick and easy way to apply a function!

Tip #2 – You can edit a query and add a function by hand. Just make sure you specify your parameter in the function or it will not work. I use this all the time for token based authentication with APIs.

Tip #3 – If you are using a few different API endpoints, you can create a base function to handle the majority of your query. Just add a parameter to the function to specify the endpoint. From there, you can add your endpoint or table name in the parameter field and invoke the function which speeds up your connection time.

What If Parameters

What if parameters are different than the other two parameters we discussed. Power Query and function parameters really help us with querying data efficiently. What if parameters are all about what you could do with your data.

How to Leverage It

We often find what if parameters when you want to run some simulated scenarios. I recently did a deep dive on what if parameters, but at a high level, we have two components.

The first component is the creating the parameter itself. We will go to the modeling ribbon and select new parameter. A new window will open and I can specify key components of my parameter which include the name, data type, minimum, maximum, increment value, and default. Once I click OK, a new slicer will appear.

But that was the easy part. We now have a new measure that is <Parameter Name> Value. This brings back the value you selected in your parameter. We now can use this measure in a measure to get our value. You will need to use a little DAX, but it is worth the extra effort.

Common Use Cases

The most common use case is to simulate revenue increases. I add the parameter to a formula that takes their current revenue and shows the change. Everyone is trying to make more money, so it makes sense.

One of my favorite use cases is to help filter data. I built a report that compared property assessments in my town. I used a what if parameter to help filter 3,500 properties to only a handful. Using key values such as year built, livable square footage, and lot size, the parameters found similar houses to the one I selected. Instead of having to fool around with filters, I could just quickly adjust my report parameters instead.

Tips and Tricks

Tip #1 – Always have the parameter make the slicer when you can. You can hide it in the selection panel if you don’t want to use it, but the slider is so nice. It makes the mobile experience even better!

Tip #2 – If you are using a number of parameters, don’t be afraid to create a page for your parameters. You can just sync the slicers to other report pages. This prevents clutter and speeds up load time for a page sine there are less objects in the view.

Tip #3 – You can create your own what if parameters with a custom table. This is nice when you want to use a non-numeric parameter. This is a common practice for filtering reports on things like top/bottom views.

Parameters on Parameters

You might have gotten to this point and said “Wow – I had no clue that this term is so widely used!”. I think it is important to understand the different kinds of parameters because they really can make an impact on your reports. Some of my reports maybe only use one of these while others use all three. The key is understanding how to properly use them.

Have you used any of these parameters before? Do you find them helpful in your report design and data models? If so, tell me in the comments below!

Sample Data Sets You Can Use for Trying New Ideas

We all want to hone our skills, but sometimes struggle to find good sample data sets to try out new ideas. Sometimes it is about specific data structure or maybe you want to show off an idea but cannot use production data. A lot of what I share comes from real scenarios I have encountered. To share these tips, I have had to use different data sources over the years.

This week, I want to share some data sets that I find fun and helpful for trying ideas out in Power BI. Some will be easier to use than others. You might even need to have a SQL Server to make them work. Regardless, you should be able to find something you can use.

The Most Basic of All Sample Data Sets

If you are brand new to Power BI, the Contoso Financial Sample workbook is a great place to start. It is a free and easy to use data set for beginners. While it does not a great resource for data modeling, it does serve as a quick and easy model to learn the basics of Power BI.

When I was a Power BI trainer, I liked using this data set for basic DAX calculations as some of the key measures such as cost of goods sold (COGS) where included in the model. I can perform some basic calculations which result in net profit.

Check out the Contoso Financial Sample data set here.

Learn How to Find Insights

Another one of my favorite sample data sets that is easy to use is the Pima Indians Diabetes Database from Kaggle.com. Like some of you, I cringe typing out the name of this data set. If published more recently, it would have likely been given a more culturally sensitive name. However, I use this data set for demonstrating the key influencers visual. I have also used it for predictive modeling with Azure Machine Learning, but that is for another day.

This data set was assembled by the National Institute of Diabetes and Digestive and Kidney Diseases. The purpose of the data set was to perform predictive modeling on diabetes in the Pima community. Identified as a high risk population for Type 2 Diabetes, this data represents the Pima community through years of research. Kaggle provides this data set for free. You just need to sign up for an account to access it.

Kaggle is a great resource for other data sets. There are so many to choose from, it is hard to just pick one. However, you are welcome to peruse their catalogue as you might find something interesting. With a little searching, you will find a data set which you can use to build a report on Settlers of Catan!

Check out the Pima Indians Diabetes data set here.

Simplest of SQL Sample Data Sets

Adventure Works is likely the world’s best know SQL database.. A common data set used for training, it is easy to implement. Experience with SQL Server Management Studio will serve you well as you implement this data set. Microsoft provides clear instructions on restoring the database but I find a little extra know how helps. It is wise to make friends with a database administrator if you don’t have one. Offer to buy them a drink or two at happy hour for their help and you will probably make a new friend out of the experience.

Download the Adventure Works data set here.

Binge The Office While Building Reports

Fans of The Office rejoice! TDMitch created a Dunder Mifflin sales data set from the Northwind Traders data base by Microsoft. Just like Adventure Works, this is a SQL data set. Implementing this data set requires additional effort compared to the Adventure Works database. You must follow instructions and run a few SQL scripts to finalize the setup of this data set.

I recommend this data set for someone who is trying to make something that connects with end users. I also recommend this data set for people who are expanding their transact SQL knowledge.

Check out the Dunder Mifflin data set here.

Simplest of REST API Sample Data Sets

REST APIs are great resources for 3rd party data. They work well but you might find frustration with implementing them. I have used this data set before with my series on the Basics of REST APIs in Power BI. While each API endpoint is unique, you can capture the basics using the Yahoo Finance API.

Offered for free up to 100 calls per day, it is an effortless way to learn the basics with no costs. If you are really into stocks, you might even consider purchasing a paid subscription. Spend some time digging through the endpoints and become comfortable with how you can use APIs with Power BI.

You can review the Yahoo Finance API documentation here.

Big Data Sample Data Sets

Sometimes you want to throw a lot of data to test out a solution. The New York City Taxi data set is a massive trove of data that is free for use. Available as CSVs or APIs, you can choose how you want to access the data. I used it to benchmark refresh speeds between various Azure data sources such as blob, table, data lake, and Azure SQL storage solutions.

The Taxi and Limousine Commission provides quality documentation around the data set. It even provides clear descriptions in the data dictionary, maps of taxi zones, and dimension tables. It even explains the difference between yellow taxi, green taxi, and for hire car services.

Check out the NYC Taxi and Limousine Commission data mart here.

Did Not Find Something To Fit Your Needs?

No fear about that! There are tons of free data sources out there for you to use. My favorite place to go is to data.gov and check out different data sets available from the US Federal Government. You can also search for open data from many states and cities. You might even be able to use it for some of your solutions.

Google also has a data set search that will help you find some samples. Search for different topics such as commodities or labor statistics and see what comes back. My only caution is that not every result will be free. However, if you are looking for something specific, this search will help you find what you data you need.

How about you? What are some of your favorite sample data sets? If you have a good one or used one of these, tell me in the comments below!

Using the POST Method with REST APIs Inside Power BI

Earlier this week I spoke about the Basics of Connecting to REST APIs at the Global Power BI Summit. A lot of people found it valuable, but wanted to know more about how to use the POST method to get data. Specifically, they wanted to know how to add a payload with the request to make it work. Since there was a lot of interest, I decided to write an article on this topic and share it with everyone.

For this article, I will be using the Microsoft OAuth2 endpoint to get an authorization token. You can find detailed documentation on the endpoint here. I will not be going into the details of the process to create an app registration or apply it in the Microsoft environment as that will take some time to cover. Instead, I decided to focus on the most important part of the request – the POST Method!

No/Low Code No More!

For many, this will be the scariest part of the entire process. This might be your first time ever writing code for a query from scratch. We do not have a connector to help us with this process. Fear not as we will walk through this entire process together and break it down. So to get everything started, go to Get Data and select Blank Query.

Adding a Blank Query to your report.

Once you select Blank Query, you will be redirected into Power Query with a blank canvas. I like to quickly update the query name on the right hand side before I forget. I am calling mine “Get-AuthToken”. Once that is in place, we will open up the Advanced Editor in the home ribbon. That is where the real fun will begin…

POST Method Query

Now that the Advanced Editor open, we are going to start writing out the query. We will need to write out a web connector query and a message body to make this work. I like to start with the web connector and build out from there.

Building the Initial Connection Query

Currently, we have Source = “” in our editor window. Those quotes are going to be the base of our query. We are going to start by adding the endpoint url in between the quotes. We should see something that looks like this:

let
    Source = "https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token"
in
    Source

In order to specify how we are connecting to the endpoint, we will wrap it with Web.Contents:

let
    Source = Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token")
in
    Source

After that is in place, we need to add our headers. These are specified in the documentation and will be unique to your API. This example uses two headers and they are required to achieve the desired result. Note the comma between the url and the headers.

let
    Source = Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]])
in
    Source

Lastly, because the header is specifying the response in JSON, we will wrap the entire string with Json.Document to translate the response.

let
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]]))
in
    Source

With that code in place we have our base query built. If you want to save some typing, you could always use an existing query as a template. Just copy and paste it into the blank query and make your changes.

Now that we have the base query written out it is time to focus on the message body.

Body Building without Protein

The body comes by a few different names. You might see it as message body, query, payload, or simply “The Data”. Essentially, it is additional information that is passed so the endpoint works as expected. While you don’t have to lift weights for this step, you might find it just as frustrating. I have tried this a few different ways, but found this to be the easiest method.

The first thing we will do is create a line right above our Source line and below the let line. From here, we will enter ContentBody = “”, as a place holder. Make sure you add the comma at the end of the line!

let
    ContentBody = "",
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]]))
in
    Source

My example requires five different fields be specified in the body. The formula is simple: field name = field data. We will add an ampersand at the end of the line and return to the next line to make it easier to read.

let
    ContentBody = 
        "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
        client_id=<CLIENTID>&
        client_secret=<CLIENTSECRET>",
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]]))
in
    Source

As a best practice, I keep elements that might need to be changed at the bottom for easier access. This example leverages a client secret that will need to be updated over time. To make it easier to access, I leave it at the bottom.

Add the Body to the Web Connector

To finalize your query using the POST Method, you have to add the message body to the source line. We need to pass the body as content. However, I need to convert it into binary. We are just going to add Content=Text.ToBinary(ContentBody) inside of the first set of brackets so it passes the body in the web connector.

let
    ContentBody = 
        "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
        client_id=<CLIENTID>&
        client_secret=<CLIENTSECRET>",
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Content=Text.ToBinary(ContentBody), Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]]))
in
    Source

Now that everything is in place, we can click Done and let the query do its magic!

Response from my POST Method query that brings back an access token for authentication.

To take things to the next level, I can drill down on the actual token by right clicking and selecting drill down. Next I will convert the query to a function by adding ()=> above the let line. In doing so, I will be able to leverage it for multiple endpoints that require token based authentication with a simple call.

()=>
let
    ContentBody = 
        "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
        client_id=<CLIENTID>&
        client_secret=<CLIENTSECRET>",
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/<TENANTID>/oauth2/v2.0/token", [Content=Text.ToBinary(ContentBody), Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]])),
    access_token = Source[access_token]
in
    Source

Make Life Easier with Parameters

I plan to write a more detailed article on Parameters inside Power Query but I want to share a little tip that will make your life much easier. With this example, we are connecting to an app registration inside Azure Active Directory through a Client ID and Client Secret. The Client Secret will expire at some point and needs to be refreshed. To make your life easier, you can add your dynamic values as Parameters so you can update the query quickly.

Go to your home ribbon and click Manage Parameters. Click new and add your parameter name. Lastly, add your parameter value and click OK when finished.

Setting up parameters to simplify the process to update my client secret.

Now we just need to replace the values in our query. We will have to use the ampersands to concatenate our values together so they come together in a single string.

()=>
let
    ContentBody = 
        "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
    	client_id=" & ClientId & "&
        client_secret=" & ClientSecret ,
    Source = Json.Document(Web.Contents("https://login.microsoftonline.com/" & TenantId & "/oauth2/v2.0/token", [Content=Text.ToBinary(ContentBody), Headers=[Accept="application/json", ContentType="application/x-www-form-urlencoded"]])),
    access_token = Source[access_token]
in
    access_token

Now if I need to update my client secret, I can just update the parameter instead of breaking open the Advanced Editor. I can even do it from the settings view in the Power BI Service. But that will have to wait for another article.

Final Thoughts on the POST Method

It is important to note that not all POST Method endpoints are meant for querying in Power BI. The primary purpose of the method is to create items. However, it is not unheard of for querying rows of data. Before you do anything with an endpoint that uses the POST Method make sure you review the documentation. You might accidentally write data to your endpoint if you are not careful!

Do you have a use case for querying using the POST Method? Any favorite endpoints? If so, leave a comment below!

How do I connect to REST APIs with Power BI? (Part 4)

In my last article, I shared how to scale a query by using a function in Power Query. However, I noted there might be some challenges with this approach and considerations you should keep in mind when connecting to REST APIs. Some of these considerations are simple and are similar to normal query optimization activities. While not as technical as other articles, I feel it is important to take some time to focus on potential challenges you might encounter when leveraging REST APIs.

Limits, Throttling, and Speed

When reading your documentation, you might see words such as “limits” and “throttling” and wonder what they are. You will build your report and everything will be fine until one day your refresh fails. In reviewing the refresh errors it will suddenly make sense what “limits” and “throttling” mean.

To ensure APIs are not overburdened, limits and throttling are employed to prevent abuse. This protects the endpoints and provides a reliable service. Understanding how to navigate them are essential to prevent your refreshes from failing as your solution scales.

What are API Request Limits?

API limits restrict the number of calls made in a day, week, or month. If you signed up for the Yahoo Finance API, you notice they have three tiers of membership. The free tier limits your calls to 100 per day. Because of how we built our function, you would have to limit your report to 100 stocks and only refresh it once a day.

Now there are additional tiers that you can purchase which increase your limits. The Yahoo Finance API offers tiers that give you 20,000 or 75,000 calls per month which might help accommodate your use case but requires an additional financial investment. But even if you increase your subscription, note that these tiers still have a hard limit. Meaning that you might need to keep purchasing a higher tier if you blow through your request limit.

Subscription pricing and limits for the Yahoo Finance API (as of February 2022)

To be a good steward of your subscription, you will want to regularly monitor your API usage and verify that you are not burning through your quotas too quickly. It never fails – you run out of requests just when you need the data the most.

What is Throttling?

APIs use throttling to protect endpoints from being heavily queried over a short period of time. Queries are limited to the number of calls made per minute, per 15 minute window, or per hour. This protects endpoints from rogue code that could degrade the service because of abuse.

With the Yahoo Finance API, our documentation states that there is a rate limit of 300 calls per minute. If you had a list of 1,200 stocks it would take at least four minutes for the refresh to complete. (1,200 stocks divided by 300 = 4 minutes) Just like how every API is unique, they manage throttling differently.

In case you missed it above, you can see the rate limit for each subscription tier.

Last year I built a Power BI report that leveraged Google’s mapping APIs to get coordinates and street views of addresses in my town. Google enacted throttling to limit requests to 100 per minute. When refreshing my data, I hit my 100 request limit and the API paused until the full minute had passed. In doing so, I was able to proceed with the refresh. With 3,500 houses in town, it took 35 minutes for the refresh to complete. It was slow, but it worked great.

On the other side of the spectrum, the Microsoft Graph APIs have a limit of 10,000 calls per 10 minute period. When I ran over this limit, instead of the API pausing, I ended up receiving responses with a 429 error stating I reached the throttling limits. In order to manage this, I had to get a little more creative with my query to make sure I did not receive that error.

Impact on Refresh Time

Regardless of limits, you will need to consider refresh time when using REST APIs. If your primary data sources reside in SQL servers, this will be an adjustment for you. Normally you just cast a wide net with your query and bring in everything you can and work from there. With APIs, you want to narrow that scope to only what you really need. But with REST APIs, the refreshes tend to be slower. Adding some filtering will speed up your queries and make them more efficient.

If you are in a situation where you need a large amount of data from an API, you might want to consider leveraging Azure Data Factory or Databricks to incrementally replicate that data into a data warehouse instead.

Paging Queries

For this series of articles, our endpoint was fairly basic. It only brought back one row of data for each stock symbol. This was by design because our overall goal was to cover the basics. As you explore other APIs, you might find endpoints that bring back several rows at a time.

If you expand your horizon with REST APIs, eventually you are going to run into something called paging. Simply put, paging is the splitting of responses over multiple queries. Similar to limits, this is to prevent overburdening the endpoint with your query.

As an example, let’s say you are querying an endpoint that has 50,000 records. Paging has been enabled on this query and is limiting the responses to 1,000 records at a time. In order to receive all 50,000 records, you will need to build logic into your query to pull each page individually.

So how do you know your endpoint is using paging? It is all about the documentation. If an endpoint has pagination, there are usually a few parameters that will tip you off. If you see a parameter called “PageNo”, “PageSize”, or “Top”, you could run into paging.

It is also important to consider that you could avoid pagination with quality filtering. If your request is targeted and removes the noise of other records, it could avoid this problem all together.

In the coming weeks I will be doing an article specifically on paging, so keep an eye out for it!

Token Authentication

Another challenge you might encounter is token based authentication. Instead of passing an API key for authentication, you would need to call another endpoint to receive a token. Once you have the token, you would pass it through a header.

If you encounter this situation, you would need to create a function in Power Query that would solely return a token. While it takes a few extra steps, it is likely reusable with other endpoints in your report. In addition to that, it will simplify your query code and keep it tight and compact.

Conclusion

Congratulations for making it to the end of this series. There are a lot of challenges when implementing REST APIs as a data source but the juice is worth the squeeze. Taking yourself out of the mix and automating my refreshes is the end goal. And while the experience may not be as efficient as a SQL data source, it still gets the job done.

Have you been following along? Were you able to successfully connect and extract data from Yahoo Finance? How about other REST APIs? Let me know in the comments below!

Page 1 of 2

Powered by WordPress & Theme by Anders Norén