Over the past year, I have been helping customers with their Copilot for Microsoft 365 deployments. The number one complaint I have received from customers has to do with the ability to provide quality Copilot Reports. If you are reading this, you are probably not alone. However, some recent changes have come about that will help us get better reporting.
While in preview at the moment, Microsoft released reporting APIs for Copilot User Activity. While these endpoints are in preview with a target release of November 2024, we can start taking advantage of them today. The biggest challenge is that in order to provide some quality reporting, we will need to start populating data. To make this a reality, this week’s post will be 100% focused on populating data in your Fabric Lakehouse.
It is important to note that I cover a lot of the mechanics of how to make this happen in previous posts. I have linked to the appropriate posts where applicable but wanted to highlight this now. There are a lot of moving pieces to make this happen and there is some prerequisite knowledge for you to be successful. Please take the time to reference these posts when applicable.
Elephant in the Room with Copilot Reports
Before we get started, we need to address the headwinds with Copilot Reports. If you know me personally, you know I am a balls and strikes kind of person. I call things as I see them and sometimes that means calling out some of the issues that come with new solutions.
Challenges with the Data
If you are a data person like me, you find the reports available from Microsoft to be frustrating. They are detailed with specific behaviors inside of Viva Insights, but you cannot drill down enough to see where use cases are being leveraged. The only way to look at individual user activity has been with the Microsoft 365 Admin Reports for Copilot. This data is what the API endpoints are pulling at this time. While the individual detail is nice, it does not resolve the frustration we have been seeing.
The reason for this frustration is twofold. Unlike the endpoints we explored in my previous series, these endpoints have two drawbacks. The first is that I cannot identify activity for a specified date. I can only pull for a period of activity. Second, I do not get a count of activity for each use of Copilot. Instead, I just get a date for the last time Copilot was used. Most of my customers have been vocal about this second point.
Overcoming the Challenges
I will not speculate on why Microsoft has chosen this path. Instead, I pushed to find a way to work around the challenge and make something happen. Instead of focusing on individual days of data, we will need to look at data week over week. Is this an imperfect system? Absolutely! But I rather focus on what I can do than what I cannot do. Previously, I was able to do this by exporting this data on a weekly basis. Annoying? Yes, but now we can automate it with the advent of the new reporting endpoint. And I was even able to enhance other elements of my analysis as a result!
This all means our reporting will simply look at user activity week over week with Copilot. This will help us drive the value we seek in some meaningful way instead of guessing at how people are leveraging Copilot. My customers have found this insight valuable in driving better adoption which results in stronger utilization.
Foundations of Copilot Reports
If you followed along for my series on Graph API reports, you should already be familiar with the key steps to get things going for this project. While we did a lot of this in that series, we will be revisiting parts of it again. To simplify the process, we will go through high level the steps we need to complete. If you need a primer on this, please go back to my previous series to go through the details on how to prepare a Lakehouse in Microsoft Fabric.
Build a Fabric Lakehouse
To get started, sign into Microsoft Fabric and create a workspace for your Copilot Reports. You will likely want to share these reports with others and it might make sense to create a dedicated workspace for this. The key point here is that we will want a new workspace to streamline the data collection process as it will be changing.
With our Lakehouse in place, we are ready for the next step in the process.
Creating Tables in the Lakehouse
Just like we did before, we will use a Fabric Notebook to create our tables. To get started, download the Create Graph Tables notebook I created for you. Remember, that you must swap your context to the Data Engineering view before you can import the notebook.
Once imported, you can open the notebook and connect it to your new Lakehouse. From there, you can run the notebook. This process will take a few minutes, but will hopefully make some parts of this process much easier for you long term.
With our tables connected, we need to start harvesting data from Microsoft Graph.
Harvesting Data for our Copilot Reports
Now that we have our foundation in place, it is all about getting our data liberated from Graph and into the Lakehouse. To get this process started, download the Copilot Reporting Template I created for your Dataflow. Once downloaded, create a Dataflow Gen2 and import your template:
Once imported, you will need to update your parameters with your Service Principal information. If you never did this before, follow the steps I outlined in my Reporting with the Graph API post a few weeks back to ensure you have one configured properly. If you do, just update the parameters:
Now that you have updated your Service Principal information, you just need to update your connection properties ensuring all endpoints are using an anonymous method as we are using the Service Principal to authenticate. From there, you just need to map your data to your new Lakehouse:
We will need to map the following tables with the following properties:
Query Name
Lakehouse Table Name
Behavior
GraphUsers
graph_users
Replace
Date_Table
date_table
Replace
LicensingData
licensing_data
Replace
TeamsUserActivity
teams_user_activity
Append
OneDriveUserActivity
onedrive_user_activity
Append
SharePointUserActivity
sharepoint_user_activity
Append
VivaEngageUserActivity
viva_engage_user_activity
Append
CopilotData
copilot_data
Append
CopilotLastActivity
copilot_last_activity
Append
Table mappings for our Dataflow to our Fabric Lakehouse
Once this is completed, we can publish our Dataflow.
Next Steps
Now that we have all of this in place, it is time to setup the refresh. If you remember from the beginning, we will be looking this data on a weekly basis. Therefore, we will want to setup the refresh to pull the data weekly. For myself, I setup the refresh to occur Sunday mornings at 9 AM as that is a quiet time of the week. It also gives me the best shot at pulling the data on the same day of the week. Remember, we are at the mercy of Microsoft and when they are releasing the data, so we have to somewhat hope for the best in this situation.
With the refreshes running weekly, we will start to collect some meaningful data we can use. Next week, we will work on preparing the semantic model to allow us to connect our Power BI report. Until then, we will keep harvesting data to help bring better value to our report!
Welcome back to my series on the Microsoft Graph API. We have spent a lot of time laying the groundwork for our data. In this article, it is all about getting our data into our Lakehouse so we can start to trend our data.
As a quick reminder, the purpose of us having to go through this exercise is because we must retain data on our own. The Microsoft Graph API will only furnish detailed data over the past thirty days. Any further back and it is lost. Therefore, if we want to trend data over three months, we must warehouse it.
In this article, we will be finalizing our queries and load them into our Lakehouse. To get there, we are going to go through some deep levels of M Query. However, hang in there until the end, I should be able to make it easier for you!
Getting Teams User Activity Detail
To get started, we are going to have to write out some code. While you can just copy/paste the code into this article, I think it is worth walking through together to help you understand the method behind the madness. It seems complicated – and it really is. But understanding the rationale of what was done will help you with your own projects. To help us on this journey, you will need to create three more parameters to simplify the process:
Parameter Name
Parameter Type
Parameter Value
LakehouseId
Text
Value of Lakehouse ID from URL
WorkspaceId
Text
Value of Workspace ID from URL
HistoricalDayCount
Decimal Number
List of values (5, 15, 27) with the default set to 5
Parameters required for capturing Microsoft Graph API data
With your parameters in place, it is time to write out our query. Open a new blank query and strap in!
Start With a Function
Because we are pulling data for a single date at a time, we will need to create a function. We have done this before, but this time we are going to embed it within an existing query:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData
in
GraphDataFunction
We need to call a few things out. First and foremost, you will note we are using the RelativePath option. This is because our dates will be updated with each all of the function, therefore we must use this option to avoid errors.
Next, you will notice the Authorization header is using our Get-AuthToken function we created in part 2 of this series. This is key to ensure we use our token to complete the authentication process.
Checking the Lakehouse for Existing Data
We will need to ensure we only query Graph data that is not already in the Lakehouse. Because of how finicky the Graph API can be, we cannot guarantee data to be available the next day. Sometimes it is a few days behind. We also do not want duplicate data to be added to the Lakehouse.
To prevent this from happening, we will query the Lakehouse and the tables we created. By using the parameters we created, we will call our Lakehouse and pull our historical data so we can transform it:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData,
LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"})
in
HistoricalDatesExported
These five lines of code will allow us to connect to our teams_user_activity table, remove unnecessary columns, remove duplicates, and format into a usable column. This will be empty for right now, but more will be added as we move forward.
Create a Range of Dates and Merge with Historical Data
Next, we will pull data from the last few days. To make life easier on us, we will limit the pull to five days to speed up our development process. We just need to use today’s date as a base and create a range going back five days that we can use:
Once we have a list of dates in place, we will merge it to our historical data from the HistoricalDatesExported step. From there, expand the values and filter out the null values so we only have dates where we are missing data left.
Execute the Microsoft Graph API Function
Reaching all the way back to the first step, GraphDataFunction, we are going to pass our list of dates through to get our data. We will pass our column and pull our data through:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData,
LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], 33)),
RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", {"Report Refresh Date", " User Id", " User Principal Name", " Last Activity Date", " Is Deleted", " Deleted Date", " Assigned Products", " Team Chat Message Count", " Private Chat Message Count", " Call Count", " Meeting Count", " Meetings Organized Count", " Meetings Attended Count", " Ad Hoc Meetings Organized Count", " Ad Hoc Meetings Attended Count", " Scheduled One-time Meetings Organized Count", " Scheduled One-time Meetings Attended Count", " Scheduled Recurring Meetings Organized Count", " Scheduled Recurring Meetings Attended Count", " Audio Duration In Seconds", " Video Duration In Seconds", " Screen Share Duration In Seconds", " Has Other Action", " Urgent Messages", " Post Messages", " Tenant Display Name", " Shared Channel Tenant Display Names", " Reply Messages", " Is Licensed", " Report Period"}),
FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
FilterMissingResults
After we execute the function, we will combine our files and expand the results. To do this, we are using a custom function titled Transform file to combine our files seamlessly:
(Parameter as binary, ColumnCount as number) => let
Source = Csv.Document(Parameter, [Delimiter = ",", Columns = ColumnCount, Encoding = 65001, QuoteStyle = QuoteStyle.None]),
#"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
#"Promoted headers"
The result should be something that looks like this:
Wow – that was a whole lot for a single endpoint. Only four more applications left! Frustrated that you will have to do this again? Well, instead of being frustrated, let’s make it a little easier.
Turning Microsoft Graph API Reports into a Flexible Function
We have worked so hard to get the first query in place. But if we think through things a little further, we can simplify the process for future endpoints. Instead of going through this entire process over and over again, we can convert this to a function and pass data to make it easier for us to use.
Prepare Your Endpoint Data
To simplify our process, we will start by creating a table we can reference. This will highlight the application, Microsoft Graph API endpoint, Fabric Lakehouse table, columns pulled from the endpoint, and the column names. Create a blank query called Endpoints and add the following code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3VXbbtswDP0Vwc/pQ9cvCOIMDZCtRt0FGIIiECzOIWpTgURny75+lOPc5aB+3YttkkeUeXRILZfJtNZYJaMEwnvVeHArXTBukXfiLYFbwLhz/ZB4Crxf8vgkj1fYWMfqFX458GuVaoaRCjCVOaQCN7pS33UtzhT9ptK7zpp5lUIFDGZ0+OgWz7Vnddiw8+VARk1sQzySrQrALZxMfQx9A2CkUk0c6JDwyj0jBifVnSJj77EksTNnTVOwD/nagjJwaE3yPlomLwSpkx2lWivY8Bkj6gCLc/U4iKth7CwQfgvgxampwQD9itWRn3xHxbVrrZ24WjpIV9UuFp3+iUY/xVibIrNILFX7YGyCEWPtBI3z9uU/4W2Bvt0i0+UwLt9A115I4PCOMdgCAhtRAp/uN+nM9DAZ46uf3Ugl4bfUZK1Z2s/7s6plp60sisYmwttNM1+YXo6r1IR/b9rbqzGzzImz7jbq2RZ3Vl4DrhPkxRpMUwWNEDww1nAn2T1wf2KZZo1zoc7PZI6gb4puDFqVNk4zWhKxyvAsLBkfNGigJ5QXDoD2Go4jnrX8Ga9FKkEVlkQ3rgQ6HqFAMuvPzTcgLfHLud91iRw+EVQxzL4LpH1OmUR4cyyAfFDebYssRFBqSqWARexbsVbQWrF2+anruqdXBg7py8JaSM7764o7gZNIu3/+BL7AxC6yOX4MuaTe/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Application = _t, Lakehouse_Table = _t, Graph_Endpoint = _t, Column_Count = _t, Column_Mapping = _t]),
#"Changed column type" = Table.TransformColumnTypes(Source, {{"Column_Count", Int64.Type}, {"Application", type text}, {"Lakehouse_Table", type text}, {"Graph_Endpoint", type text}, {"Column_Mapping", type text}})
in
#"Changed column type"
You can choose to pull this data from somewhere else if it is easier for you, but this table is easy to maintain in this scenario.
Convert our Query to a Function
Our next step is to take the first query we created in this article and make it a function. We will start by adding the parameter for an application at the top and filter the Endpoints table with the value passed in the parameter:
From there, we will call the data from the EndpointData step and inject it into our query with the following items:
In the GraphDataFunction step, place a dynamic endpoint name
In the HistoricalExports step, place a dynamic Lakehouse table name
In the MergeExportedFiles step, place a dynamic column count in the Transform file function
In the ExpandResults step, place dynamic column mappings and wrap it with a Text.Split() function
Now that is completed, we can call the function by typing in the application name and clicking invoke:
With this in place, we can invoke our function for OneDrive, SharePoint, Email, and Viva Engage with a few clicks!
Get Current Users from Entra ID
For our last step, we just need to pull our users out of Entra ID into our Dataflow. To make this happen, I like to use a function to make it easier. I created a function titled Get-Users and invoked it with this code:
() =>
let
GetUserDetail = (Params) =>
let
Path = if Text.Length(Params) > 0 then Params
else "$select=displayName,givenName,jobTitle,mail,officeLocation,surname,userPrincipalName,id,accountEnabled,userType,companyName,department,employeeType&$expand=manager",
Source = Json.Document(Web.Contents("https://graph.microsoft.com/v1.0/users",[RelativePath = "?" & Path, Headers=[Authorization="Bearer " & #"Get-AuthToken"()]])),
NextList = @Source[value], result = try @NextList & @GetUserDetail(Text.AfterDelimiter(Source[#"@odata.nextLink"], "?")) otherwise @NextList
in
result,
UserDetail = GetUserDetail("")
in
UserDetail
You will notice the function inside of the function. This is because we will need to iterate through the entire user list to receive all users. Because the API limits us to batches of 1,000, we will need to work through this a few times to get the results we desire.
HELP! THIS IS TOO MUCH!
Is your head spinning? Mine too. In fact, this was a lot to even get this coded. And if I ever want to reuse this, I do not want to go through any of this level of effort again. I am probably going to be like that math teacher you had where you learned how to do everything the hard way to pass an exam and then the next chapter they teach you a way to complete a problem easier. I hated it when that happened to be, but I am still going to do it to you.
To streamline this, I created a Power Query template for this project. You can download the template and import it into your new Dataflow:
Once imported, you will need to update the following parameters:
TenantId
AppId
ClientSecret
WorkspaceId
LakehouseId
This should streamline the process considerably if you are just getting started!
From Dataflow to Lakehouse
We now just need to map our data into our Lakehouse. This will take a little diligence, but it should move along pretty quickly. We just need to match our queries and columns from our Dataflows into tables within our Lakehouse.
Adding Teams Data to the Lakehouse
Starting with our Teams user activity data, we will go to the bottom right-hand corner of the screen and click the + icon to add it to our Lakehouse. Once you have a connection to your Lakehouse, we will use an existing table and navigate to the table where your data is stored:
Once we have selected our table, we need to map our columns. We also need to update the setting that will allow us to append our data:
Now we just need to save our settings. Once we publish and refresh our dataflow, our data will be added into the Lakehouse. We just need to repeat these steps for OneDrive, SharePoint, Email, and Viva Engage.
Adding Graph Users to the Lakehouse
If you remember from my last article, we created all of the table for our Microsoft Graph API report endpoints. However, we did not need to create anything for our users. Why is that?!
The answer is simple – with our approach, we will optimize the refresh experience and eliminate data duplication for these dates. This is because it is a moving target and hard to hit. However, with our users, it is a different story as we will replace our data each time. So, just like we did before, we will go ahead and add our Lakehouse as a data destination. However, this time we will create a new table. Once in place, we can make adjustments to our mappings and settings:
Once completed, we can click save and move to publishing our Dataflow.
One More Thing Before We Publish!
Last thing we need to do is adjust our HistoricalDayCounter from 5 days to 27 days. This will ensure that all the historical data available is captured in the refresh and will be loaded into the Lakehouse:
Once completed, we can click publish. It will take a few minutes to validate the Dataflow. After that, we just need to setup a daily refresh and our data will start to populate into the Lakehouse!
Next Steps and Conclusion
Alright, that was a ton of information in a single article. It was hard to split up, but I wanted to keep it together in a single article as these pieces were all interconnected. It is a lot, but it is complete. Now that you have your data, you can create your model, measures, and start building out your reports.
In other news, I have been able to get the endpoint for Copilot. However, there are some data quality challenges that I want to address before doing a full write-up. In the meantime, this will get you started with understanding how complicated this level of reporting can be.
Did you make it to the end? Were you successful? If so, tell me about it in the comments below!
In my last post, I went over they why and covered the building blocks for connecting to the Microsoft Graph API for reporting. With this in place, it is time to prepare the landing zone for our data in Microsoft Fabric! To achieve this, we will need to setup our Fabric environment so we can start to build out our solution.
Before we get started, if you do not have Microsoft Fabric, do not fear! You can start a Microsoft Fabric trial for 60 days for free. When you sign into Power BI, you can click on your user avatar, and start your trial. If this is not available to you, it might be because your Fabric Administrator has turned off the trial feature. You might need to submit a request to re-enable this feature.
Creating Your Lakehouse
To get things started, we need to create a new workspace that has a Fabric capacity assigned to it. Just like you would before with Power BI, create a new workspace inside of Microsoft Fabric and ensure it is tied to your capacity.
Once you have your workspace in place, it is time to build your Lakehouse. If you go to the new button in the upper left hand corner of the workspace window, a screen will appear where you can select a Lakehouse:
From there, you can give your Lakehouse a name and click create. You will be redirected to your new Lakehouse. Once created, we will need to extract the Workspace ID and Lakehouse ID from the URL. The easiest way to capture this is to just copy the URL in the address bar to a notepad where you kept your Service Principal data in part 1 of this series. Once you capture this information, you can go can close your Lakehouse:
Now that you have a Lakehouse in place, it is time to prepare the tables. While there is an opportunity to create these tables directly out of our Dataflow, we will want to create a few ahead of time. This is simply to reduce the amount of effort on your side. The goal is to make it easy, but the process can be complex. This should make the process a little more seamless for you as a result.
While I am going to give you a Notebook you can upload, we will pause and create one from scratch so you understand the process. When you click new item in your workspace, we will scroll down to prepare data and select Notebook:
We will add our Lakehouse to the Notebook on the left hand side. Once in place, we can add our code using PySpark into a cell:
Using documentation from the Microsoft Graph API, I was able to identify the endpoints and their associated fields. This allowed me to create a block of code that would create our teams_user_activity table:
In our scenario, we will need five tables for Teams, OneDrive, SharePoint, Email, and Viva Engage. To simplify the process for you, I have created an notebook you can upload into Fabric and run. You just need to connect your Lakehouse and select run all. To perform this task, you must switch to the Data Engineering context of Fabric and select Import on your workspace:
Once you have run the script successfully, you can go to your SQL Endpoint for the Lakehouse and verify the tables have been created and that they are empty with the following query:
SELECT * FROM teams_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM onedrive_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM sharepoint_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM email_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM viva_engage_user_activity ORDER BY Report_Refresh_Date DESC
Preparing Your Dataflow
After creating your Lakehouse and preparing your tables, you can create a Dataflow in the same workspace. This will query the data and push it to the right tables within our Lakehouse. However, you need to be careful to select a Dataflow Gen2 as opposed to a Dataflow:
Adding a Dataflow Gen2 to your Fabric workspace
This is critical for us to ensure our data makes it into the Lakehouse later on, so be careful with your selection! Once you select it, you can rename your Dataflow in the upper left hand corner:
With your Lakehouse and Dataflow created, it is time to start getting some data.
Connecting to the Microsoft Graph API
If you remember from my last article, we will be using the application permissions with our app registration. As a result, we must acquire a OAuth2 token to facilitate the authentication process. Luckily, we made it easy for you to deploy this in your solution.
Adding Parameters in Power Query
I covered the basics of Parameters in Power Query in an older article. Instead of going through the details, I will just highlight what we need to have created.
To facilitate our authentication, we need to create three parameters that will be used to hold data from our app registration. The three parameters we need are as follows:
Parameter Name
Parameter Type
Parameter Value
TenantId
Text
Directory (tenant) ID
AppId
Text
Application (client) ID
ClientSecret
Text
Client secret value
Parameters for our authentication process
Remember, capitalization counts! If you copy/paste the parameters above, you will not have an issue with the next step. It seems easy, but take great care with this process!
Creating an Authentication Function
Once you have your parameters set, you need to create a new blank query. To create what we need to make this connection happen, it requires you to build out the query in code. However, I have made that easier for you. Just copy the code below and paste it into the advanced editor with your new query:
Once we have pasted the code into the query, click next. After that, rename the query to Get-AuthToken. Again, capitalization is important so take care with this step!
Now that we have our authentication function in place, it is time to take the next step towards building out our Microsoft Graph API functions to pull user information and Teams user activity.
Conclusion and Next Steps
There is a lot of setup involved in this process. Part of the reason is because of the complexity of the pull we need to perform, but also to avoid running into duplicates and reduce the chance for throttling with the endpoints. I know this seems like a lot of work, but I promise it will pay off in the end!
In my next post, we will finally grab our Graph API data and load it into our Lakehouse! Stay tuned to see how it happens!
The Microsoft Graph API library has been around since November 2015 and continues to expand. One of the key endpoints that has centered around utilization reports of the Microsoft 365 ecosystem. These reports can be leveraged to drive better overall adoption of the Microsoft 365 platform and understand trends in use.
Leveraging the Microsoft Graph API, the Microsoft 365 Utilization Metrics app has been the easiest deployed solution to work through this challenge. With a few clicks in AppSource, you can deploy this app within Power BI, update some details on the sematic model, and hit refresh to understand utilization. And while it does show data for the past year, customers frequently want more out of their data.
People forget about the Microsoft Graph API until new technology finds its way into the ecosystem. Enter stage left, Copilot for Microsoft 365, the newest belle at the ball! With new technology coming at a premium price point, companies want to understand utilization to ensure value. The $30 per user per month cost of Copilot suddenly has people wanting to confirm people are leveraging the tool. If not, they want to reclaim the license and get it to someone else.
This gave me the idea to revisit this challenge and see how we could leverage Fabric to make it a reality. While the Microsoft 365 Roadmap has this rolling out soon, a little preparation will allow you to deliver this data when it is finally available.
What are the Microsoft Graph API Reports?
The Microsoft Graph API reports are a collection of endpoints that provide utilization data for users. Aggregated by action, these reports provide visibility to how Microsoft 365 is leveraged. While you can pull data for users in 7, 30, 90, and 180 periods, the data is aggregated by user. This proves a challenge in getting data in a meaningful format if you are looking to track trends over time.
While those periods work, there is another option. Most endpoints allow you to specify an individual date for your query. However, you can only collect that data for the past 30 days. This is a huge restriction and can limit your ability to trend data over a longer period of time.
In the past, if you did not have a data warehousing strategy in place, you were severely limited to what Power BI could do. However, leveraging Microsoft Fabric, this will get you in the right direction pretty quickly. However, with the advent of Microsoft Fabric, we will be able to simplify this process and enable you to track this data with minimal effort!
Preparing Access to Microsoft Graph API Reports
To get ready for the data warehousing process, you must deploy an app registration inside of Entra ID. To get started, sign into Entra ID, navigate to Applications, select App Registration, and start a new registration. Name your new app and keep everything else the same:
Once you click register, copy the Application (Client) ID and Tenant ID from the overview page and save it for later. Next, go to API permissions and click add permission. You will select Microsoft Graph and then application permissions. From there, you will need to add the following permissions:
Permission Name
Purpose
User.Read.All
Provide dimensional data around users within your organization from Entra ID. While User.Read has already been added to the app, it will not work in this situation.
Reports.Read.All
Provides access to all reporting endpoints for Microsoft 365. This includes Microsoft Teams, SharePoint, OneDrive, Exchange, and Viva Engage
Details about API permissions for your app
Once these permissions are assigned, grant admin consent for your organization. The permissions you added should have green checkmarks next to them:
Lastly, you need to generate a client secret. Go to Certificates & Secrets and add a new Client Secret. Once created, make sure you copy the value for your secret and save it. You cannot get it again and will need to re-create it when you are finished.
Turn off Report Masking
As a security feature, the Microsoft Graph API has user masking available for this data. This helps protect the anonymity of user activity in the tenant. However, if we want to track Copilot for Microsoft 365 utilization, we cannot be anonymous. For some of you, this will not apply. However, this process could require some approvals. Asking early will ensure you can the report mask turned off ahead of time.
To turn off report masking, go to the Microsoft 365 Admin Center and expand settings on the left side of the screen. From there, select org settings and then select reports. A panel will appear on the right side of the screen. Uncheck the option titled display concealed user, group, and site names in all reports:
It is important to note that this feature is either on or off for everyone in the organization for all reports. There are no granular controls for this feature. Therefore, make sure you understand the implications of turning off this setting and communicate it to fellow Microsoft 365 administrators who might be using these reporting features whether by API endpoint or in their respective administrator portal.
What else do I need to know?
This project will require you to understand how to use REST APIs in Dataflows. If you have not read my series on leveraging REST APIs in Power BI, this would be a good place to start. We will need to build a number of connections leveraging this method.
Next, we will have to navigate an authentication process that will require us to use the POST method. I have also written an article on navigating this challenge. This will be a little more complicated, but the code found in this article will be helpful for understanding how we will build a function to drive authentication for these endpoints.
Lasty, you will find yourself writing some M Query code. If you are new to this, the code will be provided to assist you. However, you will want to learn a little more about the language so you understand the method behind the madness.
Conclusion and Next Steps
This setup only takes a few minutes, but it will go a long way in preparing our environment. As mentioned above, if you are not familiar with using REST APIs in Power Query, please take some time to learn about them. I promise you that you will thank me later. This can get very complicated, so understanding the process ahead of time goes a long way!
This will be enough to get thing started for right now. In my next article, we will explore setting up our Fabric environment to query this data and prepare it for the data warehouse.
In preparing for the Global Power BI and Fabric Summit this year, I got a little nostalgic. I went back to my presentation and early blog posts on connecting to REST APIs. Two years, several blog posts, and a book on Power Query later, I felt it was time to add a fifth part to this series. I realized that I left out one additional step I could take to optimize my query a little more to allow me to squeeze the most out of my API call limits.
I will be picking up where we left off with part 4 of how to connect to data with REST APIs. However, if you are new to my blog, you can start from part 1 and get caught up. If you recall, I am limited to 100 API calls a day with my scenario. If I am watching more than 100 stocks, I might need to consider moving to the next subscription level. However, I have an option to squeeze even more out and potentially return 1,000 stock symbols. This is a bonus as I do not need to increase my subscription!
What Did Dom Miss in Part 4?
To note what I missed, I need to go back to part 2 on REST APIs. You must Read the Fine Manual! In reading the documentation, I realized that I could batch multiple stock symbols at once. Instead of making individual calls for each symbol, I can group ten of them in a single call.
With this knowledge in hand, I set out to take my query one step further and get more out of my subscription. It is not as straight forward as it may seem, but with a little manipulation we can make it work.
Prepare my List
We already have a spreadsheet of stock symbols we have brought in. We need to do start by removing all of the columns except the stock symbol. Just right click on the symbol column and remove other columns. Once that is complete, insert an index column that starts from 1. Your query should look like this:
Next, we need to add a batch column. We will need to identify every 10th row and assign it an index number. We will need to create a custom column for this. Using a modulo division function, we will identify every tenth row. If it is the tenth row, the remainder will be zero. If the modulo division equals zero, we will divide the index by 10. Otherwise, we will return a null value. The formula looks like this:
if
Number.Mod([Index], 10) = 0
then Number.IntegerDivide([Index], 10)
else null
Next, we just need to use the fill up feature to make sure the rows are assigned to a batch. The last few rows will still be null. This is not an issue as it will still be its own batch. The result looks like this:
With our batch in place, we can remove the index column. Next, we need to group our symbols together.
Use Group By to Bring Symbols Together
Going back to our documentation, each symbol must be separated by a comma. We will use the group by function to bring ten rows at a time together into a single column. We will select our batch column and select the group by function in the transform ribbon. Using the sum aggregation, we are going to summarize our symbol column. Naturally, this will result in an error, but we will fix that next.
Next, we will need to update the M code in the advanced editor. We need to replace the List.Sum() function with the Text.Combine() function. Our code for this step will look like this:
Table.Group(#"Removed Columns", {"Batch"},
{{"Symbols", each Text.Combine([Symbol], ","),
type nullable text}})
The result an updated query with our stock symbols combined:
Now that the hard part is over, we just need to apply our function.
Functions and REST APIs
We already created a function in part 3 of my REST APIs series. We just need to apply it to our new query. Go to the add column ribbon and insert a column from function. Using the symbols column for the parameter, we can query all of the symbols in the cell.
All that is left to do is expand the results so they can be leveraged in the model:
And just like that, we turned twenty five API calls to three. Not too shabby!
Conclusion
It is extra work, but it is worth while if you can save a few dollars and minutes with your refresh. It may seem excessive, but it might make the difference. Besides, I could not resist a post where I tell people to Read the Fine Manual as I know my friend Mike will get a good laugh out of it again!
Have you ever used the group by function in Power BI? Or ever extended like this? Did I miss another way to optimize this query? If so, tell me in the comments below!
I cannot begin to tell you how many people I meet who think they need a Power BI Gateway to refresh their dataset using a file as a source. Yes, if you have selected a file data source, you will need to use a Power BI Gateway. However, there is a way around this challenge. It takes a little extra work to prepare your dataset, but you can make it work.
Before everyone starts moaning about using files as a data source, sometimes this is unavoidable. I worked with a third-party business travel platform that did not provide an endpoint for me to connect directly to the data. As a result, I needed to export data on a weekly basis to feed my dashboard. Since then, that platform has resolved this issue.
I also have a current scenario where I do not have access to the APIs for an ERP solution. Because of roles and permissions, I am not allowed to access this data. A creative solution is to export the data through a scheduled job that sends a CSV file to my email. From there, I can use Power Automate to position the data so it can be accessed by the Power BI Service without using a gateway. Is it ugly? No doubt. Somewhat obnoxious of a process? You betcha! But does it work? Absolutely!
What is the Power BI Gateway?
The Power BI Gateway might be new to you. If you are not an administrator of your Power BI environment, you probably do not know about it. The concept is simple – the gateway brokers access to your data sources that reside on premises within your network. Instead of opening up firewall ports or creating crazy firewall rules, you can deploy a gateway to broker the connection.
There are two types of gateways that can be used. The enterprise gateway provides an enterprise-wide connection for your data sources with the controls to manage access to them. Generally, you install the gateway on a server that has access to your data sources. Once it is in place, you configure your data connections within the Power BI Service by adding credentials and delegating access. The biggest benefit besides the controls, you can provide redundancy to your gateway.
The other type of gateway is the personal gateway. This is deployed by an individual user. The data source must be accessible from the device from the personal gateway, but controls of the data source are essentially inherited by the user. This is a quick way to get a gateway in place, but causes a risk because it is tied to an individual user with no oversight. In addition, your device must remain on for the refresh to be successful which does not always work out.
Regardless of the type, the Power BI Gateway allows you to connect to any data source that resides on premises. While files work with gateways, they are a bit of a burden to manage. Unlike other sources such as SQL Servers, they have a tendency to move around or adjust. If we can manage this without the gateway, it will be easier for end users to manage without having to involve an administrator in the process.
Bypassing the Power BI Gateway
In short, the easiest way to bypass using a Power BI Gateway for your files is to use SharePoint Online. Since it is part of Microsoft 365 and already in the cloud, you do not need a gateway to connect to your file. Start by making sure your file is in SharePoint and open it in the desktop app from the site. From there, go to the file ribbon, select info, and click on copy path.
Once you have a file path, create a new source in Power Query and select the web connector. Paste your URL into the field, but remove the ?web=1 at the end of the path. Click OK and use your Organizational Account to access your file. From there, you will be prompted to go through and access the data based on the file structure. I talk about the web connector with REST APIs, but the concept is the same if you have never used this connector before.
Now, this process works well for Excel Workbooks and CSV files as they can be opened with Microsoft Excel. But what about JSON, XML, or Parquet files? You can get around this challenge by simply adding any Microsoft Office file into the directory to get the file path and replace the file name with the correct one. This will save you the effort of building out the URL.
But I Already Have Done So Much!
So often, I have people tell me that they have spent hours transforming their query already and fear they need to start from scratch. But believe it or not, it is easy to resolve. Let’s start with this block of code in your advanced editor:
let
Source = Excel.Workbook("c:\data\taxi_zone_lookup.xlsx",
null, true),
ExcelData = Source{[Item = "TaxiZoneTable", Kind = "Table"]}[Data]
in
ExcelData
We can see from the code block above, we are accessing the file from a folder in the root of our hard drive. Once we have in inside of SharePoint, we can replace the “c:\data\taxi_zone_lookup.xlsx” with a web connector:
let
Source = Excel.Workbook(
Web.Contents( "https://<tenant>.sharepoint.com/sites/NYCTaxi/Shared%20Documents/Taxi%20Data/taxi_zone_lookup.xlsx"
),
null,
true
),
ExcelData = Source{[Item = "TaxiZoneTable", Kind = "Table"]}[Data]
in
ExcelData
You will have to excuse the formatting in this example, but you can see it was a simple swap out in the code. With the change in place, we did not need to make any additional changes to the downstream code. While this code block is small, we could have a lot of transformations in place which would require considerable rework.
Anything Else to Consider?
A big benefit to moving to the cloud location is that you can easily share your Power BI Desktop file for development work without having to update the file path. You will need to share the file with users who will be working directly with the source, but that is easy to manage.
In addition, you can easily add files to your data source without having to get an administrator involved. Regardless of the type of Power BI Gateway you want to use, you will likely need an administrator to help you. If you are using the enterprise gateway, you will need a platform administrator to configure the connection. If you want to use a personal gateway, you might need someone with administrator rights to install the software on your machine. Regardless of the type, you will likely need some help.
Conclusion
I wish more people talked about managing file refreshes this way. This is a great option, yet Microsoft tends to keep quiet about it. In fact, you can leverage this method to provide a refreshable report from a form you created. Patrick from Guy in a Cube created a video on how to achieve this. Hard to believe how a few small changes could make your life so much easier!
So have you ever created a refreshable interface with Power BI like this? Do you have a use case for this? Tell me about it in the comments below!
Unless you have lived under a rock in the past year, you know that artificial intelligence is a popular topic. Whether it is Copilot, ChatGPT, Bard, or Open AI, artificial intelligence is here to stay. It is time to embrace the utilization of artificial intelligence and learn how to embrace it for our benefit.
But surely, the use of artificial intelligence inside of Power Query is brand new, right? Wrong! I remember seeing it available five years ago and it is a powerful tool to integrate with your models! I am not going to be able to go into all of the details of artificial intelligence in this article, but rather help you understand how to integrate it within your existing queries.
What Is Artificial Intelligence?
In the fast few years, Artificial Intelligence (AI) has become a force in society. It permeates the world we live in and has become an integral part of how things are done. We see examples of it in our lives through things like chat bots, self-driving cars, and even social media. While some may find it intimidating, the reality of AI is that it has the power to make our lives easier.
Artificial Intelligence is used to describe technologies that perform tasks that once required human intelligence to complete. When properly deployed, AI models can recognize patterns, solve problems, make decisions, and understand natural language. These models do not organically exist. They are created through extensive training activities to ensure the results you receive are valid. The work we perform with Power Query may result in the creation of a model while others will leverage one.
While generative artificial intelligence (GenAI) is all the rage right now, we will not be using it for our queries. Instead, we will be integrating models that have been trained to provide a response. You might be the creator of these models while others are creating them for you. The result is the opportunity to bring back a predictive response based upon your query.
How to Get Started?
While Power Query is available in both Microsoft Excel and Power BI, this functionality is only found in Power BI. While you can leverage similar AI models in Microsoft Excel, it is done outside of Power Query. This is an important item to note if you prefer using Power Query inside of Excel.
Second, you must have some form of a premium workspace to enable this functionality. This can be in the form of a Premium Per User, Premium Capacity, or Azure Power BI Embed environment. If you already have one of these environment at your disposal, you are set. If not, check out my article on licensing strategies around Power BI. I recommend spinning up an Azure Power BI Embed SKU or activate a Premium Per User trial if you want to try these features at a minimal cost.
What if you do not have a Power BI environment available to you? Unfortunately, you will not be able to take advantage of this functionality inside of Power Query. I recommend you work with your organization to spin up a trial to explore this functionality.
Pre-Built Models in Power Query
Power Query allows us to leverage pre-built models. Leveraging Azure Cognitive Services, we can perform text and image analysis on our queries. We can do this from the add column ribbon inside of Power Query:
If you are doing this for the first time, you will be prompted to allow the analysis of your data. Once you have done that, you can select which model you wish to use:
As you can see, you can select multiple pre-built models to perform your analysis. With text based analysis, you should consider both the text and the language being used. This helps ensure the proper analysis is being performed. By default, it will use your tenant’s default language. If your column is using the same language consistently in the query, you can hard code it in this screen. However, if you have multiple languages, you might need to to use the detect language model to bolster your analysis.
You can also identify tags for images you ingest with Power Query. This is not easily performed, but is ideal for small images. Because of the practicality of this, you will likely not use the vision model. However, you can try it out for yourself:
These models can provide quick analysis and provide meaningful results with minimal effort. I find that combining these results with other dimensions it allows me to sort through results and find the most meaningful responses in surveys.
Custom Artificial Intelligence Models
If you have created a custom model using Azure Machine Learning, you can leverage it within Power Query. Leveraging a public dataset around diabetes, I created a custom model with Azure Machine Learning. With an endpoint created, I can access it within Power Query with a few clicks:
Once I have selected Azure Machine Learning, I can select which endpoint I wish to use. I have already deployed my model and selected it below:
As you can see, there are multiple inputs I must identify for my endpoint to return a predictive response. I can identify the different columns in my query and return a result. We will need to identify values for each field to allow this to work.
It is important to consider that Azure Machine Learning uses a consumption pricing model. Every time you refresh your query, it is seeking new predictions. Keep this in mind when building out your queries as you could rack up some extra costs without realizing it!
Any Other Options?
If you are a user of R or Python, you can leverage custom algorithms and models inside of Power Query as well. You need to deploy software on your device to make this work and can be prohibitive if you do not have admin access on your machine. I do go into detail on these tools in Chapter 11 of my book, but decided not to discuss them in this post. Primarily, I wanted this to be about out of the box and easy implementation.
Conclusion
It is amazing how artificial intelligence has infiltrated Power Query. For many, they have no clue it was there or what is needed to get it activated. With a little effort, you can really transform your queries with a few clicks. I recommend you try it out to see how you can harness the value of artificial intelligence in your queries!
So did you know about this feature in Power Query? Have you used it before? Are you going to try it out? Tell me about it in the comments below!
This is a difficult post to write. From a personal perspective, I really struggle with this topic. In my daily work, I assist organizations with enabling a self service data strategy. Collaborating together, we spend hours working to find the right path that matches the needs and goals of the organization with the realities of the organization. I am a huge advocate for self service analytics and believe in it. Because of this, I want to be ridiculously helpful with everyone I encounter.
Sadly, I run into people on a regular basis that want “a little help” to perform some analysis with Power BI or Excel. The problem with this is that the ask is quite complicated. The reality of my situation is that I can make the ask happen quickly. I can spend an hour or two on a report and it will look beautiful. Having worked with Power BI for almost ten years, I have a pile of experience to make it happen efficiently. But the requestor wants me to help them build it themselves. This takes three to five times longer because I am guiding someone through the process. And I am all for doing this because it helps enable another person to use Power BI or Excel.
While I am all for helping people, I find that their interest in learning about these tools is often non-existent. They have a goal in mind that needs to be completed and they think Power BI or Excel is an easy way out. It is a self service analytics tool, so it should be easy! And yet they find the opposite. Without taking taking the time to understand the basics, they end up lost.
What makes this hard for myself and others is that it cheapens the skills we possess as data professionals. Data is not easy. Personally, I find this approach to be a little insulting. So how do we fix this problem?
Setting Expectations for Self Service Analytics
I started responding to every request I receive with clear expectations around self service analytics. I explain the challenges and skills they need to possess before they embark on such a project. Sometimes this scares them off, but it is reasonable way to find alignment.
I often share the basics they will need to learn to be successful. I start with understanding data modeling. We start by talking about things like building a star schema, using a date table, and managing relationships. Sometimes a flat table from Excel or SharePoint is fine, but other times they want more depth and dimension to their reports. Without this in place, they will never achieve their goal.
Next, we discuss how to build out measures. Like every beginner, including myself at one time, they rely on implicit measures. These measures often provide basic aggregations and get the job done. However, they often ask for something more complex. They need to perform time based analysis or use filtering. They are going to need to learn how to write DAX expressions for explicit measures.
Lastly, we will discuss what report design looks like. Often, I have a theme file or template for the organization available, so this helps with branding. But it is important to address best practices with data visualizations. This avoids confusion and provides a better consumer experience.
Assign Some Homework
With expectations set, it is time to assign some homework. Self service analytics requires some investment. But before helping anyone, I assign some basic training. The concept is simple – if someone does not know the basics, they are never going to understand the advanced topics.
Microsoft Learn is my go to for training others around the basics of Power BI. Because it is free, the only real investment is time. It covers a lot of the basics of the tool and allows me to focus on the more advanced topics. I, just like you, need to protect my time. The more I can push to other resources, the easier it is for me to stay focused on my work.
There are two benefits from using this method. First, I can ask for a badge validating that they completed the training. However, I like that this method helps gauge the interest of someone who is asking for help. The course linked above takes about three hours to complete. If they cannot invest three hours in themselves, then why should you invest three hours in them? The answer to that question is up to you, but it sets expectations for the relationship you will encounter through this engagement.
Create Working Sessions
At this point, if they have completed their homework, I recommend creating working sessions to set boundaries. I start with setting up 45 minute sessions with a clear agenda. Ideally, a weekly working session is best. Naturally, you might need to schedule more frequent sessions based upon the business requirements. It is important that you agree upon the cadence to avoid conflict in the future.
For each working session, start with a recap of the previous session. Review the previous topic and any progress made since you last met. Then, work through the assigned topic for the current week. Make sure it is recorded so it can be referenced afterwards. Lastly, decide what the topic of the next working session. This helps keep future sessions focused and your timeline compact.
I recommend that you continue assigning homework. For example, if you have an upcoming working session that will focus on a particular DAX statement, provide resources ahead of time. If you have a measure that needs to use the SWITCH() function, send them to an article like this one to help them understand the concept ahead of time. You can also assign work to be completed in their report before the next session to streamline your working sessions.
As a rule of thumb, I work to keep sessions scheduled. An ad-hoc working session often ends up being a mess. You also have other deliverables and expectations from your leadership that could get in the way. If someone needs time with me between sessions, I generally ask them to schedule it. I feel like the “Can we hop on a quick call? I have a quick question!” conversation always comes at the most inopportune time. Setting these boundaries goes a long way to ensuring you can maintain a balance in your workday.
Provide Some Learning
It never hurts to provide additional learning and training. There are several sources out there. There are several books and sites out there to help. If you need help with Power Query, you can check out my book as an option to assist. If it is with DAX, I recommend The Definitive Guide to DAX by Marco Russo and Alberto Ferrari. Lastly, you can suggest Now You See It by Stephen Few if they need help with data visualization.
At some point, you they will need to embrace self service analytics on their own. By providing materials, they can be focused on their learning and be able to sustain their own solutions. These resources will provide assistance both during your working sessions and in the future.
If self service analytics is a goal, investment in learning is key. I see customers who continue to struggle with this because they do not invest in training and learning. When you are working with an individual, at some point they will need to invest more in themselves to achieve that goal. If you do not help them head down that path, you will be stuck helping them forever.
Conclusion
I love helping people discover the world of self service analytics! I hope you are too! Just make sure you help others in a sustainable manner. These types of relationships are good to have, but you need to manage them well. Be ridiculously helpful – just take care of yourself as well!
Have you encountered this situation? Do you struggle with people who think self service analytics is easy? If so, tell me about it in the comments below!
In working with a customer a few weeks back, I ran into another odd scenario. This was a new ask for me, but made sense in a post COVID-19 world. Normally when I build out time intelligence with Calculation Groups, I usually only perform comparisons between the previous year and the current year. However, this customer wanted to go back a few years to be able to see the post COVID-19 impact on their metrics. Naturally, Calculation Groups makes quick work of a request such as this. However, there was no way to create their own custom headers and they found it confusing:
The use of generic headers led to confusion when reading the visualization. On top of that, they had a fiscal year that started on July 1st. This made it more complicated to understand which fiscal year they were currently in. In short, they wanted to provide dynamic column headers in the matrix visual that aligned with the associated fiscal year. If you were looking at the 2023-2024 fiscal year, the labels need to translate accordingly:
Calculation Group Label
Custom Column Label
CYm3
2020-2021
CYm2
2021-2022
CYm1
2022-2023
CY
2023-2024
YoY
Change from Last Year
YoY_Pct
% Change from Last Year
CYm3 Year End
2020-2021 YE
CYm3 Variance YE
2020-2021 Var YE
CYm2 Year End
2021-2022 YE
CYm2 Variance YE
2021-2022 Var YE
CYm1 Year End
2022-2023 YE
CYm1 Variance YE
2022-2023 Var YE
Generic versus custom labels
Well this was a challenge and upon all of my research, I found a possible solution. And while I was able to get my solution to work, I am not 100% sure that this is still the best solution. However, it worked for my situation. And while the performance is not the greatest, it still got the job done.
To get started, you do need to understand about Calculation Groups and using Tabular Editor. If you have not used them before, the guys over at SQLBI have a great series on how to use Calculation Groups. And if you have not already done so, head over to TabularEditor.com to download the latest version to create your Calculation Groups. For our scenario today, we are going to assume you already have your Time Intelligence measures setup. Our focus is on applying the logic for the headers.
Creating the Column Headers
There are a few challenges with the creation of these column headers. One, they must map to a calculation group item. Two, they must follow the naming convention outlined above. And three, they must be properly sequenced.
We are building our logic inside of Power Query under the assumption we do not have access to our data source to enact this solution. This allows us to see how versatile Power Query can be for our projects. However, you are welcome to perform this logic upstream in a SQL server if you have one available.
If you are not comfortable with Power Query, I suggest you check out my book which will give you the knowledge and experience to create this table.
Reference your date table
The first step in the process is that we will right click on our date table and reference it in another query. From there, we will right click on the Fiscal Year column and remove all other columns. Once completed, we will remove the duplicates in the fiscal year column so our results look like this:4
This is the easiest way to ensure that your custom column headers will align with your current date table. This will eliminate any issues with it being out of sync in the future.
Add mappings for Calculation Groups
Our next step is to add our mappings so our custom columns will leverage the right calculation group measures. Referencing the table above, we will add a new column for each calculation group item. We will also fill in the column with the sequence value in the order we wish to display these items.
For each measure, go to the Add Columns ribbon and select Custom Column. Change the column name from “Custom” to your label from the Calculation Group. Then add the sequence number in the formula box below:
When finished, your list of labels should look something like this:
The last thing we need to do is unpivot our columns so they become rows in the dataset. Right click on the Fiscal Year column and select “Unpivot Other Columns”. This will turn the other columns into rows. Rename the columns so they are measure and measure sequence respectively:
Now that this has been done, it is time to build out our logic for the labels.
Assigning labels with a conditional column
At this point, you have two choices. Because of the complex nature of our labels, you will need to write them out in the advanced editor. The question is if you would prefer to get the conditional column logic in place first. If you are not comfortable with the logic, you can use the graphical interface to achieve this. Otherwise, you can use if/then/else in the advanced editor.
I am not going to get into the weeds with this code as there were a number of functions used. However, I will provide a full printout of the code for this query when we are finished. In the end, our goal is to have our labels show up accordingly:
This process will take some time, but the trick is to use the measure sequence to make your life easier. By using that, it will make your creation of column headers easier. It takes some practice, but you will get there quickly.
Address sequence of columns
This was the trickiest part. We will implement this solution using conditional columns, but there is a catch. We will be sorting our custom header column with this new sequence column. However, we must have an equally distinct value return for the sequence. So how do we make that work?
The trick is grouping. CY, CYm1, CYm2, and CYm3 will be part of the first group. YoY will be in group 2 and group 3 will be YoY_Pct. Then the year end and year end variance columns will be part of group four.
Now, you might already be concerned about a collision. And you are correct, there is one at this moment. But we are going to do one more thing to fix this. For groups 1 and four, we are going to add the fiscal year as a decimal value. This will allow us to use consistent headers and maintain sequencing. I promise you that I have spent hours figuring this out and I do not know of a better way to achieve this outcome. If you think there is a better way to do it, give it a try. But I know for a fact that this outcome works:
Now that we have this in place, we can look at the final code.
Final Code – Custom Column Headers
As you can tell, this was not an easy process. It took a lot of custom coding to make it work. However, the results will speak for themselves shortly. Regardless, here is the code we used to achieve our outcome:
let
Source = bi_date,
#"Removed Other Columns" = Table.SelectColumns(
Source,
{"FiscalYear"}
),
#"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"),
#"Added CY - 3" = Table.AddColumn(#"Removed Duplicates", "CYm3", each 0, Int64.Type),
#"Added CY - 2" = Table.AddColumn(#"Added CY - 3", "CYm2", each 1, Int64.Type),
#"Added CY - 1" = Table.AddColumn(#"Added CY - 2", "CYm1", each 2, Int64.Type),
#"Added CY" = Table.AddColumn(#"Added CY - 1", "CY", each 3, Int64.Type),
#"Added YoY" = Table.AddColumn(#"Added CY", "YoY", each 4, Int64.Type),
#"Added YoY%" = Table.AddColumn(#"Added YoY", "YoY_Pct", each 5, Int64.Type),
#"Added CY - 3 YE" = Table.AddColumn(#"Added YoY%", "CYm3 Year End", each 6, Int64.Type),
#"Added CY - 3 Var YE" = Table.AddColumn(
#"Added CY - 3 YE",
"CYm3 Variance YE",
each 7,
Int64.Type
),
#"Added CY - 2 YE" = Table.AddColumn(
#"Added CY - 3 Var YE",
"CYm2 Year End",
each 8,
Int64.Type
),
#"Added CY - 2 Var YE" = Table.AddColumn(
#"Added CY - 2 YE",
"CYm2 Variance YE",
each 9,
Int64.Type
),
#"Added CY - 1 YE" = Table.AddColumn(
#"Added CY - 2 Var YE",
"CYm1 Year End",
each 10,
Int64.Type
),
#"Added CY - 1 Var YE" = Table.AddColumn(
#"Added CY - 1 YE",
"CYm1 Variance YE",
each 11,
Int64.Type
),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(
#"Added CY - 1 Var YE",
{"FiscalYear"},
"Measure",
"MeasureSequence"
),
#"Added ColumnHeaders" = Table.AddColumn(
#"Unpivoted Columns",
"ColumnHeaders",
each
if [MeasureSequence] = 0 then
Number.ToText([FiscalYear] - 3)
& "-"
& Number.ToText([FiscalYear] - 2)
else if [MeasureSequence] = 1 then
Number.ToText([FiscalYear] - 2)
& "-"
& Number.ToText([FiscalYear] - 1)
else if [MeasureSequence] = 2 then
Number.ToText([FiscalYear] - 1)
& "-"
& Number.ToText([FiscalYear])
else if [MeasureSequence] = 3 then
Number.ToText([FiscalYear])
& "-"
& Number.ToText([FiscalYear] + 1)
else if [MeasureSequence] = 4 then
"Change from Last Year"
else if [MeasureSequence] = 5 then
"% Change from Last Year"
else if [MeasureSequence] = 6 then
Number.ToText([FiscalYear] - 3)
& "-"
& Number.ToText([FiscalYear] - 2)
& " YE"
else if [MeasureSequence] = 7 then
Number.ToText([FiscalYear] - 3)
& "-"
& Number.ToText([FiscalYear] - 2)
& " Var YE"
else if [MeasureSequence] = 8 then
Number.ToText([FiscalYear] - 2)
& "-"
& Number.ToText([FiscalYear] - 1)
& " YE"
else if [MeasureSequence] = 9 then
Number.ToText([FiscalYear] - 2)
& "-"
& Number.ToText([FiscalYear] - 1)
& " Var YE"
else if [MeasureSequence] = 10 then
Number.ToText([FiscalYear] - 1)
& "-"
& Number.ToText([FiscalYear])
& " YE"
else if [MeasureSequence] = 11 then
Number.ToText([FiscalYear] - 1)
& "-"
& Number.ToText([FiscalYear])
& " Var YE"
else
"Error",
type text
),
#"Added HeaderSequence" = Table.AddColumn(
#"Added ColumnHeaders",
"HeaderSequence",
each
if [MeasureSequence] = 0 then
(([FiscalYear] - 3) / 10000) + 1
else if [MeasureSequence] = 1 then
(([FiscalYear] - 2) / 10000) + 1
else if [MeasureSequence] = 2 then
(([FiscalYear] - 1) / 10000) + 1
else if [MeasureSequence] = 3 then
(([FiscalYear]) / 10000) + 1
else if [MeasureSequence] = 4 then
2
else if [MeasureSequence] = 5 then
3
else if [MeasureSequence] = 6 then
(([FiscalYear] - 3) / 10000) + 4
else if [MeasureSequence] = 7 then
(([FiscalYear] - 3) / 10000) + 4.00001
else if [MeasureSequence] = 8 then
(([FiscalYear] - 2) / 10000) + 4
else if [MeasureSequence] = 9 then
(([FiscalYear] - 2) / 10000) + 4.00001
else if [MeasureSequence] = 10 then
(([FiscalYear] - 1) / 10000) + 4
else if [MeasureSequence] = 11 then
(([FiscalYear] - 1) / 10000) + 4.00001
else
0,
type number
)
in
#"Added HeaderSequence"
It almost looks worse than you might imagine. However, keep in mind that some of this is to help ensure we have a solid experience with our custom headers.
At this point, we will close and apply our query changes and load them into our data model.
Preparing the data model
At this point, we have created our custom headers. Next, we must map them for use with our Calculation Groups and fit them into our model. We will need to build a relationship to our custom headers query from our date table and create a custom measure to leverage our calculation group.
Connect date and custom headers table
Our first task is to connect our date table with our custom column headers table. This will filter down our custom column to only show the headers we need for a selected fiscal year. We will simply connect the fiscal year columns on both tables to build the relationship.
Now, you might be concerned about the many to many relationship being established. While often discouraged, there are times where they are required. This is one of those moments. We will go ahead and click OK to create the relationship:
With this relationship in place, we can now focus on creating a measure to return our desired results!
Custom measure for Calculation Groups
The last step before we create our visualization is to create a measure that leverages our calculation groups, but it is performed from the custom column headers table. A fun fact that you might not realize is that you can use your calculation groups as a filter in a CALCULATE() function. We are going to use the measures we created inside of the custom column headers table to perform that exact filtering!
The formula is simple. We are going to use CALCULATE(), call our Revenue measure, and select our time intelligence measure to apply. The result is a measure that looks like this:
Total Revenue =
CALCULATE (
[Revenue],
TimeIntelligence[Name] = MAX ( CustomColumnHeaders[Measure] )
)
With our measures set, we just need to create our matrix.
Add custom headers to our matrix
To complete the process, we need to simply update our matrix. For the column headers, replace the time intelligence field with the custom headers from our custom column headers table. We then need to replace our measure with our new custom measure to ensure the data comes back correctly. The result is the following:
This is quite the process to setup, but it can be rewarding if it works out. It will stretch your skills in a few areas within Power BI. However, the results speak for themselves.
Anything else to consider?
Keep in mind that if you go this route, updating the logic with your calculation groups will grow increasingly complex. If you need to add a new measure, this will add an additional burden to the update process. Each new item will need to be updated manually in Power Query resulting in additional effort.
My recommendation is that you take the time upfront to establish all of the requirements. This will help avoid future rework in the future.
Conclusion
First of all, you read all of this and understood it – congratulations! This is a very complex topic and not easy to digest. You might need to re-read this article a few times to fully comprehend what happened, but I promise it will work out. What is important is that you have a grasp of the key elements.
So have you ever ran into a requirement like this? Do you have a more efficient way to achieve this outcome? Do you plan on using something like this in the future? Tell me about it in the comments below!
I ran into a unique challenge this past week when it came to managing time zones in Power Query. I have a dataset that brings a dataset in using Universal Coordinated Time (UTC). Like a majority of the world, my key stakeholders do not reside in the UTC time zone. On top of that, they wanted to use multiple time zones, so this could become quite complicated.
You might be reading this article and asking why would this be such a complex issue? Well there are a couple of factors that you need to consider. Naturally you need to consider the time zone you need to use. In addition, you must consider daylight savings time. In the United States, every state except Arizona observes daylight savings time. As a result, I have to adjust my model accordingly. And while 99% of my work is based in the United States, many other countries across the world observe Daylight Savings Time.
To add to the complication, one of the required time zones was for Indian Standard Time (IST). If you are not familiar with the IST time zone, there is an additional 30 minute offset from UTC. In short, if it is 1:00 AM UTC, then it is 6:30 AM IST (UTC +5:30).
Since I know I am not the only person with this challenge, I thought it would be helpful to share my experience and how you might be able to work around this issue.
Convert UTC to IST Time Zone
Let’s start with an easy conversion. When setting the IST time zone, you do not have to worry about daylight savings time. This means it is an easy conversion!
To get started, we are going to create a custom column in Power Query and convert our date/time in the UTC time zone. However, the dataset does not assign the time zone to the timestamp. We will assign it using the DateTime.AddZone() function. The function accepts three parameters:
Date/Time value (in our case, the Timestamp column)
Hour offset
Minute offset
=DateTime.AddZone([Timestamp], 0, 0)
This formula will replicate your current timestamp and add “+00:00” at the end to show no offset from UTC. Now that we have a time zone assigned, we need to convert to IST. We will wrap our last function with the DateTimeZone.SwitchZone() function to adjust the offset. The parameters are the same as the function above.
Now the column has been updated with “+05:30” at the end, but the time has been incremented by the same amount. It is that easy!
Now, you might not want the time zone offset showing in the report canvas. We can easily remove that from the view by wrapping the last formula with DateTimeZone.RemoveZone()
Now there is just one pesky problem left – our new column is shown as a string instead of date/time. This is an easy fix! Just crack open your advanced editor and add the column type on the same step!
Custom Column = Table.AddColumn(#"Changed Type", "TimestampIST", each DateTimeZone.RemoveZone(DateTimeZone.SwitchZone(DateTime.AddZone([Timestamp], 0, 0), 5, 30)), type datetime)
Yes, you can update the column type with the next step. But I am a fan of keeping things compact and clean. Since you are already doing the transformation, it is an easy add to save some compute time!
If your time zone is not IST, you can look up the offset here. You just need to update the code above with the correct offset.
What About Daylight Savings Time?
The biggest challenge is to know what rules define the time shift. In the United States, our formula is simple. Daylight Savings Time starts on the second Sunday in March and ends on the first Sunday in November. So how do we put that knowledge to work?
One option is to add a field on your date dimension to show if daylight savings time applies. That works, but it requires maintenance long term. Another option is to create a table of dates, but once again it is a lot of work to keep current.
But there is a way to scale this logic using a few functions in Power Query
Find the Start and End of Daylight Savings Time
Like before, let’s start off easy by finding the end of Daylight Savings Time. The first thing we need to do is establish the first day of November. We will use the #date(year, month, day) function to set the date. But remember, our goal is to make it scalable, so we need to make the year dynamic:
=#date(Date.Year([Timestamp]), 11, 1)
Now that we have the first day of the month established, we will use it to find the first Sunday of the month. We will use the Date.EndOfWeek() function to establish the last day. We will add a parameter with the value of “1” to shift the end of the week from Saturday to Sunday.
Now to find the second Sunday of March, we will use the same formula as a base. We will change the 11 to 3 for March. But there is one difference, we are wrapping this formula in Date.AddDays(). We will use that formula to add 7 days to our date which will give us the second Sunday in March.
Like I said above, your time zone might have a different set of rules. However, with some research, I found that the rules are generally the same. You will need to add or remove weeks to get to your desired date. Maybe use the End of the Month and work backwards. If it starts on Saturday, you can ignore the extra parameter. And if you are in one of the countries that observes on a Friday, just use the Date.AddDays() and work forward/back the appropriate number of days to make it work!
Apply Dates to Time Zone Offset
Now that we have the start and end of Daylight Savings Time, we just need a basic conditional statement to provide the offset. Because our start and end dates are pure dates with no time associated, we will need to do our comparison using the DateTime.Date() function to convert our timestamp.
Since I am using Eastern Time in the US, there is an offset of 5 hours from UTC, but only 4 during Daylight Savings Time. With this all in place, we now have a function that will scale with Daylight Savings Time over the years. Feels like a lot of work to get here, but it works well!
Get Fancy with a Time Zone Function
Now, you might be thinking that this could get painful if you need to provide multiple time zones over and over again. For those of us in the United States, we are using the same logic for several time zones. So instead of doing everything manually, you can create a function to make your life easier!
Using the same logic, the function will accept two parameters: the timestamp and the offset. We then can use a few lines of code and make a function that we can reuse over and over:
(TimestampValue as datetime, TimeOffset as number) =>
let
TimestampDate = DateTime.Date(TimestampValue),
TimestampYear = Date.Year(TimestampValue),
Source = DateTimeZone.RemoveZone( if TimestampDate >= Date.AddDays(Date.EndOfWeek(#date(TimestampYear, 3, 1), 1), 7) and TimestampDate < Date.EndOfWeek(#date(TimestampYear, 11, 1), 1) then DateTimeZone.SwitchZone(DateTime.AddZone(TimestampValue, 0, 0), TimeOffset + 1, 0) else DateTimeZone.SwitchZone(DateTime.AddZone(TimestampValue, 0, 0), TimeOffset, 0))
in
Source
With a few substitutions, I can simplify the logic and make it easier to understand. Now I can just invoke the function by using the timestamp and adding the offset. Not too shabby!
Anything Else to Consider?
This solution is just using dates and ignores the official time when the change takes place. In my use case, the precision was not that important when we are looking at 4 hours out of 8,760 hours in a year. If you need that level of precision, you will need to add time to the logic in the formulas.
I will say, this is a gaping hole in Power BI. With Power Automate, you are able to perform precision conversions based upon time zones much easier. A majority of my work just uses dates and ignores time, but there are use cases like audit data that we need this information. Hopefully Microsoft will add functionality here one day. However, users have remarked about this challenge for years. I might be better off waiting for the United States to pass a law eliminating Daylight Savings Time. But that is not looking too good either…
Anyway, how have you handled this in the past? Do you have a different approach? If so, tell me in the comments below!