Copilot Reports have been a challenge for a while, but Microsoft Fabric has made life so much easier! In last week’s article, we went through the process to start harvesting data from our endpoints. This week is about getting the model in place so we can deploy an easy-to-use report to track utilization trends.
While this process is a little tedious, this should make your life much easier. With a report that is already designed with the key metrics and outcomes built for you, it should only take minutes for you to find some key insights. Just make sure you pay attention to the details and everything should work out smoothly!
Add Tables to your Semantic Model
The first step we need to complete is the process of adding tables to your default semantic model. This is a critical step to ensure you have the right data available to you. We will start by going to your workspace and opening your SQL analytics endpoint tied to your Lakehouse:
Once, opened, we can go to the Modeling tab and manage your default semantic model. Once the new window appears, select the tables holding your data within the Lakehouse and click confirm:
Now that we have your tables in place, it is time to setup some relationships to enable our reports.
Building Your Relationships
Now that we have our tables selected, we need to connect them together. If you have been using Power BI already, you know how this works. In our situation, we will need to drag our tables to the model layout so we can use it for our Copilot Reports:
You will note that I added all tables except one. The collaboration_levels table was intentionally omitted because it will not have any relationships. This table is purely for labeling purposes only and will not directly interact with the tables displayed in this layout.
Connecting the Date Dimension for Copilot Reports
To get started, we will connect our date_table to our different activity tables. To make this simple, all of our relationships will use the same column name. We just need to connect the dots. We will simply connect date_table[date] to the following columns with a one-to-many relationship:
onedrive_user_activity[Report_Refresh_Date]
teams_user_activity[Report_Refresh_Date]
sharepoint_user_activity[Report_Refresh_Date]
viva_engage_user_activity[Report_Refresh_Date]
copilot_last_activity[Report_Refresh_Date]
You will note that I left one table out of the mix. We will need to create a relationship with the copilot_data table, but this one will be a little different. For this one, we need to make sure the relationship is inactive:
This might not make sense at the moment, but it will work out later. I promise! With these relationships in place, your model layout should look something like this:
With our date dimension in place, it is time to build our user data.
Connecting the User Dimension for Copilot Reports
Our graph_users table holds a lot of valuable information for deeper analysis. For example, we can provide dimensional analysis based on a user’s department, office, company, manager, or even job title! This will obviously only be as effective as the data associated with your user within Entra ID. However, many organizations find this as a great place to start.
The process of building these relationships is far easier. Both sides of the relationship use the [User_Principal_Name] column, so we just need to build out our connection. To make this happen, we will create a one-to-many connection between graph_users[User_Principal_Name] with the following columns:
onedrive_user_activity[User_Principal_Name]
teams_user_activity[User_Principal_Name]
sharepoint_user_activity[User_Principal_Name]
viva_engage_user_activity[User_Principal_Name]
Wait a minute! What about the copilot activity data tables?! Yes, I did exclude them for the moment. The reason is we need to create a bi-directional relationship our user dimension and these tables as it will aid in some of our reporting:
We are getting close, but we have one last relationship to create.
Connecting the License Dimension for Copilot Reports
The last dimension we are going to provide is the ability to filter users based upon what licenses they have assigned to them. For some organizations, there is a split in users between enterprise and frontline licenses. For others, they want to be hyper focused on only Copilot for Microsoft 365 users.
Regardless of your use case, this could provide some additional insight. To make things easier for us, we will simply create a relationship between the licensing_data and graph_users table with the [User_Principal_Name] column on both sides. This will also be a bi-directional relationship.
With our relationships in place, our model should look something like this:
At this point, please double and triple check your relationships to make sure nothing is missing or incorrectly mapped. If you do not have this in place, next week’s article is going to be painful as you will be coming back to this article.
Modeling Choices for our Copilot Reports
So why did I make some of the choices I did with relationships? Well, I can tell you that part of this is because of the challenges I have experienced already with the stock labels. For example, I have experienced some volatility with the labeling of copilots in the copilot_data table prior to the release of the API endpoints. To avoid having to make these changes in the future, I have taken a few steps to ensure this is not an issue for the future.
I also fully expect this solution to be obsolete at some point in the future. When the announcement of the Copilot Reports endpoint was made, I was hoping this would provide some more granular results. For whatever reason, this has not happened yet. However, I fully expect this to eventually fall in line with the other reporting endpoints in the future. No clue on the timeline and I won’t speculate on it, but I have to imagine that this will happen eventually. Therefore, I hope to be writing an article that says this series is obsolete and you should be using the new version of the endpoint.
Fun Fact About This Article
I think one of the biggest aspects of leveraging Microsoft Fabric is the power to build everything within a web browser. My favorite part of writing this article is that I did it completely at 37,000 feet in the air! If you are a fan of Chick Corea, I just heard 500 Miles High on repeat in my head as I wrote this article.
While the soundtrack is fun, I just wanted to highlight the power and ability to manage this project all while in the air. While working on airplane might not be your thing, it is still amazing to see what we can get done with a limited internet connection. Naturally, your milage may vary by airline, but still pretty cool in my book!
Next Steps
At this point, we are harvesting our data and we have our model set. By next week, we should have three weeks of data that we can trend and start to provide better Copilot for Microsoft 365 data points to review. I am so excited to share this and look forward to publishing that article soon!
Over the past year, I have been helping customers with their Copilot for Microsoft 365 deployments. The number one complaint I have received from customers has to do with the ability to provide quality Copilot Reports. If you are reading this, you are probably not alone. However, some recent changes have come about that will help us get better reporting.
While in preview at the moment, Microsoft released reporting APIs for Copilot User Activity. While these endpoints are in preview with a target release of November 2024, we can start taking advantage of them today. The biggest challenge is that in order to provide some quality reporting, we will need to start populating data. To make this a reality, this week’s post will be 100% focused on populating data in your Fabric Lakehouse.
It is important to note that I cover a lot of the mechanics of how to make this happen in previous posts. I have linked to the appropriate posts where applicable but wanted to highlight this now. There are a lot of moving pieces to make this happen and there is some prerequisite knowledge for you to be successful. Please take the time to reference these posts when applicable.
Elephant in the Room with Copilot Reports
Before we get started, we need to address the headwinds with Copilot Reports. If you know me personally, you know I am a balls and strikes kind of person. I call things as I see them and sometimes that means calling out some of the issues that come with new solutions.
Challenges with the Data
If you are a data person like me, you find the reports available from Microsoft to be frustrating. They are detailed with specific behaviors inside of Viva Insights, but you cannot drill down enough to see where use cases are being leveraged. The only way to look at individual user activity has been with the Microsoft 365 Admin Reports for Copilot. This data is what the API endpoints are pulling at this time. While the individual detail is nice, it does not resolve the frustration we have been seeing.
The reason for this frustration is twofold. Unlike the endpoints we explored in my previous series, these endpoints have two drawbacks. The first is that I cannot identify activity for a specified date. I can only pull for a period of activity. Second, I do not get a count of activity for each use of Copilot. Instead, I just get a date for the last time Copilot was used. Most of my customers have been vocal about this second point.
Overcoming the Challenges
I will not speculate on why Microsoft has chosen this path. Instead, I pushed to find a way to work around the challenge and make something happen. Instead of focusing on individual days of data, we will need to look at data week over week. Is this an imperfect system? Absolutely! But I rather focus on what I can do than what I cannot do. Previously, I was able to do this by exporting this data on a weekly basis. Annoying? Yes, but now we can automate it with the advent of the new reporting endpoint. And I was even able to enhance other elements of my analysis as a result!
This all means our reporting will simply look at user activity week over week with Copilot. This will help us drive the value we seek in some meaningful way instead of guessing at how people are leveraging Copilot. My customers have found this insight valuable in driving better adoption which results in stronger utilization.
Foundations of Copilot Reports
If you followed along for my series on Graph API reports, you should already be familiar with the key steps to get things going for this project. While we did a lot of this in that series, we will be revisiting parts of it again. To simplify the process, we will go through high level the steps we need to complete. If you need a primer on this, please go back to my previous series to go through the details on how to prepare a Lakehouse in Microsoft Fabric.
Build a Fabric Lakehouse
To get started, sign into Microsoft Fabric and create a workspace for your Copilot Reports. You will likely want to share these reports with others and it might make sense to create a dedicated workspace for this. The key point here is that we will want a new workspace to streamline the data collection process as it will be changing.
With our Lakehouse in place, we are ready for the next step in the process.
Creating Tables in the Lakehouse
Just like we did before, we will use a Fabric Notebook to create our tables. To get started, download the Create Graph Tables notebook I created for you. Remember, that you must swap your context to the Data Engineering view before you can import the notebook.
Once imported, you can open the notebook and connect it to your new Lakehouse. From there, you can run the notebook. This process will take a few minutes, but will hopefully make some parts of this process much easier for you long term.
With our tables connected, we need to start harvesting data from Microsoft Graph.
Harvesting Data for our Copilot Reports
Now that we have our foundation in place, it is all about getting our data liberated from Graph and into the Lakehouse. To get this process started, download the Copilot Reporting Template I created for your Dataflow. Once downloaded, create a Dataflow Gen2 and import your template:
Once imported, you will need to update your parameters with your Service Principal information. If you never did this before, follow the steps I outlined in my Reporting with the Graph API post a few weeks back to ensure you have one configured properly. If you do, just update the parameters:
Now that you have updated your Service Principal information, you just need to update your connection properties ensuring all endpoints are using an anonymous method as we are using the Service Principal to authenticate. From there, you just need to map your data to your new Lakehouse:
We will need to map the following tables with the following properties:
Query Name
Lakehouse Table Name
Behavior
GraphUsers
graph_users
Replace
Date_Table
date_table
Replace
LicensingData
licensing_data
Replace
TeamsUserActivity
teams_user_activity
Append
OneDriveUserActivity
onedrive_user_activity
Append
SharePointUserActivity
sharepoint_user_activity
Append
VivaEngageUserActivity
viva_engage_user_activity
Append
CopilotData
copilot_data
Append
CopilotLastActivity
copilot_last_activity
Append
Table mappings for our Dataflow to our Fabric Lakehouse
Once this is completed, we can publish our Dataflow.
Next Steps
Now that we have all of this in place, it is time to setup the refresh. If you remember from the beginning, we will be looking this data on a weekly basis. Therefore, we will want to setup the refresh to pull the data weekly. For myself, I setup the refresh to occur Sunday mornings at 9 AM as that is a quiet time of the week. It also gives me the best shot at pulling the data on the same day of the week. Remember, we are at the mercy of Microsoft and when they are releasing the data, so we have to somewhat hope for the best in this situation.
With the refreshes running weekly, we will start to collect some meaningful data we can use. Next week, we will work on preparing the semantic model to allow us to connect our Power BI report. Until then, we will keep harvesting data to help bring better value to our report!
Welcome back to my series on the Microsoft Graph API. We have spent a lot of time laying the groundwork for our data. In this article, it is all about getting our data into our Lakehouse so we can start to trend our data.
As a quick reminder, the purpose of us having to go through this exercise is because we must retain data on our own. The Microsoft Graph API will only furnish detailed data over the past thirty days. Any further back and it is lost. Therefore, if we want to trend data over three months, we must warehouse it.
In this article, we will be finalizing our queries and load them into our Lakehouse. To get there, we are going to go through some deep levels of M Query. However, hang in there until the end, I should be able to make it easier for you!
Getting Teams User Activity Detail
To get started, we are going to have to write out some code. While you can just copy/paste the code into this article, I think it is worth walking through together to help you understand the method behind the madness. It seems complicated – and it really is. But understanding the rationale of what was done will help you with your own projects. To help us on this journey, you will need to create three more parameters to simplify the process:
Parameter Name
Parameter Type
Parameter Value
LakehouseId
Text
Value of Lakehouse ID from URL
WorkspaceId
Text
Value of Workspace ID from URL
HistoricalDayCount
Decimal Number
List of values (5, 15, 27) with the default set to 5
Parameters required for capturing Microsoft Graph API data
With your parameters in place, it is time to write out our query. Open a new blank query and strap in!
Start With a Function
Because we are pulling data for a single date at a time, we will need to create a function. We have done this before, but this time we are going to embed it within an existing query:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData
in
GraphDataFunction
We need to call a few things out. First and foremost, you will note we are using the RelativePath option. This is because our dates will be updated with each all of the function, therefore we must use this option to avoid errors.
Next, you will notice the Authorization header is using our Get-AuthToken function we created in part 2 of this series. This is key to ensure we use our token to complete the authentication process.
Checking the Lakehouse for Existing Data
We will need to ensure we only query Graph data that is not already in the Lakehouse. Because of how finicky the Graph API can be, we cannot guarantee data to be available the next day. Sometimes it is a few days behind. We also do not want duplicate data to be added to the Lakehouse.
To prevent this from happening, we will query the Lakehouse and the tables we created. By using the parameters we created, we will call our Lakehouse and pull our historical data so we can transform it:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData,
LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"})
in
HistoricalDatesExported
These five lines of code will allow us to connect to our teams_user_activity table, remove unnecessary columns, remove duplicates, and format into a usable column. This will be empty for right now, but more will be added as we move forward.
Create a Range of Dates and Merge with Historical Data
Next, we will pull data from the last few days. To make life easier on us, we will limit the pull to five days to speed up our development process. We just need to use today’s date as a base and create a range going back five days that we can use:
Once we have a list of dates in place, we will merge it to our historical data from the HistoricalDatesExported step. From there, expand the values and filter out the null values so we only have dates where we are missing data left.
Execute the Microsoft Graph API Function
Reaching all the way back to the first step, GraphDataFunction, we are going to pass our list of dates through to get our data. We will pass our column and pull our data through:
let
GraphDataFunction = (ExportDate as date) =>
let
GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/",
[RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
in
GraphData,
LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], 33)),
RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", {"Report Refresh Date", " User Id", " User Principal Name", " Last Activity Date", " Is Deleted", " Deleted Date", " Assigned Products", " Team Chat Message Count", " Private Chat Message Count", " Call Count", " Meeting Count", " Meetings Organized Count", " Meetings Attended Count", " Ad Hoc Meetings Organized Count", " Ad Hoc Meetings Attended Count", " Scheduled One-time Meetings Organized Count", " Scheduled One-time Meetings Attended Count", " Scheduled Recurring Meetings Organized Count", " Scheduled Recurring Meetings Attended Count", " Audio Duration In Seconds", " Video Duration In Seconds", " Screen Share Duration In Seconds", " Has Other Action", " Urgent Messages", " Post Messages", " Tenant Display Name", " Shared Channel Tenant Display Names", " Reply Messages", " Is Licensed", " Report Period"}),
FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
FilterMissingResults
After we execute the function, we will combine our files and expand the results. To do this, we are using a custom function titled Transform file to combine our files seamlessly:
(Parameter as binary, ColumnCount as number) => let
Source = Csv.Document(Parameter, [Delimiter = ",", Columns = ColumnCount, Encoding = 65001, QuoteStyle = QuoteStyle.None]),
#"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
#"Promoted headers"
The result should be something that looks like this:
Wow – that was a whole lot for a single endpoint. Only four more applications left! Frustrated that you will have to do this again? Well, instead of being frustrated, let’s make it a little easier.
Turning Microsoft Graph API Reports into a Flexible Function
We have worked so hard to get the first query in place. But if we think through things a little further, we can simplify the process for future endpoints. Instead of going through this entire process over and over again, we can convert this to a function and pass data to make it easier for us to use.
Prepare Your Endpoint Data
To simplify our process, we will start by creating a table we can reference. This will highlight the application, Microsoft Graph API endpoint, Fabric Lakehouse table, columns pulled from the endpoint, and the column names. Create a blank query called Endpoints and add the following code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3VXbbtswDP0Vwc/pQ9cvCOIMDZCtRt0FGIIiECzOIWpTgURny75+lOPc5aB+3YttkkeUeXRILZfJtNZYJaMEwnvVeHArXTBukXfiLYFbwLhz/ZB4Crxf8vgkj1fYWMfqFX458GuVaoaRCjCVOaQCN7pS33UtzhT9ptK7zpp5lUIFDGZ0+OgWz7Vnddiw8+VARk1sQzySrQrALZxMfQx9A2CkUk0c6JDwyj0jBifVnSJj77EksTNnTVOwD/nagjJwaE3yPlomLwSpkx2lWivY8Bkj6gCLc/U4iKth7CwQfgvgxampwQD9itWRn3xHxbVrrZ24WjpIV9UuFp3+iUY/xVibIrNILFX7YGyCEWPtBI3z9uU/4W2Bvt0i0+UwLt9A115I4PCOMdgCAhtRAp/uN+nM9DAZ46uf3Ugl4bfUZK1Z2s/7s6plp60sisYmwttNM1+YXo6r1IR/b9rbqzGzzImz7jbq2RZ3Vl4DrhPkxRpMUwWNEDww1nAn2T1wf2KZZo1zoc7PZI6gb4puDFqVNk4zWhKxyvAsLBkfNGigJ5QXDoD2Go4jnrX8Ga9FKkEVlkQ3rgQ6HqFAMuvPzTcgLfHLud91iRw+EVQxzL4LpH1OmUR4cyyAfFDebYssRFBqSqWARexbsVbQWrF2+anruqdXBg7py8JaSM7764o7gZNIu3/+BL7AxC6yOX4MuaTe/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Application = _t, Lakehouse_Table = _t, Graph_Endpoint = _t, Column_Count = _t, Column_Mapping = _t]),
#"Changed column type" = Table.TransformColumnTypes(Source, {{"Column_Count", Int64.Type}, {"Application", type text}, {"Lakehouse_Table", type text}, {"Graph_Endpoint", type text}, {"Column_Mapping", type text}})
in
#"Changed column type"
You can choose to pull this data from somewhere else if it is easier for you, but this table is easy to maintain in this scenario.
Convert our Query to a Function
Our next step is to take the first query we created in this article and make it a function. We will start by adding the parameter for an application at the top and filter the Endpoints table with the value passed in the parameter:
From there, we will call the data from the EndpointData step and inject it into our query with the following items:
In the GraphDataFunction step, place a dynamic endpoint name
In the HistoricalExports step, place a dynamic Lakehouse table name
In the MergeExportedFiles step, place a dynamic column count in the Transform file function
In the ExpandResults step, place dynamic column mappings and wrap it with a Text.Split() function
Now that is completed, we can call the function by typing in the application name and clicking invoke:
With this in place, we can invoke our function for OneDrive, SharePoint, Email, and Viva Engage with a few clicks!
Get Current Users from Entra ID
For our last step, we just need to pull our users out of Entra ID into our Dataflow. To make this happen, I like to use a function to make it easier. I created a function titled Get-Users and invoked it with this code:
() =>
let
GetUserDetail = (Params) =>
let
Path = if Text.Length(Params) > 0 then Params
else "$select=displayName,givenName,jobTitle,mail,officeLocation,surname,userPrincipalName,id,accountEnabled,userType,companyName,department,employeeType&$expand=manager",
Source = Json.Document(Web.Contents("https://graph.microsoft.com/v1.0/users",[RelativePath = "?" & Path, Headers=[Authorization="Bearer " & #"Get-AuthToken"()]])),
NextList = @Source[value], result = try @NextList & @GetUserDetail(Text.AfterDelimiter(Source[#"@odata.nextLink"], "?")) otherwise @NextList
in
result,
UserDetail = GetUserDetail("")
in
UserDetail
You will notice the function inside of the function. This is because we will need to iterate through the entire user list to receive all users. Because the API limits us to batches of 1,000, we will need to work through this a few times to get the results we desire.
HELP! THIS IS TOO MUCH!
Is your head spinning? Mine too. In fact, this was a lot to even get this coded. And if I ever want to reuse this, I do not want to go through any of this level of effort again. I am probably going to be like that math teacher you had where you learned how to do everything the hard way to pass an exam and then the next chapter they teach you a way to complete a problem easier. I hated it when that happened to be, but I am still going to do it to you.
To streamline this, I created a Power Query template for this project. You can download the template and import it into your new Dataflow:
Once imported, you will need to update the following parameters:
TenantId
AppId
ClientSecret
WorkspaceId
LakehouseId
This should streamline the process considerably if you are just getting started!
From Dataflow to Lakehouse
We now just need to map our data into our Lakehouse. This will take a little diligence, but it should move along pretty quickly. We just need to match our queries and columns from our Dataflows into tables within our Lakehouse.
Adding Teams Data to the Lakehouse
Starting with our Teams user activity data, we will go to the bottom right-hand corner of the screen and click the + icon to add it to our Lakehouse. Once you have a connection to your Lakehouse, we will use an existing table and navigate to the table where your data is stored:
Once we have selected our table, we need to map our columns. We also need to update the setting that will allow us to append our data:
Now we just need to save our settings. Once we publish and refresh our dataflow, our data will be added into the Lakehouse. We just need to repeat these steps for OneDrive, SharePoint, Email, and Viva Engage.
Adding Graph Users to the Lakehouse
If you remember from my last article, we created all of the table for our Microsoft Graph API report endpoints. However, we did not need to create anything for our users. Why is that?!
The answer is simple – with our approach, we will optimize the refresh experience and eliminate data duplication for these dates. This is because it is a moving target and hard to hit. However, with our users, it is a different story as we will replace our data each time. So, just like we did before, we will go ahead and add our Lakehouse as a data destination. However, this time we will create a new table. Once in place, we can make adjustments to our mappings and settings:
Once completed, we can click save and move to publishing our Dataflow.
One More Thing Before We Publish!
Last thing we need to do is adjust our HistoricalDayCounter from 5 days to 27 days. This will ensure that all the historical data available is captured in the refresh and will be loaded into the Lakehouse:
Once completed, we can click publish. It will take a few minutes to validate the Dataflow. After that, we just need to setup a daily refresh and our data will start to populate into the Lakehouse!
Next Steps and Conclusion
Alright, that was a ton of information in a single article. It was hard to split up, but I wanted to keep it together in a single article as these pieces were all interconnected. It is a lot, but it is complete. Now that you have your data, you can create your model, measures, and start building out your reports.
In other news, I have been able to get the endpoint for Copilot. However, there are some data quality challenges that I want to address before doing a full write-up. In the meantime, this will get you started with understanding how complicated this level of reporting can be.
Did you make it to the end? Were you successful? If so, tell me about it in the comments below!
In my last post, I went over they why and covered the building blocks for connecting to the Microsoft Graph API for reporting. With this in place, it is time to prepare the landing zone for our data in Microsoft Fabric! To achieve this, we will need to setup our Fabric environment so we can start to build out our solution.
Before we get started, if you do not have Microsoft Fabric, do not fear! You can start a Microsoft Fabric trial for 60 days for free. When you sign into Power BI, you can click on your user avatar, and start your trial. If this is not available to you, it might be because your Fabric Administrator has turned off the trial feature. You might need to submit a request to re-enable this feature.
Creating Your Lakehouse
To get things started, we need to create a new workspace that has a Fabric capacity assigned to it. Just like you would before with Power BI, create a new workspace inside of Microsoft Fabric and ensure it is tied to your capacity.
Once you have your workspace in place, it is time to build your Lakehouse. If you go to the new button in the upper left hand corner of the workspace window, a screen will appear where you can select a Lakehouse:
From there, you can give your Lakehouse a name and click create. You will be redirected to your new Lakehouse. Once created, we will need to extract the Workspace ID and Lakehouse ID from the URL. The easiest way to capture this is to just copy the URL in the address bar to a notepad where you kept your Service Principal data in part 1 of this series. Once you capture this information, you can go can close your Lakehouse:
Now that you have a Lakehouse in place, it is time to prepare the tables. While there is an opportunity to create these tables directly out of our Dataflow, we will want to create a few ahead of time. This is simply to reduce the amount of effort on your side. The goal is to make it easy, but the process can be complex. This should make the process a little more seamless for you as a result.
While I am going to give you a Notebook you can upload, we will pause and create one from scratch so you understand the process. When you click new item in your workspace, we will scroll down to prepare data and select Notebook:
We will add our Lakehouse to the Notebook on the left hand side. Once in place, we can add our code using PySpark into a cell:
Using documentation from the Microsoft Graph API, I was able to identify the endpoints and their associated fields. This allowed me to create a block of code that would create our teams_user_activity table:
In our scenario, we will need five tables for Teams, OneDrive, SharePoint, Email, and Viva Engage. To simplify the process for you, I have created an notebook you can upload into Fabric and run. You just need to connect your Lakehouse and select run all. To perform this task, you must switch to the Data Engineering context of Fabric and select Import on your workspace:
Once you have run the script successfully, you can go to your SQL Endpoint for the Lakehouse and verify the tables have been created and that they are empty with the following query:
SELECT * FROM teams_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM onedrive_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM sharepoint_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM email_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM viva_engage_user_activity ORDER BY Report_Refresh_Date DESC
Preparing Your Dataflow
After creating your Lakehouse and preparing your tables, you can create a Dataflow in the same workspace. This will query the data and push it to the right tables within our Lakehouse. However, you need to be careful to select a Dataflow Gen2 as opposed to a Dataflow:
Adding a Dataflow Gen2 to your Fabric workspace
This is critical for us to ensure our data makes it into the Lakehouse later on, so be careful with your selection! Once you select it, you can rename your Dataflow in the upper left hand corner:
With your Lakehouse and Dataflow created, it is time to start getting some data.
Connecting to the Microsoft Graph API
If you remember from my last article, we will be using the application permissions with our app registration. As a result, we must acquire a OAuth2 token to facilitate the authentication process. Luckily, we made it easy for you to deploy this in your solution.
Adding Parameters in Power Query
I covered the basics of Parameters in Power Query in an older article. Instead of going through the details, I will just highlight what we need to have created.
To facilitate our authentication, we need to create three parameters that will be used to hold data from our app registration. The three parameters we need are as follows:
Parameter Name
Parameter Type
Parameter Value
TenantId
Text
Directory (tenant) ID
AppId
Text
Application (client) ID
ClientSecret
Text
Client secret value
Parameters for our authentication process
Remember, capitalization counts! If you copy/paste the parameters above, you will not have an issue with the next step. It seems easy, but take great care with this process!
Creating an Authentication Function
Once you have your parameters set, you need to create a new blank query. To create what we need to make this connection happen, it requires you to build out the query in code. However, I have made that easier for you. Just copy the code below and paste it into the advanced editor with your new query:
Once we have pasted the code into the query, click next. After that, rename the query to Get-AuthToken. Again, capitalization is important so take care with this step!
Now that we have our authentication function in place, it is time to take the next step towards building out our Microsoft Graph API functions to pull user information and Teams user activity.
Conclusion and Next Steps
There is a lot of setup involved in this process. Part of the reason is because of the complexity of the pull we need to perform, but also to avoid running into duplicates and reduce the chance for throttling with the endpoints. I know this seems like a lot of work, but I promise it will pay off in the end!
In my next post, we will finally grab our Graph API data and load it into our Lakehouse! Stay tuned to see how it happens!