Category: Dataflows

Copilot Reports with Microsoft Fabric – Part 1

On November 5, 2024

In Copilot for Microsoft 365, Dataflows, Lakehouse, Power Query

Over the past year, I have been helping customers with their Copilot for Microsoft 365 deployments. The number one complaint I have received from customers has to do with the ability to provide quality Copilot Reports. If you are reading this, you are probably not alone. However, some recent changes have come about that will help us get better reporting.

While in preview at the moment, Microsoft released reporting APIs for Copilot User Activity. While these endpoints are in preview with a target release of November 2024, we can start taking advantage of them today. The biggest challenge is that in order to provide some quality reporting, we will need to start populating data. To make this a reality, this week’s post will be 100% focused on populating data in your Fabric Lakehouse.

It is important to note that I cover a lot of the mechanics of how to make this happen in previous posts. I have linked to the appropriate posts where applicable but wanted to highlight this now. There are a lot of moving pieces to make this happen and there is some prerequisite knowledge for you to be successful. Please take the time to reference these posts when applicable.

Elephant in the Room with Copilot Reports

Before we get started, we need to address the headwinds with Copilot Reports. If you know me personally, you know I am a balls and strikes kind of person. I call things as I see them and sometimes that means calling out some of the issues that come with new solutions.

Challenges with the Data

If you are a data person like me, you find the reports available from Microsoft to be frustrating. They are detailed with specific behaviors inside of Viva Insights, but you cannot drill down enough to see where use cases are being leveraged. The only way to look at individual user activity has been with the Microsoft 365 Admin Reports for Copilot. This data is what the API endpoints are pulling at this time. While the individual detail is nice, it does not resolve the frustration we have been seeing.

The reason for this frustration is twofold. Unlike the endpoints we explored in my previous series, these endpoints have two drawbacks. The first is that I cannot identify activity for a specified date. I can only pull for a period of activity. Second, I do not get a count of activity for each use of Copilot. Instead, I just get a date for the last time Copilot was used. Most of my customers have been vocal about this second point.

Overcoming the Challenges

I will not speculate on why Microsoft has chosen this path. Instead, I pushed to find a way to work around the challenge and make something happen. Instead of focusing on individual days of data, we will need to look at data week over week. Is this an imperfect system? Absolutely! But I rather focus on what I can do than what I cannot do. Previously, I was able to do this by exporting this data on a weekly basis. Annoying? Yes, but now we can automate it with the advent of the new reporting endpoint. And I was even able to enhance other elements of my analysis as a result!

This all means our reporting will simply look at user activity week over week with Copilot. This will help us drive the value we seek in some meaningful way instead of guessing at how people are leveraging Copilot. My customers have found this insight valuable in driving better adoption which results in stronger utilization.

Foundations of Copilot Reports

If you followed along for my series on Graph API reports, you should already be familiar with the key steps to get things going for this project. While we did a lot of this in that series, we will be revisiting parts of it again. To simplify the process, we will go through high level the steps we need to complete. If you need a primer on this, please go back to my previous series to go through the details on how to prepare a Lakehouse in Microsoft Fabric.

Build a Fabric Lakehouse

To get started, sign into Microsoft Fabric and create a workspace for your Copilot Reports. You will likely want to share these reports with others and it might make sense to create a dedicated workspace for this. The key point here is that we will want a new workspace to streamline the data collection process as it will be changing.

Screenshot of my new Fabric Lakehouse for my Copilot Reports — *New Fabric Lakehouse for my Copilot Reports*

With our Lakehouse in place, we are ready for the next step in the process.

Creating Tables in the Lakehouse

Just like we did before, we will use a Fabric Notebook to create our tables. To get started, download the Create Graph Tables notebook I created for you. Remember, that you must swap your context to the Data Engineering view before you can import the notebook.

Screenshot of the Data Engineering view inside of Microsoft Fabric where we will import our notebook. — *Swapping to the Data Engineering view to import your Notebook*

Once imported, you can open the notebook and connect it to your new Lakehouse. From there, you can run the notebook. This process will take a few minutes, but will hopefully make some parts of this process much easier for you long term.

Screenshot of our notebook creating tables in our Fabric Lakehouse for our Copilot Reports — *Creating tables in our Fabric Lakehouse*

With our tables connected, we need to start harvesting data from Microsoft Graph.

Harvesting Data for our Copilot Reports

Now that we have our foundation in place, it is all about getting our data liberated from Graph and into the Lakehouse. To get this process started, download the Copilot Reporting Template I created for your Dataflow. Once downloaded, create a Dataflow Gen2 and import your template:

Screenshot of the process to import a Power Query template in a Dataflow. — *Importing my Power Query template in a Dataflow*

Once imported, you will need to update your parameters with your Service Principal information. If you never did this before, follow the steps I outlined in my Reporting with the Graph API post a few weeks back to ensure you have one configured properly. If you do, just update the parameters:

Screenshot of the Dataflow with the required parameters that need to be updated for your Copilot Reports — *Update your parameters with your Service Principal details*

Now that you have updated your Service Principal information, you just need to update your connection properties ensuring all endpoints are using an anonymous method as we are using the Service Principal to authenticate. From there, you just need to map your data to your new Lakehouse:

Screenshot of us mapping data from our Dataflow to our new Fabric Lakehouse — *Mapping our data to our Lakehouse*

We will need to map the following tables with the following properties:

Query Name	Lakehouse Table Name	Behavior
GraphUsers	graph_users	Replace
Date_Table	date_table	Replace
LicensingData	licensing_data	Replace
TeamsUserActivity	teams_user_activity	Append
OneDriveUserActivity	onedrive_user_activity	Append
SharePointUserActivity	sharepoint_user_activity	Append
VivaEngageUserActivity	viva_engage_user_activity	Append
CopilotData	copilot_data	Append
CopilotLastActivity	copilot_last_activity	Append

Table mappings for our Dataflow to our Fabric Lakehouse

Once this is completed, we can publish our Dataflow.

Next Steps

Now that we have all of this in place, it is time to setup the refresh. If you remember from the beginning, we will be looking this data on a weekly basis. Therefore, we will want to setup the refresh to pull the data weekly. For myself, I setup the refresh to occur Sunday mornings at 9 AM as that is a quiet time of the week. It also gives me the best shot at pulling the data on the same day of the week. Remember, we are at the mercy of Microsoft and when they are releasing the data, so we have to somewhat hope for the best in this situation.

With the refreshes running weekly, we will start to collect some meaningful data we can use. Next week, we will work on preparing the semantic model to allow us to connect our Power BI report. Until then, we will keep harvesting data to help bring better value to our report!

Microsoft Graph API Reporting in Fabric – Part 3

By Dominick Raimato

On October 10, 2024

In Dataflows, Fabric, Lakehouse, Power Query

Welcome back to my series on the Microsoft Graph API. We have spent a lot of time laying the groundwork for our data. In this article, it is all about getting our data into our Lakehouse so we can start to trend our data.

As a quick reminder, the purpose of us having to go through this exercise is because we must retain data on our own. The Microsoft Graph API will only furnish detailed data over the past thirty days. Any further back and it is lost. Therefore, if we want to trend data over three months, we must warehouse it.

In this article, we will be finalizing our queries and load them into our Lakehouse. To get there, we are going to go through some deep levels of M Query. However, hang in there until the end, I should be able to make it easier for you!

Getting Teams User Activity Detail

To get started, we are going to have to write out some code. While you can just copy/paste the code into this article, I think it is worth walking through together to help you understand the method behind the madness. It seems complicated – and it really is. But understanding the rationale of what was done will help you with your own projects. To help us on this journey, you will need to create three more parameters to simplify the process:

Parameter Name	Parameter Type	Parameter Value
LakehouseId	Text	Value of Lakehouse ID from URL
WorkspaceId	Text	Value of Workspace ID from URL
HistoricalDayCount	Decimal Number	List of values (5, 15, 27) with the default set to 5

Parameters required for capturing Microsoft Graph API data

With your parameters in place, it is time to write out our query. Open a new blank query and strap in!

Start With a Function

Because we are pulling data for a single date at a time, we will need to create a function. We have done this before, but this time we are going to embed it within an existing query:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData
in
  GraphDataFunction

We need to call a few things out. First and foremost, you will note we are using the RelativePath option. This is because our dates will be updated with each all of the function, therefore we must use this option to avoid errors.

Next, you will notice the Authorization header is using our Get-AuthToken function we created in part 2 of this series. This is key to ensure we use our token to complete the authentication process.

Checking the Lakehouse for Existing Data

We will need to ensure we only query Graph data that is not already in the Lakehouse. Because of how finicky the Graph API can be, we cannot guarantee data to be available the next day. Sometimes it is a few days behind. We also do not want duplicate data to be added to the Lakehouse.

To prevent this from happening, we will query the Lakehouse and the tables we created. By using the parameters we created, we will call our Lakehouse and pull our historical data so we can transform it:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"})
in
  HistoricalDatesExported

These five lines of code will allow us to connect to our teams_user_activity table, remove unnecessary columns, remove duplicates, and format into a usable column. This will be empty for right now, but more will be added as we move forward.

Create a Range of Dates and Merge with Historical Data

Next, we will pull data from the last few days. To make life easier on us, we will limit the pull to five days to speed up our development process. We just need to use today’s date as a base and create a range going back five days that we can use:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
  ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null))
in
  IdentifyMissingData

Once we have a list of dates in place, we will merge it to our historical data from the HistoricalDatesExported step. From there, expand the values and filter out the null values so we only have dates where we are missing data left.

Execute the Microsoft Graph API Function

Reaching all the way back to the first step, GraphDataFunction, we are going to pass our list of dates through to get our data. We will pass our column and pull our data through:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
  ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
  ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
  FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
  MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], 33)),
  RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
  ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", {"Report Refresh Date", " User Id", " User Principal Name", " Last Activity Date", " Is Deleted", " Deleted Date", " Assigned Products", " Team Chat Message Count", " Private Chat Message Count", " Call Count", " Meeting Count", " Meetings Organized Count", " Meetings Attended Count", " Ad Hoc Meetings Organized Count", " Ad Hoc Meetings Attended Count", " Scheduled One-time Meetings Organized Count", " Scheduled One-time Meetings Attended Count", " Scheduled Recurring Meetings Organized Count", " Scheduled Recurring Meetings Attended Count", " Audio Duration In Seconds", " Video Duration In Seconds", " Screen Share Duration In Seconds", " Has Other Action", " Urgent Messages", " Post Messages", " Tenant Display Name", " Shared Channel Tenant Display Names", " Reply Messages", " Is Licensed", " Report Period"}),
  FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
  FilterMissingResults

After we execute the function, we will combine our files and expand the results. To do this, we are using a custom function titled Transform file to combine our files seamlessly:

(Parameter as binary, ColumnCount as number) => let
  Source = Csv.Document(Parameter, [Delimiter = ",", Columns = ColumnCount, Encoding = 65001, QuoteStyle = QuoteStyle.None]),
  #"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
  #"Promoted headers"

The result should be something that looks like this:

Screenshot of Microsoft Graph API report data. — *Screenshot of the Microsoft Graph API report data for Teams*

Wow – that was a whole lot for a single endpoint. Only four more applications left! Frustrated that you will have to do this again? Well, instead of being frustrated, let’s make it a little easier.

Turning Microsoft Graph API Reports into a Flexible Function

We have worked so hard to get the first query in place. But if we think through things a little further, we can simplify the process for future endpoints. Instead of going through this entire process over and over again, we can convert this to a function and pass data to make it easier for us to use.

Prepare Your Endpoint Data

To simplify our process, we will start by creating a table we can reference. This will highlight the application, Microsoft Graph API endpoint, Fabric Lakehouse table, columns pulled from the endpoint, and the column names. Create a blank query called Endpoints and add the following code:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3VXbbtswDP0Vwc/pQ9cvCOIMDZCtRt0FGIIiECzOIWpTgURny75+lOPc5aB+3YttkkeUeXRILZfJtNZYJaMEwnvVeHArXTBukXfiLYFbwLhz/ZB4Crxf8vgkj1fYWMfqFX458GuVaoaRCjCVOaQCN7pS33UtzhT9ptK7zpp5lUIFDGZ0+OgWz7Vnddiw8+VARk1sQzySrQrALZxMfQx9A2CkUk0c6JDwyj0jBifVnSJj77EksTNnTVOwD/nagjJwaE3yPlomLwSpkx2lWivY8Bkj6gCLc/U4iKth7CwQfgvgxampwQD9itWRn3xHxbVrrZ24WjpIV9UuFp3+iUY/xVibIrNILFX7YGyCEWPtBI3z9uU/4W2Bvt0i0+UwLt9A115I4PCOMdgCAhtRAp/uN+nM9DAZ46uf3Ugl4bfUZK1Z2s/7s6plp60sisYmwttNM1+YXo6r1IR/b9rbqzGzzImz7jbq2RZ3Vl4DrhPkxRpMUwWNEDww1nAn2T1wf2KZZo1zoc7PZI6gb4puDFqVNk4zWhKxyvAsLBkfNGigJ5QXDoD2Go4jnrX8Ga9FKkEVlkQ3rgQ6HqFAMuvPzTcgLfHLud91iRw+EVQxzL4LpH1OmUR4cyyAfFDebYssRFBqSqWARexbsVbQWrF2+anruqdXBg7py8JaSM7764o7gZNIu3/+BL7AxC6yOX4MuaTe/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Application = _t, Lakehouse_Table = _t, Graph_Endpoint = _t, Column_Count = _t, Column_Mapping = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"Column_Count", Int64.Type}, {"Application", type text}, {"Lakehouse_Table", type text}, {"Graph_Endpoint", type text}, {"Column_Mapping", type text}})
in
  #"Changed column type"

You can choose to pull this data from somewhere else if it is easier for you, but this table is easy to maintain in this scenario.

Convert our Query to a Function

Our next step is to take the first query we created in this article and make it a function. We will start by adding the parameter for an application at the top and filter the Endpoints table with the value passed in the parameter:

(Application as text) =>
let
  EndpointTable = Endpoints,
  EndpointData = Table.SelectRows(EndpointTable, each [Application] = Application),
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath=EndpointData{0}[Graph_Endpoint] & "(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 

  //Endpoint = "teams_user_activity",
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = EndpointData{0}[Lakehouse_Table], ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -HistoricalDayCount),
  ListDatesForExport = List.Dates(TodaysDate, HistoricalDayCount, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
  ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
  FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
  MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], EndpointData{0}[Column_Count])),
  RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
  ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", Text.Split(EndpointData{0}[Column_Mapping], ", ")),
  FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
  FilterMissingResults

From there, we will call the data from the EndpointData step and inject it into our query with the following items:

In the GraphDataFunction step, place a dynamic endpoint name
In the HistoricalExports step, place a dynamic Lakehouse table name
In the MergeExportedFiles step, place a dynamic column count in the Transform file function
In the ExpandResults step, place dynamic column mappings and wrap it with a Text.Split() function

Now that is completed, we can call the function by typing in the application name and clicking invoke:

Screenshot of us invoking our Microsoft Graph API reports function — *Invoking our Microsoft Graph API reports function.*

With this in place, we can invoke our function for OneDrive, SharePoint, Email, and Viva Engage with a few clicks!

Get Current Users from Entra ID

For our last step, we just need to pull our users out of Entra ID into our Dataflow. To make this happen, I like to use a function to make it easier. I created a function titled Get-Users and invoked it with this code:

() =>
let
  GetUserDetail = (Params) =>
  let    
    Path = if Text.Length(Params) > 0  then Params
      else "$select=displayName,givenName,jobTitle,mail,officeLocation,surname,userPrincipalName,id,accountEnabled,userType,companyName,department,employeeType&$expand=manager",  
    Source = Json.Document(Web.Contents("https://graph.microsoft.com/v1.0/users",[RelativePath = "?" & Path, Headers=[Authorization="Bearer " & #"Get-AuthToken"()]])),  
    NextList = @Source[value], result = try @NextList & @GetUserDetail(Text.AfterDelimiter(Source[#"@odata.nextLink"], "?")) otherwise @NextList    
  in
    result,
  UserDetail = GetUserDetail("")
in
  UserDetail

You will notice the function inside of the function. This is because we will need to iterate through the entire user list to receive all users. Because the API limits us to batches of 1,000, we will need to work through this a few times to get the results we desire.

HELP! THIS IS TOO MUCH!

Is your head spinning? Mine too. In fact, this was a lot to even get this coded. And if I ever want to reuse this, I do not want to go through any of this level of effort again. I am probably going to be like that math teacher you had where you learned how to do everything the hard way to pass an exam and then the next chapter they teach you a way to complete a problem easier. I hated it when that happened to be, but I am still going to do it to you.

To streamline this, I created a Power Query template for this project. You can download the template and import it into your new Dataflow:

Screenshot of the Microsoft Graph API Power Query template being imported into a dataflow. — *Importing a Power Query template for our Microsoft Graph API reports*

Once imported, you will need to update the following parameters:

TenantId
AppId
ClientSecret
WorkspaceId
LakehouseId

This should streamline the process considerably if you are just getting started!

From Dataflow to Lakehouse

We now just need to map our data into our Lakehouse. This will take a little diligence, but it should move along pretty quickly. We just need to match our queries and columns from our Dataflows into tables within our Lakehouse.

Adding Teams Data to the Lakehouse

Starting with our Teams user activity data, we will go to the bottom right-hand corner of the screen and click the + icon to add it to our Lakehouse. Once you have a connection to your Lakehouse, we will use an existing table and navigate to the table where your data is stored:

Screenshot of placing Microsoft Graph API data into our Lakehouse — *Adding Teams data into our Lakehouse*

Once we have selected our table, we need to map our columns. We also need to update the setting that will allow us to append our data:

Mapping our Microsoft Graph API report fields into our Fabric Lakehouse — *Mapping columns and appending our data*

Now we just need to save our settings. Once we publish and refresh our dataflow, our data will be added into the Lakehouse. We just need to repeat these steps for OneDrive, SharePoint, Email, and Viva Engage.

Adding Graph Users to the Lakehouse

If you remember from my last article, we created all of the table for our Microsoft Graph API report endpoints. However, we did not need to create anything for our users. Why is that?!

The answer is simple – with our approach, we will optimize the refresh experience and eliminate data duplication for these dates. This is because it is a moving target and hard to hit. However, with our users, it is a different story as we will replace our data each time. So, just like we did before, we will go ahead and add our Lakehouse as a data destination. However, this time we will create a new table. Once in place, we can make adjustments to our mappings and settings:

Screenshot of our Entra ID users mapping to the Fabric Lakehouse — *Mapping Entra ID users to the Lakehouse*

Once completed, we can click save and move to publishing our Dataflow.

One More Thing Before We Publish!

Last thing we need to do is adjust our HistoricalDayCounter from 5 days to 27 days. This will ensure that all the historical data available is captured in the refresh and will be loaded into the Lakehouse:

Screenshot of the HistoricalDayCounter parameter being updated to 30 days. — *Update our HistoricalDayCounter to 27 days*

Once completed, we can click publish. It will take a few minutes to validate the Dataflow. After that, we just need to setup a daily refresh and our data will start to populate into the Lakehouse!

Next Steps and Conclusion

Alright, that was a ton of information in a single article. It was hard to split up, but I wanted to keep it together in a single article as these pieces were all interconnected. It is a lot, but it is complete. Now that you have your data, you can create your model, measures, and start building out your reports.

In other news, I have been able to get the endpoint for Copilot. However, there are some data quality challenges that I want to address before doing a full write-up. In the meantime, this will get you started with understanding how complicated this level of reporting can be.

Did you make it to the end? Were you successful? If so, tell me about it in the comments below!

Microsoft Graph API Reporting in Fabric – Part 2

By Dominick Raimato

On October 8, 2024

In Dataflows, Fabric, Lakehouse, Power Query

In my last post, I went over they why and covered the building blocks for connecting to the Microsoft Graph API for reporting. With this in place, it is time to prepare the landing zone for our data in Microsoft Fabric! To achieve this, we will need to setup our Fabric environment so we can start to build out our solution.

Before we get started, if you do not have Microsoft Fabric, do not fear! You can start a Microsoft Fabric trial for 60 days for free. When you sign into Power BI, you can click on your user avatar, and start your trial. If this is not available to you, it might be because your Fabric Administrator has turned off the trial feature. You might need to submit a request to re-enable this feature.

Creating Your Lakehouse

To get things started, we need to create a new workspace that has a Fabric capacity assigned to it. Just like you would before with Power BI, create a new workspace inside of Microsoft Fabric and ensure it is tied to your capacity.

Once you have your workspace in place, it is time to build your Lakehouse. If you go to the new button in the upper left hand corner of the workspace window, a screen will appear where you can select a Lakehouse:

Screenshot of the screen where you can select a Lakehouse to deploy. — *Deploying a Lakehouse in Microsoft Fabric*

From there, you can give your Lakehouse a name and click create. You will be redirected to your new Lakehouse. Once created, we will need to extract the Workspace ID and Lakehouse ID from the URL. The easiest way to capture this is to just copy the URL in the address bar to a notepad where you kept your Service Principal data in part 1 of this series. Once you capture this information, you can go can close your Lakehouse:

https://app.powerbi.com/groups/<WorkspaceId>/lakehouses/<LakehouseId>?experience=power-bi

Preparing Your Lakehouse

Now that you have a Lakehouse in place, it is time to prepare the tables. While there is an opportunity to create these tables directly out of our Dataflow, we will want to create a few ahead of time. This is simply to reduce the amount of effort on your side. The goal is to make it easy, but the process can be complex. This should make the process a little more seamless for you as a result.

While I am going to give you a Notebook you can upload, we will pause and create one from scratch so you understand the process. When you click new item in your workspace, we will scroll down to prepare data and select Notebook:

Screenshot of adding a Notebook to your Fabric Workspace. — *Adding a Notebook to your Fabric Workspace*

We will add our Lakehouse to the Notebook on the left hand side. Once in place, we can add our code using PySpark into a cell:

Screenshot of our Notebook in Microsoft Fabric that has been connected to a Lakehouse and PySpark code to create a table. — *Preparing our Notebook to build tables in our Lakehouse*

Using documentation from the Microsoft Graph API, I was able to identify the endpoints and their associated fields. This allowed me to create a block of code that would create our teams_user_activity table:

#Build out teams_user_activity table in the Lakehouse

from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType, BooleanType
schema = StructType([
    StructField("Report_Refresh_Date", DateType(), True),
    StructField("User_Id", StringType(), True),
    StructField("User_Principal_Name", StringType(), True),
    StructField("Last_Activity_Date", DateType(), True),
    StructField("Is_Deleted", BooleanType(), True),
    StructField("Deleted_Date", StringType(), True),
    StructField("Assigned_Products", StringType(), True),
    StructField("Team_Chat_Message_Count", IntegerType(), True),
    StructField("Private_Chat_Message_Count", IntegerType(), True),
    StructField("Call_Count", IntegerType(), True),
    StructField("Meeting_Count", IntegerType(), True),
    StructField("Meetings_Organized_Count", IntegerType(), True),
    StructField("Meetings_Attended_Count", IntegerType(), True),
    StructField("Ad_Hoc_Meetings_Organized_Count", IntegerType(), True),
    StructField("Ad_Hoc_Meetings_Attended_Count", IntegerType(), True),
    StructField("Scheduled_One-time_Meetings_Organized_Count", IntegerType(), True),
    StructField("Scheduled_One-time_Meetings_Attended_Count", IntegerType(), True),
    StructField("Scheduled_Recurring_Meetings_Organized_Count", IntegerType(), True),
    StructField("Scheduled_Recurring_Meetings_Attended_Count", IntegerType(), True),
    StructField("Audio_Duration_In_Seconds", IntegerType(), True),
    StructField("Video_Duration_In_Seconds", IntegerType(), True),
    StructField("Screen_Share_Duration_In_Seconds", IntegerType(), True),
    StructField("Has_Other_Action", StringType(), True),
    StructField("Urgent_Messages", IntegerType(), True),
    StructField("Post_Messages", IntegerType(), True),
    StructField("Tenant_Display_Name", StringType(), True),
    StructField("Shared_Channel_Tenant_Display_Names", StringType(), True),
    StructField("Reply_Messages", IntegerType(), True),
    StructField("Is_Licensed", StringType(), True),
    StructField("Report_Period", IntegerType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("teams_user_activity")

In our scenario, we will need five tables for Teams, OneDrive, SharePoint, Email, and Viva Engage. To simplify the process for you, I have created an notebook you can upload into Fabric and run. You just need to connect your Lakehouse and select run all. To perform this task, you must switch to the Data Engineering context of Fabric and select Import on your workspace:

Screenshot of the process to import a notebook into your Fabric Workspace — *Importing a Notebook into your Microsoft Fabric Workspace*

Once you have run the script successfully, you can go to your SQL Endpoint for the Lakehouse and verify the tables have been created and that they are empty with the following query:

SELECT * FROM teams_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM onedrive_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM sharepoint_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM email_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM viva_engage_user_activity ORDER BY Report_Refresh_Date DESC

Preparing Your Dataflow

After creating your Lakehouse and preparing your tables, you can create a Dataflow in the same workspace. This will query the data and push it to the right tables within our Lakehouse. However, you need to be careful to select a Dataflow Gen2 as opposed to a Dataflow:

Screenshot of the screen where you can select a Dataflow Gen2 to deploy.

Adding a Dataflow Gen2 to your Fabric workspace

This is critical for us to ensure our data makes it into the Lakehouse later on, so be careful with your selection! Once you select it, you can rename your Dataflow in the upper left hand corner:

Screenshot of renaming your Dataflow in Microsoft Fabric. — *Renaming your Dataflow*

With your Lakehouse and Dataflow created, it is time to start getting some data.

Connecting to the Microsoft Graph API

If you remember from my last article, we will be using the application permissions with our app registration. As a result, we must acquire a OAuth2 token to facilitate the authentication process. Luckily, we made it easy for you to deploy this in your solution.

Adding Parameters in Power Query

I covered the basics of Parameters in Power Query in an older article. Instead of going through the details, I will just highlight what we need to have created.

To facilitate our authentication, we need to create three parameters that will be used to hold data from our app registration. The three parameters we need are as follows:

Parameter Name	Parameter Type	Parameter Value
TenantId	Text	Directory (tenant) ID
AppId	Text	Application (client) ID
ClientSecret	Text	Client secret value

Parameters for our authentication process

Remember, capitalization counts! If you copy/paste the parameters above, you will not have an issue with the next step. It seems easy, but take great care with this process!

Creating an Authentication Function

Once you have your parameters set, you need to create a new blank query. To create what we need to make this connection happen, it requires you to build out the query in code. However, I have made that easier for you. Just copy the code below and paste it into the advanced editor with your new query:

() =>
  let
    ContentBody = "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
        client_id=" & AppId & "&
        client_secret=" & ClientSecret, 
    Source = Json.Document(
      Web.Contents(
        "https://login.microsoftonline.com/" & TenantId & "/oauth2/v2.0/token", 
        [
          Content = Text.ToBinary(ContentBody), 
          Headers = [
            Accept = "application/json", 
            ContentType = "application/x-www-form-urlencoded"
            ]
        ]
      )
    )[access_token]
  in
    Source

Once we have pasted the code into the query, click next. After that, rename the query to Get-AuthToken. Again, capitalization is important so take care with this step!

Now that we have our authentication function in place, it is time to take the next step towards building out our Microsoft Graph API functions to pull user information and Teams user activity.

Conclusion and Next Steps

There is a lot of setup involved in this process. Part of the reason is because of the complexity of the pull we need to perform, but also to avoid running into duplicates and reduce the chance for throttling with the endpoints. I know this seems like a lot of work, but I promise it will pay off in the end!

In my next post, we will finally grab our Graph API data and load it into our Lakehouse! Stay tuned to see how it happens!