Making sense of challenging data topics one step at a time.

Category: Dataflows

Microsoft Graph API Reporting in Fabric – Part 3

Welcome back to my series on the Microsoft Graph API. We have spent a lot of time laying the groundwork for our data. In this article, it is all about getting our data into our Lakehouse so we can start to trend our data.

As a quick reminder, the purpose of us having to go through this exercise is because we must retain data on our own. The Microsoft Graph API will only furnish detailed data over the past thirty days. Any further back and it is lost. Therefore, if we want to trend data over three months, we must warehouse it.

In this article, we will be finalizing our queries and load them into our Lakehouse. To get there, we are going to go through some deep levels of M Query. However, hang in there until the end, I should be able to make it easier for you!

Getting Teams User Activity Detail

To get started, we are going to have to write out some code. While you can just copy/paste the code into this article, I think it is worth walking through together to help you understand the method behind the madness. It seems complicated – and it really is. But understanding the rationale of what was done will help you with your own projects. To help us on this journey, you will need to create three more parameters to simplify the process:

Parameter NameParameter TypeParameter Value
LakehouseIdTextValue of Lakehouse ID from URL
WorkspaceIdTextValue of Workspace ID from URL
HistoricalDayCountDecimal NumberList of values (5, 15, 27) with the default set to 5
Parameters required for capturing Microsoft Graph API data

With your parameters in place, it is time to write out our query. Open a new blank query and strap in!

Start With a Function

Because we are pulling data for a single date at a time, we will need to create a function. We have done this before, but this time we are going to embed it within an existing query:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData
in
  GraphDataFunction

We need to call a few things out. First and foremost, you will note we are using the RelativePath option. This is because our dates will be updated with each all of the function, therefore we must use this option to avoid errors.

Next, you will notice the Authorization header is using our Get-AuthToken function we created in part 2 of this series. This is key to ensure we use our token to complete the authentication process.

Checking the Lakehouse for Existing Data

We will need to ensure we only query Graph data that is not already in the Lakehouse. Because of how finicky the Graph API can be, we cannot guarantee data to be available the next day. Sometimes it is a few days behind. We also do not want duplicate data to be added to the Lakehouse.

To prevent this from happening, we will query the Lakehouse and the tables we created. By using the parameters we created, we will call our Lakehouse and pull our historical data so we can transform it:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"})
in
  HistoricalDatesExported

These five lines of code will allow us to connect to our teams_user_activity table, remove unnecessary columns, remove duplicates, and format into a usable column. This will be empty for right now, but more will be added as we move forward.

Create a Range of Dates and Merge with Historical Data

Next, we will pull data from the last few days. To make life easier on us, we will limit the pull to five days to speed up our development process. We just need to use today’s date as a base and create a range going back five days that we can use:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
  ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null))
in
  IdentifyMissingData

Once we have a list of dates in place, we will merge it to our historical data from the HistoricalDatesExported step. From there, expand the values and filter out the null values so we only have dates where we are missing data left.

Execute the Microsoft Graph API Function

Reaching all the way back to the first step, GraphDataFunction, we are going to pass our list of dates through to get our data. We will pass our column and pull our data through:

let
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath="getTeamsUserActivityUserDetail(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = "teams_user_activity", ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -5),
  ListDatesForExport = List.Dates(TodaysDate, 5, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
  ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
  FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
  MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], 33)),
  RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
  ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", {"Report Refresh Date", " User Id", " User Principal Name", " Last Activity Date", " Is Deleted", " Deleted Date", " Assigned Products", " Team Chat Message Count", " Private Chat Message Count", " Call Count", " Meeting Count", " Meetings Organized Count", " Meetings Attended Count", " Ad Hoc Meetings Organized Count", " Ad Hoc Meetings Attended Count", " Scheduled One-time Meetings Organized Count", " Scheduled One-time Meetings Attended Count", " Scheduled Recurring Meetings Organized Count", " Scheduled Recurring Meetings Attended Count", " Audio Duration In Seconds", " Video Duration In Seconds", " Screen Share Duration In Seconds", " Has Other Action", " Urgent Messages", " Post Messages", " Tenant Display Name", " Shared Channel Tenant Display Names", " Reply Messages", " Is Licensed", " Report Period"}),
  FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
  FilterMissingResults

After we execute the function, we will combine our files and expand the results. To do this, we are using a custom function titled Transform file to combine our files seamlessly:

(Parameter as binary, ColumnCount as number) => let
  Source = Csv.Document(Parameter, [Delimiter = ",", Columns = ColumnCount, Encoding = 65001, QuoteStyle = QuoteStyle.None]),
  #"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
  #"Promoted headers"

The result should be something that looks like this:

Screenshot of Microsoft Graph API report data.
Screenshot of the Microsoft Graph API report data for Teams

Wow – that was a whole lot for a single endpoint. Only four more applications left! Frustrated that you will have to do this again? Well, instead of being frustrated, let’s make it a little easier.

Turning Microsoft Graph API Reports into a Flexible Function

We have worked so hard to get the first query in place. But if we think through things a little further, we can simplify the process for future endpoints. Instead of going through this entire process over and over again, we can convert this to a function and pass data to make it easier for us to use.

Prepare Your Endpoint Data

To simplify our process, we will start by creating a table we can reference. This will highlight the application, Microsoft Graph API endpoint, Fabric Lakehouse table, columns pulled from the endpoint, and the column names. Create a blank query called Endpoints and add the following code:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3VXbbtswDP0Vwc/pQ9cvCOIMDZCtRt0FGIIiECzOIWpTgURny75+lOPc5aB+3YttkkeUeXRILZfJtNZYJaMEwnvVeHArXTBukXfiLYFbwLhz/ZB4Crxf8vgkj1fYWMfqFX458GuVaoaRCjCVOaQCN7pS33UtzhT9ptK7zpp5lUIFDGZ0+OgWz7Vnddiw8+VARk1sQzySrQrALZxMfQx9A2CkUk0c6JDwyj0jBifVnSJj77EksTNnTVOwD/nagjJwaE3yPlomLwSpkx2lWivY8Bkj6gCLc/U4iKth7CwQfgvgxampwQD9itWRn3xHxbVrrZ24WjpIV9UuFp3+iUY/xVibIrNILFX7YGyCEWPtBI3z9uU/4W2Bvt0i0+UwLt9A115I4PCOMdgCAhtRAp/uN+nM9DAZ46uf3Ugl4bfUZK1Z2s/7s6plp60sisYmwttNM1+YXo6r1IR/b9rbqzGzzImz7jbq2RZ3Vl4DrhPkxRpMUwWNEDww1nAn2T1wf2KZZo1zoc7PZI6gb4puDFqVNk4zWhKxyvAsLBkfNGigJ5QXDoD2Go4jnrX8Ga9FKkEVlkQ3rgQ6HqFAMuvPzTcgLfHLud91iRw+EVQxzL4LpH1OmUR4cyyAfFDebYssRFBqSqWARexbsVbQWrF2+anruqdXBg7py8JaSM7764o7gZNIu3/+BL7AxC6yOX4MuaTe/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Application = _t, Lakehouse_Table = _t, Graph_Endpoint = _t, Column_Count = _t, Column_Mapping = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"Column_Count", Int64.Type}, {"Application", type text}, {"Lakehouse_Table", type text}, {"Graph_Endpoint", type text}, {"Column_Mapping", type text}})
in
  #"Changed column type"

You can choose to pull this data from somewhere else if it is easier for you, but this table is easy to maintain in this scenario.

Convert our Query to a Function

Our next step is to take the first query we created in this article and make it a function. We will start by adding the parameter for an application at the top and filter the Endpoints table with the value passed in the parameter:

(Application as text) =>
let
  EndpointTable = Endpoints,
  EndpointData = Table.SelectRows(EndpointTable, each [Application] = Application),
  GraphDataFunction = (ExportDate as date) =>
    let
      GraphData = Web.Contents("https://graph.microsoft.com/v1.0/reports/", 
        [RelativePath=EndpointData{0}[Graph_Endpoint] & "(date=" & Date.ToText(ExportDate, "yyyy-MM-dd") & ")",
        Headers=[Accept="application/json",Authorization="Bearer " & #"Get-AuthToken"()]])
    in
      GraphData, 

  //Endpoint = "teams_user_activity",
  LakehouseConnection = Lakehouse.Contents(null){[workspaceId = WorkspaceId]}[Data]{[lakehouseId = LakehouseId]}[Data],
  HistoricalExports = LakehouseConnection{[Id = EndpointData{0}[Lakehouse_Table], ItemKind = "Table"]}[Data],
  ReportRefreshDates = Table.SelectColumns(HistoricalExports, {"Report_Refresh_Date"}),
  FilterRefreshDates = Table.SelectRows(ReportRefreshDates, each [Report_Refresh_Date] <> null and [Report_Refresh_Date] <> ""),
  HistoricalDatesExported = Table.Distinct(FilterRefreshDates, {"Report_Refresh_Date"}),
  TodaysDate = Date.AddDays(Date.From(DateTime.LocalNow()), -HistoricalDayCount),
  ListDatesForExport = List.Dates(TodaysDate, HistoricalDayCount, #duration(1, 0, 0, 0)),
  TableOfDates = Table.FromList(ListDatesForExport, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  RenameDateColumn = Table.RenameColumns(TableOfDates, {{"Column1", "ExportDates"}}),
  SetProperDataType = Table.TransformColumnTypes(RenameDateColumn, {{"ExportDates", type date}}),
  MergeHistoricalData = Table.NestedJoin(SetProperDataType, {"ExportDates"}, HistoricalDatesExported, {"Report_Refresh_Date"}, "HistoricalDatesExported", JoinKind.LeftOuter),
  ExpandHistoricalDates = Table.ExpandTableColumn(MergeHistoricalData, "HistoricalDatesExported", {"Report_Refresh_Date"}, {"Report_Refresh_Date"}),
  IdentifyMissingData = Table.SelectRows(ExpandHistoricalDates, each ( [Report_Refresh_Date] = null)),
  ExecuteGraphFunction = Table.AddColumn(IdentifyMissingData, "Invoked custom function", each GraphDataFunction([ExportDates])),
  FilterMissingData = Table.SelectRows(ExecuteGraphFunction, each [Attributes]?[Hidden]? <> true),
  MergeExportedFiles = Table.AddColumn(FilterMissingData, "Transform file", each #"Transform file"([Invoked custom function], EndpointData{0}[Column_Count])),
  RemoveNonDataColumns = Table.SelectColumns(MergeExportedFiles, {"Transform file"}),
  ExpandResults = Table.ExpandTableColumn(RemoveNonDataColumns, "Transform file", Text.Split(EndpointData{0}[Column_Mapping], ", ")),
  FilterMissingResults = Table.SelectRows(ExpandResults, each [Report Refresh Date] <> null)
in
  FilterMissingResults

From there, we will call the data from the EndpointData step and inject it into our query with the following items:

  • In the GraphDataFunction step, place a dynamic endpoint name
  • In the HistoricalExports step, place a dynamic Lakehouse table name
  • In the MergeExportedFiles step, place a dynamic column count in the Transform file function
  • In the ExpandResults step, place dynamic column mappings and wrap it with a Text.Split() function

Now that is completed, we can call the function by typing in the application name and clicking invoke:

Screenshot of us invoking our Microsoft Graph API reports function
Invoking our Microsoft Graph API reports function.

With this in place, we can invoke our function for OneDrive, SharePoint, Email, and Viva Engage with a few clicks!

Get Current Users from Entra ID

For our last step, we just need to pull our users out of Entra ID into our Dataflow. To make this happen, I like to use a function to make it easier. I created a function titled Get-Users and invoked it with this code:

() =>
let
  GetUserDetail = (Params) =>
  let    
    Path = if Text.Length(Params) > 0  then Params
      else "$select=displayName,givenName,jobTitle,mail,officeLocation,surname,userPrincipalName,id,accountEnabled,userType,companyName,department,employeeType&$expand=manager",  
    Source = Json.Document(Web.Contents("https://graph.microsoft.com/v1.0/users",[RelativePath = "?" & Path, Headers=[Authorization="Bearer " & #"Get-AuthToken"()]])),  
    NextList = @Source[value], result = try @NextList & @GetUserDetail(Text.AfterDelimiter(Source[#"@odata.nextLink"], "?")) otherwise @NextList    
  in
    result,
  UserDetail = GetUserDetail("")
in
  UserDetail

You will notice the function inside of the function. This is because we will need to iterate through the entire user list to receive all users. Because the API limits us to batches of 1,000, we will need to work through this a few times to get the results we desire.

HELP! THIS IS TOO MUCH!

Is your head spinning? Mine too. In fact, this was a lot to even get this coded. And if I ever want to reuse this, I do not want to go through any of this level of effort again. I am probably going to be like that math teacher you had where you learned how to do everything the hard way to pass an exam and then the next chapter they teach you a way to complete a problem easier. I hated it when that happened to be, but I am still going to do it to you.

To streamline this, I created a Power Query template for this project. You can download the template and import it into your new Dataflow:

Screenshot of the Microsoft Graph API Power Query template being imported into a dataflow.
Importing a Power Query template for our Microsoft Graph API reports

Once imported, you will need to update the following parameters:

  • TenantId
  • AppId
  • ClientSecret
  • WorkspaceId
  • LakehouseId

This should streamline the process considerably if you are just getting started!

From Dataflow to Lakehouse

We now just need to map our data into our Lakehouse. This will take a little diligence, but it should move along pretty quickly. We just need to match our queries and columns from our Dataflows into tables within our Lakehouse.

Adding Teams Data to the Lakehouse

Starting with our Teams user activity data, we will go to the bottom right-hand corner of the screen and click the + icon to add it to our Lakehouse. Once you have a connection to your Lakehouse, we will use an existing table and navigate to the table where your data is stored:

Screenshot of placing Microsoft Graph API data into our Lakehouse
Adding Teams data into our Lakehouse

Once we have selected our table, we need to map our columns. We also need to update the setting that will allow us to append our data:

Mapping our Microsoft Graph API report fields into our Fabric Lakehouse
Mapping columns and appending our data

Now we just need to save our settings. Once we publish and refresh our dataflow, our data will be added into the Lakehouse. We just need to repeat these steps for OneDrive, SharePoint, Email, and Viva Engage.

Adding Graph Users to the Lakehouse

If you remember from my last article, we created all of the table for our Microsoft Graph API report endpoints. However, we did not need to create anything for our users. Why is that?!

The answer is simple – with our approach, we will optimize the refresh experience and eliminate data duplication for these dates. This is because it is a moving target and hard to hit. However, with our users, it is a different story as we will replace our data each time. So, just like we did before, we will go ahead and add our Lakehouse as a data destination. However, this time we will create a new table. Once in place, we can make adjustments to our mappings and settings:

Screenshot of our Entra ID users mapping to the Fabric Lakehouse
Mapping Entra ID users to the Lakehouse

Once completed, we can click save and move to publishing our Dataflow.

One More Thing Before We Publish!

Last thing we need to do is adjust our HistoricalDayCounter from 5 days to 27 days. This will ensure that all the historical data available is captured in the refresh and will be loaded into the Lakehouse:

Screenshot of the HistoricalDayCounter parameter being updated to 30 days.
Update our HistoricalDayCounter to 27 days

Once completed, we can click publish. It will take a few minutes to validate the Dataflow. After that, we just need to setup a daily refresh and our data will start to populate into the Lakehouse!

Next Steps and Conclusion

Alright, that was a ton of information in a single article. It was hard to split up, but I wanted to keep it together in a single article as these pieces were all interconnected. It is a lot, but it is complete. Now that you have your data, you can create your model, measures, and start building out your reports.

In other news, I have been able to get the endpoint for Copilot. However, there are some data quality challenges that I want to address before doing a full write-up. In the meantime, this will get you started with understanding how complicated this level of reporting can be.

Did you make it to the end? Were you successful? If so, tell me about it in the comments below!

Microsoft Graph API Reporting in Fabric – Part 2

In my last post, I went over they why and covered the building blocks for connecting to the Microsoft Graph API for reporting. With this in place, it is time to prepare the landing zone for our data in Microsoft Fabric! To achieve this, we will need to setup our Fabric environment so we can start to build out our solution.

Before we get started, if you do not have Microsoft Fabric, do not fear! You can start a Microsoft Fabric trial for 60 days for free. When you sign into Power BI, you can click on your user avatar, and start your trial. If this is not available to you, it might be because your Fabric Administrator has turned off the trial feature. You might need to submit a request to re-enable this feature.

Creating Your Lakehouse

To get things started, we need to create a new workspace that has a Fabric capacity assigned to it. Just like you would before with Power BI, create a new workspace inside of Microsoft Fabric and ensure it is tied to your capacity.

Once you have your workspace in place, it is time to build your Lakehouse. If you go to the new button in the upper left hand corner of the workspace window, a screen will appear where you can select a Lakehouse:

Screenshot of the screen where you can select a Lakehouse to deploy.
Deploying a Lakehouse in Microsoft Fabric

From there, you can give your Lakehouse a name and click create. You will be redirected to your new Lakehouse. Once created, we will need to extract the Workspace ID and Lakehouse ID from the URL. The easiest way to capture this is to just copy the URL in the address bar to a notepad where you kept your Service Principal data in part 1 of this series. Once you capture this information, you can go can close your Lakehouse:

https://app.powerbi.com/groups/<WorkspaceId>/lakehouses/<LakehouseId>?experience=power-bi

Preparing Your Lakehouse

Now that you have a Lakehouse in place, it is time to prepare the tables. While there is an opportunity to create these tables directly out of our Dataflow, we will want to create a few ahead of time. This is simply to reduce the amount of effort on your side. The goal is to make it easy, but the process can be complex. This should make the process a little more seamless for you as a result.

While I am going to give you a Notebook you can upload, we will pause and create one from scratch so you understand the process. When you click new item in your workspace, we will scroll down to prepare data and select Notebook:

Screenshot of adding a Notebook to your Fabric Workspace.
Adding a Notebook to your Fabric Workspace

We will add our Lakehouse to the Notebook on the left hand side. Once in place, we can add our code using PySpark into a cell:

Screenshot of our Notebook in Microsoft Fabric that has been connected to a Lakehouse and PySpark code to create a table.
Preparing our Notebook to build tables in our Lakehouse

Using documentation from the Microsoft Graph API, I was able to identify the endpoints and their associated fields. This allowed me to create a block of code that would create our teams_user_activity table:

#Build out teams_user_activity table in the Lakehouse

from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType, BooleanType
schema = StructType([
    StructField("Report_Refresh_Date", DateType(), True),
    StructField("User_Id", StringType(), True),
    StructField("User_Principal_Name", StringType(), True),
    StructField("Last_Activity_Date", DateType(), True),
    StructField("Is_Deleted", BooleanType(), True),
    StructField("Deleted_Date", StringType(), True),
    StructField("Assigned_Products", StringType(), True),
    StructField("Team_Chat_Message_Count", IntegerType(), True),
    StructField("Private_Chat_Message_Count", IntegerType(), True),
    StructField("Call_Count", IntegerType(), True),
    StructField("Meeting_Count", IntegerType(), True),
    StructField("Meetings_Organized_Count", IntegerType(), True),
    StructField("Meetings_Attended_Count", IntegerType(), True),
    StructField("Ad_Hoc_Meetings_Organized_Count", IntegerType(), True),
    StructField("Ad_Hoc_Meetings_Attended_Count", IntegerType(), True),
    StructField("Scheduled_One-time_Meetings_Organized_Count", IntegerType(), True),
    StructField("Scheduled_One-time_Meetings_Attended_Count", IntegerType(), True),
    StructField("Scheduled_Recurring_Meetings_Organized_Count", IntegerType(), True),
    StructField("Scheduled_Recurring_Meetings_Attended_Count", IntegerType(), True),
    StructField("Audio_Duration_In_Seconds", IntegerType(), True),
    StructField("Video_Duration_In_Seconds", IntegerType(), True),
    StructField("Screen_Share_Duration_In_Seconds", IntegerType(), True),
    StructField("Has_Other_Action", StringType(), True),
    StructField("Urgent_Messages", IntegerType(), True),
    StructField("Post_Messages", IntegerType(), True),
    StructField("Tenant_Display_Name", StringType(), True),
    StructField("Shared_Channel_Tenant_Display_Names", StringType(), True),
    StructField("Reply_Messages", IntegerType(), True),
    StructField("Is_Licensed", StringType(), True),
    StructField("Report_Period", IntegerType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("teams_user_activity")

In our scenario, we will need five tables for Teams, OneDrive, SharePoint, Email, and Viva Engage. To simplify the process for you, I have created an notebook you can upload into Fabric and run. You just need to connect your Lakehouse and select run all. To perform this task, you must switch to the Data Engineering context of Fabric and select Import on your workspace:

Screenshot of the process to import a notebook into your Fabric Workspace
Importing a Notebook into your Microsoft Fabric Workspace

Once you have run the script successfully, you can go to your SQL Endpoint for the Lakehouse and verify the tables have been created and that they are empty with the following query:

SELECT * FROM teams_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM onedrive_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM sharepoint_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM email_user_activity ORDER BY Report_Refresh_Date DESC;

SELECT * FROM viva_engage_user_activity ORDER BY Report_Refresh_Date DESC

Preparing Your Dataflow

After creating your Lakehouse and preparing your tables, you can create a Dataflow in the same workspace. This will query the data and push it to the right tables within our Lakehouse. However, you need to be careful to select a Dataflow Gen2 as opposed to a Dataflow:

Screenshot of the screen where you can select a Dataflow Gen2 to deploy.

Adding a Dataflow Gen2 to your Fabric workspace

This is critical for us to ensure our data makes it into the Lakehouse later on, so be careful with your selection! Once you select it, you can rename your Dataflow in the upper left hand corner:

Screenshot of renaming your Dataflow in Microsoft Fabric.
Renaming your Dataflow

With your Lakehouse and Dataflow created, it is time to start getting some data.

Connecting to the Microsoft Graph API

If you remember from my last article, we will be using the application permissions with our app registration. As a result, we must acquire a OAuth2 token to facilitate the authentication process. Luckily, we made it easy for you to deploy this in your solution.

Adding Parameters in Power Query

I covered the basics of Parameters in Power Query in an older article. Instead of going through the details, I will just highlight what we need to have created.

To facilitate our authentication, we need to create three parameters that will be used to hold data from our app registration. The three parameters we need are as follows:

Parameter NameParameter TypeParameter Value
TenantIdTextDirectory (tenant) ID
AppIdTextApplication (client) ID
ClientSecretTextClient secret value
Parameters for our authentication process

Remember, capitalization counts! If you copy/paste the parameters above, you will not have an issue with the next step. It seems easy, but take great care with this process!

Creating an Authentication Function

Once you have your parameters set, you need to create a new blank query. To create what we need to make this connection happen, it requires you to build out the query in code. However, I have made that easier for you. Just copy the code below and paste it into the advanced editor with your new query:

() =>
  let
    ContentBody = "Content-Type=application/x-www-form-urlencoded&
        scope=https://graph.microsoft.com/.default&
        grant_type=client_credentials&
        client_id=" & AppId & "&
        client_secret=" & ClientSecret, 
    Source = Json.Document(
      Web.Contents(
        "https://login.microsoftonline.com/" & TenantId & "/oauth2/v2.0/token", 
        [
          Content = Text.ToBinary(ContentBody), 
          Headers = [
            Accept = "application/json", 
            ContentType = "application/x-www-form-urlencoded"
            ]
        ]
      )
    )[access_token]
  in
    Source

Once we have pasted the code into the query, click next. After that, rename the query to Get-AuthToken. Again, capitalization is important so take care with this step!

Now that we have our authentication function in place, it is time to take the next step towards building out our Microsoft Graph API functions to pull user information and Teams user activity.

Conclusion and Next Steps

There is a lot of setup involved in this process. Part of the reason is because of the complexity of the pull we need to perform, but also to avoid running into duplicates and reduce the chance for throttling with the endpoints. I know this seems like a lot of work, but I promise it will pay off in the end!

In my next post, we will finally grab our Graph API data and load it into our Lakehouse! Stay tuned to see how it happens!

Powered by WordPress & Theme by Anders Norén