In my last post, I went over they why and covered the building blocks for connecting to the Microsoft Graph API for reporting. With this in place, it is time to prepare the landing zone for our data in Microsoft Fabric! To achieve this, we will need to setup our Fabric environment so we can start to build out our solution.
Before we get started, if you do not have Microsoft Fabric, do not fear! You can start a Microsoft Fabric trial for 60 days for free. When you sign into Power BI, you can click on your user avatar, and start your trial. If this is not available to you, it might be because your Fabric Administrator has turned off the trial feature. You might need to submit a request to re-enable this feature.
Creating Your Lakehouse
To get things started, we need to create a new workspace that has a Fabric capacity assigned to it. Just like you would before with Power BI, create a new workspace inside of Microsoft Fabric and ensure it is tied to your capacity.
Once you have your workspace in place, it is time to build your Lakehouse. If you go to the new button in the upper left hand corner of the workspace window, a screen will appear where you can select a Lakehouse:
From there, you can give your Lakehouse a name and click create. You will be redirected to your new Lakehouse. Once created, we will need to extract the Workspace ID and Lakehouse ID from the URL. The easiest way to capture this is to just copy the URL in the address bar to a notepad where you kept your Service Principal data in part 1 of this series. Once you capture this information, you can go can close your Lakehouse:
https://app.powerbi.com/groups/<WorkspaceId>/lakehouses/<LakehouseId>?experience=power-bi
Preparing Your Lakehouse
Now that you have a Lakehouse in place, it is time to prepare the tables. While there is an opportunity to create these tables directly out of our Dataflow, we will want to create a few ahead of time. This is simply to reduce the amount of effort on your side. The goal is to make it easy, but the process can be complex. This should make the process a little more seamless for you as a result.
While I am going to give you a Notebook you can upload, we will pause and create one from scratch so you understand the process. When you click new item in your workspace, we will scroll down to prepare data and select Notebook:
We will add our Lakehouse to the Notebook on the left hand side. Once in place, we can add our code using PySpark into a cell:
Using documentation from the Microsoft Graph API, I was able to identify the endpoints and their associated fields. This allowed me to create a block of code that would create our teams_user_activity table:
#Build out teams_user_activity table in the Lakehouse
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType, BooleanType
schema = StructType([
StructField("Report_Refresh_Date", DateType(), True),
StructField("User_Id", StringType(), True),
StructField("User_Principal_Name", StringType(), True),
StructField("Last_Activity_Date", DateType(), True),
StructField("Is_Deleted", BooleanType(), True),
StructField("Deleted_Date", StringType(), True),
StructField("Assigned_Products", StringType(), True),
StructField("Team_Chat_Message_Count", IntegerType(), True),
StructField("Private_Chat_Message_Count", IntegerType(), True),
StructField("Call_Count", IntegerType(), True),
StructField("Meeting_Count", IntegerType(), True),
StructField("Meetings_Organized_Count", IntegerType(), True),
StructField("Meetings_Attended_Count", IntegerType(), True),
StructField("Ad_Hoc_Meetings_Organized_Count", IntegerType(), True),
StructField("Ad_Hoc_Meetings_Attended_Count", IntegerType(), True),
StructField("Scheduled_One-time_Meetings_Organized_Count", IntegerType(), True),
StructField("Scheduled_One-time_Meetings_Attended_Count", IntegerType(), True),
StructField("Scheduled_Recurring_Meetings_Organized_Count", IntegerType(), True),
StructField("Scheduled_Recurring_Meetings_Attended_Count", IntegerType(), True),
StructField("Audio_Duration_In_Seconds", IntegerType(), True),
StructField("Video_Duration_In_Seconds", IntegerType(), True),
StructField("Screen_Share_Duration_In_Seconds", IntegerType(), True),
StructField("Has_Other_Action", StringType(), True),
StructField("Urgent_Messages", IntegerType(), True),
StructField("Post_Messages", IntegerType(), True),
StructField("Tenant_Display_Name", StringType(), True),
StructField("Shared_Channel_Tenant_Display_Names", StringType(), True),
StructField("Reply_Messages", IntegerType(), True),
StructField("Is_Licensed", StringType(), True),
StructField("Report_Period", IntegerType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("teams_user_activity")
In our scenario, we will need five tables for Teams, OneDrive, SharePoint, Email, and Viva Engage. To simplify the process for you, I have created an notebook you can upload into Fabric and run. You just need to connect your Lakehouse and select run all. To perform this task, you must switch to the Data Engineering context of Fabric and select Import on your workspace:
Once you have run the script successfully, you can go to your SQL Endpoint for the Lakehouse and verify the tables have been created and that they are empty with the following query:
SELECT * FROM teams_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM onedrive_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM sharepoint_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM email_user_activity ORDER BY Report_Refresh_Date DESC;
SELECT * FROM viva_engage_user_activity ORDER BY Report_Refresh_Date DESC
Preparing Your Dataflow
After creating your Lakehouse and preparing your tables, you can create a Dataflow in the same workspace. This will query the data and push it to the right tables within our Lakehouse. However, you need to be careful to select a Dataflow Gen2 as opposed to a Dataflow:
Adding a Dataflow Gen2 to your Fabric workspace
This is critical for us to ensure our data makes it into the Lakehouse later on, so be careful with your selection! Once you select it, you can rename your Dataflow in the upper left hand corner:
With your Lakehouse and Dataflow created, it is time to start getting some data.
Connecting to the Microsoft Graph API
If you remember from my last article, we will be using the application permissions with our app registration. As a result, we must acquire a OAuth2 token to facilitate the authentication process. Luckily, we made it easy for you to deploy this in your solution.
Adding Parameters in Power Query
I covered the basics of Parameters in Power Query in an older article. Instead of going through the details, I will just highlight what we need to have created.
To facilitate our authentication, we need to create three parameters that will be used to hold data from our app registration. The three parameters we need are as follows:
Parameter Name | Parameter Type | Parameter Value |
TenantId | Text | Directory (tenant) ID |
AppId | Text | Application (client) ID |
ClientSecret | Text | Client secret value |
Remember, capitalization counts! If you copy/paste the parameters above, you will not have an issue with the next step. It seems easy, but take great care with this process!
Creating an Authentication Function
Once you have your parameters set, you need to create a new blank query. To create what we need to make this connection happen, it requires you to build out the query in code. However, I have made that easier for you. Just copy the code below and paste it into the advanced editor with your new query:
() =>
let
ContentBody = "Content-Type=application/x-www-form-urlencoded&
scope=https://graph.microsoft.com/.default&
grant_type=client_credentials&
client_id=" & AppId & "&
client_secret=" & ClientSecret,
Source = Json.Document(
Web.Contents(
"https://login.microsoftonline.com/" & TenantId & "/oauth2/v2.0/token",
[
Content = Text.ToBinary(ContentBody),
Headers = [
Accept = "application/json",
ContentType = "application/x-www-form-urlencoded"
]
]
)
)[access_token]
in
Source
Once we have pasted the code into the query, click next. After that, rename the query to Get-AuthToken. Again, capitalization is important so take care with this step!
Now that we have our authentication function in place, it is time to take the next step towards building out our Microsoft Graph API functions to pull user information and Teams user activity.
Conclusion and Next Steps
There is a lot of setup involved in this process. Part of the reason is because of the complexity of the pull we need to perform, but also to avoid running into duplicates and reduce the chance for throttling with the endpoints. I know this seems like a lot of work, but I promise it will pay off in the end!
In my next post, we will finally grab our Graph API data and load it into our Lakehouse! Stay tuned to see how it happens!
1 Pingback