We all want to hone our skills, but sometimes struggle to find good sample data sets to try out new ideas. Sometimes it is about specific data structure or maybe you want to show off an idea but cannot use production data. A lot of what I share comes from real scenarios I have encountered. To share these tips, I have had to use different data sources over the years.
This week, I want to share some data sets that I find fun and helpful for trying ideas out in Power BI. Some will be easier to use than others. You might even need to have a SQL Server to make them work. Regardless, you should be able to find something you can use.
The Most Basic of All Sample Data Sets
If you are brand new to Power BI, the Contoso Financial Sample workbook is a great place to start. It is a free and easy to use data set for beginners. While it does not a great resource for data modeling, it does serve as a quick and easy model to learn the basics of Power BI.
When I was a Power BI trainer, I liked using this data set for basic DAX calculations as some of the key measures such as cost of goods sold (COGS) where included in the model. I can perform some basic calculations which result in net profit.
Check out the Contoso Financial Sample data set here.
Learn How to Find Insights
Another one of my favorite sample data sets that is easy to use is the Pima Indians Diabetes Database from Kaggle.com. Like some of you, I cringe typing out the name of this data set. If published more recently, it would have likely been given a more culturally sensitive name. However, I use this data set for demonstrating the key influencers visual. I have also used it for predictive modeling with Azure Machine Learning, but that is for another day.
This data set was assembled by the National Institute of Diabetes and Digestive and Kidney Diseases. The purpose of the data set was to perform predictive modeling on diabetes in the Pima community. Identified as a high risk population for Type 2 Diabetes, this data represents the Pima community through years of research. Kaggle provides this data set for free. You just need to sign up for an account to access it.
Kaggle is a great resource for other data sets. There are so many to choose from, it is hard to just pick one. However, you are welcome to peruse their catalogue as you might find something interesting. With a little searching, you will find a data set which you can use to build a report on Settlers of Catan!
Check out the Pima Indians Diabetes data set here.
Simplest of SQL Sample Data Sets
Adventure Works is likely the world’s best know SQL database.. A common data set used for training, it is easy to implement. Experience with SQL Server Management Studio will serve you well as you implement this data set. Microsoft provides clear instructions on restoring the database but I find a little extra know how helps. It is wise to make friends with a database administrator if you don’t have one. Offer to buy them a drink or two at happy hour for their help and you will probably make a new friend out of the experience.
Download the Adventure Works data set here.
Binge The Office While Building Reports
Fans of The Office rejoice! TDMitch created a Dunder Mifflin sales data set from the Northwind Traders data base by Microsoft. Just like Adventure Works, this is a SQL data set. Implementing this data set requires additional effort compared to the Adventure Works database. You must follow instructions and run a few SQL scripts to finalize the setup of this data set.
I recommend this data set for someone who is trying to make something that connects with end users. I also recommend this data set for people who are expanding their transact SQL knowledge.
Check out the Dunder Mifflin data set here.
Simplest of REST API Sample Data Sets
REST APIs are great resources for 3rd party data. They work well but you might find frustration with implementing them. I have used this data set before with my series on the Basics of REST APIs in Power BI. While each API endpoint is unique, you can capture the basics using the Yahoo Finance API.
Offered for free up to 100 calls per day, it is an effortless way to learn the basics with no costs. If you are really into stocks, you might even consider purchasing a paid subscription. Spend some time digging through the endpoints and become comfortable with how you can use APIs with Power BI.
You can review the Yahoo Finance API documentation here.
Big Data Sample Data Sets
Sometimes you want to throw a lot of data to test out a solution. The New York City Taxi data set is a massive trove of data that is free for use. Available as CSVs or APIs, you can choose how you want to access the data. I used it to benchmark refresh speeds between various Azure data sources such as blob, table, data lake, and Azure SQL storage solutions.
The Taxi and Limousine Commission provides quality documentation around the data set. It even provides clear descriptions in the data dictionary, maps of taxi zones, and dimension tables. It even explains the difference between yellow taxi, green taxi, and for hire car services.
Check out the NYC Taxi and Limousine Commission data mart here.
Did Not Find Something To Fit Your Needs?
No fear about that! There are tons of free data sources out there for you to use. My favorite place to go is to data.gov and check out different data sets available from the US Federal Government. You can also search for open data from many states and cities. You might even be able to use it for some of your solutions.
Google also has a data set search that will help you find some samples. Search for different topics such as commodities or labor statistics and see what comes back. My only caution is that not every result will be free. However, if you are looking for something specific, this search will help you find what you data you need.
How about you? What are some of your favorite sample data sets? If you have a good one or used one of these, tell me in the comments below!
Leave a Reply
You must be logged in to post a comment.