In my last article, I shared how to scale a query by using a function in Power Query. However, I noted there might be some challenges with this approach and considerations you should keep in mind when connecting to REST APIs. Some of these considerations are simple and are similar to normal query optimization activities. While not as technical as other articles, I feel it is important to take some time to focus on potential challenges you might encounter when leveraging REST APIs.
Limits, Throttling, and Speed
When reading your documentation, you might see words such as “limits” and “throttling” and wonder what they are. You will build your report and everything will be fine until one day your refresh fails. In reviewing the refresh errors it will suddenly make sense what “limits” and “throttling” mean.
To ensure APIs are not overburdened, limits and throttling are employed to prevent abuse. This protects the endpoints and provides a reliable service. Understanding how to navigate them are essential to prevent your refreshes from failing as your solution scales.
What are API Request Limits?
API limits restrict the number of calls made in a day, week, or month. If you signed up for the Yahoo Finance API, you notice they have three tiers of membership. The free tier limits your calls to 100 per day. Because of how we built our function, you would have to limit your report to 100 stocks and only refresh it once a day.
Now there are additional tiers that you can purchase which increase your limits. The Yahoo Finance API offers tiers that give you 20,000 or 75,000 calls per month which might help accommodate your use case but requires an additional financial investment. But even if you increase your subscription, note that these tiers still have a hard limit. Meaning that you might need to keep purchasing a higher tier if you blow through your request limit.
To be a good steward of your subscription, you will want to regularly monitor your API usage and verify that you are not burning through your quotas too quickly. It never fails – you run out of requests just when you need the data the most.
What is Throttling?
APIs use throttling to protect endpoints from being heavily queried over a short period of time. Queries are limited to the number of calls made per minute, per 15 minute window, or per hour. This protects endpoints from rogue code that could degrade the service because of abuse.
With the Yahoo Finance API, our documentation states that there is a rate limit of 300 calls per minute. If you had a list of 1,200 stocks it would take at least four minutes for the refresh to complete. (1,200 stocks divided by 300 = 4 minutes) Just like how every API is unique, they manage throttling differently.
Last year I built a Power BI report that leveraged Google’s mapping APIs to get coordinates and street views of addresses in my town. Google enacted throttling to limit requests to 100 per minute. When refreshing my data, I hit my 100 request limit and the API paused until the full minute had passed. In doing so, I was able to proceed with the refresh. With 3,500 houses in town, it took 35 minutes for the refresh to complete. It was slow, but it worked great.
On the other side of the spectrum, the Microsoft Graph APIs have a limit of 10,000 calls per 10 minute period. When I ran over this limit, instead of the API pausing, I ended up receiving responses with a 429 error stating I reached the throttling limits. In order to manage this, I had to get a little more creative with my query to make sure I did not receive that error.
Impact on Refresh Time
Regardless of limits, you will need to consider refresh time when using REST APIs. If your primary data sources reside in SQL servers, this will be an adjustment for you. Normally you just cast a wide net with your query and bring in everything you can and work from there. With APIs, you want to narrow that scope to only what you really need. But with REST APIs, the refreshes tend to be slower. Adding some filtering will speed up your queries and make them more efficient.
If you are in a situation where you need a large amount of data from an API, you might want to consider leveraging Azure Data Factory or Databricks to incrementally replicate that data into a data warehouse instead.
Paging Queries
For this series of articles, our endpoint was fairly basic. It only brought back one row of data for each stock symbol. This was by design because our overall goal was to cover the basics. As you explore other APIs, you might find endpoints that bring back several rows at a time.
If you expand your horizon with REST APIs, eventually you are going to run into something called paging. Simply put, paging is the splitting of responses over multiple queries. Similar to limits, this is to prevent overburdening the endpoint with your query.
As an example, let’s say you are querying an endpoint that has 50,000 records. Paging has been enabled on this query and is limiting the responses to 1,000 records at a time. In order to receive all 50,000 records, you will need to build logic into your query to pull each page individually.
So how do you know your endpoint is using paging? It is all about the documentation. If an endpoint has pagination, there are usually a few parameters that will tip you off. If you see a parameter called “PageNo”, “PageSize”, or “Top”, you could run into paging.
It is also important to consider that you could avoid pagination with quality filtering. If your request is targeted and removes the noise of other records, it could avoid this problem all together.
In the coming weeks I will be doing an article specifically on paging, so keep an eye out for it!
Token Authentication
Another challenge you might encounter is token based authentication. Instead of passing an API key for authentication, you would need to call another endpoint to receive a token. Once you have the token, you would pass it through a header.
If you encounter this situation, you would need to create a function in Power Query that would solely return a token. While it takes a few extra steps, it is likely reusable with other endpoints in your report. In addition to that, it will simplify your query code and keep it tight and compact.
Conclusion
Congratulations for making it to the end of this series. There are a lot of challenges when implementing REST APIs as a data source but the juice is worth the squeeze. Taking yourself out of the mix and automating my refreshes is the end goal. And while the experience may not be as efficient as a SQL data source, it still gets the job done.
Have you been following along? Were you able to successfully connect and extract data from Yahoo Finance? How about other REST APIs? Let me know in the comments below!
1 Pingback