Part of a project I was working on required mashing up some data from SharePoint with data stored in datalake. We settled on creating a Databricks notebook to read an input file, query data lake using the input file, and then export an enriched file.
Here’s a high-level overview of what’s going to be created:
Call the notebook, parse the JSON response, loop until the notebook has finished, then respond to the notebook’s output.
In my case, triggering the notebook will require knowing its URL, bearer token, job id, and input parameters.
Parse the response from the HTTP call:
The notebook will take a little time to spin up, then process the input file. The best way to handle this is to leverage a basic do-until loop to check the status of the notebook job. I opted to use a one-minute delay, call the API to get the job status, parse the response, then evaluate if it’s finished.
One thing to note about the do until action, you don’t want it to run for eternity, and to avoid adding complexity to it, you don’t want to add extra evaluations like: if looped X times, stop
If you expand the Change limits option, you can set how many times it loops or change the duration. Here I’ve set the action to stop looping after 20 tries. For more info on this, please check SPGuides for a detailed overview.
The last step in the flow is to process the response from the notebook. If the job is success(full), get the file from blob storage and load it to SharePoint; otherwise, create a Slack alert.
That’s it; using the example above, you can trigger a Databricks notebook using a Flow.
When I set this up, my company allowed the use of Personal Access Tokens (PAT).
The PAT was then used in the Flow to trigger the notebook.