SharePoint Audit Using Purview

Today, I had a customer ask the following question:
“How can I pull a report showing all of the files in a library that have not been viewed?”
Typically, users request a report showing all the items or files accessed, so this request was unique.

You can run an audit search on most everything in a cloud tenant using Microsoft Purview. Every file downloaded, viewed, list item opened, edited, deleted, page views, Onedrive actions, Teams actions, and the list goes on and on.

In Purview, click on the Audit link in the left nav, and it will open the audit search page.
Select the time range you want to target
Activities: Accessed file
Record types: SharePointFileOperation
Search name: this can be anything you want, i.e. SP Library Search
File, folder, or site: https://taco.sharepoint.com/sites/test/TheLibrary/*
Workload: SharePoint
The key items to note are the record type and file options. You can use a wildcard * to return results for everything in the target library. This will return much information, so filtering after the report is downloaded is needed. Once you’ve populated the fields, click Search, then wait a bit for it to complete. The amount of data in your tenant and current workloads will determine how long the search will take.


The completed search will look like this:

Clicking on the report name will open a detailed view of the audit search. From the results page, click the Export button and wait a few minutes for the file to be generated. If the page gets stuck at 0%, refresh your browser, and it should trigger the download.

Next, I needed to get all the files in the SharePoint library. To do this, I used PowerShell to connect to the target site and then downloaded the file info to a CSV.

# Connect to the SharePoint site interactively
Connect-PnPOnline -Url "https://taco.sharepoint.com/sites/test" -Interactive

# Specify the target library
$libraryName = "TheLibrary"

# Get all files from the target library
$files = Get-PnPListItem -List $libraryName -Fields "FileLeafRef", "ID", "FileRef", "Created", "Modified", "UniqueId", "GUID"

# Create an empty array to store the file metadata
$fileMetadata = @()

# Loop through each file and extract the relevant metadata
foreach ($file in $files) {
    $fileMetadata += [PSCustomObject]@{
        FileName = $file["FileLeafRef"]
        ID = $file["ID"]
        GUID = $file["GUID"]
        UniqueId = $file["UniqueId"]
        URL = $file["FileRef"]
        Created = $file["Created"]
        Modified = $file["Modified"]
    }
}

# Export the file metadata to a CSV file
$fileMetadata | Export-Csv -Path "C:\code\library_audit.csv" -NoTypeInformation

If you take anything away from this post, please take note of this: Purview uses a field named ListItemUniqueId to identify a SharePoint file or list item. My first thought was to use the GUID from the SharePoint library to match up to the Purview data. This is 100% incorrect! From SharePoint, UniqueId is the field that matches the Purview field ListItemUniqueId.

Basic logic:
SELECT SharePoint.*
FROM SharePoint
INNER JOIN Purview
ON SharePoint.UniqueId = Purview.ListItemUniqueId

I used Power BI to format and mash the exported Purview data with the SharePoint data. Power BI is unnecessary; you can easily use Power Query in Excel to do the same thing. Below, I’m including my M code statement that parses the JSON from the Purview file, and counts how many times the files were accessed and the last time they were accessed.

let
    Source = Csv.Document(File.Contents("C:\ian240329\Purview_Audit.csv"),[Delimiter=",", Columns=8, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"RecordId", type text}, {"CreationDate", type datetime}, {"RecordType", Int64.Type}, {"Operation", type text}, {"UserId", type text}, {"AuditData", type text}, {"AssociatedAdminUnits", type text}, {"AssociatedAdminUnitsNames", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type",{{"AuditData", Json.Document}}),
    #"Expanded AuditData" = Table.ExpandRecordColumn(#"Parsed JSON", "AuditData", {"ListItemUniqueId", "SourceFileExtension", "SourceFileName", "ObjectId"}, {"ListItemUniqueId", "SourceFileExtension", "SourceFileName", "ObjectId"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded AuditData",{"AssociatedAdminUnits", "AssociatedAdminUnitsNames", "RecordId"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"ObjectId", "File URL"}, {"SourceFileName", "File Name"}, {"SourceFileExtension", "File Extension"}}),
    #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([File Extension] <> "aspx")),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each true),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Operation", "RecordType"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns1", {"ListItemUniqueId"}, {{"View Count", each Table.RowCount(_), Int64.Type}, {"Last Viewed", each List.Max([CreationDate]), type nullable datetime}})
in
    #"Grouped Rows"

Still working in Power Query, I created a new query to show what SharePoint files had not been accessed. My Purview license is limited to 6 months‘ worth of data, so this is one hindrance to painting a full picture of what has/has not been accessed.

let
    Source = Table.NestedJoin(SharePoint_Audit, {"UniqueId"}, Purview_Audit, {"ListItemUniqueId"}, "Purview_Audit", JoinKind.LeftAnti),
    #"Expanded Purview_Audit" = Table.ExpandTableColumn(Source, "Purview_Audit", {"File Name"}, {"Purview_Audit.File Name"}),
    #"Sorted Rows1" = Table.Sort(#"Expanded Purview_Audit",{{"Purview_Audit.P File Name", Order.Descending}}),
    #"Filtered Rows1" = Table.SelectRows(#"Sorted Rows1", each true)
in
    #"Filtered Rows1"

With this data, I can now report to the user what files have not been accessed in the past 6 months.


Leave a Reply

Your email address will not be published. Required fields are marked *