SharePoint Audit Using Purview

Today, I had a customer ask the following question:
“How can I pull a report showing all of the files in a library that have not been viewed?”
Typically, users request a report showing all the items or files accessed, so this request was unique.

You can run an audit search on most everything in a cloud tenant using Microsoft Purview. Every file downloaded, viewed, list item opened, edited, deleted, page views, Onedrive actions, Teams actions, and the list goes on and on.

In Purview, click on the Audit link in the left nav, and it will open the audit search page.
Select the time range you want to target
Activities: Accessed file
Record types: SharePointFileOperation
Search name: this can be anything you want, i.e. SP Library Search
File, folder, or site: https://taco.sharepoint.com/sites/test/TheLibrary/*
Workload: SharePoint
The key items to note are the record type and file options. You can use a wildcard * to return results for everything in the target library. This will return much information, so filtering after the report is downloaded is needed. Once you’ve populated the fields, click Search, then wait a bit for it to complete. The amount of data in your tenant and current workloads will determine how long the search will take.


The completed search will look like this:

Clicking on the report name will open a detailed view of the audit search. From the results page, click the Export button and wait a few minutes for the file to be generated. If the page gets stuck at 0%, refresh your browser, and it should trigger the download.

Next, I needed to get all the files in the SharePoint library. To do this, I used PowerShell to connect to the target site and then downloaded the file info to a CSV.

# Connect to the SharePoint site interactively
Connect-PnPOnline -Url "https://taco.sharepoint.com/sites/test" -Interactive

# Specify the target library
$libraryName = "TheLibrary"

# Get all files from the target library
$files = Get-PnPListItem -List $libraryName -Fields "FileLeafRef", "ID", "FileRef", "Created", "Modified", "UniqueId", "GUID"

# Create an empty array to store the file metadata
$fileMetadata = @()

# Loop through each file and extract the relevant metadata
foreach ($file in $files) {
    $fileMetadata += [PSCustomObject]@{
        FileName = $file["FileLeafRef"]
        ID = $file["ID"]
        GUID = $file["GUID"]
        UniqueId = $file["UniqueId"]
        URL = $file["FileRef"]
        Created = $file["Created"]
        Modified = $file["Modified"]
    }
}

# Export the file metadata to a CSV file
$fileMetadata | Export-Csv -Path "C:\code\library_audit.csv" -NoTypeInformation

If you take anything away from this post, please take note of this: Purview uses a field named ListItemUniqueId to identify a SharePoint file or list item. My first thought was to use the GUID from the SharePoint library to match up to the Purview data. This is 100% incorrect! From SharePoint, UniqueId is the field that matches the Purview field ListItemUniqueId.

Basic logic:
SELECT SharePoint.*
FROM SharePoint
INNER JOIN Purview
ON SharePoint.UniqueId = Purview.ListItemUniqueId

I used Power BI to format and mash the exported Purview data with the SharePoint data. Power BI is unnecessary; you can easily use Power Query in Excel to do the same thing. Below, I’m including my M code statement that parses the JSON from the Purview file, and counts how many times the files were accessed and the last time they were accessed.

let
    Source = Csv.Document(File.Contents("C:\ian240329\Purview_Audit.csv"),[Delimiter=",", Columns=8, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"RecordId", type text}, {"CreationDate", type datetime}, {"RecordType", Int64.Type}, {"Operation", type text}, {"UserId", type text}, {"AuditData", type text}, {"AssociatedAdminUnits", type text}, {"AssociatedAdminUnitsNames", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type",{{"AuditData", Json.Document}}),
    #"Expanded AuditData" = Table.ExpandRecordColumn(#"Parsed JSON", "AuditData", {"ListItemUniqueId", "SourceFileExtension", "SourceFileName", "ObjectId"}, {"ListItemUniqueId", "SourceFileExtension", "SourceFileName", "ObjectId"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded AuditData",{"AssociatedAdminUnits", "AssociatedAdminUnitsNames", "RecordId"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"ObjectId", "File URL"}, {"SourceFileName", "File Name"}, {"SourceFileExtension", "File Extension"}}),
    #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([File Extension] <> "aspx")),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each true),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Operation", "RecordType"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns1", {"ListItemUniqueId"}, {{"View Count", each Table.RowCount(_), Int64.Type}, {"Last Viewed", each List.Max([CreationDate]), type nullable datetime}})
in
    #"Grouped Rows"

Still working in Power Query, I created a new query to show what SharePoint files had not been accessed. My Purview license is limited to 6 months‘ worth of data, so this is one hindrance to painting a full picture of what has/has not been accessed.

let
    Source = Table.NestedJoin(SharePoint_Audit, {"UniqueId"}, Purview_Audit, {"ListItemUniqueId"}, "Purview_Audit", JoinKind.LeftAnti),
    #"Expanded Purview_Audit" = Table.ExpandTableColumn(Source, "Purview_Audit", {"File Name"}, {"Purview_Audit.File Name"}),
    #"Sorted Rows1" = Table.Sort(#"Expanded Purview_Audit",{{"Purview_Audit.P File Name", Order.Descending}}),
    #"Filtered Rows1" = Table.SelectRows(#"Sorted Rows1", each true)
in
    #"Filtered Rows1"

With this data, I can now report to the user what files have not been accessed in the past 6 months.


Using New-PnPSite With A Multi Geo Tenant

If you try to run the PnP PowerShell command New-PnPSite using a managed identity or App Registration, in a multi-geo tenant, it will create the site in the default tenant. To get around this, you can use the PreferredDataLocation parameter to set the desired location, but you’ll also need to update your MS Graph permissions.


If you run the New-PnPSite command with the -PreferredDataLocation parameter and your permission are not correct, you will receive this error:

Code: Authorization_RequestDenied Message: The requesting principal is not authorized to set group preferred data location.

Open your App Registration and add the following MS Graph application permissions:
Group.Create, Group.ReadWrite.All, Directory.ReadWrite.All

Other people who had the same issue:
https://github.com/pnp/powershell/issues/2629
https://github.com/pnp/PnP-PowerShell/issues/2682
https://learn.microsoft.com/en-us/answers/questions/1099399/unable-to-create-modern-team-site-using-pnp-powers

Complete list of the geo codes can be found here:
https://learn.microsoft.com/en-us/microsoft-365/enterprise/multi-geo-add-group-with-pdl?view=o365-worldwide#geo-location-codes

Set of geo codes as of March 2024:

Modernizing Authentication in SharePoint Online

Starting a year or two ago, Microsoft announced it would stop supporting and/or blocking access to Azure Access Control Services (ACS) and the SharePoint Add-In model. This is important because ACS has been used for many years to grant app/script API access to a SharePoint site(s), and you likely have many sites where this has been used. Moving forward, Azure Access Control (AAC) will be used in place of ACS.

Historically, you would start the permissions journey by generating the client ID and secret at this endpoint:
_layouts/15/AppRegNew.aspx
From there, you grant the newly created identity access to a tenant, sites, lists, libraries, or a combination.
_layouts/15/appinv.aspx
The other option was to create an Azure App Registration and then grant it access to the target objects. When working with SharePoint Online and AppRegNew.aspx, the App Registration is generated automatically. Depending on what is/is not configured, this can be an issue and set off alarms in Azure.

With that out of the way, how do you wire-up a new connection to SharePoint Online, allowing PowerShell, Python, script, or app access to the SharePoint API or the Microsoft Graph API?

Overview of what I’m doing:
Create a self-signed cert
Add the cert to your personal cert store and upload it to Azure, creating an App Registration
Adjust permissions as needed
Grant the App Registration access to a specific SharePoint site
Use the newly created credentials to access the SharePoint site

#create cert with a password
New-PnPAzureCertificate `
    -CommonName "Demo_SP_Azure_2024" `
    -OutPfx c:\code\Demo_SP_Azure_2024.pfx `
    -OutCert c:\code\Demo_SP_Azure_2024.cer `
    -CertificatePassword (ConvertTo-SecureString -String "Taco@Good" -AsPlainText -Force) `
    -ValidYears 1

#import the cert. for this to work, run as Admin.
Import-PfxCertificate `
    -Exportable `
    -CertStoreLocation Cert:\LocalMachine\My `
    -FilePath c:\code\Demo_SP_Azure_2024.pfx `
    -Password (ConvertTo-SecureString -String "Taco@Good" -AsPlainText -Force)

I highly suggest not skipping the ‘-interactive’ part for the next command. It will open a browser window where you must authenticate with an account with adequate permissions to create a new App Registration in Azure. The script’s most important thing to note is SharePointApplicationPermissions Site.Selected. Why is this important? This is extremely useful if you want to limit permissions to a single SharePoint site and not every site in the tenant.

Register-PnPAzureADApp `
   -ApplicationName Demo_SP_Azure_2024 `
   -Tenant tacoranch.onmicrosoft.com `
   -Store CurrentUser `
   -SharePointApplicationPermissions "Sites.Selected" `
   -Interactive

After that runs, the output will include the Client ID and Certificate Thumprint. Take note of it, or you can open Azure, navigate to App Registrations, and select all applications. In the left nav, click Certificates & secrets, where you’ll find the thumbprint; again, in the left nav, click Overview, and you’ll see the Application ID, aka Client ID.

In the next two commands, you will connect to the SharePoint Admin site interactive, then grant the newly created App Registration write access to the target SharePoint site.

Connect-PnPOnline -Url "https://tacoranch-admin.sharepoint.com" `
    -Interactive

#grant the App Reg write access to the site
Grant-PnPAzureADAppSitePermission `
    -AppId 'a95ddafe-6eca-488a-b26a-dc62a64d9105' `
    -DisplayName 'Demo_SP_Azure_2024' `
    -Site 'https://tacoranch.sharepoint.com/sites/cert-demo' `
    -Permissions Write


Now that the App Registration can access the SharePoint site, how do you connect to it using the certificate and thumbprint?

Connect-PnPOnline `
    -Tenant "tacoranch.onmicrosoft.com" `
    -Url "https://tacoranch.sharepoint.com/sites/cert-demo" `
    -ClientId "a95ddafe-6eca-488a-b26a-dc62a64d9105" `
    -Thumbprint "5C5891197B54B9535D171D6D9DD7D6D351039C8E" 

Get-PnPList | Select-Object Title

Using the above commands, you can create a cert, register it in Azure, and grant access to a single SharePoint site.

I’ve included a copy of the full script here:
https://www.sharepointed.com/wp-content/uploads/2024/03/Azure-App-Reg-Cert-Demo.txt

Error(s) and fixes:
Error:
Grant-PnPAzureADAppSitePermission: {"error":{"code":"accessDenied","message":"Access denied","innerError":{"date":"2024-03-21T16:29:47","request-id":"","client-request-id":""}}}
Fix:
Ensure the account running this command Grant-PnPAzureADAppSitePermission , has access to the target SharePoint site.

Error:
Connect-PnPOnline: A configuration issue is preventing authentication - check the error message from the server for details. You can modify the configuration in the application registration portal. See https://aka.ms/msal-net-invalid-client for details. Original exception: AADSTS700027: The certificate with identifier used to sign the client assertion is not registered on application. [Reason - The key was not found., Thumbprint of key used by client: '6C5891197B54B9535D179D6D9DD7D6D351039D8Z', Please visit the Azure Portal, Graph Explorer or directly use MS
Graph to see configured keys for app Id 'eb7f9fbc-f4ee-4a48-a008-49d6bcdc6c40'. Review the documentation at https://docs.microsoft.com/en-us/graph/deployments to determine the corresponding service endpoint and https://docs.microsoft.com/en-us/graph/api/application-get?view=graph-rest-1.0&tabs=http to build a query request URL, such as 'https://graph.microsoft.com/beta/applications/aeb7f9fbc-f4ee-4a48-a008-49d6bcdc6c40']. Trace ID: 3e1e7ab3-60c0-4126-acb8-e2fdb6e28000 Correlation ID: 62ddc80b-aeb2-49c5-8c31-83a04e70bf6e Timestamp: 2024-04-01 12:21:59Z

Fix:
This one is easy; ensure you use the correct Client ID and thumbprint. You can get this from the app registration page in the Azure portal.

How do you find ALL the Flows that reference a SharePoint site or list?

I asked this question when I first started down the path of learning about Flow:
How do you find all the Flows running on or referencing a SharePoint list?

UPDATE / EDIT – READ THIS Part
Before you start on this, please ensure that your account or the account you are using to run the script has sufficient permissions to the target environment(s).

$oneFlow = Get-AdminFlow -FlowName "00000-ae95-4cab-96d8-0000000" -EnvironmentName "222222-4943-4068-8a2d-11111111"

$refResources = $oneFlow.Internal.properties.referencedResources
Write-Host $refResources



If you run that command and look at the returned properties and see an error, that means you do not have the correct permissions to move forward. You can check your permissions in the Power Platform admin center: https://admin.powerplatform.microsoft.com/

/end of update

Think about it: someone in your company creates a Flow that runs when a SharePoint item is updated. Fast forward a year or so, and that coworker has moved on, and the Flow needs to be updated. If you work for a small company or one that hasn’t fallen in love with Power Platform and Flow, you’re likely in luck, and finding the Flow will take a few minutes. In my case, there are currently 2,712 Flows in my tenant that span several environments.

The PowerShell script I’ve created will query a tenant using the Get-AdminFlow command, return all Flows, and then loop through them. The script can be adjusted to target a single environment using the EnvironmentName parameter. Note: running the script using the Get-Flow action will return all the Flows your AD account can access.

#Install-Module AzureAD
#Install-Module -Name Microsoft.PowerApps.Administration.PowerShell  
#Install-Module -Name Microsoft.PowerApps.PowerShell -AllowClobber 

#connect-AzureAD

function Get-UserFromId($id) {
    try {
        $usr = Get-AzureADUser -ObjectId $id
        return $usr.displayName
    }
    catch {
        return $null
    }
}

#get all flows in the tenant
$adminFlows = Get-AdminFlow 

#set path for output
$Path = "$([Environment]::GetFolderPath('Desktop'))\Flow_Search_for_SharePoint_$(Get-Date -Format "yyyyMMdd_HHmmss").csv"

#set target site
$targetSPSite = "https://yourTenant.sharepoint.com/sites/yourSITE"
$targetSPList = "4f4604d2-fa8f-4bae-850f-4908b4708b07"
$targetSites = @()

foreach ($gFlow in $adminFlows) {

    #check if the flow references the target site
    $refResources = $gFlow.Internal.properties.referencedResources | Where-Object { $_.resource.site -eq $targetSPSite }

    #check if the flow references the target list
    #$refResources = $gFlow.Internal.properties.referencedResources | Where-Object { $_.resource.list -eq $targetSPList }

    if ($refResources -ne $null) {

        #optional - get the user who created the Flow
        $createdBy = Get-UserFromId($gFlow.internal.properties.creator.userId)

        $row = @{}
        $row.Add("EnvironmentName", $gFlow.EnvironmentName)
        $row.Add("Name", $gFlow.DisplayName)
        $row.Add("FlowEnabled", $gFlow.Enabled)
        $row.Add("FlowGUID", $gFlow.FlowName)
        $row.Add("CreatedByUser", $createdBy)
        $row.Add("CreatedDate", $gFlow.CreatedTime)
        $row.Add("LastModifiedDate", $gFlow.lastModifiedTime)
        
        $targetSites += $(new-object psobject -Property $row)
    }
}

#output to csv
$targetSites | Export-Csv -Path $Path -NoTypeInformation

If you don’t want to get the display name of the user who created the Flow, comment out the part of the script that calls the Get-UserFromId function, and you won’t need to connect to Azure.

And to answer my original question: How do you find all the Flows running on or referencing a SharePoint list?
In the script, comment out the part of the script that references $targetSPSite and un-comment $targetSPList. You can get the GUID of the list by navigating to list settings and looking at the URL. Another option is to open the list, view the Page Source, then look for the “listId” property.

In a future post(s), I will outline how to search for all Flows that use different connectors, Dynamics 365 tables (dataverse), triggered from Power Apps, or other objects. All of the info is in the properties of the Flow; getting to it can be a little fun.

Use PowerShell PNP to Create an Alphabetical Directory of Folders in SharePoint

I’m in the process of reorganizing a document library and wanted to store all of the documents in alphabetical folders. Yes, I’m using metadata, but I’ve passed the magic 5,000 item threshold and want to rearrange the library and leverage a rich search experience.

So, using PowerShell, how do you create a bunch of folders going from A to Z?

$siteURL = "https://sharepointed.sharepoint.com/sites/parent/child"

$conn = Connect-PnPOnline -Url $siteURL -Credentials (Get-Credential) -ReturnConnection

try{
(65..(65+25)).ForEach({     
$xy = [char]$_    
Add-PnPFolder -Name $xy -Folder "/mylibrary" -Connection $conn
})
}

catch{ Write-host -f Red "Error:" $_.Exception.Message}

More information about creating folders using ASCII:
https://devblogs.microsoft.com/scripting/use-powershell-and-ascii-to-create-folders-with-letters/

Get Files From a SharePoint Folder Using PowerShell PNP

How do you get all the files from a folder in SharePoint using PowerShell PNP?

$devConn = Connect-PnPOnline -Url "https://sharepointed.sharepoint.com/sites/siteA/siteB" -Credentials -Credentials (Get-Credential) -ReturnConnection

$folderName = "/Shared Documents/myfolder/anotherfolder"

$folderItems = Get-PnPFolderItem -FolderSiteRelativeUrl $folderName -Connection $devConn

foreach($item in $folderItems)
{
    Write-Host $item.Name
}

Write-Host "done"

Depending on your needs, you could also use a search query with a path filter to get the files.

Example of using the Get-PnPListItem cmdlet with the FolderServerRelativeUrl parameter.

$devConn = Connect-PnPOnline -Url "https://sharepointed.sharepoint.com/sites/siteA/siteB" -Credentials -Credentials (Get-Credential) -ReturnConnection

$folderName = "/sites/spdev2/bw2/Shared Documents/myfolder/anotherfolder"

$folderItems = Get-PnPListItem -List "Shared Documents" -FolderServerRelativeUrl $folderName -Connection $devConn 

foreach($item in $folderItems)
{
   Write-Host $item
}
    

Get-PnPSearchCrawlLog Filter to a List or Library

Using the Get-PnpSearchCrawlLog cmdlet wanted to filter the returned result set to a specific list. Before you begin, you’ll want to make sure you have access to the Crawl Log: https://yourSite-admin.sharepoint.com/_layouts/15/searchadmin/crawllogreadpermission.aspx

Connect-PnPOnline -Url "https://sharepointed.sharepoint.com/sites/food" -Credentials (Get-Credential)

$logs = Get-PnPSearchCrawlLog  -filter "https://sharepointed.sharepoint.com/sites/food/Lists/tacos/"

foreach($l in $logs)
{
    Write-Host "    "
    Write-Host $l.Url
    Write-Host $l.ItemId
    Write-Host $l.LogLevel
    Write-Host $l.CrawlTime
}

This will filter the returned results to a specific list. Note: when using Connect-PnPOnline I use my email address and App Password. App Password can be created/found here: https://account.activedirectory.windowsazure.com/AppPasswords.aspx

Get-PnPSearchCrawlLog details: https://docs.microsoft.com/en-us/powershell/module/sharepoint-pnp/get-pnpsearchcrawllog?view=sharepoint-ps

The Web application at X could not be found.

Error: The Web application at https://sharepoint.sharepointed.com could not be found. Verify that you have typed the URL correctly. If the URL should be serving existing content, the system administrator may need to add a new request URL mapping to the intended application.

I created a .net console app to update some stuff in SharePoint.  I received the above error when executing the .exe file with a new service account. 

First, I tried granting Shell access to the content db I was working with, but that didn’t solve the problem.

$cDb = Get-SPContentDatabase -site "https://taco.sharepointed.com/" Add-SPShellAdmin -UserName "domain\userAccount -database $cDb

Running the same command without the database switch fixed my problem.

Add-SPShellAdmin -UserName "domain\userAccount"

Use PowerShell to Execute SharePoint Search Queries

In this example, I’m narrowing my search to one library and a search term.
At a high level, the script searches the FoodSite for the word GoodTaco.

function Query-SPSearch {
    param(
        [Parameter(Mandatory=$true)][String]$WebApplicationPath,
        [Parameter(Mandatory=$true)][String]$KeywordQuery,
        [Parameter()][Int32]$Count = 10
    )
 
    $QueryXml = @"
 
<QueryPacket xmlns="urn:Microsoft.Search.Query" >
    <Query>
        <Context>
            <QueryText type="STRING">$KeywordQuery</QueryText>
        </Context>
        <Range>
            <Count>$Count</Count>
        </Range>    
        <IncludeSpecialTermResults>false</IncludeSpecialTermResults>
        <PreQuerySuggestions>false</PreQuerySuggestions>
        <HighlightQuerySuggestions>false</HighlightQuerySuggestions>
        <IncludeRelevantResults>true</IncludeRelevantResults>
        <IncludeHighConfidenceResults>false</IncludeHighConfidenceResults>
    </Query>
</QueryPacket>
"@
    $ServicePath = "/_vti_bin/search.asmx"
    $SearchWS = New-WebServiceProxy -Uri ($WebApplicationPath + $ServicePath) -UseDefaultCredential
    $Results = $SearchWS.QueryEx( $QueryXml )
    # we excluded all other result sets, but just in case get the one we want:
    $Results.Tables["RelevantResults"]
}
 
Query-SPSearch -WebApplicationPath "https://sharepointed.com/sites/foodsite" -KeywordQuery "GoodTaco AND path:https://sharepointed.com/sites/foodsite/tacos" -Count 20 | Format-Table Title, Author, Path