Use Python to upload a LARGE file to SharePoint

In this post, I will quickly show how to use the Office365-REST-Python-Client library to upload a large file to a SharePoint library.

For this to work, you will need a certificate, Azure App registration, and access to the target SharePoint site. I outlined all the necessary parts in this post: Modernizing Authentication in SharePoint Online Note: the linked post will output a .PFX cert, and the script below will need a .PEM cert. You can use this Python command to convert the cert:

from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography import x509
from cryptography.hazmat.primitives.serialization import pkcs12

# Load the PFX file
pfx_file = open('C:\\path_to_cert\\EXAMPLE.pfx', 'rb').read()  # replace with your pfx file path
(private_key, certificate, additional_certificates) = pkcs12.load_key_and_certificates(pfx_file, None, default_backend())

with open('NewCERT.pem', 'wb') as f:
    f.write(private_key.private_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PrivateFormat.TraditionalOpenSSL,
        encryption_algorithm=serialization.NoEncryption()
    ))
    f.write(certificate.public_bytes(serialization.Encoding.PEM))

# install this library if needed
# pip install cryptography

Ok, with that out of the way, you can use this script to upload to a SharePoint library. In the script, I’ve commented out the line that would be used to upload to a folder within a library.

import os
from office365.sharepoint.client_context import ClientContext

cert_credentials = {
    "tenant": "abcxyz-1234-4567-8910-0e3d638792fb",
    "client_id": "abcddd-4444-4444-cccc-123456789111",
    "thumbprint": "7D8D8DF7D8D2F4DF8DF45D4FD8FD48DF5D8D",
    "cert_path": "RestClient\\NewCERT.pem"
}
ctx = ClientContext("https://tacoranch.sharepoint.com/sites/python").with_client_certificate(**cert_credentials)
current_web = ctx.web.get().execute_query()
print("{0}".format(current_web.url))

filename = "LargeExcel.xlsx"
folder_path = "C:\\code\py"

def print_upload_progress(offset):
    # type: (int) -> None
    file_size = os.path.getsize(local_path)
    print(
        "Uploaded '{0}' bytes from '{1}'...[{2}%]".format(
            offset, file_size, round(offset / file_size * 100, 2)
        )
    )

#upload to a folder
#target_url = "Shared Documents/folderA/folderB"

target_url = "Shared Documents"
target_folder = ctx.web.get_folder_by_server_relative_url(target_url)
size_chunk = 1000000
local_path = os.path.join(folder_path, filename)
with open(local_path, "rb") as f:
    uploaded_file = target_folder.files.create_upload_session(
        f, size_chunk, print_upload_progress, filename
    ).execute_query()

print("File {0} has been uploaded successfully".format(uploaded_file.serverRelativeUrl))

If you receive an error stating you don’t have access, double-check that you’ve added the App Registration to the target SharePoint site permissions. Again, this is noted in the blog post linked at the being of this post.

Consider this a workaround until MS Graph is out of its latest beta and there’s more support for easily uploading to SharePoint.

What if you need to upload a file and set a column value? When working with SharePoint via the API, you must be mindful of the column names. The column name in the UI might not be the same as the internal name, so I will use the script above as my starting point and add the following script to the end. In this example, I’m setting two fields: ReportName and ReportDate.

#get the file that was just uploaded
file_item = uploaded_file.listItemAllFields

# Define a dictionary of field names and their new values
fields_to_update = {
    "ReportName": "My TPS Report",
    "ReportDate": datetime.datetime.now().isoformat(),
    # Add more fields here as needed
}

# Iterate over the dictionary and update each field
for field_name, new_value in fields_to_update.items():
    file_item.set_property(field_name, new_value)

# Commit the changes
file_item.update()
ctx.execute_query()

print("Report fields were updated")

How do you get a list of all the columns in a list or library? The script below will output all the column’s internal and display names.

from office365.sharepoint.client_context import ClientContext

cert_credentials = {
    "tenant": "abcxyz-1234-4567-8910-0e3d638792fb",
    "client_id": "abcddd-4444-4444-cccc-123456789111",
    "thumbprint": "7D8D8DF7D8D2F4DF8DF45D4FD8FD48DF5D8D",
    "cert_path": "RestClient\\NewCERT.pem"
}

ctx = ClientContext("https://tacoranch.sharepoint.com/sites/python").with_client_certificate(**cert_credentials)
current_web = ctx.web.get().execute_query()
print("{0}".format(current_web.url))

# Get the target list or library
list_or_library = ctx.web.lists.get_by_title('TPS-Reports')

# Load the fields
fields = list_or_library.fields.get().execute_query()

# Print the field names
for field in fields:
    print("Field internal name: {0}, Field display name: {1}".format(field.internal_name, field.title))



How to upload a large file to SharePoint using the Microsoft Graph API

What started as a simple question from a co-worker turned into a rabbit hole exploration session that lasted a bit longer than anticipated. ‘Hey, I need to upload a report to SharePoint using Python.’

In the past, I’ve used SharePoint Add-in permissions to create credentials allowing an external service, app, or script to write to a site, library, list, or all of the above. However, the environment I’m currently working in does not allow Add-in permissions, and Microsoft has been slowly depreciating the service for a long time.

As of today (March 18, 2024) this is the only way I could find to upload a large file to SharePoint. Using the MS Graph SDK, you can upload files smaller than 4mb, but that is useless in most cases.

For the script below, the following items are needed:
Azure App Registration:
Microsoft Graph application permissions:
Files.ReadWrite.All
Sites.ReadWrite.All
SharePoint site
SharePoint library (aka drive)
File to test with

import requests
import msal
import atexit
import os.path
import urllib.parse
import os

TENANT_ID = '19a6096e-3456-7890-abcd-19taco8cdedd'
CLIENT_ID = '0cd0453d-cdef-xyz1-1234-532burrito98'
CLIENT_SECRET  = '.i.need.tacos-and.queso'
SHAREPOINT_HOST_NAME = 'tacoranch.sharepoint.com'
SITE_NAME = 'python'
TARGET_LIBRARY = 'reports'
UPLOAD_FILE = 'C:\\code\\test files\\LargeExcel.xlsx'
UPLOAD_FILE_NAME = 'LargeExcel.xlsx'
UPLOAD_FILE_DESCRIPTION = 'A large excel file' #not required

AUTHORITY = 'https://login.microsoftonline.com/' + TENANT_ID
ENDPOINT = 'https://graph.microsoft.com/v1.0'

SCOPES = [
    'Files.ReadWrite.All',
    'Sites.ReadWrite.All'
]

cache = msal.SerializableTokenCache()

if os.path.exists('token_cache.bin'):
    cache.deserialize(open('token_cache.bin', 'r').read())

atexit.register(lambda: open('token_cache.bin', 'w').write(cache.serialize()) if cache.has_state_changed else None)

SCOPES = ["https://graph.microsoft.com/.default"]

app = msal.ConfidentialClientApplication(CLIENT_ID, authority=AUTHORITY, client_credential=CLIENT_SECRET, token_cache=cache)

result = None
result = app.acquire_token_silent(SCOPES, account=None)

drive_id = None

if result is None:
    result = app.acquire_token_for_client(SCOPES)

if 'access_token' in result:
    print('Token acquired')
else:
    print(result.get('error'))
    print(result.get('error_description'))
    print(result.get('correlation_id')) 

if 'access_token' in result:
    access_token = result['access_token']
    headers={'Authorization': 'Bearer ' + access_token}

    # get the site id
    result = requests.get(f'{ENDPOINT}/sites/{SHAREPOINT_HOST_NAME}:/sites/{SITE_NAME}', headers=headers)
    result.raise_for_status()
    site_info = result.json()
    site_id = site_info['id']

    # get the drive / library id
    result = requests.get(f'{ENDPOINT}/sites/{site_id}/drives', headers=headers)
    result.raise_for_status()
    drives_info = result.json()
    
    for drive in drives_info['value']:
        if drive['name'] == TARGET_LIBRARY:
            drive_id = drive['id']
            break

    if drive_id is None:
        print(f'No drive named "{TARGET_LIBRARY}" found')

    # upload a large file to
    file_url = urllib.parse.quote(UPLOAD_FILE_NAME)
    result = requests.post(
        f'{ENDPOINT}/drives/{drive_id}/root:/{file_url}:/createUploadSession',
        headers=headers,
        json={
            '@microsoft.graph.conflictBehavior': 'replace',
            'description': UPLOAD_FILE_DESCRIPTION,
            'fileSystemInfo': {'@odata.type': 'microsoft.graph.fileSystemInfo'},
            'name': UPLOAD_FILE_NAME
        }
    )

    result.raise_for_status()
    upload_session = result.json()
    upload_url = upload_session['uploadUrl']

    st = os.stat(UPLOAD_FILE)
    size = st.st_size
    CHUNK_SIZE = 10485760
    chunks = int(size / CHUNK_SIZE) + 1 if size % CHUNK_SIZE > 0 else 0
    with open(UPLOAD_FILE, 'rb') as fd:
        start = 0
        for chunk_num in range(chunks):
            chunk = fd.read(CHUNK_SIZE)
            bytes_read = len(chunk)
            upload_range = f'bytes {start}-{start + bytes_read - 1}/{size}'
            print(f'chunk: {chunk_num} bytes read: {bytes_read} upload range: {upload_range}')
            result = requests.put(
                upload_url,
                headers={
                    'Content-Length': str(bytes_read),
                    'Content-Range': upload_range
                },
                data=chunk
            )
            result.raise_for_status()
            start += bytes_read

else:
    raise Exception('no access token')

In the script, I’m uploading the LargeExcel file to a library named reports in the python site. It is important to note that the words drive and library are used interchangeably when working with MS Graph. If you see a script example that does not specify a target library but only uses root, it will write the files to the default Documents / Shared Documents library.

Big thank you to Keath Milligan for providing the foundation of the script.
https://gist.github.com/keathmilligan/590a981cc629a8ea9b7c3bb64bfcb417

Use Python to Query a LARGE SharePoint list.

When querying a SharePoint list that has more than 5,000 items, you’ll likely receive an error like this:

This view cannot be displayed because it exceeds the list view threshold (5000 items) enforced by the administrator. 


Microsoft.SharePoint.SPQueryThrottledException', 'The attempted operation is prohibited because it exceeds the list view threshold.', "500 Server Error: Internal Server Error for url

Or, your query will only return the default 100 items. To get around this, pagination can be used to query the list and return ALL of the items.
Example:
all_items = list_to_export.items.paged(1000).get().execute_query()

Full script using VS Code:

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext


app_settings = {
    'url': 'https://taco.sharepoint.com/sites/queso/',
    'client_id': 'ID here',
    'client_secret': 'shhhh its a secret',
}

context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'], client_secret=app_settings['client_secret'])

#connect to the site
ctx = ClientContext(app_settings['url'], context_auth)
ctx.execute_query()

#get the target list
list_title = "List of Tacos"
list_to_export = ctx.web.lists.get_by_title(list_title)

#get all of the list items
all_items = list_to_export.items.paged(1000).get().execute_query()
list_items = [item for item in all_items]

print("Item count: {0}".format(len(list_items)))

The example above connects to a SharePoint site using a client ID and secret, then queries the list. Again, the key here is using pagination (paged). You can adjust the page size to better fit your needs, but be sure to leave it under 5,000, or you will be back to square one.

Download a File From SharePoint Online Using Python

How do you download a file from a SharePoint Online library using Python?

Update – If you scroll to the bottom, I’ve outlined another approach that uses a username and password to connect via the SharePlum library.

Items needed to run the script in this example:
Office365 Rest Python Client library:
https://pypi.org/project/Office365-REST-Python-Client/
SharePoint App Only Client Id and Secret:
Microsoft documentation:
https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
You can create an app principle that is limited to a single site, list, library, or a combination of them:
https://piyushksingh.com/2018/12/26/register-app-in-sharepoint/

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

app_settings = {
    'url': 'https://YOURtenant.sharepoint.com/sites/somesite/',
    'client_id': '12344-abcd-efgh-1234-1a2d12a21a2121a',
    'client_secret': 'Oamytacohungry234343224534543=',
}

context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'], client_secret=app_settings['client_secret'])

ctx = ClientContext(app_settings['url'], context_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()

response = File.open_binary(ctx, "/Shared Documents/Invoice.pdf")
with open("./Invoice.pdf", "wb") as local_file:
    local_file.write(response.content)

If the above script does not work, step back and ensure you are connected to the site. The following script connects to a site and outputs its title. This is useful to validate that a site connection can be made.

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

app_settings = {
    'url': 'https://YOURtenant.sharepoint.com/sites/somesite/',
    'client_id': '12344-abcd-efgh-1234-1a2d12a21a2121a',
    'client_secret': 'Oamytacohungry234343224534543=',
}

context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'], client_secret=app_settings['client_secret'])

ctx = ClientContext(app_settings['url'], context_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()

print("Site title: {0}".format(web.properties['Title']))

SharePlum connection example using a username and password to connect to SharePoint Online. More details about SharePlum can be found here: https://github.com/jasonrollins/shareplum

from shareplum import Site
from shareplum import Office365

sharepoint_url = 'https://YOURtenant.sharepoint.com/sites/spdev'
username = 'You@YourDomain.com'
password = 'Password'

authcookie = Office365('https://YOURtenant.sharepoint.com',
                       username=username,
                       password=password).GetCookies()
site = Site('https://YOURtenant.sharepoint.com/sites/DEV/',
            authcookie=authcookie)
sp_list = site.List('Your List')
data = sp_list.GetListItems('All Items', row_limit=200)

If you get this error, you won’t be able to connect with a username and password, and you’ll need to use an App Password.

File “C:\Python311\Lib\site-packages\shareplum\office365.py”, line 80, in get_security_token
raise Exception(‘Error authenticating against Office 365. Error from Office 365:’, message[0].text)
Exception: (‘Error authenticating against Office 365. Error from Office 365:’, “AADSTS50076: Due to a configuration change made by your administrator, or because you moved
to a new location, you must use multi-factor authentication to access ”.”)

I’ve created this post to outline how to upload a large file to a SharePoint library:
https://www.sharepointed.com/2024/03/how-to-upload-a-large-file-to-sharepoint-using-the-microsoft-graph-api/