Download files in Playwright Python
Whenever there is an action on the webpage, you will see a request generated in the network. When you download a file, it will generate a request with “download”, so we just have to listen for this request using Playwright.
Once we have a request, we can decide where to store the downloaded file or what is the current name of the downloading file or also read the file.
For this example, I will be using the below website but use any website as it does not matter.
https://filesamples.com/formats/csv
Method 1: Perform And Download
When it comes to initiating a download, different web pages have distinct methods. Some websites trigger the download by clicking a button, while others activate it by hovering over a link. In such cases use this method. This method can be useful when you know the file downloads immediately or with a few seconds delay.
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://filesamples.com/formats/csv")
with page.expect_download(timeout=30000) as downloading_file:
page.locator("(//*[@class='output'] //a)[1]").click()
Method and Action you can do on downloading file:
There few other operations you can do on the downloading file.
- downloading_file.is_done() – whether the downloading file is completed or not. If done returns true
- downloading_file.value.failure() – Whether download failed or not, if failed then return true
- downloading_file.value.cancel() – Cancels the currently downloading file
- downloading_file.value.url – Gives the URL of the file (note, it is not page URL)
- downloading_file.value.path() – Given value of where it is getting downloaded to in your local machine (gives temp path)
- downloading_file.value.suggested_filename – file name of the downloaded file
- downloading_file.value.save_as(path=”/path /where /to /save”) – Saves the file in custom path
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://filesamples.com/formats/csv")
with page.expect_download(timeout=30000) as downloading_file:
page.locator("(//*[@class='output'] //a)[1]").click()
print("suggested_filename", downloading_file.value.suggested_filename)
print("path", downloading_file.value.path())
print("url", downloading_file.value.url)
print("Is failed", downloading_file.value.failure())
downloading_file.value.save_as(path="/Users/pavan/Downloads/new_download_with_playwright.csv")
print("is done", downloading_file.is_done())
Once you download the file, you can read the file and check the memory or do whatever is your scenario.
Method 2: You do not know when the file downloads:
Sometimes you might not have an idea when the files are going to be downloaded, in Such cases you cannot use the above method where you have to wait for the file to start downloading.
So we will be listening to a particular action to happen from the webpage, for downloading file page.on("download")
occurs, so we have to implement our code in a way to handle it.
All the methods present in method 1 are present with downloading_file
variable here also.
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://filesamples.com/formats/csv")
def download_file_function(downloading_file):
print("hello")
print(downloading_file.path)
page.on("download", download_file)
page.locator("(//*[@class='output'] //a)[1]").click()
Posts You Might Like
- Why playwright is better than selenium webdriver, is it?
- Handle dropdowns in Playwright Python
- Open the browser and Close in Playwright Python
- Handle checkbox in Playwright Python
- Element Operations in Playwright Python
- Handle IFrames in Playwright Python
- Page level commands in Playwright Python
- Element State with Playwright Python