Running Playwright as a Docker Container in AWS Lambda.
Playwright
Playwright has quickly become a go-to tool for end-to-end testing and browser automation. The Playwright team includes developers from Google's Puppeteer project and anyone who has worked with both can clearly see that it takes inspiration from Puppeteer but instead of copying it they improved upon it.
Understanding Serverless and AWS Lambdas
Serverless is definitely a buzzword in the tech industry, but there are places where it shines. Autoscaling can be hard to get right, but with serverless, all that complexity is handled by the cloud provider. All one has to do is write your specific code in the language of your choice and deploy it on the cloud where it will be run on demand.
Like with most of the internet, AWS leads the way in serverless with its Lambda service.
The Challenge with Playwright on Serverless Environments
So how do we go about running Playwright on AWS Lambda? If it was as simple as running
npm install
I would not be writing this article. The problem comes from the fact that
Lambdas runtime environment is a very bare bones linux distribution, it is like to this
have minimal cold-starts
and fast execution times. But this also means that it lacks many
of the libraries that Playwright and Chrome needs to run.
There are couple of solutions to this problem, one is to provide all the libraries that AWS Linux distribution is missing and provide a custom implementation of browser that can run in a serverless environment. Similar to packages such as this: chrome-aws-lambda. Or run on your own container directly in AWS Lambda.
Migration to Dockerized Playwright
We at sematext had been using Puppeteer with the above mentioned package for some time now. But the new features and improvements in Playwright and similarity in syntax with Puppeteer really convinced us to make the switch.
We had been burned by using third party chromium packages and things could easily break with one security patch from AWS. So we decided to have our own containers running only official Playwright images. This guarantees a stable runtime environment and kept us immune from changes or problems in the third party packages.
Setting Up Docker for AWS Lambdas
Thankfully the good people at Microsoft provide a Playwright Docker image that contains Playwright browsers and most of the dependencies needed to run it. We do need a few packages to run things in our container which I will mention later. Below is the Dockerfile that you can use directly to run Playwright in AWS Lambda.
Even though we’re using Playwright’s official image, we still need to
install packages like libcurl4-openssl-dev
and libtool
to ensure
Chrome runs properly. Another important point is the aws-lambda-ric package.
If we had used one of the official AWS images, those would already include
the necessary tools to launch Lambdas. However, since we’re using Playwright’s
image, we need to install this package separately. Additionally, because it’s
a large package, we cache its installation by installing it outside our
application code.
Regarding the application code, I haven’t included the package.json. You can add any packages your application requires in it. Just be sure to install both Playwright and Playwright Core, as they are not included in the base image.
Chrome Development Flags
Since we are running chrome in a limited environment, we need to disable some chrome-features which help us run chrome smoothly. The following flags helped us but you might not need all.
If you are looking to optimize Chrome, you can disable many other features. In fact, the official Playwright package disables several features, which you can review here.
At this point, our Playwright setup is ready for deployment on AWS Lambda. Next, let’s explore how to do that step by step.
Deploying to AWS Lambdas
Deployment can be divided into three parts: building the Docker image, pushing it to a ECR and finally deploying it to Lambda. Building the Docker image is simple and can be done with
In order to use Docker images with AWS Lambda, we need to push the image to ECR. First create a fresh repository using the AWS console then we tag our local image to be able to push to ECR. However before we can push the image we need to login with Docker in our AWS CLI. This can be done with the following command:
We tag our local image using the following command:
Finally we push the image to ECR with the following command:
Now that our image is in ECR, we can deploy it to Lambda. This can be done easily by creating a new Lambda function, selecting container image and browse image to select the image we just pushed to ECR.
Conclusion
At this stage you should have a working Playwright Lambda function capable of performing whatever tasks you need. Like I noted above this simplifies our Playwright execution and makes scaling up very simple. We have been using this new setup for many months and have seen improvements both in performance and stability.