Frontend With HasanAbout Me

Running Playwright as a Docker Container in AWS Lambda.

Playwright

Playwright has quickly become a go-to tool for end-to-end testing and browser automation. The Playwright team includes developers from Google's Puppeteer project and anyone who has worked with both can clearly see that it takes inspiration from Puppeteer but instead of copying it they improved upon it.

Understanding Serverless and AWS Lambdas

Serverless is definitely a buzzword in the tech industry, but there are places where it shines. Autoscaling can be hard to get right, but with serverless, all that complexity is handled by the cloud provider. All one has to do is write your specific code in the language of your choice and deploy it on the cloud where it will be run on demand.

Like with most of the internet, AWS leads the way in serverless with its Lambda service.

The Challenge with Playwright on Serverless Environments

So how do we go about running Playwright on AWS Lambda? If it was as simple as running npm install I would not be writing this article. The problem comes from the fact that Lambdas runtime environment is a very bare bones linux distribution, it is like to this have minimal cold-starts and fast execution times. But this also means that it lacks many of the libraries that Playwright and Chrome needs to run.

There are couple of solutions to this problem, one is to provide all the libraries that AWS Linux distribution is missing and provide a custom implementation of browser that can run in a serverless environment. Similar to packages such as this: chrome-aws-lambda. Or run on your own container directly in AWS Lambda.

Migration to Dockerized Playwright

We at sematext had been using Puppeteer with the above mentioned package for some time now. But the new features and improvements in Playwright and similarity in syntax with Puppeteer really convinced us to make the switch.

We had been burned by using third party chromium packages and things could easily break with one security patch from AWS. So we decided to have our own containers running only official Playwright images. This guarantees a stable runtime environment and kept us immune from changes or problems in the third party packages.

Setting Up Docker for AWS Lambdas

Thankfully the good people at Microsoft provide a Playwright Docker image that contains Playwright browsers and most of the dependencies needed to run it. We do need a few packages to run things in our container which I will mention later. Below is the Dockerfile that you can use directly to run Playwright in AWS Lambda.

# Use the Playwright base image, last tested with v1.44.1
FROM mcr.microsoft.com/playwright:v1.44.1-focal-amd64

RUN apt-get update && apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev \
autoconf \
libtool

ENV NPM_CONFIG_CACHE=/tmp/.npm

# To cache the npm install step
RUN npm install -g aws-lambda-ric

COPY package.json ./

# Install the application dependencies
RUN npm install

# Copy the application code to the container
COPY . .

RUN npm run lint

RUN npm run build

ENTRYPOINT ["/bin/npx", "aws-lambda-ric"]

# Set the command to run your application
CMD ["dist/index.handler"]

Even though we’re using Playwright’s official image, we still need to install packages like libcurl4-openssl-dev and libtool to ensure Chrome runs properly. Another important point is the aws-lambda-ric package. If we had used one of the official AWS images, those would already include the necessary tools to launch Lambdas. However, since we’re using Playwright’s image, we need to install this package separately. Additionally, because it’s a large package, we cache its installation by installing it outside our application code.

Regarding the application code, I haven’t included the package.json. You can add any packages your application requires in it. Just be sure to install both Playwright and Playwright Core, as they are not included in the base image.

Chrome Development Flags

Since we are running chrome in a limited environment, we need to disable some chrome-features which help us run chrome smoothly. The following flags helped us but you might not need all.

"--no-zygote",
"--disable-setuid-sandbox",
"--no-sandbox",
"--single-process",

If you are looking to optimize Chrome, you can disable many other features. In fact, the official Playwright package disables several features, which you can review here.

At this point, our Playwright setup is ready for deployment on AWS Lambda. Next, let’s explore how to do that step by step.

Deploying to AWS Lambdas

Deployment can be divided into three parts: building the Docker image, pushing it to a ECR and finally deploying it to Lambda. Building the Docker image is simple and can be done with

docker build -t <image-name> .

In order to use Docker images with AWS Lambda, we need to push the image to ECR. First create a fresh repository using the AWS console then we tag our local image to be able to push to ECR. However before we can push the image we need to login with Docker in our AWS CLI. This can be done with the following command:

aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com

We tag our local image using the following command:

docker tag <image-name>:latest <account-id>.dkr.ecr.<region>.amazonaws.com/<repository-name>

Finally we push the image to ECR with the following command:

docker push <account-id>.dkr.ecr.<region>.amazonaws.com/<repository-name>

Now that our image is in ECR, we can deploy it to Lambda. This can be done easily by creating a new Lambda function, selecting container image and browse image to select the image we just pushed to ECR.

Conclusion

At this stage you should have a working Playwright Lambda function capable of performing whatever tasks you need. Like I noted above this simplifies our Playwright execution and makes scaling up very simple. We have been using this new setup for many months and have seen improvements both in performance and stability.