Using Python’s LXML in Amazon Lambda

By Hector Castro on June 27th, 2016

We recently set out to do some XML processing within Amazon Lambda using Python and the LXML library. Once it came time to deploy the function, we realized that the standard method for creating a deployment package was not going to cut it. Why? Because lxml must be built with C extensions for libxml2 and libxslt in a way that plays well with the Amazon Lambda execution environment.

Deployment packages

Amazon already has some pretty straightforward documentation around creating deployment packages for Lambda that make use of pip and virtualenv. For pure Python dependencies, the packaging process can look something like this:

$ pip install requests -t .
$ zip -r9 requests/

For dependencies with C extensions, things get a little more complicated because the C extensions themselves must be compiled against system libraries like those in the Amazon Lambda execution environment. Luckily, we know the execution environment runs Amazon Linux, right down to the Amazon Machine Image (AMI) ID and Linux kernel version.

Launching an Amazon Linux instance with the AWS CLI

In an attempt to smooth out the process of launching an Amazon Linux instance, below is a one line command to launch the current Amazon Linux AMI for Amazon Lambda as a t2.micro within the default VPC of an AWS account:

$ aws ec2 run-instances \
--image-id ami-60b6c60a --instance-type t2.micro --count 1 \
--key-name MyKeyPair --security-group-ids MySecurityGroupId

Note: MyKeyPair and MySecurityGroupId need to be changed before executing this command. In addition, the security group used should have rules that allow ingress SSH and egress HTTP/S.

Building an LXML bundle on Amazon Linux

Once you are inside of an Amazon Linux instance, the following steps produce a ZIP archive that contains an lxml bundle suitable for the Amazon Lambda execution environment.

First, we create, activate, and navigate into a virtualenv:

$ virtualenv builder
$ source ./builder/bin/activate
$ pushd builder

Next, we install dependencies for lxml, along with lxml itself:

$ sudo yum install -y gcc libxml2-devel libxslt-devel
$ ./bin/pip install --upgrade pip
$ ./bin/pip install lxml==3.6.0

Lastly, we navigate to the site-packages directory and create a ZIP archive of the lxml package:

$ pushd lib64/python2.7/site-packages/
$ zip -r9 lxml lxml-3.6.0-py2.7.egg-info/

Repackaging and deployment

At this point, we have an lxml bundle that is ready for Amazon Lambda, but it lives on an Amazon Linux EC2 instance without any of our function code. Assuming the code you want to deploy is on your local workstation, the following steps go through the process of downloading the lxml bundle and repackaging it with our Lambda function code.

First, download the lxml bundle from the Amazon Linux instance:

$ scp -i ~/.ssh/MyKeyPair.pem \ .

Next, extract the bundle’s contents alongside an existing Lambda function:

$ unzip
$ ls -l
total 5960
drwxrwxr-x 27 hector staff 918 Mar 28 19:56 lxml
drwxrwxr-x 9 hector staff 306 Mar 28 19:56 lxml-3.6.0-py2.7.egg-info
-rw-r--r-- 1 hector staff 3035310 Mar 28 19:58
-rw-r--r-- 1 hector staff 399 Mar 28 18:23

Lastly, recreate an archive with lxml and, then create a new python2.7 Lambda function:

$ zip -r9 lxml
$ aws lambda create-function \
--function-name funcValidateXML --zip-file fileb:// \
--role arn:aws:iam::AWS_ACCOUNT_ID:role/lambda_basic_execution \
--handler main.handle --runtime python2.7

That’s it. Now you’ve got a fully functional Amazon Lambda function that can make use of lxml!