How to Structure a Python AWS Serverless Project
I haven't been able to find much guidance on how to structure AWS serverless projects written in Python. There is plenty of "hello world" examples out there, where all code fits into a single file, and a whole lot of questions about module resolution issues in Python lambda projects on StackOverflow, but precious little advice on how to set up a repository for a larger project. What is the best way to share code between lambdas? How to overcome local development module resolution issues that frequently plague projects of this type? In short, how to set things up so that Python tooling - language servers, type checkers and test runners - all work as expected?
After reading this post you'll know how to:
- use Python packaging tools to transparently share code between lambda handlers
- avoid module resolution issues in local development environment
- package shared code as a lambda layer during deployment
- setup
pytest
to correctly run a test suite located in a separate directory - help
mypy
type-check the project correctly despite its non-standard structure
Note: A finished reference serverless project is available on Github. Feel free to consult it at any stage or just read the finished code instead of the description below.
Shared code as an internal package
The basic structure of the example project repository looks as follows:
├── functions/
│ ├── add/
│ │ └── handler.py
│ └── multiply/
│ │ └── handler.py
├── layer/
│ └── shared/
│ ├── __init__.py
│ ├── math.py
│ └── py.typed
├── tests/
The example application is a service that performs mathematical operations. It's a pointless service, or rather its only point is to provide an excuse for me to talk about structuring the project. The shared code is located in the layer/shared
folder, while the lambda handlers live in the functions
folder. Tests have been separated from the application code in the tests
folder - since we don't want them to be included with the deployed code.
The problem and the solution
When functions and the layer are deployed, the function handlers will be able to import the shared
package from the global namespace. This "magical" behavior is courtesy of the lambda layer machinery working behind the scenes. Things will work when deployed, that's great, but what about the local development experience? If you clone the example repository and open either of the handlers in your code editor you'll find the import statements referencing the shared
module underlined in red. Module resolution is broken, since Python doesn't automatically understand a codebase structured as described above. It seems that many projects end up accepting this state of affairs as the fact of life when building with lambdas - some really hacky workarounds for this very issue can be found, for example, in official serverless project examples published by AWS. We can do better!
The proper "Pythonic" solution to the problem is to have the shared
package installed in the development environment so that it can be imported in other parts of the project irrespective of the project's directory structure. Python, in fact, has a well established pattern for installing packages in "editable" mode to ease local development. We can leverage this feature to effectively create an editable simulated layer that can be developed alongside the handlers. Yes, a little bit of initial setup is required, and we will always need to install the shared
package locally as a prerequisite to doing development work and/or running the test suite, but the tradeoff is well worth it.
Creating the internal package
The files and directories that comprise the internal package look as follows:
├── layer/
│ └── shared/
│ ├── __init__.py
│ ├── math.py
│ └── py.typed
├── tests/
├── pyproject.toml
└── setup.cfg
This structure is essentially a variant of what's known as the src package layout - with the src
directory renamed as layer
. For this project I'm using setuptools as the packaging tool, and I'm configuring the package declaratively using a setup.cfg
file.
Declaring packages in this style requires, per PEP 621, a tiny bit of boilerplate in the pyproject.toml
file:
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
This is just to instruct the tools (such as pip
or build
) on how to build the package.
The bulk of package configuration lives in the setup.cfg
file:
[metadata]
name = shared
version = 0.1.0
[options]
package_dir =
=layer
packages = find:
include_package_data = True
[options.packages.find]
where = layer
[options.package_data]
* = py.typed
The [metadata]
section holds some basic information about the project. We don't need much here, since this package will be only used internally and will not be published to external package repositories.
The [options]
section accomplishes two things:
-
It informs packaging tools that they should automatically find and include all modules located inside the
layer
subdirectory, and that thelayer
directory itself should be excluded from the packaged module hierarchy. The[options.packages.find]
section points the package auto-discovery logic at thelayer
directory. -
It states that the package is allowed to contain data files, i.e. files that don't contain Python code, as long as they are referenced defined in the
[options.package_data]
section. This is required in order to include thepy.typed
file from thelayer/shared
folder in the package. This empty marker file informsmypy
that the packaged code contains type definitions.
We can install the shared
package locally in editable mode using the following command:
❯ pip install --editable .
The shared
package is now installed and we can import it like any other package:
❯ python
>>> from shared.math import Addition
>>> a = Addition()
>>> print(a.add(2, 2))
4
The red squiggles should now be gone from the handlers and the test suite should run without any issues. If you're using a Python language server in your code editor you should be able to jump around the code, find definitions, and get completion suggestions for the packaged code throughout the codebase. Finally, any changes to the shared
package code will be immediately applied throughout the project, without the need to re-install it.
Deploying the internal package in a layer
We've managed to get things working nicely in the local environment, now we just have to figure out how to include our internal package in a lambda layer on deployment. While the example project repository uses AWS SAM for deployment, the solution I'm going to describe is tool agnostic and should be possible to adapt to any other AWS deployment tool/framework (we use this approach with Terraform at work, for example).
The first step is to turn the internal package into a wheel (a *.whl
file). We can use the build tool for this purpose. After installing build
with pip
we can run it as follows:
❯ python -m build -w
We run it as a Python module, adding the -w
flag to build the wheel only. By default the build artifacts are placed in the dist
folder:
├── dist
│ └── shared-0.1.0-py3-none-any.whl
Now we can begin assembling the layer.
A lambda layer is packaged as a zipped python
directory containing Python modules. These modules can be anything Python understands as modules - individual Python files or directories containing __init__.py
files. The example project uses the build
directory as staging area - let's, therefore, create python
directory as a subdirectory of build
:
❯ mkdir -p build/python
Now we can use pip
to install the shared
package wheel to the build/python
directory:
❯ python -m pip install dist/*.whl -t build/python
This should produce the following structure under the build
directory:
├── build
│ └── python
│ ├── shared
│ │ ├── __init__.py
│ │ ├── math.py
│ │ └── py.typed
│ └── shared-0.1.0.dist-info
│ ├── (...)
We can use analogous approach to install any external dependencies that should be included in the layer - so provided they are listed in the requirements.txt
file we run:
❯ python -m pip install -r requirements.txt -t build/python
The final step is to zip the python
directory:
❯ cd build; zip -rq ../layer.zip python; cd ..
This will produce a layer.zip
file located in the root directory of the project. This file is ready to be deployed as a layer using a AWS deployment tool of your preference.
In the example project repository I use a Makefile
to perform the above-described manuals steps automatically:
ARTIFACTS_DIR ?= build
# (...)
.PHONY: build
build:
rm -rf dist || true
python -m build -w
.PHONY: build_layer
build_layer: build
rm -rf "$(ARTIFACTS_DIR)/python" || true
mkdir -p "$(ARTIFACTS_DIR)/python"
python -m pip install -r requirements.txt -t "$(ARTIFACTS_DIR)/python"
python -m pip install dist/*.whl -t "$(ARTIFACTS_DIR)/python"
.PHONY: package_layer
package_layer: build build_layer
cd "$(ARTIFACTS_DIR)"; zip -rq ../layer.zip python
Running make build
will build the package, running make build_layer
will populate the layer python
directory, and running make package_layer
will turn the python
directory into a zip archive. The ARTIFACTS_DIR
defaults to "build" if not set, so the default behavior of the make targets will be like in the manual commands described earlier. The single command to package the layer as a zip file is make package_layer
(this target will run build
and build_layer
targets as its prerequisites/dependencies).
Getting pytest to work
With the shared
package installed in the local Python environment, pytest
mostly works with this repository structure. This is because pytest
uses its own module discovery logic that's more permissive regarding directory layout compared to the Python default.
The tests should always work when pytest
is invoked as follows from the root of the project:
❯ python -m pytest
The handler tests (tests/unit/functions_add_test.py
and tests/unit/functions_multiply_test.py
) will fail, however, with the following error when invoking pytest
directly (i.e. not as a Python module with python -m
) from the root of the project:
❯ pytest
tests/unit/functions_add_test.py:2: in <module>
from functions.add.handler import handler
E ModuleNotFoundError: No module named 'functions'
(...)
tests/unit/functions_multiply_test.py:2: in <module>
from functions.multiply.handler import handler
E ModuleNotFoundError: No module named 'functions'
The difference in behavior is explained in PyTest documentation - running python -m pytest
has a side-effect of adding the current directory to sys.path
per standard python
behavior.
If you prefer calling pytest
directly you can work around this quirk by including a conftest.py
file in the root of the project. This will effectively force pytest
to include project root in its hierarchy of discovered modules and the command should run without module resolution errors.
Getting mypy to work
This one took a while to figure out. While mypy
will run happily against the layer directory, it throws an error when asked to type-check the functions
directory:
❯ mypy functions
functions/multiply/handler.py: error: Duplicate module named "handler" (also at "functions/add/handler.py")
Found 1 error in 1 file (errors prevented further checking)
The problem has to do with the fact that the functions
directory contains multiple subdirectories, each with a file called handler.py
. From mypy
's perspective this indicates an invalid package structure.
There is a closed issue in the mypy
repo with a discussion about this problem. The problem can be boiled down to this: mypy
only understands Python packages and relationships between them, while our functions
folder holds multiple discrete, parallel entry-points into the codebase that don't make sense when interpreted as a package. Contents of the functions
directory, in other words, is a bit like a monorepo with multiple distinct projects located in separate directories, and mypy
doesn't understand monorepos.
There are different possible ways of working around the problem. One way would be to use distinct handler file names for each function, but that seems like addressing the symptom not the cause of the problem. Instead, I ended up writing a simple make
target that runs mypy
separately on each directory that ought to be type-checked:
MYPY_DIRS := $(shell find functions layer ! -path '*.egg-info*' -type d -maxdepth 1 -mindepth 1 | xargs)
# (...)
.PHONY: mypy
mypy: $(MYPY_DIRS)
$(foreach d, $(MYPY_DIRS), python -m mypy $(d);)
The MYPY_DIRS
variable holds all direct subdirectories of layer
and functions
directories (except the egg-info
directory that's created by installing the shared
package in editable mode). The make mypy
command will run python -m mypy
for each of those directories.
Conclusion
The general idea I was hoping to get across in this blog post is that it's possible to leverage Python packaging tooling to decouple project directory structure from the issue of module discovery/resolution in Python. This happens to be particularly helpful in case of Python AWS serverless projects.
The template of the solution described above could be adjusted to suit many types of projects. If you're working on a system that's comprised of multiple micro-services, this project layout might be used for individual micro-services, with an additional abstraction, such as packages published to an internal repository, to share code between services. In case of very large projects it might be beneficial to package shared code into multiple layers, which is also possible in principle, with few adjustments.