How to Structure a Python AWS Serverless Project
I haven't been able to find much guidance on how to structure AWS serverless projects written in Python. There is plenty of "hello world" examples out there, where all code fits into a single file, and a whole lot of questions about module resolution issues in Python lambda projects on StackOverflow, but precious little advice on how to set up a repository for a larger project. What is the best way to share code between lambdas? How to overcome local development module resolution issues that frequently plague projects of this type? In short, how to set things up so that Python tooling - language servers, type checkers and test runners - all work as expected?
After reading this post you'll know how to:
- use Python packaging tools to transparently share code between lambda handlers
- avoid module resolution issues in local development environment
- package shared code as a lambda layer during deployment
pytestto correctly run a test suite located in a separate directory
mypytype-check the project correctly despite its non-standard structure
Note: A finished reference serverless project is available on Github. Feel free to consult it at any stage or just read the finished code instead of the description below.
Shared code as an internal package
The basic structure of the example project repository looks as follows:
├── functions/ │ ├── add/ │ │ └── handler.py │ └── multiply/ │ │ └── handler.py ├── layer/ │ └── shared/ │ ├── __init__.py │ ├── math.py │ └── py.typed ├── tests/
The example application is a service that performs mathematical operations. It's a pointless service, or rather its only point is to provide an excuse for me to talk about structuring the project. The shared code is located in the
layer/shared folder, while the lambda handlers live in the
functions folder. Tests have been separated from the application code in the
tests folder - since we don't want them to be included with the deployed code.
The problem and the solution
When functions and the layer are deployed, the function handlers will be able to import the
shared package from the global namespace. This "magical" behavior is courtesy of the lambda layer machinery working behind the scenes. Things will work when deployed, that's great, but what about the local development experience? If you clone the example repository and open either of the handlers in your code editor you'll find the import statements referencing the
shared module underlined in red. Module resolution is broken, since Python doesn't automatically understand a codebase structured as described above. It seems that many projects end up accepting this state of affairs as the fact of life when building with lambdas - some really hacky workarounds for this very issue can be found, for example, in official serverless project examples published by AWS. We can do better!
The proper "Pythonic" solution to the problem is to have the
shared package installed in the development environment so that it can be imported in other parts of the project irrespective of the project's directory structure. Python, in fact, has a well established pattern for installing packages in "editable" mode to ease local development. We can leverage this feature to effectively create an editable simulated layer that can be developed alongside the handlers. Yes, a little bit of initial setup is required, and we will always need to install the
shared package locally as a prerequisite to doing development work and/or running the test suite, but the tradeoff is well worth it.
Creating the internal package
The files and directories that comprise the internal package look as follows:
├── layer/ │ └── shared/ │ ├── __init__.py │ ├── math.py │ └── py.typed ├── tests/ ├── pyproject.toml └── setup.cfg
This structure is essentially a variant of what's known as the src package layout - with the
src directory renamed as
layer. For this project I'm using setuptools as the packaging tool, and I'm configuring the package declaratively using a
Declaring packages in this style requires, per PEP 621, a tiny bit of boilerplate in the
[build-system] requires = ["setuptools", "wheel"] build-backend = "setuptools.build_meta"
This is just to instruct the tools (such as
build) on how to build the package.
The bulk of package configuration lives in the
[metadata] name = shared version = 0.1.0 [options] package_dir = =layer packages = find: include_package_data = True [options.packages.find] where = layer [options.package_data] * = py.typed
[metadata] section holds some basic information about the project. We don't need much here, since this package will be only used internally and will not be published to external package repositories.
[options] section accomplishes two things:
It informs packaging tools that they should automatically find and include all modules located inside the
layersubdirectory, and that the
layerdirectory itself should be excluded from the packaged module hierarchy. The
[options.packages.find]section points the package auto-discovery logic at the
It states that the package is allowed to contain data files, i.e. files that don't contain Python code, as long as they are referenced defined in the
[options.package_data]section. This is required in order to include the
py.typedfile from the
layer/sharedfolder in the package. This empty marker file informs
mypythat the packaged code contains type definitions.
We can install the
shared package locally in editable mode using the following command:
❯ pip install --editable .
shared package is now installed and we can import it like any other package:
❯ python >>> from shared.math import Addition >>> a = Addition() >>> print(a.add(2, 2)) 4
The red squiggles should now be gone from the handlers and the test suite should run without any issues. If you're using a Python language server in your code editor you should be able to jump around the code, find definitions, and get completion suggestions for the packaged code throughout the codebase. Finally, any changes to the
shared package code will be immediately applied throughout the project, without the need to re-install it.
Deploying the internal package in a layer
We've managed to get things working nicely in the local environment, now we just have to figure out how to include our internal package in a lambda layer on deployment. While the example project repository uses AWS SAM for deployment, the solution I'm going to describe is tool agnostic and should be possible to adapt to any other AWS deployment tool/framework (we use this approach with Terraform at work, for example).
The first step is to turn the internal package into a wheel (a
*.whl file). We can use the build tool for this purpose. After installing
pip we can run it as follows:
❯ python -m build -w
We run it as a Python module, adding the
-w flag to build the wheel only. By default the build artifacts are placed in the
├── dist │ └── shared-0.1.0-py3-none-any.whl
Now we can begin assembling the layer.
A lambda layer is packaged as a zipped
python directory containing Python modules. These modules can be anything Python understands as modules - individual Python files or directories containing
__init__.py files. The example project uses the
build directory as staging area - let's, therefore, create
python directory as a subdirectory of
❯ mkdir -p build/python
Now we can use
pip to install the
shared package wheel to the
❯ python -m pip install dist/*.whl -t build/python
This should produce the following structure under the
├── build │ └── python │ ├── shared │ │ ├── __init__.py │ │ ├── math.py │ │ └── py.typed │ └── shared-0.1.0.dist-info │ ├── (...)
We can use analogous approach to install any external dependencies that should be included in the layer - so provided they are listed in the
requirements.txt file we run:
❯ python -m pip install -r requirements.txt -t build/python
The final step is to zip the
❯ cd build; zip -rq ../layer.zip python; cd ..
This will produce a
layer.zip file located in the root directory of the project. This file is ready to be deployed as a layer using a AWS deployment tool of your preference.
In the example project repository I use a
Makefile to perform the above-described manuals steps automatically:
ARTIFACTS_DIR ?= build # (...) .PHONY: build build: rm -rf dist || true python -m build -w .PHONY: build_layer build_layer: build rm -rf "$(ARTIFACTS_DIR)/python" || true mkdir -p "$(ARTIFACTS_DIR)/python" python -m pip install -r requirements.txt -t "$(ARTIFACTS_DIR)/python" python -m pip install dist/*.whl -t "$(ARTIFACTS_DIR)/python" .PHONY: package_layer package_layer: build build_layer cd "$(ARTIFACTS_DIR)"; zip -rq ../layer.zip python
make build will build the package, running
make build_layer will populate the layer
python directory, and running
make package_layer will turn the
python directory into a zip archive. The
ARTIFACTS_DIR defaults to "build" if not set, so the default behavior of the make targets will be like in the manual commands described earlier. The single command to package the layer as a zip file is
make package_layer (this target will run
build_layer targets as its prerequisites/dependencies).
Getting pytest to work
shared package installed in the local Python environment,
pytest mostly works with this repository structure. This is because
pytest uses its own module discovery logic that's more permissive regarding directory layout compared to the Python default.
The tests should always work when
pytest is invoked as follows from the root of the project:
❯ python -m pytest
The handler tests (
tests/unit/functions_multiply_test.py) will fail, however, with the following error when invoking
pytest directly (i.e. not as a Python module with
python -m) from the root of the project:
❯ pytest tests/unit/functions_add_test.py:2: in <module> from functions.add.handler import handler E ModuleNotFoundError: No module named 'functions' (...) tests/unit/functions_multiply_test.py:2: in <module> from functions.multiply.handler import handler E ModuleNotFoundError: No module named 'functions'
The difference in behavior is explained in PyTest documentation - running
python -m pytest has a side-effect of adding the current directory to
sys.path per standard
If you prefer calling
pytest directly you can work around this quirk by including a
conftest.py file in the root of the project. This will effectively force
pytest to include project root in its hierarchy of discovered modules and the command should run without module resolution errors.
Getting mypy to work
This one took a while to figure out. While
mypy will run happily against the layer directory, it throws an error when asked to type-check the
❯ mypy functions functions/multiply/handler.py: error: Duplicate module named "handler" (also at "functions/add/handler.py") Found 1 error in 1 file (errors prevented further checking)
The problem has to do with the fact that the
functions directory contains multiple subdirectories, each with a file called
mypy's perspective this indicates an invalid package structure.
There is a closed issue in the
mypy repo with a discussion about this problem. The problem can be boiled down to this:
mypy only understands Python packages and relationships between them, while our
functions folder holds multiple discrete, parallel entry-points into the codebase that don't make sense when interpreted as a package. Contents of the
functions directory, in other words, is a bit like a monorepo with multiple distinct projects located in separate directories, and
mypy doesn't understand monorepos.
There are different possible ways of working around the problem. One way would be to use distinct handler file names for each function, but that seems like addressing the symptom not the cause of the problem. Instead, I ended up writing a simple
make target that runs
mypy separately on each directory that ought to be type-checked:
MYPY_DIRS := $(shell find functions layer ! -path '*.egg-info*' -type d -maxdepth 1 -mindepth 1 | xargs) # (...) .PHONY: mypy mypy: $(MYPY_DIRS) $(foreach d, $(MYPY_DIRS), python -m mypy $(d);)
MYPY_DIRS variable holds all direct subdirectories of
functions directories (except the
egg-info directory that's created by installing the
shared package in editable mode). The
make mypy command will run
python -m mypy for each of those directories.
The general idea I was hoping to get across in this blog post is that it's possible to leverage Python packaging tooling to decouple project directory structure from the issue of module discovery/resolution in Python. This happens to be particularly helpful in case of Python AWS serverless projects.
The template of the solution described above could be adjusted to suit many types of projects. If you're working on a system that's comprised of multiple micro-services, this project layout might be used for individual micro-services, with an additional abstraction, such as packages published to an internal repository, to share code between services. In case of very large projects it might be beneficial to package shared code into multiple layers, which is also possible in principle, with few adjustments.