Photo by Mediamodifier on Unsplash
Publishing Your First Python Package
And Automating Releases With Github Actions
If you're reading this, you're probably already somewhat familiar with Python packages, how they work as an end-user and the various different types of packages. For those of you who are not familiar, I'll briefly explain. Python, like many other languages, uses a module-based import system to load parts of the language that are not available by default. You can also install 3rd parties tools and libraries through Python to expand language features or to simply act as independent Python-based utilities. These packages are generally installed using a package manager. Python provides a package manager (pip), but other options do exist.
Package managers rely on a centralized repository to know what packages are available to be installed. Python package managers typically rely on the Python Package Index (PyPI) list of packages. This is what I'll be discussing today: how to publish a package on PyPI so that it can be installed by anyone.
Structuring the Package
I recently created a new Python package and this tutorial will be based on publishing my new utility: Runtime-Environment-Capture (REC). My project already exists, but let's assume that you're starting from scratch. You'll need a directory for your new project so go ahead and create one. A common convention for structuring packages is to create a parent folder that will store top-level information and config files about the project and then create a secondary directory (named after the package) that will be where source code is written. For example, with my package REC, I structured the project like this:
All of the source code for REC lies in the rec/rec
folder. From now on I'll refer to the top-level folder as the root folder and the subdirectory will be the module folder. This structure is not absolutely necessary, but it helps to logically separate your project. Feel free to experiment with whatever format works best for you. Aside from those recommendations, you can structure your project more or less however you feel like it. There are 2 other important things to keep in mind though. For an application-type package, a __main__.py
file is necessary as it acts as the default entry point for your program. If you're familiar with C program linking, this is similar to the _start
symbol that a compiler creates to tell your computer where to start executing code. This main file will do just that for our Python module. Once you've created the file, you can have it call any other function in any other file, but it's important to start here. Inside the __main__.py
file, you should have the following code block:
if __name__ == '__main__':
# Your Code Here
That logic should lie outside of any function and serves as a check to ensure that the program is being started like normal. If you've defined a start function for your program elsewhere, this is a good place to call it. Something like this is fine (be sure to adapt it to your own code):
def main():
print("Hello World")
if __name__ == '__main__':
main()
The second thing to keep in mind is that, inside the nested directory that contains your source code, you need a __init__.py
file. This file can be empty and is simply letting Python know that you are defining a new module with the name of the current directory. It's also important to note that any subdirectories of your project need this as well if you want them included in the final package that is published.
Next, it is a good idea to have a README.md
file in your root level directory which provides a detailed description of your project. This will become the primary description of your project on the PyPI page which lists your package.
You should also choose a license for your project and place it in a License.txt
file in the root directory. The license you choose is up to you, but if you need help choosing one check out this site:
https://choosealicense.com.
Once you have done that, you now have all the necessary components to publish the Python package. Now you need to build the package into a distributable file. We'll cover that in the next section.
Building the Package
So you want to build a package?
To build a package, we're going to need to create 2 more files in the root directory of your project. First, create a file called pyproject.toml
and place the following code inside of it:
[build-system]
requires = ["setuptools>=42"]
build-backend = "setuptools.build_meta"
This file tells the build-system that you're going to be using the "setuptools" method of building your package. Setuptools is simply the Python-provided mechanism of building a package for distribution. You can specify other things in this file, but for this tutorial, we'll skip that as it's not strictly necessary.
The next file you should create is a setup.py
file in your root directory. I'll explain what every line does in the file, but
from setuptools import setup, find_packages
with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()
setup(
name="runtime-environment-capture",
version="0.2.1",
author="Carson Woods",
description="A wrapper to collect data for ensuring reproducibility.",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/carsonwoods/rec",
repository="https://github.com/carsonwoods/rec",
documentation="https://github.com/carsonwoods/REC/wiki",
license="MIT",
classifiers=[
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Intended Audience :: Developers",
"Intended Audience :: Information Technology",
"Intended Audience :: Science/Research",
"Operating System :: MacOS",
"Operating System :: POSIX :: Linux",
"Operating System :: Unix",
],
python_requires='>=3.6',
packages=find_packages(),
entry_points={
"console_scripts": [
"rec=rec.__main__:main",
]
},
)
Let's start from the first line, you're importing the all-important setup()
function from setuptools, along with a find_packages()
function as well. The setup function helps to define metadata for your package that will appear on your package's listing, as well as some other build information.
The next thing that occurs is we're reading a file. Specifically, we're reading the README.md
we made earlier. That is going to be our longer description. Since the PyPI website can parse Markdown, it's easier to define this in an external file and read it into a variable value than to try and include it inline. Plus, it makes your project look better-documented on Github or Gitlab if you're making it available there as well.
Then we call the setup()
function with a lot of parameters. First is the name. This name MUST be unique. For better or worse, PyPI does not use a [username]/[project-name]
scheme like Github does, so your project will end up with a URL that looks like "pypi.org/project/runtime-environment-capture". If you try and upload a package with the same name as some other package, it will fail. Then, you'll want to specify a version. I use the Major.Minor.BugFix style for my version, however, you may prefer a date-based version system. It's up to you.
The next few lines are very simple. You simply write information that will appear in the sidebar and main page for your package on the PyPI listing. A description
and long_description
are self-explanatory (though do note that our long_description
uses the variable we created from our README file earlier in the file). Then we specify the type of text that we're using to display the long description. This is so the file is properly parsed by the PyPI website into a human-readable format. The URL, repository, and documentation pages are also self-explanatory. Those will create links to the homepage, source code page, and documentation page for your project. In many cases, this could be the same link. It is not required to include these links, but I recommend including a URL at the very least. Finally, you also should specify which license you are releasing your software with. This plays a large role in whether some companies can use your package, so it's always good to include. To give you an idea of what these parameters do, they populate the following page for your package:
The next line (or few lines) is a dictionary of "classifiers". These are PyPI-provided tags that act as a form of SEO for your package. They form collections of packages that match the specified tags. A full list of classifiers is available here: https://pypi.org/classifiers/.
The last few parts of the file define some build components of the package. The first python_requires
defines the minimum required version that the Python package works on. The second is a packages=find_packages()
line. This line tells the setup process to recursively look for modules in subdirectories. This is incredibly important because you'll presumably need code in all of the subdirectories that you've created for your project. The last parameter is entry_points
. This is particularly useful in Python packages that act as standalone utilities and it lets users invoke your package without having to type out python -m [package name] [args]
every time. In my above example, I'm stating that the rec
command will invoke the main()
function in the file __main__.py
in the folder rec
.
Now, you're finally done. All that is left is to build the project. Quickly be sure that you have setup tools installed by running:
pip install setuptools
Then you can run
python setup.py sdist bdist_wheel
If everything goes well, you should see three new directories. A build/
, dist/
, and directory named after your package that ends in a .egg-info
. If you ever need to re-build your package, you can delete these directories and rebuild them using the last command.
And ta-da. You're done. This is, by far, the most time-consuming part of preparing your package. Now you're ready to move on to publishing it for all to see.
Publishing the Package
Now that you've got a project ready to be published, you're going to want to create an account on the following websites:
As mentioned, PyPI is the primary package index and this is where we'll publish our package. The test.pypi.org page is an entirely separate instance of the pypi.org server and can be used to test releases before uploading them to the main PyPI instance.
Once you've created an account, you're ready to start uploading a package. From your terminal, install a utility called twine using pip:
pip install twine
Twine is a utility that uploads projects to PyPI. To use twine to make sure that your build process went smoothly, you can run the following on your newly created dist/
folder:
twine check dist/*
Testing the Waters
First, let's test our package on the PyPI test server. When you're ready to publish, run
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
You'll be asked to authenticate with the credentials for your PyPI test account. Do so now. If everything goes well, your package should now be available to install on the test repository. If you go to the test repo homepage, you should see your package under the "New releases" section and it should be searchable.
Go to the package listing page and make sure everything looks correct. Next, try installing the package using pip with the following command (insert your own package name of course):
pip install -i https://test.pypi.org/simple/ [package-name]
If the package installs correctly and runs as expected, you're good to go on the final upload.
Ready to Launch
Uploading your project is simple. You can simply ensure that your build is ready for upload and run the following:
twine upload dist/*
Enter your credentials for the main version of PyPI watch as your project gets uploaded ๐๐!!
Congratulations! You're now a Python package author!!
Important Notes
One last section of notes for this section with some important extras:
- You can't rename a project once it has been created. You can rename it, but it will create a new repository listing and you will break any packages which depend on that package explicitly. Be very deliberate in choosing a package name.
- You can't overwrite a version without first deleting a version in the PyPI dashboard. Before publishing a new version of a package on PyPI, you'll need to increase the version in the
setup.py
file.
Automating Updates with Github Actions
Now, if you've made it this far, you're clearly a programmer. And if you're a developer you almost certainly love automating things. Sure you can just run the twine upload dist/*
command every time, but why do something by hand when you don't have to? If this was exactly your train of thought then I have great news for you! Now that we have our setup.py
and project, we can completely automate this deployment using Github Actions!
This part of the project assumes you're publishing your project on Github. Technically it's not strictly necessary, but it really helps for that to be the case. With that assumption out of the way, let's get going.
First, you'll need to go to PyPI and generate an API key for your account. This can be done in the settings for your account. For security reasons, I recommend setting access to the key to only the project you'll be deploying. We'll be using a Github Secret for this API key, but the principle of least privilege exists for a reason. Once you have the API key, go to the settings page for your repository where your package's code is stored. Under the Security tab in Settings, you'll see a Secrets section with an Actions subsection. Using that menu, create a new key with the name PYPI_API_TOKEN
and the value being set to the previously generated API key.
Once you've done this, click on the Actions tab for your repository and click "New Workflow". The "Publish a Python Package" is a preset option for Python-based projects, but I'll include the code below for the workflow file:
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.
name: Upload Python Package
on:
release:
types: [published]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
As you can see, the password that is passed to the PyPI publish action is the PYPI_API_TOKEN
secret we created. This workflow will create a new version on PyPI whenever a new version is released on Github.
To create a new release on Github, simply make the changes you want to make to your project, increment the version in your setup file, and commit tag your changes. Then, go to your projects page on Github and click on the releases tab to create a new release. Write up the release changes, name the version, and upload the source code files found in your dist/ directory. Iโm glossing over the details here because it falls outside the scope of the tutorial and isn't really that relevant given that most users install packages with pip. This release is only for your users who might get their packages from installing from source on Github. The key thing here is that, per our workflow earlier, publishing a new release on Github will trigger the workflow we made earlier and it will upload your most recent commit to PyPI. Once you publish the release, you can track the progress of the automation from the Actions tab. It will display an error message if the publishing process fails, but if all goes well, your new update to your package will be visible on PyPI.
Conclusions
Hopefully, this was a clear and reasonably-comprehensive introduction to publishing a Python package. It is by no means exhaustive and Python's setuptools offers substantially more options for configuring your package. There are also other build systems aside from setuptools, however, I wanted to keep this to the basics for a first post on the topic.
Thanks so much for reading and if you've made it this far, I'd love if you could take a look at my new Python package: runtime-environment-capture (REC). It's still very much in "beta", but I'm rapidly adding new features. If you want an extra tool to help ensure reproducible scripts/digital experiments/etc., you might want to see what it can do!