Version 1 (modified by David Fraser, 11 years ago) (diff)

Added notes for talk this afternoon

Talk on Packaging 2010-03-13

These are the notes from a talk at Meeting20100313 by David Fraser on Python packaging for Fedora, RHEL, Windows, etc

General Packaging

  • We have source code for Python libraries, applications and tools
  • These have dependencies on other Python libraries, as well as non-Python libraries (database drivers etc)
  • We need users to run our applications etc on various platforms

Terms of Endearment

  • Distribution here refers to a built Python thing
  • Distro refers to a Linux distribution
  • Package refers to a Python package (a module with submodules)
  • rpm refers to a package file in the RPM format (to distinguish from Python packages)


  • distutils is the standard Python packaging system
  • Various extensions exist that augment its capabilities, and because you use Python, you can do that to
  • Input: you write a script (and optionally a setup.cfg configuration file) that specifies:
    • Package metadata (name, version, authorship, licensing, dependencies, etc)
    • Package contents (pure python modules/packages, extensions, executable Python scripts, data - non-code that needs to be in the package, scripts that aid installation etc)
  • Output: running supports various commands that produce different outputs:
    • sdist produces a source distribution - it helps if you can regenerate the distribution from a clean source distribution
      • template -> MANIFEST list of files to include - these are not regenerated automatically by default (--force-manifest)
    • bdist contains a generic built installation (but I've hardly ever seen it used as it's not a useful package format)
    • bdist_rpm generates RPM and SRPM package files
    • bdist_wininst produces a Windows executable installer that will install the distribution into an existing Python installation
    • The output generally gets put in a dist subdirectory
  • Commands include the above output commands, as well as intermediate commands that are run on the way to producing the above
    • Each command can have command-specific options (as well as the generic options passed to
    • Options can be specified per-command either in or in setup.cfg, or on the command-line
    • build is an intermediate command which runs:
      • build_py for pure modules (simple copy to the build directory)
      • build_ext for compiling C/C++ extensions, links them to build directory
      • build_clib for building C/C++ libraries
      • build_scripts which copies them and alters the #! line
    • install then installs everything from the build directory to the target (separate steps like build)
    • clean cleans up the build directory
    • You can also register the command with the Python package index, and upload it
  • Scripts can include things to run (which can be installed into the path), as well as a post-installation/pre-removal script
  • Things to consider:
    • If you've got a bunch of related files, should they be in a package? Otherwise they can clutter the standard Python site-packages directory
    • Can you cleanly regenerate your source distribution from itself
    • Package dependencies: a distribution can provide, require or obsolete packages
    • Consider creating distributions on the same platform as you're targetting, or from other platforms, or both
    • How are you going to build your distribution for different platforms?
    • How are you going to deliver your distribution to people on different platforms, including dependencies?
    • How will your distribution interact with the native packaging on the target platforms (if any)

The world of eggs

setuptools is a set of extensions to distutils that try and bring it into the modern age:

  • Adds proper dependency support to Python packages
  • Lots of surrounding tools - easy_install, pkg_resources etc - very simple ways of installing stuff from the standard Python repositorya
  • Lots more tools being built around this format
  • Simply import setuptools instead of distutils
  • Does not integrate directly with distros' packaging systems
  • Supports parallel versions of the same library reasonably well (you can require a specific version and use it even if a different one is present - however conflicts can arise)
  • Not that good at uninstallation etc

The world of RPM

  • RPM is a packaging format that originated with RedHat, and is now used in RHEL/CentOS/Fedora, SUSE Enterprise/openSUSE, Mandriva as well as being part of the Linux Standard Base
  • Input: A .spec file defines a rpm's metadata, how to build both a source and a binary rpm, what sources to use, patches to apply, etc, etc
    • This is a more data-driven format than
    • The .spec file contains a header section for general information as well as
      • script sections for %preparing the build, %setup (unpacking the source and packaging), %building the source, %installing (into a BUILD_ROOT directory), %checking the results, and %clean`ing up
      • %pre and %post installation scripts, and %preun and %postun uninstallation scripts
      • A list of the %files included in the rpm (including categories like %documentation, and file attributes)
      • A %changelog
    • Support for macros, including shell execution to define macros
    • Support for subpackages - multiple rpms can be built from the same .spec file
  • From distutils, bdist_rpm generates a source distribution, creates a spec file, generates a source rpm from that, and then a binary rpm
  • What's really useful about rpm is dependency based on repositories
    • Different distros have different tools for this, and they vary in capability between releases - RHEL/CentOS/Fedora use yum
    • There isn't that much functional difference between the .deb format/the .rpm format, and apt-get/yum etc - for the user. There is for the packager...
    • You can specify dependencies as options to bdist_rpm (in, setup.cfg or on the commandline) - they don't seem to get included from the normal package metadata
  • Distros/repositories have different standards and requirements for inclusion of rpms and .spec files
    • bdist_rpm's automatically generated .spec files will generally not meet these requirements
    • The general feeling is that for inclusion into a distro, .spec files should be hand-generated and maintained
    • Are you targeting your rpms for inclusion in a Linux distro? Read and follow the rules and follow the procedures...
    • Otherwise you may be happy with the standard distutils stuff
    • Generally packages using setuptools can install their .egg information alongside the source code and include that in the rpm
  • Targetting older distros can be tricky if you are using lots of modern Python stuff
    • Distros have their own version of Python - you may require a newer one. Typically this can be installed alongside as something like python25.rpm - often these rpms exist, sometimes you have to rebuild them
    • In that case you will need to package all the dependencies for the new Python - usually called python25-babel etc
    • Doing this on multiple distros can be tiring. Since you're not targetting inclusion in the old distros, try and get away with murder (or at least, functional packages rather than beautifully crafted ones)
  • We have a tool called centuryegg for targetting older distros
    • This is currently reasonably specific to our set of requirements
    • Order of priority:
      • Use existing rpms from the target distro
      • Backport rpms from newer releases of the target distro
      • Spin our own rpms from the eggs in PyPI
    • Target is to be able to automatically source and download all the requirements from a simple list, generate the rpms, and upload them into a repository
    • We should make it more generic - is anyone interested?
  • Generating your own repository
    • You will generally need different repositories for different distros and versions (even apparently equivalent ones like RHEL4/CentOS 4)
    • Usually this just involves compiling (strongly recommend doing this on the target distro - we had strange crashes due to minor library versioning differences etc)
    • You then just need to run something like createrepo and put the files in a web space

The world of Windows

  • bdist_win32 is fine for lots of purposes - especially if distributing packages to other developers
  • bdist_msi has now also been added, that produces packages in Windows installer's MSI format (this is also used to produce the python msi itself)
  • For distributing applications, most Windows users expect a single install, and may be confused by having the Python runtime environment set up on their machine with lots of libraries
  • py2exe is the most popular of a variety of tools for producing a frozen Python distribution on Windows:
    • An extension to distutils
    • Packages up the Python runtime, and a set of Python packages, modules, extensions, scripts and data, into a target directory
    • Scripts are converted to stub win32 executables that load the Python dll and execute some code
    • Automatic search for Python library dependencies (by scanning your source code for import statements), as well as manual specification of requirements
    • All the libraries you depend on need to be included - sometimes having things installed as eggs on the build environment can produce problems
    • Running in frozen mode often requires some changes to the underlying code for compatibility - location of files etc - lots of tips on the wiki site
  • It's fairly common to use InnoSetup to produce an installer containing all the required files, Start menu items, etc

Putting it all together

  • It makes life easier if you can do all the above, and whatever other targets you require, from the same script
  • Separate out options for the different commands as much as possible
  • We found we had to hack distutils a lot with derived code to make it all work

Attachments (3)

Download all attachments as: .zip