= Talk on Packaging 2010-03-13 = == General Packaging == These are the notes from a talk at [wiki:Meeting20100313] by David Fraser on Python packaging for Fedora, RHEL, Windows, etc * We have source code for Python libraries, applications and tools * These have dependencies on other Python libraries, as well as non-Python libraries (database drivers etc) * We need users to run our applications etc on various platforms === Terms of Endearment === * ''Distribution'' here refers to a built Python thing * ''Distro'' refers to a Linux distribution * ''Package'' refers to a Python package (a module with submodules) * ''rpm'' refers to a package file in the RPM format (to distinguish from Python packages) === Distutils === * [http://docs.python.org/distutils/ distutils] is the standard Python packaging system * Various extensions exist that augment its capabilities, and because you use Python, you can do that to * Input: you write a `setup.py` script (and optionally a `setup.cfg` configuration file) that specifies: * Package metadata (name, version, authorship, licensing, dependencies, etc) * Package contents (pure python modules/packages, extensions, executable Python scripts, ''data'' - non-code that needs to be in the package, scripts that aid installation etc) * Output: running `setup.py` supports various commands that produce different outputs: * `sdist` produces a source distribution - it helps if you can regenerate the distribution from a clean source distribution * `MANIFEST.in` template `->` `MANIFEST` list of files to include - these are not regenerated automatically by default (`--force-manifest`) * `bdist` contains a generic ''built'' installation (but I've hardly ever seen it used as it's not a useful package format) * `bdist_rpm` generates RPM and SRPM package files * `bdist_wininst` produces a Windows executable installer that will install the distribution into an existing Python installation * The output generally gets put in a `dist` subdirectory * Commands include the above output commands, as well as intermediate commands that are run on the way to producing the above * Each command can have command-specific options (as well as the generic options passed to `setup.py`) * Options can be specified per-command either in `setup.py` or in `setup.cfg`, or on the command-line * `build` is an intermediate command which runs: * `build_py` for pure modules (simple copy to the build directory) * `build_ext` for compiling C/C++ extensions, links them to build directory * `build_clib` for building C/C++ libraries * `build_scripts` which copies them and alters the `#!` line * `install` then installs everything from the build directory to the target (separate steps like `build`) * `clean` cleans up the build directory * You can also `register` the command with the Python package index, and `upload` it * Scripts can include things to run (which can be installed into the path), as well as a post-installation/pre-removal script * Things to consider: * If you've got a bunch of related files, should they be in a package? Otherwise they can clutter the standard Python `site-packages` directory * Can you cleanly regenerate your source distribution from itself * Package dependencies: a distribution can provide, require or obsolete packages * Consider creating distributions on the same platform as you're targetting, or from other platforms, or both * How are you going to build your distribution for different platforms? * How are you going to deliver your distribution to people on different platforms, including dependencies? * How will your distribution interact with the native packaging on the target platforms (if any) === The world of eggs === [http://pypi.python.org/pypi/setuptools setuptools] is a set of extensions to `distutils` that try and bring it into the modern age: * Adds proper dependency support to Python packages * Lots of surrounding tools - `easy_install`, `pkg_resources` etc - very simple ways of installing stuff from the standard Python repository * Lots more tools being built around this format * Simply import `setuptools` instead of `distutils` * Does not integrate directly with distros' packaging systems * Supports parallel versions of the same library reasonably well (you can `require` a specific version and use it even if a different one is present - however conflicts can arise) * Not that good at uninstallation etc === The world of RPM === * [http://www.rpm.org/ RPM] is a packaging format that originated with [http://www.redhat.com/ RedHat], and is now used in [http://www.redhat.com/rhel/ RHEL]/[http://centos.org/ CentOS]/[http://fedoraproject.org/ Fedora], [http://www.novell.com/linux/ SUSE Enterprise]/[http://opensuse.org/ openSUSE], [http://www.mandriva.com/en/linux/ Mandriva] as well as being part of the [http://www.linuxfoundation.org/en/Specifications Linux Standard Base] * Input: A `.spec` file defines a rpm's metadata, how to build both a source and a binary rpm, what sources to use, patches to apply, etc, etc * This is a more data-driven format than `setup.py` * The `.spec` file contains a header section for general information as well as * script sections for `%prep`aring the build, `%setup` (unpacking the source and packaging), `%build`ing the source, `%install`ing (into a `BUILD_ROOT` directory`), `%check`ing the results, and `%clean`ing up * `%pre` and `%post` installation scripts, and `%preun` and `%postun` uninstallation scripts * A list of the `%files` included in the rpm (including categories like `%doc`umentation, and file attributes) * A `%changelog` * Support for macros, including shell execution to define macros * Support for subpackages - multiple rpms can be built from the same `.spec` file * From `distutils`, `bdist_rpm` generates a source distribution, creates a spec file, generates a source rpm from that, and then a binary rpm * What's really useful about `rpm` is dependency based on repositories * Different distros have different tools for this, and they vary in capability between releases - `RHEL/CentOS/Fedora` use [http://yum.baseurl.org/ yum] * There isn't that much functional difference between the `.deb` format/the `.rpm` format, and `apt-get`/`yum` etc - for the user. There is for the packager... * You can specify dependencies as options to `bdist_rpm` (in `setup.py`, `setup.cfg` or on the commandline) - they don't seem to get included from the normal package metadata * Distros/repositories have different standards and requirements for inclusion of rpms and `.spec` files * `bdist_rpm`'s automatically generated `.spec` files will generally not meet these requirements * The general feeling is that for inclusion into a distro, `.spec` files should be hand-generated and maintained * Are you targeting your rpms for inclusion in a Linux distro? Read and follow the rules and follow the procedures... * Otherwise you may be happy with the standard `distutils` stuff * Generally packages using `setuptools` can install their `.egg` information alongside the source code and include that in the rpm * Targetting older distros can be tricky if you are using lots of modern Python stuff * Distros have their own version of Python - you may require a newer one. Typically this can be installed alongside as something like `python25.rpm` - often these rpms exist, sometimes you have to rebuild them * In that case you will need to package all the dependencies for the new Python - usually called `python25-babel` etc * Doing this on multiple distros can be tiring. Since you're not targetting inclusion in the old distros, try and get away with murder (or at least, functional packages rather than beautifully crafted ones) * We have a tool called [http://trac.sjsoft.com/browser/packaging/centuryegg centuryegg] for targetting older distros * This is currently reasonably specific to our set of requirements * Order of priority: * Use existing `rpm`s from the target distro * Backport `rpm`s from newer releases of the target distro * Spin our own `rpm`s from the eggs in PyPI * Target is to be able to automatically source and download all the requirements from a simple list, generate the rpms, and upload them into a repository * We should make it more generic - is anyone interested? * Generating your own repository * You will generally need different repositories for different distros and versions (even apparently equivalent ones like `RHEL4`/`CentOS 4`) * Usually this just involves compiling (strongly recommend doing this ''on the target distro'' - we had strange crashes due to minor library versioning differences etc) * You then just need to run something like `createrepo` and put the files in a web space === The world of Windows === * `bdist_win32` is fine for lots of purposes - especially if distributing packages to other developers * `bdist_msi` has now also been added, that produces packages in Windows installer's MSI format (this is also used to produce the python msi itself) * Building extension libraries can be a nightmare (if they use the Visual Studio C library) due to DLL hell - they need to match both the exact VS C library that Python was built with, and the exact VS C library that any other libraries they use link against * For distributing applications, most Windows users expect a single install, and may be confused by having the Python runtime environment set up on their machine with lots of libraries * [http://www.py2exe.org/ py2exe] is the most popular of a variety of tools for producing a ''frozen'' Python distribution on Windows: * An extension to `distutils` * Packages up the Python runtime, and a set of Python packages, modules, extensions, scripts and data, into a target directory * Scripts are converted to stub `win32` executables that load the Python dll and execute some code * Automatic search for Python library dependencies (by scanning your source code for `import` statements), as well as manual specification of requirements * All the libraries you depend on need to be included - sometimes having things installed as `eggs` on the build environment can produce problems * Running in frozen mode often requires some changes to the underlying code for compatibility - location of files etc - lots of tips on the wiki site * It's fairly common to use [http://www.jrsoftware.org/isinfo.php InnoSetup] or [http://nsis.sourceforge.net/ NSIS] to produce an installer containing all the required files, Start menu items, etc === Putting it all together === * It makes life easier if you can do all the above, and whatever other targets you require, from the same `setup.py` script * Separate out options for the different commands as much as possible * We found we had to hack `distutils` a lot with derived code to make it all work * Beware of dependencies from `setup.py` - somebody may be trying to do something that doesn't need the dependency, so trap `ImportError`s etc == Debian Packaging == [http://tumbleweed.org.za/ tumbleweed]'s talk on building debian packages: * Package your source up nicely using `sdist` * Things get named `name_version.orig.tar.gz` * Unpack the source distribution, and run `dh_make` * Look through the different example files that are produced: * There are loads of them, you don't have to use all of them * `debian/control` tells what different packages should be produced * You can have separate `binary` and `source` files * `debian/copyright` is important to signal the licenses etc, following Debian rules * `debian/rules` is a makefile that generates a deb * It's a lot of work to write all the rules, so there are a lot of helper tools * This used to involve lots of separate build steps * debhelper 7 is the way to go nowadays... Now you just use `dh`: {{{ #!/usr/bin/make -f %: dh $@ }}} * You can have pre- and post- install scripts * For a python package, you just need a few files (those mentioned above) * `dch` lets you edit `debian/changelog` * To install scripts, create a little script that loads the module (`debian/hellopy`) and add the instruction to install in `debian/install`: `debian/hellopy /usr/bin` * Call the package `python-$libname` to follow convention * `Architecture`: `any` means build on every architecture; `all` means just build one package for all libraries * `Description` - first line is summary, the rest is detail * Format is the same as email: To continue on a new line, put a space as the first character; for a blank line use `` like so: ` .` * `XS-Python-Version`: You can set this to `all` or limit to greater than `2.5` etc * `Depends`: calls the macro `${python_Depends}` * There are two competing ways to do python dependencies on debian: `python-central` and `python-support` * `python-central` is maintained by the Python maintainer, used by 10-20% of the packages. * `python-support` is much more popular, and the maintainer is more reactive. Integrates nicely with `debhelper 7` * Specify `Build-Depends: debhelper (>= 7), python-support` * Patching the source directly is frowned upon, there are systems for managing patches * Put `3.0 (quilt)` in `debian/source/format` and then you can use `quilt` to maintain patches * Then you build with `debuild`: * Your package should automatically clean everything it creates - `clean` gets run before `build` * Everything gets installed under a fake root, so you need to specify anything that your build requires * `debc` lists the contents of the deb file * Then you install... * As you install, `python-support` will automatically byte compile your python code for you by calling `update-python-modules` * `/usr/lib/pymodules/python2.x` contains symlinks to the original code under `/usr/share/pyshared/` as well as the byte-compiled files for that version * Extensions go into `/usr/lib/python-support/python-$libname/python2.x/` * These symlinks and byte-compiled files aren't actually owned by the package, but will be regenerated/cleaned up if you run `update-python-modules` for that package again * More advanced ideas: * You now don't have to have a `.diff.gz` containing the debian additions to a package, you can use `.debian.tar.gz` containing them instead * A lot of the packaging work is being pedantic, making sure it will be correct no matter who runs it * You can override `dh` rules in `debian/rules` to do things like adjusting automatic compression of files * To get your package into Debian, hop onto the `debian-python` IRC channel and ask for someone to help - they'll check through your package and hlpe you fix mistakes == Future of Python packaging == Simon Cross's description of where things are going: * Tarek Ziade is working towards producing a decent standard packaging library for Python, codenamed `distutils2. * This basically replaces things as follows: * `distutils2` replaces `distutils`, as well as adding various things from `setuptools`: `egg-info`, recording installed files (for uninstall etc), entry points, and version info * `Distribute` replaces the rest of `setuptools` (focus on '''building''') * `pip` replaces `easy_install` (focus on '''install/uninstall''') * The idea is that `distutils2` will be included in the standard library and will be extensible enough for `Distribute` (or anyone similar library) to produce building tools, or `pip` (or any similar library) to produce installation tools * Thus the Linux distros will be able to make their own tools to integrate with their packaging systems (if they so desire), on top of the standard `distutils2` * To produce ''low quality'' `deb` files there are a few options - `stdeb` is the most actively maintained and useful one at the moment