Comparing RPM to GPT

Overview

The first question that people ask when you present a new packaging tool is "Why not just use the Redhat Package Manager (RPM)?". Although RPM has achieved remarkable success as a popular package manager it is not meant to be a solution for all deployment challenges. In fact the RPM website lists a number of tools that are designed to supplement RPM. GPT in turn is not meant to be a replacement for RPM. Instead it was developed to address some deployment issues that RPM does not. This paper discusses these deployment issues.

Relocatable Binary Packages

Making a binary package independent of any location information is extremely difficult because this information is embedded in the installed files in many different ways. RPM does not need to address this issue because most of the binary packages installed with RPM can be installed in the same location on every machine. There are a few exceptions. Libc is a famous example. RPM had to install two versions of this system library in order to support all of the packages. RPM solves this by changing the installed filenames and the name of the older libc.

On large systems such as Grid computers or computing clusters it is difficult to support packages that are not relocatable.. These types of systems usually have custom disk storage solutions. Because of this it is impossible to enforce the same directory structure across all platforms. For examples /usr/local may not have sufficient disk space on some machines to hold all of the packages that expect to be there.

GPT does not solve the binary package locatibility problem either :). What it does is provide an environmental variable (currently $GLOBUS_LOCATION) that all package can use as a root for locations. This of course means that all GPT packages have to be relocatable. But it also means that administrators have more flexibility in where software is installed. Future versions of GPT will manage a list of locations and provide a set of variables.

Installation of Multiple Versions

GPT was developed to deploy software for the Grid. This software consists of a large collection of inter-dependent packages which have independent release cycles. Because of this, multiple versions of a package will often need to be installed on the same machine to fulfill dependencies for different applications and/or services. However installing two versions of the same package in the same location leads to a file conflict. A solution is to install one version of the package with the set of software packages that depend on it in one location and the other version of the package with its dependent software packages in another location. To facilitate this GPT stores the packaging data for an installation location at the location (currently $GLOBUS_LOCATION/etc/globus_packages). Thus switching between installations is simply a matter of changing the location variable. This kind of operation is not normally done with RPM although it can be accomplished by creatively using the command line --dbpath flag.

Multi-platform Building Issues

On large systems such as the Grid, software frequently is compiled with specialized build options that include advanced operating system features. Examples of these features are number sizes (64 vs 32 bit) and message passing interface libraries. Unfortunately these features affect the linking compatibility of the binaries. Because of this, packages that are built with these features need to link with libraries from other packages that are built the same. To prepare for this situation a source package needs to generate several binary packages; each built with a separate make clean/configure/make/make install cycle using a separate set of options. RPM and for that matter most gnu build tools such as autoconf assume that a package will be built in only one way.[1] To wit, the RPM spec file is formatted to support one build cycle. When we evaluated RPM, we did manage to create a spec file that supported multiple build cycles through extensive use of RPM macros but the result was difficult to read and maintain. In addition, it was very difficult to maintain a spec file that was architecture independent [2].

GPT has a build system which manages sets of build options and generates binary packages from multiple build cycles. It also has a patch-n-build process similar to an RPM spec file which allows translation of build options to local configure variables and flags.

Accessing the Packaging Data

RPM comes with a rich set of packaging data fields. However we have found some items that we felt should be included in the packaging data. For example one of the features that grew out of GPT development was a tighter coupling between the packaging data and the build system. We defined build dependencies and other build characteristics[3] inside the packaging data which are extracted by the build system and passed into a package's makefiles during the configure stage of the build process. For instance, the package Netlogger consists of a library which is linked into communication libraries to provide logging of network events. In the packaging data we can store the name of the library (-lnetlogger) as well as a macro that needs to be added to CFLAGS (-DENABLE_NETLOGGER). A package that has a build dependency to netlogger will automatically have these items passed into its makefiles.

It is difficult to add fields such as this to RPM's packaging data although it can be done through clever use of the "Provides:" field. It is much easier to add fields to the GPT packaging data without disrupting the tools because the data is stored in XML.

XML provides another benefit with the existence of parsing libraries in multiple programming languages. This allows the packaging data to be accessed by a variety of tools. RPM has to provide a library for each language to access the packaging data because the packaging data format is internal to RPM.

Installing the Package Manager

One of the major Grid software requirements that GPT has to contend with is the large variety of UNIX platforms this software needs to be deployed on. Most of these UNIX's do not have RPM natively installed. Installing RPM on these systems is a significant effort because of all of the libraries (Zlib, bzip, BerkeleyDB) the program requires. A key point that drove the decision not to use RPM was the fact that this software could not be compiled on the Cray T3E's unicos operating system. GPT is a set of perl modules and scripts that requires much less effort to install (Although GPT does also use the Zlib library). A package manager that is complicated to install will diminish the enthusiasm a user has for the program.

Summary

This discussion was designed to give some insights into why GPT was developed as an alternative/supplement to RPM. GPT can be deployed either as a stand alone tool or as an addition to RPM. For example the globus toolkit is currently being deployed as GPT packages. The European Data Grid delivers the Globus Toolkit as GPT packages wrapped into RPMs.

Currently work is being done to have the GPT build tools generate RPMs as well as other types of packages using both RPM and the ESP Package Manager. All of this shows how GPT can be used to supplement a native package manager as well as provide package management tools for platforms lacking a packaging manager.


1 The exception is the Gnu compiler collection (GCC) which has a feature called multilib. This is used to compile compiler libraries such as libgcc and libjava with various options such as threaded/non-threaded and with/without runtime exceptions. Unfortunately this feature only applies to internal libraries used by the compilers.

2 OpenPkg is a variant of RPM which provides a more flexible build system. Unfortunately it still seems to support only one build cycle.

3 The latest version of RPM (4.x) does support build dependencies but not build environment flags..