Comparing RPM to GPT
Overview
The first question that people ask when you present a new packaging
tool is "Why not just use the Redhat Package
Manager (RPM)?". Although RPM has achieved remarkable success as a
popular package manager it is not meant to be a solution for all
deployment challenges. In fact the RPM website lists a number of
tools that are designed to supplement RPM. GPT in turn is not meant to
be a replacement for RPM. Instead it was developed to address some
deployment issues that RPM does not. This paper discusses these
deployment issues.
Relocatable Binary Packages
Making a binary package independent of any location information is
extremely difficult because this information is embedded in the
installed files in many different ways. RPM does not need to address
this issue because most of the binary packages installed with RPM can
be installed in the same location on every machine. There are a few
exceptions. Libc is a famous example. RPM had to install two
versions of this system library in order to support all of the
packages. RPM solves this by changing the installed filenames and the
name of the older libc.
On large systems such as Grid computers or computing clusters it is
difficult to support packages that are not relocatable.. These types
of systems usually have custom disk storage solutions. Because of
this it is impossible to enforce the same directory structure across
all platforms. For examples /usr/local may not have sufficient disk
space on some machines to hold all of the packages that expect to be
there.
GPT does not solve the binary package locatibility problem either :).
What it does is provide an environmental variable (currently
$GLOBUS_LOCATION) that all package can use as a root for locations.
This of course means that all GPT packages have to be relocatable.
But it also means that administrators have more flexibility in where
software is installed. Future versions of GPT will manage a list of
locations and provide a set of variables.
Installation of Multiple Versions
GPT was developed to deploy software for the Grid. This software
consists of a large collection of inter-dependent packages which have
independent release cycles. Because of this, multiple versions of a
package will often need to be installed on the same machine to fulfill
dependencies for different applications and/or services. However
installing two versions of the same package in the same location leads
to a file conflict. A solution is to install one version of the
package with the set of software packages that depend on it in one
location and the other version of the package with its dependent
software packages in another location. To facilitate this GPT stores
the packaging data for an installation location at the location
(currently $GLOBUS_LOCATION/etc/globus_packages). Thus switching
between installations is simply a matter of changing the location
variable. This kind of operation is not normally done with RPM
although it can be accomplished by creatively using the command line
--dbpath flag.
Multi-platform Building Issues
On large systems such as the Grid, software frequently is compiled
with specialized build options that include advanced operating system
features. Examples of these features are number sizes (64 vs 32 bit)
and message passing interface libraries. Unfortunately these features
affect the linking compatibility of the binaries. Because of this,
packages that are built with these features need to link with
libraries from other packages that are built the same. To prepare for
this situation a source package needs to generate several binary
packages; each built with a separate make clean/configure/make/make
install cycle using a separate set of options. RPM and for that matter
most gnu build tools such as autoconf assume that a package will be
built in only one way.[1] To wit, the
RPM spec file is formatted to support one build cycle. When we
evaluated RPM, we did manage to create a spec file that supported
multiple build cycles through extensive use of RPM macros but the
result was difficult to read and maintain. In addition, it was very
difficult to maintain a spec file that was architecture independent
[2].
GPT has a build system which manages sets of build options and
generates binary packages from multiple build cycles. It also has a
patch-n-build process similar to an RPM spec file which allows
translation of build options to local configure variables and flags.
Accessing the Packaging Data
RPM comes with a rich set of packaging data fields. However we have
found some items that we felt should be included in the packaging
data. For example one of the features that grew out of GPT
development was a tighter coupling between the packaging data and the
build system. We defined build dependencies and other build
characteristics[3] inside the
packaging data which are extracted by the build system and passed into
a package's makefiles during the configure stage of the build process.
For instance, the package Netlogger consists of a library which is
linked into communication libraries to provide logging of network
events. In the packaging data we can store the name of the library
(-lnetlogger) as well as a macro that needs to be added to CFLAGS
(-DENABLE_NETLOGGER). A package that has a build dependency to
netlogger will automatically have these items passed into its
makefiles.
It is difficult to add fields such as this to RPM's packaging data
although it can be done through clever use of the "Provides:" field.
It is much easier to add fields to the GPT packaging data without
disrupting the tools because the data is stored in XML.
XML provides another benefit with the existence of parsing libraries
in multiple programming languages. This allows the packaging data to
be accessed by a variety of tools. RPM has to provide a library for
each language to access the packaging data because the packaging data
format is internal to RPM.
Installing the Package Manager
One of the major Grid software requirements that GPT has to contend
with is the large variety of UNIX platforms this software needs to be
deployed on. Most of these UNIX's do not have RPM natively installed.
Installing RPM on these systems is a significant effort because of all
of the libraries (Zlib, bzip, BerkeleyDB) the program requires. A key
point that drove the decision not to use RPM was the fact that this
software could not be compiled on the Cray T3E's unicos operating
system. GPT is a set of perl modules and scripts that requires much
less effort to install (Although GPT does also use the Zlib library).
A package manager that is complicated to install will diminish the
enthusiasm a user has for the program.
Summary
This discussion was designed to give some insights into why GPT was
developed as an alternative/supplement to RPM. GPT can be deployed
either as a stand alone tool or as an addition to RPM. For example
the globus toolkit is currently being deployed as GPT
packages. The European Data
Grid delivers the Globus Toolkit as GPT packages wrapped into RPMs.
Currently work is being done to have the GPT build tools generate RPMs
as well as other types of packages using both RPM and the ESP Package Manager. All of this
shows how GPT can be used to supplement a native package manager as
well as provide package management tools for platforms lacking a
packaging manager.
1 The exception is the Gnu compiler
collection (GCC) which has a feature called multilib. This is used to
compile compiler libraries such as libgcc and libjava with various
options such as threaded/non-threaded and with/without runtime
exceptions. Unfortunately this feature only applies to internal
libraries used by the compilers.
2 OpenPkg is a variant of RPM which provides
a more flexible build system. Unfortunately it still seems to support
only one build cycle.
3 The latest version of RPM (4.x) does
support build dependencies but not build environment flags..