A tool to mirror and version YUM repositories
- You invoke pakrat and pass it some information about your repositories.
- Pakrat mirrors the YUM repositories, and optionally arranges the data in a versioned manner.
It is easiest to demonstrate what Pakrat does by shell example:
$ pakrat --repodir /etc/yum.repos.d
repo done/total complete metadata
-------------------------------------------------------
base 357/6381 5% -
updates 112/1100 10% -
extras 13/13 100% complete
total: 482/7494 6%
- Mirror repository packages from remote sources
- Optional repository versioning with user-defined version schema
- Mirror YUM group metadata
- Supports standard YUM configuration files
- Supports YUM configuration directories (repos.d style)
- Supports command-line repos for zero-configuration (
--name
and--baseurl
) - Command-line interface with real-time progress indicator
- Parallel repository downloads for maximum effeciency
- Syslog integration
- Supports user-specified callbacks
Pakrat is available in PyPI as pakrat
. That means you can install it with
easy_install:
# easy_install pakrat
NOTE Installation from PyPI should work on any Linux. However, since Pakrat depends on YUM and Createrepo, which are not available in PyPI, these dependencies will not be detected as missing. The easiest install path is to install on some kind of RHEL like so:
# yum -y install createrepo
# easy_install pakrat
The simplest possible example would involve mirroring a YUM repository in a very basic way, using the CLI:
$ pakrat --name centos --baseurl http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
├── Packages
└── repodata
A slightly more complex example would be to version the same repository. To do this, you must pass in a version number. An easy example is to mirror a repository daily.
$ pakrat \
--repoversion $(date +%Y-%m-%d) \
--name centos \
--baseurl http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
├── 2013-07-29
│ ├── Packages -> ../Packages
│ └── repodata
├── latest -> 2013-07-29
└── Packages
If you were to configure the above to command to run on a daily schedule, eventually you would see something like:
$ tree -d centos
centos/
├── 2013-07-29
│ ├── Packages -> ../Packages
│ └── repodata
├── 2013-07-30
│ ├── Packages -> ../Packages
│ └── repodata
├── 2013-07-31
│ ├── Packages -> ../Packages
│ └── repodata
├── latest -> 2013-07-31
└── Packages
You can also opt to have a combined repository for each of your repos. This is
useful because you could simply point your clients to the root of your
repository, and they will have access to its complete history of RPMs. You can
do this by passing in the --combined
option when versioning repositories.
Pakrat is also capable of handling multiple YUM repositories in the same mirror run. If multiple repositories are specified, each repository will get its own download thread. This is handy if you are syncing from a mirror that is not particularly quick. The other repositories do not need to wait on it to finish.
$ pakrat \
--repoversion $(date +%Y-%m-%d) \
--name centos --baseurl http://mirror.centos.org/centos/6/os/x86_64 \
--name epel --baseurl http://dl.fedoraproject.org/pub/epel/6/x86_64
$ tree -d centos epel
centos/
├── 2013-07-29
│ ├── Packages -> ../Packages
│ └── repodata
├── latest -> 2013-07-29
└── Packages
epel/
├── 2013-07-29
│ ├── Packages -> ../Packages
│ └── repodata
├── latest -> 2013-07-29
└── Packages
Configuration can also be passed in from YUM configuration files. See the CLI
--help
for details.
Pakrat also exposes its interfaces in plain python for integration with other
projects and software. A good starting point for using Pakrat via the python
API is to take a look at the pakrat.sync
method. The CLI calls this method
almost exclusively, so it should be fairly straightforward in its usage (all
arguments are named and optional):
pakrat.sync(basedir, objrepos, repodirs, repofiles, repoversion, delete, callback)
Another handy python method is pakrat.repo.factory
, which creates YUM
repository objects so that no file-based configuration is needed.
pakrat.repo.factory(name, baseurls=None, mirrorlist=None)
Since the YUM team did a decent job at externalizing the progress data, pakrat will return the favor by exposing the same data, plus some extras via user callbacks.
A user callback is a simple class that implements some methods for handling received data. It is not mandatory to implement any of the methods.
A few of the available user callbacks in pakrat come directly from the
urlgrabber
interface (namely, any user callback beginning with download_
.
The other methods are called by pakrat, which explains why the interfaces
are varied.
The supported user callbacks are listed in the following method signatures:
""" Called when the number of packages a repository contains becomes known """
repo_init(repo_id, num_pkgs)
""" Called when 'createrepo' begins running and when it completes """
repo_metadata(repo_id, status)
""" Called when a repository finishes downloading all packages """
repo_complete(repo_id)
""" Called whenever an exception is thrown from a repo thread """
repo_error(repo_id, error)
""" Called when a package becomes known as 'already downloaded' """
local_pkg_exists(repo_id, pkgname)
""" Called when a file begins downloading (non-exclusive) """
download_start(repo_id, fpath, url, fname, fsize, text)
""" Called during downloads, 'size' is bytes downloaded """
download_update(repo_id, size)
""" Called when a file download completes, 'size' is file size in bytes """
download_end(repo_id, size)
The following is a basic example of how to use user callbacks in pakrat.
Note that an instance of the class is passed into the pakrat.sync()
call
as the named argument callback
.
import pakrat
class mycallback(object):
def log(self, msg):
with open('log.txt', 'a') as logfile:
logfile.write('%s\n' % msg)
def repo_init(self, repo_id, num_pkgs):
self.log('Found %d packages in repo %s' % (num_pkgs, repo_id))
def download_start(self, repo_id, _file, url, basename, size, text):
self.fname = basename
def download_end(self, repo_id, size):
if self.fname.endswith('.rpm'):
self.log('%s, repo %s, size %d' % (self.fname, repo_id, size))
def repo_metadata(self, repo_id, status):
self.log('Metadata for repo %s is now %s' % (repo_id, status))
myrepo = pakrat.repo.factory(
'extras',
mirrorlist='http://mirrorlist.centos.org/?repo=extras&release=6&arch=x86_64'
)
mycallback_instance = mycallback()
pakrat.sync(objrepos=[myrepo], callback=mycallback_instance)
If you run the above example, and then take a look in the log.txt
file (which
the user callbacks should have created), you will see something like:
Found 13 packages in repo extras
bakefile-0.2.8-3.el6.centos.x86_64.rpm, repo extras, size 256356
centos-release-cr-6-0.el6.centos.x86_64.rpm, repo extras, size 3996
centos-release-xen-6-2.el6.centos.x86_64.rpm, repo extras, size 4086
freenx-0.7.3-9.4.el6.centos.x86_64.rpm, repo extras, size 99256
jfsutils-1.1.13-9.el6.x86_64.rpm, repo extras, size 244104
nx-3.5.0-2.1.el6.centos.x86_64.rpm, repo extras, size 2807864
opennx-0.16-724.el6.centos.1.x86_64.rpm, repo extras, size 1244240
python-empy-3.3-5.el6.centos.noarch.rpm, repo extras, size 104632
wxBase-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 586068
wxGTK-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 3081804
wxGTK-devel-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 1005036
wxGTK-gl-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 31824
wxGTK-media-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 38644
Metadata for repo extras is now working
Metadata for repo extras is now complete
Pakrat can be easily packaged into an RPM.
- Download a release and name the tarball
pakrat.tar.gz
:
curl -o pakrat.tar.gz -L https://github.com/ryanuber/pakrat/archive/master.tar.gz
- Build it into an RPM:
rpmbuild -tb pakrat.tar.gz
- Unit tests (preliminary work done in unit_test branch)
Thanks to Keith Chambers for help with the ideas and useful input on CLI design.