973375 – yum incorrectly uses cached package information causing pkg checksum failures

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 973375 - yum incorrectly uses cached package information causing pkg checksum failures

Summary: yum incorrectly uses cached package information causing pkg checksum failures

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	yum
Sub Component:
Version:	6.4
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Packaging Maintenance Team
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-11 19:33 UTC by greg.2.harris
Modified:	2014-04-16 18:53 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-04-16 18:53:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description greg.2.harris 2013-06-11 19:33:46 UTC

Description of problem:

Yum will incorrectly use its own package cache on subsequent operations if an RPM command directly manipulates the RPM database shortly after a yum install operation.

Version-Release number of selected component (if applicable):
RHEL 6.4
yum-3.2.29.40.el6.centos

How to reproduce:

NOTE - for reasons to be explained below this is much more likely to happen on a filesystem like ext2/ext3 where mtime resolution is limited to 1 second.  On modern filesystems like ext4 this race is much less likely to occur.

The following bash script reproduces the problem for me:
while true; do yum -y install nc || break; rpm -e nc;  done

Actual results:

It should fail shortly with an error like:
Rpmdb checksum is invalid: pkg checksums: nc-0:1.84-22.el6.x86_64

Expected results:

The script should spin forever.

Additional info:

The error is being thrown in yum/rpmsack.py.  The check that is failing is within preloadPackageChecksums and is reproduced below:

        rpmdbv = self.simpleVersion(main_only=True)[0]
        fo = open(self._cachedir + '/pkgtups-checksums')
        frpmdbv = fo.readline()
        if not frpmdbv or rpmdbv != frpmdbv[:-1]:
            return

rpmdbv here is supposed to be the version of the RPM database from /var/lib/rpm/Packages *but* if you follow the trace you'll eventually find out it takes a shortcut in _get_cached_simpleVersion_main() [the same file]:

        #  This test is "obvious" and the only thing to come out of:
        # http://lists.rpm.org/pipermail/rpm-maint/2007-November/001719.html
        # ...if anything gets implemented, we should change.
        rpmdbvfname = self._cachedir + "/version"
        rpmdbfname  = self.root + "/var/lib/rpm/Packages"

        if os.path.exists(rpmdbvfname) and os.path.exists(rpmdbfname):
            # See if rpmdb has "changed" ...
            nmtime = os.path.getmtime(rpmdbvfname)
            omtime = os.path.getmtime(rpmdbfname)
            if omtime <= nmtime:
                rpmdbv = open(rpmdbvfname).readline()[:-1]
                self._have_cached_rpmdbv_data  = rpmdbv
        return self._have_cached_rpmdbv_data

Basically it compares the mtime on the local cache with the mtime on the Packages and if they are <= it will take the cache instead of getting the version information from the real Packages file.

The bug is that if the yum install completes in the *same second* as the rpm -e operation the '/var/lib/rpm/Packages' file will have the *same* mtime as the cache even though it is different than the cache.

The program then errors out since it will search for all the packages found in the cache and it can't find the package that was just removed by the 'rpm -e' comamnd.

The correct fix is to change the '<=' to '<' which ensures the cache *is* more recent than the actual RPM database itself.

You can see this in action by adding a sleep after the yum install in the for loop:
while true; do yum -y install nc || break; sleep 1; rpm -e nc;  done

The above doesn't hang and will loop forever.  The sleep 1 means that the rpm -e command will have a larger mtime thus forcing yum to read the rpmdb version from the database itself.

On modern filesystems like ext4 that have higher mtime resolution this problem should occur less since it will be much harder to have both commands finish in the same millsecond (or microsecond).

Comment 1 Zdeněk Pavlas 2013-06-12 12:06:16 UTC

> The correct fix is to change the '<=' to '<'

ACK, making rpmdb caching a bit more conservative should not hurt.  This should also go upstream.

Comment 2 James Antill 2013-06-18 13:56:56 UTC

Removing dev. ACK flag, as this patch breaks the caching for the common case.

Why are you using rpm directly?

Comment 3 James Antill 2013-06-18 14:01:42 UTC

> On modern filesystems like ext4 that have higher mtime resolution this problem should occur less since it will be much harder to have both commands finish in the same millsecond (or microsecond).

Also this is a slight understatement, rpm is unlikely to be able to run erase transactions 1000000000x faster than it currently does anytime soon (ext4 has nano second resolution).

Comment 4 greg.2.harris 2013-06-18 14:26:29 UTC

James,

This problem is being triggered for us by puppet. Specifically the 'remove' package action in puppet uses "rpm -e" while the 'install' package action will use yum directly.  That said, I don't see why using RPM directly should be a problem here.

Can you explain how this breaks the common caching case?  As long as yum does not write out the caches until the RPM install operation is complete they should have a newer timestamp than the RPM caches.  I agree that on filesystems with lower mtime resolution you will have the problem where the timestamp is the same.

From my read, the fallback logic here is also not to invalidate the cache but rather check the version number on the RPM cache against the version number of the yum cache to make sure the databases match.  Assuming they match I believe the cache will still be used?

I don't think this is a matter of 'making the caching more conservative'.  The caching logic as-is will sometimes take invalid caches which seems like a  bug...  Making sure the yum caches are newer (> instead of >=) is the only way to guarentee the cache is safe without inspecting the RPM database versions.

I agree with you that it's very unlikely to happen under ext4 but there are still a lot of machines (including the ones we're seeing this on) that run ext3.

-- Greg

Comment 5 RHEL Program Management 2013-10-13 23:29:13 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 6 James Antill 2014-04-16 18:53:31 UTC

(In reply to greg.2.harris from comment #4)
> James,
> 
> This problem is being triggered for us by puppet. Specifically the 'remove'
> package action in puppet uses "rpm -e" while the 'install' package action
> will use yum directly.  That said, I don't see why using RPM directly should
> be a problem here.

 That's just as broken, although maybe less observable, than the other way around.
 history and yumdb are the two obvious things that aren't done when you go to rpm directly.

> Can you explain how this breaks the common caching case?

 Because the common case is that rpmdb.simpleVersion => _put_cached_simpleVersion_main is called within a second of rpmdb finishing the transaction. So Changing the <= to < means we might as well just not bother writing the cache at all.

> From my read, the fallback logic here is also not to invalidate the cache
> but rather check the version number on the RPM cache against the version
> number of the yum cache to make sure the databases match.  Assuming they
> match I believe the cache will still be used?

 There are multiple layers to the caching. Changing this just breaks the /version cache, so yes if we have to regenerate that and it's valid then we'll still be able to use the /conflicts etc. caches that rely on it. But it's still significant. Eg. compare:

 yum version nogroups
 yum version

...the first is directly reading just the /version cache, and the second is regenerating it (because it doesn't cache the groups data). The first is roughly the same speed as python/yum init. ... the second is almost 4x that _if_ the rpmdb is in page cache, and like 12x that otherwise.

> I agree with you that it's very unlikely to happen under ext4 but there are
> still a lot of machines (including the ones we're seeing this on) that run
> ext3.

 You also need to have puppet (or something) running rpm directly, during the transaction.
 If you want you can have puppet run "yum clean rpmdb", after it alters the rpmdb directly ... which will delete all the cached rpmdb data, which is likely what you want when all that data is going to be bad anyway.

Note You need to log in before you can comment on or make changes to this bug.