Bug 1619969
Summary: | beaker-repo-update leaves corrupted packages, inconsistent repodata if a package changes while keeping the same NVR | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Dan Callaghan <dcallagh> | ||||||
Component: | general | Assignee: | Dan Callaghan <dcallagh> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Roman Joost <rjoost> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 25 | CC: | dcallagh, ed, jmckenzi, mtyson, rjoost | ||||||
Target Milestone: | 25.6 | Keywords: | Patch | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-09-03 23:31:36 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Dan Callaghan
2018-08-22 03:19:48 UTC
Workaround is to explicitly recreate the repodata for all harness repos on your Beaker server: for d in /var/www/beaker/harness/* ; do ( cd $d && createrepo_c --checksum sha --no-database . ) ; done beaker-repo-update messed up the repos on beaker-devel again yesterday -- even the RHEL5 ones were there should have been no silent NVR switcheroos (finished those for RHEL5 last week). So there is definitely something wrong with the repodata being produced by beaker-repo-update. I'm not sure what. I will grab the bad repodata that's currently on there now and attach it here, for comparison with corresponding good repodata. Seen in: https://beaker-devel.app.eng.bos.redhat.com/recipes/25446 http://beaker-devel.app.eng.bos.redhat.com/harness/RedHatEnterpriseLinuxServer5/rhts-test-env-4.74-1.el5bkr.noarch.rpm: [Errno -1] Package does not match intended download Diffing the bad vs. regenerated repodata I do indeed see that this package changed while keeping its NVR the same: <name>rhts-test-env</name> <arch>noarch</arch> <version epoch="0" ver="4.74" rel="1.el5bkr"/> - <checksum type="sha" pkgid="YES">baa9aed8ace0df55b1d3c6d8ac6922a2a73183dc</checksum> + <checksum type="sha" pkgid="YES">24c7f435f7dc3a19474ca673050220c33119f923</checksum> <summary>Testing API</summary> <description>This package contains components of the test system used when running tests, either on a developer's workstation, or within a lab.</description> <packager>Koji</packager> <url></url> - <time file="1517974945" build="1517968224"/> - <size package="45056" installed="119750" archive="124456"/> + <time file="1534813682" build="1517968224"/> + <size package="45077" installed="119750" archive="124456"/> <location href="rhts-test-env-4.74-1.el5bkr.noarch.rpm"/> I wonder if our Jenkins is doing something wrong... Created attachment 1479105 [details]
bad and regenerated repodata
I went back through all the console logs of our beaker-redhat-yum-repos jobs on Jenkins. The last time it touched rhts-test-env-4.74-1.el5bkr was job #2121 which ran for 1 hour 12 minutes starting 20 August 2018 23:58:11 UTC. That was the job for the commit "switch to Brew for RHEL4-6 harness repos". So it doesn't seem like Jenkins has done anything wrong. Indeed, the file on disk on beaker-devel has a modtime of 21 August 01:08 UTC which lines up with the above. The only mystery here is why this keeps going backwards. I first hit this problem last week (22 August according to the bug timestamp), I regenerated the repodata for all repos as per comment 1 as a workaround. Then yesterday evening (27 August) I re-ran beaker-repo-update and somehow it changed the repodata back to use the incorrect checksum. Here is something suspicious though. The checksum of the file on disk in /var/www/beaker/harness/RedHatEnterpriseLinuxServer5/ matches NEITHER the old build from beakerkoji NOR the new build from Brew: $ shasum rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-b* 24c7f435f7dc3a19474ca673050220c33119f923 rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beaker-devel baa9aed8ace0df55b1d3c6d8ac6922a2a73183dc rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beakerkoji ca223594bcf85f1cd2e3f41431d9bc68b33853b3 rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-brew It seems like the package on disk on beaker-devel is somehow corrupted. rpm -q --info shows that it's still the old build according to its RPM header: $ rpm -q --info -p rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beakerkoji | grep Build Build Date : Wed 07 Feb 2018 11:50:24 AEST Build Host : test4.dcallagh.beakerdevs.lab.eng.bne.redhat.com $ rpm -q --info -p rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-brew | grep Build Build Date : Fri 27 Apr 2018 15:16:44 AEST Build Host : ppc-030.build.eng.bos.redhat.com $ rpm -q --info -p rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beaker-devel | grep Build Build Date : Wed 07 Feb 2018 11:50:24 AEST Build Host : test4.dcallagh.beakerdevs.lab.eng.bne.redhat.com but its size matches the (slightly larger) new build from Brew: -rw-r--r--. 1 dcallagh dcallagh 45077 Aug 21 11:08 rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beaker-devel -rw-r--r--. 1 dcallagh dcallagh 45056 Feb 7 2018 rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-beakerkoji -rw-r--r--. 1 dcallagh dcallagh 45077 Apr 27 15:16 rhts-test-env-4.74-1.el5bkr.noarch.rpm.from-brew I was worried that we might have a mistake in our build-yum-repos.py script that runs in Jenkins, or the rsync command that runs at the end. I thought it might have accidentally left some mangled file on disk. But it seems not to be the case. Both /var/lib/jenkins/workspace/beaker-redhat-yum-repos/beakerrepos/harness/RedHatEnterpriseLinux5/rhts-test-env-4.74-1.el5bkr.noarch.rpm on the Jenkins slave, and /net/aoe-cluster.lab.bos.redhat.com/exports/beakerrepos/harness/RedHatEnterpriseLinux5/rhts-test-env-4.74-1.el5bkr.noarch.rpm (the download mirror) have the expected size and checksum ca223594bcf85f1cd2e3f41431d9bc68b33853b3. The corrupted package only appears on disk in /var/www/beaker/harness on beaker-devel, meaning this must be a bug in beaker-repo-update. I just noticed /var/www was full on beaker-devel. So beaker-repo-update was failing to write but not actually erroring out. I am not sure if that could be the cause of the corrupted downloads although it certainly won't help (and certainly beaker-repo-update is not smart enough to notice that a previous run produced corrupted packages either). The yum code called by beaker-repo-update appears to be using cache data in /var/tmp/yum-* in spite of all the attempts in beaker-repo-update to prevent that. So that might explain how it sometimes manages to go "back in time", if there is an error and it falls back to an old cached copy of its metadata. https://gerrit.beaker-project.org/#/c/beaker/+/6281 tests: use http:// instead of file:// for beaker-repo-update https://gerrit.beaker-project.org/#/c/beaker/+/6282 beaker-repo-update: handle missing OS majors as a special case https://gerrit.beaker-project.org/#/c/beaker/+/6283 beaker-repo-update: verify package checksums when downloading I also noticed while testing this, that if an incomplete file exists on disk beaker-repo-update (or rather the Yum/urlgrabber code we are calling into) would just download the file again and *append* it to whatever junk was there on disk. :-( At least with the above patch, the checksum verification will catch any bugs like that in future. Created attachment 1479152 [details]
beaker-repo-update run log
Verified on beaker-devel, which during running beaker-repo-update showed several entries:
2018-08-28 07:37:06,394 bkr.server.tools.repo_update INFO Unlinking bad package /var/www/beaker/harness/RedHatStorageSoftwareAppliance3/beaker-system-scan-debuginfo-2.3-3.el6bkr.x86_64.rpm
This has been released with Beaker 25.6. Release Notes: https://beaker-project.org/docs/whats-new/release-25.html#beaker-25-6 *** Bug 1625423 has been marked as a duplicate of this bug. *** |