Bug 652852
Summary: | meta data sync to ISS slave failed | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Luc de Louw <luc> | ||||||
Component: | Satellite Synchronization | Assignee: | Michael Mráka <mmraka> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Jiri Kastner <jkastner> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 540 | CC: | cperry, degts, fdewaley, jfenal, jhutar, jkastner, joshuadfranklin, jwest, marcus.moeller, mmraka, mzazrivec, raud, rvandolson, sandro, stanislav.polasek, stephan.duehr, taw, xdmoon | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | spacewalk-backend-1.2.13-19 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-12-13 14:31:42 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 646488 | ||||||||
Attachments: |
|
Description
Luc de Louw
2010-11-12 23:30:53 UTC
Please also see Case #378636 We have exactly the same problem on our 5.4 ISS slave Greets Marcus In meantime, I detected more affected RPMs: Python https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10403 e2fsprogs https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10400 glibc on some systems https://rhn.redhat.com/rhn/errata/details/Details.do?eid=10405 I was not able to detect some pattern to determine which kind of packages are affected. Workarounds: - Temporary sync your slave directly with rhn if you can. - Download the packages from the (slave) satellite and yum localinstall it. At the moment it seems that only RHEL5 base-channels are affected. RHEL6 untested. Once one had a ISS between two sat540, it seems that the database is affected, it does not help to delete /var/cache/rhn/* and sync directly with rhn, as the repodata is built from information out of the database. I wonder if restoring the sat530 db and a subsequent spacewalk-schema-upgrade would help to fix the problem. We have encountered the cause for this problem: ISS seems to ignore symlinks and directories. Please compare the filelist from an affected package (e.g. tcsh) on the ISS master and the slave and you will notice the difference. Greets Marcus A random pick of packages on the RHEL6 base channel shows more strange things: On the Master https://sat.expample.com/network/software/packages/file_list.pxt shows that bzip2-1.0.5-7.el6_0.x86_64.rpm only consists of directories and symlinks, no files. abrt-1.1.13-4.el6.x86_64.rpm only consists of directories, so files, no symlinks. As Marcus found out, directories and symlinks are ignored. On the Slave satellite this leads in such nice displays like "This package contains the following files. No files." Fixed in spacewalk nightly: commit 73c920ce0329b5aa7bde6fbb270b122263ca369e 652852 - dirs and links have no checksum Package spacewalk-backend-1.3.6-1. Dear Michael, what happens to already damaged repodata in the db? Is there a way to force recreation? Greets Marcus Hi Marcus, unfortunately there's no way better than remove channel from db (spacewalk-remove-channel --just-db should be enough) and let it sync again. Regards, Michael Does that have any influence on registered systems? Greets Marcus Hi Marcus, Unfortunately it seems to be the case: slave:~# spacewalk-remove-channel --justdb -c rhel-x86_64-server-6 Currently there are systems subscribed to one or more of the specified channels. If you would like to automatically unsubscribe these systems, simply use the --unsubscribe flag. The following systems were found to be subscribed: org_id id name -------------------------------- 2 1000000000 rhel6-test slave:~# Maybe restoring the sat530 db and a subsequent spacewalk-schema-upgrade would be an option? Of course only if you did not registred too many systems since the upgrade. Greets, Luc Hmm, that's not a working scenario in our case. I cannot force 500+ system to re-register. Greets Marcus Markus and Luc, hold your horses please. This has been just fixed in Spacewalk upstream. The oficial Satellite errata will be release soon, once it will go through appropriately QA cycle! Regards, Michael Wild Apache (hardly) can hold back his horses... Nobody is going to install nightlies on a production (already crippled) satellite Marcus and I was talking about how to recover from the crippled DB once the fix is available... Once the fix is officially available on rhn, we still have crippled slave satellites. Will there be a solution and how does this solution looks like? Greets, Luc Spacewalk git: commit d4bee4ec00fc89e00dd5c74a684298ebf0e2f686 added --skip-channels to spacewalk-remove-channel this is the way how to remove all packages from the channel(s) spacewalk-remove-channel in spacewalk night now has ability to delete all packages from channel but not the channel. This is the way how we to refresh packages without been forced to remove and re-register all servers. How to recover from the bug: * install new spacewalk-backend package (both on ISS master and slave) * on ISS slave flush sync package cache: rm -rf /var/cache/rhn/satsync/packages/* * on ISS slave remove packages sync'ed over ISS from database (withnout deleting channels themselves): for i in <list of channels sync'ed over ISS> ; do spacewalk-remove-channel -c $i --just-db --skip-channels done * resync packages back (In reply to comment #21) > How to recover from the bug: > * install new spacewalk-backend package (both on ISS master and slave) > * on ISS slave flush sync package cache: rm -rf > /var/cache/rhn/satsync/packages/* > * on ISS slave remove packages sync'ed over ISS from database (withnout > deleting channels themselves): > for i in <list of channels sync'ed over ISS> ; do > spacewalk-remove-channel -c $i --just-db --skip-channels > done > * resync packages back Dear Michael, This message greatly relieves my concerns about the data base, thanks a lot. Is there any plan when those fixes will get released? Sorry, I've got our security officers at my neck. Thanks, Luc Hi - in short, as soon as we can. Beyond just testing this specific bug and path to recovery as outlined, we will also, where appropriate create automated tests for this and run general regression/automated tests as appropriate. Regards, Cliff. One more forgotten commit in spacewalk git: commit 4bd2be58dc7da4a43804bd3cf7c8610e5afe284f don't require server unsubscribe when --skip-channels is used It's impossible to delete data from base channels as the script always complains that there are existing child channels associated with the base channel: Error: cannot remove channel rhel-x86_64-ws-4: subchannel(s) exist: rhel-4ws-x86_64-epel-os rhel-4ws-x86_64-vmwaretools-4.0 rhel-x86_64-ws-4-extras rhel-x86_64-ws-4-fastrack rhn-tools-rhel-4-ws-x86_64 If the --skip-channels parameter is given child channel associations should simply be ignored. NOTE - Channel Dumps in RHN - due to the dumps being created from a 5.4 Satellite with this bug, the channel dumps published on RHN Nov-14 which included RHEL 6 dumps also have this bug. As such, if you have a Satellite which imported data from those dumps you will need to remove the package content (as outlined in how to recover comments previously in this bug) and then reimport once new dumps are published (or sync'ing from RHN). We are going to apply the fix to the Channel Dump creator 5.4 Satellite used and start generating fresh dumps. Cliff. Created attachment 915169 [details]
Comment
(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
verified. must be deleted /var/cache/rhn/satsync/* content on master and slave, otherwise generated repodata will be still incosistent. Slight modification from comment #21 * install new spacewalk-backend package (both on ISS master and slave) * on ISS master flush sync cache: rm -rf /var/cache/rhn/satsync/* * on ISS slave flush sync package cache: rm -rf /var/cache/rhn/satsync/packages/* * on ISS slave remove packages sync'ed over ISS from database (withnout deleting channels themselves): for i in <list of channels sync'ed over ISS> ; do spacewalk-remove-channel -c $i --just-db --skip-channels done * on ISS slave resync packages back - this will generate new cache for export on master and them re-imported to slave Satellite. Basically the old instructions were missing the delete cache from master step. As for anyone using Channel Dumps to import and the bumps were bad, you would run the same steps as outlined from Slave Satellites. Cliff This issue does not seem to be fixed after we applied the provided steps after having received hotfix_spacewalk-backend-1.2.13-16.el5sat.tar from Red Hat support. I noticed satellite reports "no files" for certain packages again. So far I noticed this for glibc and python in some but not all channels. I just noticed, that we now miss on the ISS slave: - all kickstart profiles - all kickstartable trees I don't know if this is directly connected to this hotfix or the provided steps but it would be a huge coincidence if not. Slight modification from comment #21 & #40 - we need to remove the xml* directories from the master cache directory, not satsync. This was a mistake on my part in reading private comments that wasn't noticed till today. * install new spacewalk-backend package (both on ISS master and slave) * on ISS master flush sync cache: rm -rf /var/cache/rhn/xml* rm -rf /var/cache/rhn/satsync/* * on ISS slave flush sync package cache: rm -rf /var/cache/rhn/satsync/packages/* * on ISS slave remove packages sync'ed over ISS from database (withnout deleting channels themselves): for i in <list of channels sync'ed over ISS> ; do spacewalk-remove-channel -c $i --just-db --skip-channels done * on ISS slave resync packages back - this will generate new cache for export on master and them re-imported to slave Satellite. As for anyone using Channel Dumps to import and the bumps were bad, you would run the same steps as outlined from Slave Satellites. Cliff A quick note to anyone on cc following this. Due to comments made in this bug earlier today, even though it is VERIFIED by QA. We are re-running some further validation/confirmations before releasing this Errata and bugfix. Cliff This bug is very closely related to bug 659348. An extension to updatedPackages.py has been made to fix data affected by both bugs. New solution: * install new spacewalk-backend package (both on ISS master and slave) * flush sync and exporter cache on both ISS master and slave: rm -rf /var/cache/rhn/xml* rm -rf /var/cache/rhn/satsync/* * to fix previously sunced data run: upgradePackages.py --update-package-files An extension to updatePackages.py to fix already synced data (spacewalk git) commit c73d420b42791122ca482c3f89ffc8e69b790a59 update-packages: update package file list functionality --update-package-files fixes following problems: 1. Bug #652852 - meta data sync to ISS slave failed. This script will be able to insert package files which did not make it into the database as a result of corrupted channel / ISS export. 2. Bug #659348 - file list of rhel6 package shows (Directory) instead of checksum. This script will be able to set correct checksum to package files which lack it. All package information are being retrieved from packages on the filer. Backported to satelite git: commit 7cbacad481307ccac8cbec17176002236c729822 update-packages: update package file list functionality ... (cherry picked from commit c73d420b42791122ca482c3f89ffc8e69b790a59) From our observations neither the process described in comment 43 nor comment 46, syncs the kickstartable trees. Also RHEL 6 channels still show packages without content. Created attachment 467404 [details]
output while trying to sync kickstartable trees from ISS master
A little updated concerning rhel 6 packages with no files: it seems that packages where an errata/update has been published for, the updated package shows content, but the initial package does not. Marcus Moeller - Please us Red Hat support to help diagnose this. We suspect you may have issues resulting from earlier version of the hotfix. While we appreciate the feedback in this bug, it needs to be troubleshoot correctly via our support channels. Regards, Cliff An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0974.html |