Red Hat Bugzilla – Full Text Bug Listing
|Summary:||corrupted RPMs when installing full-virt guest|
|Product:||[Fedora] Fedora||Reporter:||Angus Thomas <athomas>|
|Component:||xen||Assignee:||Juan Quintela <quintela>|
|Status:||CLOSED WONTFIX||QA Contact:||Martin Jenner <mjenner>|
|Version:||6||CC:||herbert.xu, jmh, katzj, xen-maint|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-02-26 18:47:43 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||219275, 224214|
Description Angus Thomas 2006-12-08 10:08:09 EST
Description of problem: Installing a full-virt xen guest is failing, when exaclty the same install succeeds on non-virtual hardware. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-1.2849.fc6 xen-3.0.3-0.1.rc3 virt-manager-0.2.6-2.fc6 How reproducible: Always Steps to Reproduce: 1. Create a full-virt guest, via virt-manager 2. Boot RHEL3 U6 32 bit install media 3. Kickstart install of guest, with no manual intervention. Actual results: Install halts midway through installing RPMs. Reports that there was an error installing a particular package. /mnt/sysimage/root/install.log contains: <snip> Installing comps-3AS-0.20050921.i386. error: unpacking of archive failed on file /usr/share/comps/i386/hdlist2;4579147a: cpio: MD5 sum mismatch N.B. This occurence, with the comps RPM, is the most recent. Earlier, the same issue occured with the glibc RPM. /mnt/sysimage/var/tmp/ contains a file called "comps.rpm", whose md5sum, and size, are not the same as the rpm in the install media. Using the same boot media, and the same kickstart file, repeatedly installs without errors on non-virtual hardware. Expected results: Successful insttallation (as on "real" hardware) Additional info: Anaconda is being run with the following arguments: skipddc nofb nousb ksdevice=eth0 ip=172.16.100.50 netmask=255.255.255.128 gateway=172.16.100.1 dns=184.108.40.206 ks=http://172.16.100.41/172.16.100.50-ks.cfg Anaconda is being directed to retrieve the RPMs via http. There's plenty of unused space on each volume under /mnt/sysimage
Comment 1 Stephen Tweedie 2006-12-08 10:38:41 EST
I've got only a few ideas. We've had problems with interactions between Xen networking and IP checksum code in the past --- could you perhaps run with an httpd served from the local host to eliminate that possibility? What NIC are you using, too?
Comment 2 Stephen Tweedie 2006-12-08 10:40:09 EST
Adding Herbert to CC, as this appears possibly to involve checksum problems over the network (to be verified.)
Comment 3 Angus Thomas 2006-12-08 12:27:22 EST
Retried, having moved the http install tree to dom0. The guest install still aborts, with anaconda reporting md5sum errors in unpacking comps-3AS-0.20050921.i386 I've verified that the comps-3AS-0.20050921.i386 RPM under the DocumentRoot on dom0 is valid. Please disregard the reference above to comps.rpm under /mnt/sysimage/var/tmp. That is an uncorrupted copy of the comps.xml from under base/ on the install media and is unrelated to the comps-3AS-0.20050921.i386 RPM that is failing to install. The machine has two NICs. lspci describes them both as: 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
Comment 4 Herbert Xu 2006-12-11 00:35:25 EST
Hmm, I'm not sure how a network checksum error can cause file size to change. The most likely result of a checksum error is the packet being dropped and the connection hung. What exactly is the size difference? Is it just truncation? If so is the non-truncated part identical to the original?
Comment 5 Angus Thomas 2006-12-11 05:39:27 EST
In fact, there isn't a size difference. I was mistakenly assuming that the comps.rpm on local disk was a copy of the comps package that is failing to install. The two different sizes are actually different packages. Here's the real problem: The relevant error message is that /mnt/sysimage/root/install.log contains: <snip> Installing comps-3AS-0.20050921.i386. error: unpacking of archive failed on file /usr/share/comps/i386/hdlist2;4579147a: cpio: MD5 sum mismatch I've used "rpm -K" to verify that the version of the comps-3AS-0.20050921.i386 rpm that is being served by httpd running in dom0 isn't corrupt.
Comment 6 Jan Mark Holzer 2006-12-11 10:56:51 EST
Tried the same on a RHEL5 Beta 2 (2747) system (woodie) and run into the same problems with checksum errors if HTTP is used to install RHEL-3 (tried U6 and U7). Using NFS or FTP works fine. Also installing RHEL-4 via HTTP works fine , So it seems the problem is "isolated" to RHEL-3 install via HTTP .... If someone wants to login to the RHEL5 Beta 2 systems please contact me and I'll provide the details offline.
Comment 7 Stephen Tweedie 2006-12-11 11:26:05 EST
OK, many thanks. Is woodie running the same (Intel 1G) NIC?
Comment 8 Stephen Tweedie 2006-12-11 11:29:47 EST
Created attachment 143296 [details] [XEN] Extend emulator to fully decode ModRM and SIB bytes. Please also try with this patch. It's a hypervisor patch, so much faster to test than anything against the kernel. This is from upstream changeset 12528, backported to current CVS HEAD RHEL5 kernel-xen.
Comment 9 Jan Mark Holzer 2006-12-12 05:29:02 EST
Yes our local system is using the same Ethernet 1Gb card [root@woodie ~]# lspci|grep Ethe 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) Also have been doing some more testing with no major breakthrough . Tried various newer kernels but still keep getting corrupted RPMs I installed/booted the following kernels : Installation via Kernel Revision RHEL3/HTTP RHEL3/FTP RHEL4/HTTP RHEL5 Beta 2 (rev 2747) FAIL SUCCESS SUCCESS RHEL5 RC3 (rev 2839) FAIL SUCCESS SUCCESS RHEL5 RC2 (rev 2817) FAIL SUCCESS SUCCESS Initial results brought up hope RC3/2839 may indeed help but after 2 successful installations we still ended up with corrupted RPMs . I have not yet tested with the above patch as the nightly build had not completed on the 11th . Will try earlier kernels/HVs to see when/if it ever worked .
Comment 10 Angus Thomas 2006-12-12 08:33:32 EST
Using a hypervisor which has the patch from Comment #8, I've been unable to reproduce the error after several attempts.
Comment 11 Jan Mark Holzer 2006-12-12 09:58:46 EST
Update on testing with RHEL5 B2 + HV patch from #8 . I have now installed 15 RHEL3 guests using HTTP without any problems. In the past we've never been able to get through more than 3 guest installs without hitting the corruption problem.
Comment 12 Stephen Tweedie 2006-12-12 18:07:42 EST
Patch seems to work, reassigning for FC6 merge.
Comment 13 Angus Thomas 2007-01-19 08:18:27 EST
The same behavior has returned. I'm running 32 bit kernel-xen-2.6.18-4.el5 on a 2.0GHz Intel Xeon 5130 Woodcrest Dual-Core, installing RHEL3 via HTTP into a full-virt guest. Same results as previoulsy: The install is terminating, complaining about an error installing glibc. /mnt/sysimage/root/install.log reports an MD5 sum error mismatch on a specific library from within the RPM. As previously, the RPM is not corrupted.
Comment 14 Angus Thomas 2007-01-19 10:13:41 EST
I should have mentioned that the failure from Commment #13 is reproducable. By contrast, I've just used the same boot media & kickstart, against the same install tree, to successfuly create the RHEL3 guest on a woodcrest running the 64 bit kernel-xen-2.6.18-1.2910.el5. Unless the 64/32 bitness makes a comparison irrelevant, the issue may be a regression since 2.6.18-1.2910.el5. I'll retest with an older 32 bit kernel
Comment 15 Angus Thomas 2007-01-22 11:11:45 EST
Moving back to an older kernel-xen RPM to check when a regression occured hasn't been successful. I've switched back to the 32 bit kernel-xen-2.6.18-1.3014.el5 RPM, but that fails to install in a totally different way: Hanging during kernel boot. With the latest 32 bit RHEL5 kernel-xen RPM (2.6.18-4.el5), I'm still able to reproduce the memory corruption during install.
Comment 17 Red Hat Bugzilla 2007-07-24 20:03:30 EDT
change QA contact
Comment 18 Chris Lalancette 2008-02-26 18:47:43 EST
This report targets FC6, which is now end-of-life. Please re-test against Fedora 7 or later, and if the issue persists, open a new bug. Thanks