Bug 219275 - corrupted RPMs when installing full-virt guest
corrupted RPMs when installing full-virt guest
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.0
All Linux
high Severity high
: ---
: ---
Assigned To: Stephen Tweedie
:
Depends On: 218926
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-12 07:38 EST by Stephen Tweedie
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-26 16:19:36 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Stephen Tweedie 2006-12-12 07:38:36 EST
Has also been reproduced on RHEL-5 hosts.

+++ This bug was initially created as a clone of Bug #218926 +++

Description of problem:
Installing a full-virt xen guest is failing, when exaclty the same install
succeeds on non-virtual hardware.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-1.2849.fc6
xen-3.0.3-0.1.rc3
virt-manager-0.2.6-2.fc6

How reproducible:
Always

Steps to Reproduce:
1. Create a full-virt guest, via virt-manager 
2. Boot RHEL3 U6 32 bit install media
3. Kickstart install of guest, with no manual intervention.
  
Actual results:
Install halts midway through installing RPMs. Reports that there was an error
installing a particular package.

/mnt/sysimage/root/install.log contains:
<snip>
Installing comps-3AS-0.20050921.i386.
error: unpacking of archive failed on file
/usr/share/comps/i386/hdlist2;4579147a: cpio: MD5 sum mismatch

N.B. This occurence, with the comps RPM, is the most recent. Earlier, the same
issue occured with the glibc RPM.

/mnt/sysimage/var/tmp/ contains a file called "comps.rpm", whose md5sum, and
size, are not the same as the rpm in the install media.

Using the same boot media, and the same kickstart file, repeatedly installs
without errors on non-virtual hardware.

Expected results:

Successful insttallation (as on "real" hardware)

Additional info:

Anaconda is being run with the following arguments: 
skipddc nofb nousb ksdevice=eth0 ip=172.16.100.50 netmask=255.255.255.128
gateway=172.16.100.1 dns=139.149.131.156
ks=http://172.16.100.41/172.16.100.50-ks.cfg

Anaconda is being directed to retrieve the RPMs via http.

There's plenty of unused space on each volume under /mnt/sysimage

-- Additional comment from sct@redhat.com on 2006-12-08 10:38 EST --
I've got only a few ideas.  We've had problems with interactions between Xen
networking and IP checksum code in the past --- could you perhaps run with an
httpd served from the local host to eliminate that possibility?  What NIC are
you using, too?


-- Additional comment from sct@redhat.com on 2006-12-08 10:40 EST --
Adding Herbert to CC, as this appears possibly to involve checksum problems over
the network (to be verified.)

-- Additional comment from athomas@redhat.com on 2006-12-08 12:27 EST --
Retried, having moved the http install tree to dom0. The guest install still
aborts, with anaconda reporting md5sum errors in unpacking comps-3AS-0.20050921.i386

I've verified that the comps-3AS-0.20050921.i386 RPM under the DocumentRoot on
dom0 is valid.

Please disregard the reference above to comps.rpm under /mnt/sysimage/var/tmp.
That is an uncorrupted copy of the comps.xml from under base/ on the install
media and is unrelated to the comps-3AS-0.20050921.i386 RPM that is failing to
install.

The machine has two NICs. lspci describes them both as:
06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
Ethernet Controller (Copper) (rev 01)

-- Additional comment from herbert.xu@redhat.com on 2006-12-11 00:35 EST --
Hmm, I'm not sure how a network checksum error can cause file size to change. 
The most likely result of a checksum error is the packet being dropped and the
connection hung.

What exactly is the size difference? Is it just truncation? If so is the
non-truncated part identical to the original?

-- Additional comment from athomas@redhat.com on 2006-12-11 05:39 EST --
In fact, there isn't a size difference. I was mistakenly assuming that the
comps.rpm on local disk was a copy of the comps package that is failing to
install. The two different sizes are actually different packages.


Here's the real problem:

The relevant error message is that /mnt/sysimage/root/install.log contains:
<snip>
Installing comps-3AS-0.20050921.i386.
error: unpacking of archive failed on file
/usr/share/comps/i386/hdlist2;4579147a: cpio: MD5 sum mismatch

I've used "rpm -K" to verify that the version of the comps-3AS-0.20050921.i386
rpm that is being served by httpd running in dom0 isn't corrupt.



-- Additional comment from jmh@redhat.com on 2006-12-11 10:56 EST --

Tried the same on a RHEL5 Beta 2 (2747) system (woodie) and run into the same
problems with checksum errors if HTTP is used to install RHEL-3 (tried U6 and U7).
Using NFS or FTP works fine. Also installing RHEL-4 via HTTP works fine , So it
seems the problem is "isolated" to RHEL-3 install via HTTP ....
If someone wants to login to the RHEL5 Beta 2 systems please contact me and I'll
provide the details offline.



-- Additional comment from sct@redhat.com on 2006-12-11 11:26 EST --
OK, many thanks.  Is woodie running the same (Intel 1G) NIC?

-- Additional comment from sct@redhat.com on 2006-12-11 11:29 EST --
Created an attachment (id=143296)
[XEN] Extend emulator to fully decode ModRM and SIB bytes.

Please also try with this patch.  It's a hypervisor patch, so much faster to
test than anything against the kernel.	This is from upstream changeset 12528,
backported to current CVS HEAD RHEL5 kernel-xen.

-- Additional comment from jmh@redhat.com on 2006-12-12 05:29 EST --

Yes our local system is using the same Ethernet 1Gb card 
[root@woodie ~]# lspci|grep Ethe
06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)
06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)

Also have been doing some more testing with no major breakthrough .
Tried various newer kernels but still keep getting corrupted RPMs 

I installed/booted the following kernels :

                                    Installation via 
Kernel Revision                 RHEL3/HTTP   RHEL3/FTP    RHEL4/HTTP
RHEL5 Beta 2 (rev 2747)           FAIL        SUCCESS      SUCCESS
RHEL5 RC3    (rev 2839)           FAIL        SUCCESS      SUCCESS
RHEL5 RC2    (rev 2817)           FAIL        SUCCESS      SUCCESS

Initial results brought up hope RC3/2839 may indeed help but 
 after 2 successful installations we still ended up with corrupted RPMs .

I have not yet tested with the above patch as the nightly build had not
completed on the 11th .

Will try earlier kernels/HVs to see when/if it ever worked .
Comment 2 Jan Mark Holzer 2006-12-12 09:59:19 EST
Update on testing with RHEL5 B2 + HV patch from sct.
I have now installed 15 RHEL3 guests using HTTP without any problems. In the
past we've never been able to get through more than 3 guest installs without
hitting the corruption problem.
Comment 5 Jay Turner 2006-12-13 15:20:06 EST
QE ack for RHEL5.
Comment 6 Don Zickus 2006-12-17 22:00:40 EST
184896
Comment 7 Don Zickus 2006-12-18 12:48:05 EST
ignore previous useless comment
in 2.6.18-1.2910.el5
Comment 8 Jay Turner 2007-01-26 16:19:36 EST
2.6.18-7.el5 included in 20070125.0.

Note You need to log in before you can comment on or make changes to this bug.