This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 498394 - Intel i386 PV Guest Container Corrupted
Intel i386 PV Guest Container Corrupted
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
4.7.z
All Linux
low Severity low
: rc
: ---
Assigned To: Xen Maintainance List
Red Hat Kernel QE team
:
Depends On:
Blocks: 458302
  Show dependency treegraph
 
Reported: 2009-04-30 05:54 EDT by CAI Qian
Modified: 2010-05-14 05:07 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-05-14 05:06:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description CAI Qian 2009-04-30 05:54:18 EDT
Description of problem:
Seen several of those issues when trying to boot a new RHEL4.7.z kernel after upgraded from the RHEL4.7 GA kernel on the Intel i386 PV guest. Some messages from the logs.

...
end_request: I/O error, dev xvda, sector 20951
Buffer I/O error on device xvda1, logical block 10444
lost page write due to I/O error on xvda1
Buffer I/O error on device xvda1, logical block 10445
lost page write due to I/O error on xvda1
Buffer I/O error on device xvda1, logical block 10446
lost page write due to I/O error on xvda1
Buffer I/O error on device xvda1, logical block 10447
lost page write due to I/O error on xvda1
Buffer I/O error on device xvda1, logical block 10448
lost page write due to I/O error on xvda1
...

Then, the guest is failed to start.

# uname -a
Linux hp-bl480c-01.rhts.bos.redhat.com 2.6.18-92.1.24.el5xen #1 SMP Thu Jan 8
11:35:39 EST 2009 i686 i686 i386 GNU/Linux

# virsh list --all
 Id Name                 State
----------------------------------
  0 Domain-0             running
  8 rhel4u7_i386_hvm     blocked
  - rhel4u7_i386_pv      shut off

# virsh start rhel4u7_i386_pv
libvir: Xen Daemon error : POST operation failed: (xend.err "Error creating
domain: (1, 'Internal error', 'xc_dom_do_gunzip: inflate failed (rc=-3)\\n')")
error: Failed to start domain rhel4u7_i386_pv

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.0.22.EL
kernel-xen-2.6.18-92.1.24.el5

How reproducible:
I have seen it at least on two RHTS machines.

hp-bl480c-01.rhts.bos.redhat.com
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56644

dell-pe1955-02.rhts.bos.redhat.com
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56282

Steps to Reproduce:
1. install file container based RHEL4.7 PV guest.
2. install the guest's kernel to the RHEL4.7.z kernel.
3. reboot
  
Actual results:
The guest failed to start.

Expected results:
The guest started successfully with the new kernel.

Additional info:
The PV guest has the following attributes.

pv install of guest=rhel4u7_i386_pv vcpus=1 memory=1024 container=file installer=nfs
Comment 1 CAI Qian 2009-04-30 11:21:06 EDT
I have done some investigation with Martin Jenner and Mike Gahagan on this issue so far.

Those 2 machines are all using Intel Xeon CPUs.

hp-bl480c-01.rhts.bos.redhat.com  
CPUMODEL       Intel(R) Xeon(TM) CPU 3.20GHz
CPUFAMILY      15
CPUMODELNUMBER 6
http://lab.rhts.bos.redhat.com/cgi-bin/rhts/system.cgi?id=1129

dell-pe1955-02.rhts.bos.redhat.com
CPUMODEL       Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
CPUMODELNUMBER 15
CPUFAMILY      6

The problem is not always reproducible.

hp-bl480c-01.rhts.bos.redhat.com
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56643 -- working
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56644 -- corrupted

dell-pe1955-02.rhts.bos.redhat.com
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56563 -- working
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56282 -- corrupted

Even on the successful run. It has this message.
 end_request: I/O error, dev xvda, sector 7555901
 Buffer I/O error on device dm-0, logical block 918334
 lost page write due to I/O error on dm-0
https://rhts.redhat.com/testlogs/56563/189640/1586010/guest-rhel4u7_i386_pv.log

Dose it sound like we are still getting corruption, but it does not take out the guest all the time?
Comment 2 CAI Qian 2009-05-10 12:28:22 EDT
I have run additional 10 same tests on one of those affected machines, but apart from 2 jobs were aborted seems due to 4 guests could not talk to the RHTS scheduler, I can't trigger the problem any more. I'll change the severity/priority to low/low due to the unreproducible, but you can close it if feel more appropriate.
Comment 3 CAI Qian 2009-05-10 12:30:31 EDT
(In reply to comment #2)
> I have run additional 10 same tests on one of those affected machines, but

Correction -- on both of those affected machines (5 each).
Comment 4 Paolo Bonzini 2009-06-16 08:41:12 EDT
I couldn't reproduce this either, OTOH I got this:

Badness in local_bh_enable at kernel/softirq.c:141
 [<c01213a8>] local_bh_enable+0x47/0x6f
 [<c0217db9>] skb_checksum+0x133/0x25e
 [<c025160a>] udp_poll+0x66/0x113
 [<c0213ba9>] sock_poll+0x19/0x1d
 [<c016d636>] do_select+0x190/0x2c7
 [<c016d345>] __pollwait+0x0/0x9b
 [<c0144d68>] __kmalloc+0x56/0xd3
 [<c016da6c>] sys_select+0x2e7/0x45c
 [<c010740f>] syscall_call+0x7/0xb

with RH5.2 dom0 and RH4.7.z guest (more or less random, but happens often when running up2date) -- unrelated though.
Comment 5 Paolo Bonzini 2009-06-17 06:28:35 EDT
The badness in local_bh_enable is fixed in RHEL 4.8 (commit 45f38c).
Comment 6 CAI Qian 2010-05-14 05:06:05 EDT
Can't reproduce it. Will re-open it when see it again.

Note You need to log in before you can comment on or make changes to this bug.