Bug 498394
Summary: | Intel i386 PV Guest Container Corrupted | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Qian Cai <qcai> |
Component: | kernel-xen | Assignee: | Xen Maintainance List <xen-maint> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.7.z | CC: | clalance, mgahagan, mjenner, pbonzini, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-05-14 09:06:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 458302 |
Description
Qian Cai
2009-04-30 09:54:18 UTC
I have done some investigation with Martin Jenner and Mike Gahagan on this issue so far. Those 2 machines are all using Intel Xeon CPUs. hp-bl480c-01.rhts.bos.redhat.com CPUMODEL Intel(R) Xeon(TM) CPU 3.20GHz CPUFAMILY 15 CPUMODELNUMBER 6 http://lab.rhts.bos.redhat.com/cgi-bin/rhts/system.cgi?id=1129 dell-pe1955-02.rhts.bos.redhat.com CPUMODEL Intel(R) Xeon(R) CPU 5160 @ 3.00GHz CPUMODELNUMBER 15 CPUFAMILY 6 The problem is not always reproducible. hp-bl480c-01.rhts.bos.redhat.com https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56643 -- working https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56644 -- corrupted dell-pe1955-02.rhts.bos.redhat.com https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56563 -- working https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=56282 -- corrupted Even on the successful run. It has this message. end_request: I/O error, dev xvda, sector 7555901 Buffer I/O error on device dm-0, logical block 918334 lost page write due to I/O error on dm-0 https://rhts.redhat.com/testlogs/56563/189640/1586010/guest-rhel4u7_i386_pv.log Dose it sound like we are still getting corruption, but it does not take out the guest all the time? I have run additional 10 same tests on one of those affected machines, but apart from 2 jobs were aborted seems due to 4 guests could not talk to the RHTS scheduler, I can't trigger the problem any more. I'll change the severity/priority to low/low due to the unreproducible, but you can close it if feel more appropriate. (In reply to comment #2) > I have run additional 10 same tests on one of those affected machines, but Correction -- on both of those affected machines (5 each). I couldn't reproduce this either, OTOH I got this: Badness in local_bh_enable at kernel/softirq.c:141 [<c01213a8>] local_bh_enable+0x47/0x6f [<c0217db9>] skb_checksum+0x133/0x25e [<c025160a>] udp_poll+0x66/0x113 [<c0213ba9>] sock_poll+0x19/0x1d [<c016d636>] do_select+0x190/0x2c7 [<c016d345>] __pollwait+0x0/0x9b [<c0144d68>] __kmalloc+0x56/0xd3 [<c016da6c>] sys_select+0x2e7/0x45c [<c010740f>] syscall_call+0x7/0xb with RH5.2 dom0 and RH4.7.z guest (more or less random, but happens often when running up2date) -- unrelated though. The badness in local_bh_enable is fixed in RHEL 4.8 (commit 45f38c). Can't reproduce it. Will re-open it when see it again. |