Bug 515757 - Salvage bug fixes from reverted xen-qemu-aio-scsi.patch
Summary: Salvage bug fixes from reverted xen-qemu-aio-scsi.patch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Rik van Riel
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 552573
TreeView+ depends on / blocked
 
Reported: 2009-08-05 15:57 UTC by Markus Armbruster
Modified: 2010-04-08 16:22 UTC (History)
6 users (show)

Fixed In Version: xen-3.0.3-100.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:59:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Possible fixes salvaged from xen-qemu-aio-scsi.patch (4.09 KB, patch)
2009-08-05 15:57 UTC, Markus Armbruster
no flags Details | Diff
Formatting the scsi disk for vista64 (95.87 KB, image/png)
2010-01-05 08:33 UTC, Yewei Shao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0294 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2010-03-29 14:20:32 UTC

Description Markus Armbruster 2009-08-05 15:57:07 UTC
Created attachment 356344 [details]
Possible fixes salvaged from xen-qemu-aio-scsi.patch

We backported AIO to IDE, SCSI and USB emulation for bug 465116.  SCSI and USB turned out to be flawed, and got reverted.  The flawed AIO code is mixed up with cleanups and bug fixes (upstream's fault, not the backporter's).  The bug fixes should be salvaged.

I attach an untested sub-patch of the reverted patch containing those parts that look like bug fixes to me.  We should check the reverted patch carefully for more.

Comment 2 Markus Armbruster 2009-08-07 07:48:48 UTC
I analyzed the bugs in comment#1's attachment.  I believe they are perfectly capable of corrupting data.  Whether they actually do depends on how the guest OS driver uses the virtual chip.  It's conceivable that no supported guest OS uses it in a way that leads to disaster.  Practically impossible to know for sure.

We could instrument ioemu to detect when a guest OS triggers these bug.  Triggering doesn't necessarily imply data corruption.  I doubt this is worth the effort.

As far as I know, no customer filesystems on SCSI disks have imploded so far.  This means we should walk, but not run towards a fix.

Let's patch the bugs, have QA test SCSI thoroughly, then z-stream the fix, as soon as practical, but without undue haste.

Comment 8 Yewei Shao 2009-12-31 04:59:50 UTC
QE want to verify this bug, but do not know how to verify this, could anyone give us some steps or some advice to verify this bug? Thanks.

Comment 9 Jiri Denemark 2010-01-04 09:26:11 UTC
I guess someone who actually knows something about the bugs is better for answering this question. Perhaps Rik or Markus?

Comment 10 Paolo Bonzini 2010-01-04 10:12:26 UTC
The bug fixes are only visible depending on your firmware and on the OS drivers.  Some of the changes probably are not touched in normal usage, since they would break quite horribly.  For some others, Windows seems to be more prone to showing the bugs.

You could try reproducing the scenario of bug 465116: format an emulated SCSI drive on a Windows 2008 or Vista virtual machine.  (Note that even if it doesn't work, this would not be a regression).

Comment 11 Yewei Shao 2010-01-05 08:07:50 UTC
I try to do this bug reference to the scenario of bug 465116, I try to formatting a disk on Vista 64 bit guest, and this will cause the guest go to non-response forever, please see the screenshot of the attachment and also please see my steps here:

Version:xen-3.0.3-102.el5
Host:RHEL5.4-x86_64-xen
Guest:Vista 64 bit

Steps to Reproduce:
1. Install the guest.
2. Shutdown guest, add attached scsi disk(xml file as followed), xm create
guest.
<disk type="file" device="disk">
 <source file="/var/lib/libvirt/images/a.img"/>
 <target dev="sdb" bus="scsi"/>
</disk>
3. Within disk manager, format a FAT32 partition of 1GB size

After the step 3, the guest will go to non response forever. 

But if I attach a new IDE disk to vista 64 and format a FAT32 partition of 1GB size, then it will get no panic.

I also tried this with window 2008 64 bit and attach scsi disk, format a FAT32 partition of 1GB size, then the formatting program will go to non response forever. But if I attach IDE disk and format a FAT32 partition of 1GB size, it can format it successfully.

I also test with windows 2008 32 bit and vista 32 bit, and formatting a scsi disk, it will get no error and can format it successfully.

Paolo, please help me to see whether the problem of the bug 465116 is still exist? Thanks.

Comment 12 Yewei Shao 2010-01-05 08:33:48 UTC
Created attachment 381719 [details]
Formatting the scsi disk for vista64

Comment 13 Paolo Bonzini 2010-01-05 14:51:44 UTC
So there is still a problem with SCSI disks (IDE disks are fine, while in bug 465116 neither configuration worked).

However, this BZ was merely to reintroduce some bug fixes, but this didn't necessarily mean that this scenario would work (reading bug 465116 carefully, it's probably the other way round).  I'll open a bug for SCSI, in the meanwhile I suppose you can look at the code in the srpm for sanity check.  For example, you can look for this code in tools/ioemu/hw/lsi53c895a.c:

            case 3: /* XOR */
                op0 ^= op1;
                break;

which used to say "op0 |= op1" in RHEL5.4.

Comment 14 Yewei Shao 2010-01-06 03:57:08 UTC
I check the code in xen-3.1.0-src/tools/ioemu/hw/lsi53c895a.c, I found the example like:

case 3: /* XOR */
                op0 |= op1;
                break;

So if this is the expected result? I check the patch in this bug in attachment, seems like it should be       

case 3: /* XOR */
+                op0 ^= op1;
                 break;

And also check the code and compared the code with the patch in this bug, most of these modified are not seem in the code, here is some example:
   uint32_t scratch[13]; (This seems in this bug should be uint32_t scratch[18])
  There is no "int insn_processed = 0" (Seems in this bug should have)

So seems like the code in srpm is not the same code compared with the patch in this bug in attachment.

Comment 15 Yewei Shao 2010-01-06 09:54:07 UTC
Please jump the comment #14's result.

I do this bug by following steps:
(1) rpm -ivh xen-3.0.3-102.el5.src.rpm
(2) rpmbuild -bp xen.spec
(3) go to the directory in directory: /usr/src/redhat/BUILD/xen-3.1.0-src, and check the code in /tools/ioemu/hw/lsi53c895a.c, check the code and compared with the patch in this bug in attachment, the code are the same. So this bug is verified in xen-3.0.3-102.el5.

Comment 20 errata-xmlrpc 2010-03-30 08:59:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0294.html

Comment 21 Paolo Bonzini 2010-04-08 15:48:12 UTC
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).


Note You need to log in before you can comment on or make changes to this bug.