Created attachment 356344 [details] Possible fixes salvaged from xen-qemu-aio-scsi.patch We backported AIO to IDE, SCSI and USB emulation for bug 465116. SCSI and USB turned out to be flawed, and got reverted. The flawed AIO code is mixed up with cleanups and bug fixes (upstream's fault, not the backporter's). The bug fixes should be salvaged. I attach an untested sub-patch of the reverted patch containing those parts that look like bug fixes to me. We should check the reverted patch carefully for more.
I analyzed the bugs in comment#1's attachment. I believe they are perfectly capable of corrupting data. Whether they actually do depends on how the guest OS driver uses the virtual chip. It's conceivable that no supported guest OS uses it in a way that leads to disaster. Practically impossible to know for sure. We could instrument ioemu to detect when a guest OS triggers these bug. Triggering doesn't necessarily imply data corruption. I doubt this is worth the effort. As far as I know, no customer filesystems on SCSI disks have imploded so far. This means we should walk, but not run towards a fix. Let's patch the bugs, have QA test SCSI thoroughly, then z-stream the fix, as soon as practical, but without undue haste.
QE want to verify this bug, but do not know how to verify this, could anyone give us some steps or some advice to verify this bug? Thanks.
I guess someone who actually knows something about the bugs is better for answering this question. Perhaps Rik or Markus?
The bug fixes are only visible depending on your firmware and on the OS drivers. Some of the changes probably are not touched in normal usage, since they would break quite horribly. For some others, Windows seems to be more prone to showing the bugs. You could try reproducing the scenario of bug 465116: format an emulated SCSI drive on a Windows 2008 or Vista virtual machine. (Note that even if it doesn't work, this would not be a regression).
I try to do this bug reference to the scenario of bug 465116, I try to formatting a disk on Vista 64 bit guest, and this will cause the guest go to non-response forever, please see the screenshot of the attachment and also please see my steps here: Version:xen-3.0.3-102.el5 Host:RHEL5.4-x86_64-xen Guest:Vista 64 bit Steps to Reproduce: 1. Install the guest. 2. Shutdown guest, add attached scsi disk(xml file as followed), xm create guest. <disk type="file" device="disk"> <source file="/var/lib/libvirt/images/a.img"/> <target dev="sdb" bus="scsi"/> </disk> 3. Within disk manager, format a FAT32 partition of 1GB size After the step 3, the guest will go to non response forever. But if I attach a new IDE disk to vista 64 and format a FAT32 partition of 1GB size, then it will get no panic. I also tried this with window 2008 64 bit and attach scsi disk, format a FAT32 partition of 1GB size, then the formatting program will go to non response forever. But if I attach IDE disk and format a FAT32 partition of 1GB size, it can format it successfully. I also test with windows 2008 32 bit and vista 32 bit, and formatting a scsi disk, it will get no error and can format it successfully. Paolo, please help me to see whether the problem of the bug 465116 is still exist? Thanks.
Created attachment 381719 [details] Formatting the scsi disk for vista64
So there is still a problem with SCSI disks (IDE disks are fine, while in bug 465116 neither configuration worked). However, this BZ was merely to reintroduce some bug fixes, but this didn't necessarily mean that this scenario would work (reading bug 465116 carefully, it's probably the other way round). I'll open a bug for SCSI, in the meanwhile I suppose you can look at the code in the srpm for sanity check. For example, you can look for this code in tools/ioemu/hw/lsi53c895a.c: case 3: /* XOR */ op0 ^= op1; break; which used to say "op0 |= op1" in RHEL5.4.
I check the code in xen-3.1.0-src/tools/ioemu/hw/lsi53c895a.c, I found the example like: case 3: /* XOR */ op0 |= op1; break; So if this is the expected result? I check the patch in this bug in attachment, seems like it should be case 3: /* XOR */ + op0 ^= op1; break; And also check the code and compared the code with the patch in this bug, most of these modified are not seem in the code, here is some example: uint32_t scratch[13]; (This seems in this bug should be uint32_t scratch[18]) There is no "int insn_processed = 0" (Seems in this bug should have) So seems like the code in srpm is not the same code compared with the patch in this bug in attachment.
Please jump the comment #14's result. I do this bug by following steps: (1) rpm -ivh xen-3.0.3-102.el5.src.rpm (2) rpmbuild -bp xen.spec (3) go to the directory in directory: /usr/src/redhat/BUILD/xen-3.1.0-src, and check the code in /tools/ioemu/hw/lsi53c895a.c, check the code and compared with the patch in this bug in attachment, the code are the same. So this bug is verified in xen-3.0.3-102.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0294.html
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).