Red Hat Bugzilla – Bug 460693
Xen domU, RAID1, LVM, iscsi target export with blockio bug
Last modified: 2010-12-22 04:04:51 EST
Description of problem:
My goal was to make iSCSI export of parts (logical volumes) of software RAID1 device created inside domU.
RAID components are basically two dom0 logical volumes, pushed as block devices to domU. RAID1 device, /dev/md0, is created inside domU; then, PV, VG and LVs are created inside /dev/md0. Different logical volumes from /dev/md0 are then exported through iSCSI target software, with "blockio" mode.
After starting iscsi target software, connecting to targets from other computer was successful, but creating filesystem brings up bug in domU blkfront.c, same as writing larger amount of data (~128MB) to the target with dd. In case that there is file system already on the target, mounting FS is successful, but trying to write large amount of data to it with
dd if=/dev/zero of=dummy-file-1 bs=1024 count=$[1024*512]
brings up the same bug.
In case that iscsi target software is using "fileio" mode, everything is going just fine.
Exporting whole /dev/md0 as iscsi target also works great.
Same thing happens with both iSCSI enterprise target (IET) and scst-iscsi.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. make 2 LVs in dom0 and push them to domU
2. inside domU, make RAID1 /dev/md0 consisting of these two devices
3. create logical volumes in /dev/md0
4. export logical volumes as separate iSCSI targets, with "blockio" mode
5. connect to iscsi target(s) from other computer
6. try to write large amount of data to iscsi target(s) - either mkfs, dd
Bug shows up in domU that is running iSCSI target software, and domU reboots:
------------[ cut here ]------------
kernel BUG at drivers/xen/blkfront/blkfront.c:567!
invalid opcode: 0000 [#1]
last sysfs file: /block/ram0/dev
Modules linked in: iscsi_scst(FU) scst_disk(U) scst_vdisk(U) scst(U) iscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) scsi_mod lock_dlm gfs2(U) dlm configfs ipv6 xfrm_nalgo crypto_api dm_multipath raid1 parport_pc lp parport pcspkr xenblk xennet dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
EIP: 0061:[<da0b8704>] Tainted: GF VLI
EFLAGS: 00010046 (2.6.18-92.1.6.el5xen #1)
EIP is at do_blkif_request+0x182/0x37b [xenblk]
eax: 0000000c ebx: c0dd17e0 ecx: 00000008 edx: 0000000b
esi: 00000000 edi: 0000bc26 ebp: d8f97628 esp: c0c52dec
ds: 007b es: 007b ss: 0069
Process md0_raid1 (pid: 2344, ti=c0c52000 task=d4fbc000 task.ti=c0c52000)
Stack: d8f6e468 c08ec000 d8f6abe4 00000003 c08ec000 00000001 00000177 00000000
d261fec4 c0dd17e0 00000008 00000000 0000000b ffffffff d8f6e468 d8f6e468
00000000 00000060 c04d5418 d8f6abe4 c04d7530 00000000 00001000 c0660000
[<da0bfa0b>] raid1d+0xec/0xc44 [raid1]
Code: 0f b7 5b 1a 6b c3 0c 89 5c 24 2c 89 44 24 20 8b 52 30 c7 44 24 30 00 00 00 00 01 d0
No bug :)
domU disk configuration is:
disk = [ 'phy:/dev/system/root.container1,sda1,w',
# Users, who can access this target. The same rules as for discovery
# users apply here.
# Leave them alone if you don't want to use authentication.
#IncomingUser joe secret
#OutgoingUser jim 12charpasswd
# Alias name for this target
# Alias Test
# various iSCSI parameters
# (not all are used right now, see also iSCSI spec for details)
# various target parameters
In IT234267, the customer is experiencing occasional crashes while
installing a DomU. All of the crashes go through the following code
[<ed1ef2b4>] unplug_slaves+0x4f/0x83 [raid1]
[<ed1ef300>] raid1_unplug+0xe/0x1a [raid1]
[<ed247840>] dm_table_unplug_all+0x22/0x2e [dm_mod]
[<ed245c79>] dm_unplug_all+0x17/0x21 [dm_mod]
I feel the underlying cause in IT234267 is the same as experienced in this
Issue escalated to RHEL 5 Kernel by: bbraswel.
Internal Status set to 'Waiting on Engineering'
This event sent from IssueTracker by bbraswel
FYI; this problem *may* be solved by the upstream patch posted here:
I've done a quick port of that upstream change to the RHEL-5 kernel, and did a quick test here. Could someone who can reproduce the error (I wasn't able to) download the kernel at:
And see if it fixes the issue for them?
I will test kernel later today or tomorrow in the morning.
Ups, I was not able to reproduce the error, too. It looks like that something in my test configuration has been changed in last 6 months. I will try several other tests, but I'm not sure that this would lead to anything particulary useful. :(
Ah, OK. Thanks for trying; I appreciate the effort. If you *do* get some result, please be sure to report it here.
In the meantime, there were a couple of other people who had reported problems in this area, so I'm hoping one of them can reproduce the error and try this test patch out.
For anyone else (hint, hint) who was having problems with this bug, I've folded this patch into the main virttest kernels, since the patch referenced in Comment #2 is headed upstream. You can get that kernel at:
Please give it a test to ensure we get it into the next RHEL release!
I've tripped up this same bug in 5.2--same like in blkfront.c when the kernel panics. This happens mostly when doing a kickstart. I'll be giving 5.3 a test pretty soon.
Unfortunately, because I'm seeing this in the kickstart, I need the right kickstart initrd's to get it going--I've tried rolling the new modules into the existing initrd.img I have for 5.2 but there's a problem.
OK. Well, my guess is that 5.3 won't change the issue for you; we didn't do anything in 5.3 to address this. There is a patch in the virttest kernels that may address this problem, although I haven't been able to confirm it since I can't reproduce the issue at all. Do you happen to have a reproduction scenario so I can try to reproduce?
Oh, I should also mention that the kernels have now moved to:
Created attachment 333658 [details]
Backport of upstream Linux 9e973e64ac6dc504e6447d52193d4fff1a670156
The current patch that we are carrying in the virttest kernels. It still needs verification that it fixes the problem.
My problem is I haven't had the time to make a working initrd for kickstart from these test kernels. I have them running in my Xen DomU guests (already installed a running md raid1), and they're just fine.
Hrm, more work, something new today:
Created a md raid 1 from two xvd devices, lvm and running bonnie++:
Then a panic!
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/xen/blkfront/blkfront.c:567
invalid opcode: 0000  SMP
last sysfs file: /block/ram0/dev
Modules linked in: nls_utf8 hfsplus i2c_dev i2c_core nfs lockd fscache nfs_acl sunrpc xennet ipv6 xfrm_nalgo crypto_api dm_multipath parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-92.1.22.el5xen #1
RIP: e030:[<ffffffff8808472b>] [<ffffffff8808472b>] :xenblk:do_blkif_request+0x181/0x384
RSP: e02b:ffffffff8062fdd8 EFLAGS: 00010046
RAX: 000000000000000b RBX: 0000000000000008 RCX: 0000000000000000
RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000f48
RBP: ffff88007fd42430 R08: ffff8800471b95f8 R09: 0000070000000335
R10: 0000070000000476 R11: 0000070000000410 R12: ffff88007fe9c000
R13: ffff8800497c2570 R14: ffff8800471b95f8 R15: ffff8800502b9540
FS: 00002aac547fce00(0000) GS:ffffffff805b0000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff805f0000, task ffffffff804d8b00)
Stack: ffff88007ff01928 ffff88007fe9c000 0000000250290ec0 00000000000001e9
000000000000000e 0000000800001000 0000000b00001000 ffffffff8022d658
<IRQ> [<ffffffff8022d658>] __end_that_request_first+0x1b2/0x4ff
<EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
> Hrm, more work, something new today:
> Created a md raid 1 from two xvd devices, lvm and running bonnie++:
> Then a panic!
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at drivers/xen/blkfront/blkfront.c:567
> invalid opcode: 0000  SMP
> last sysfs file: /block/ram0/dev
> CPU 0
> Modules linked in: nls_utf8 hfsplus i2c_dev i2c_core nfs lockd fscache nfs_acl
> sunrpc xennet ipv6 xfrm_nalgo crypto_api dm_multipath parport_pc lp parport
> pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk raid1 ext3 jbd uhci_hcd
> ohci_hcd ehci_hcd
> Pid: 0, comm: swapper Not tainted 2.6.18-92.1.22.el5xen #1
> RIP: e030:[<ffffffff8808472b>] [<ffffffff8808472b>]
Cool, this is the same panic. I tried setting up something similar to your test, and ran it overnight, but I didn't get a crash. Is this reproducible for you? If so, can you give the virttest kernels a whirl, and see if the issue goes away then?
(In reply to comment #14)
> > Hrm, more work, something new today:
> > Created a md raid 1 from two xvd devices, lvm and running bonnie++:
> > Then a panic!
> > ----------- [cut here ] --------- [please bite here ] ---------
> > Kernel BUG at drivers/xen/blkfront/blkfront.c:567
> > invalid opcode: 0000  SMP
> > last sysfs file: /block/ram0/dev
> > CPU 0
> > Modules linked in: nls_utf8 hfsplus i2c_dev i2c_core nfs lockd fscache nfs_acl
> > sunrpc xennet ipv6 xfrm_nalgo crypto_api dm_multipath parport_pc lp parport
> > pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk raid1 ext3 jbd uhci_hcd
> > ohci_hcd ehci_hcd
> > Pid: 0, comm: swapper Not tainted 2.6.18-92.1.22.el5xen #1
> > RIP: e030:[<ffffffff8808472b>] [<ffffffff8808472b>]
> > :xenblk:do_blkif_request+0x181/0x384
> Cool, this is the same panic. I tried setting up something similar to your
> test, and ran it overnight, but I didn't get a crash. Is this reproducible for
> you? If so, can you give the virttest kernels a whirl, and see if the issue
> goes away then?
> Chris Lalancette
I've rebooted with the virttest kernel 2.6.18-131.el5virttest9xen #1 SMP Fri
Feb 20 06:20:21 EST 2009 x86_64 x86_64 x86_64 GNU/Linux and the problem hasn't
You can download this test kernel from http://people.redhat.com/dzickus/el5
Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so. However feel free
to provide a comment indicating that this fix has been verified.
~~ Attention - RHEL 5.4 Beta Released! ~~
RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!
If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.
Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.
Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.