490148 – Xen domU, RAID1, LVM, iscsi target export with blockio bug

Bug 490148 - Xen domU, RAID1, LVM, iscsi target export with blockio bug

Summary: Xen domU, RAID1, LVM, iscsi target export with blockio bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	4.8
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	460693
Blocks:	458302
TreeView+	depends on / blocked

Reported:	2009-03-13 15:22 UTC by Chris Lalancette
Modified:	2011-02-16 15:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	460693
Environment:
Last Closed:	2011-02-16 15:58:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0263	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update	2011-02-16 15:14:55 UTC

Description Chris Lalancette 2009-03-13 15:22:51 UTC

+++ This bug was initially created as a clone of Bug #460693 +++

Description of problem:

My goal was to make iSCSI export of parts (logical volumes) of software RAID1 device created inside domU. 
RAID components are basically two dom0 logical volumes, pushed as block devices to domU. RAID1 device, /dev/md0, is created inside domU; then, PV, VG and LVs are created inside /dev/md0. Different logical volumes from /dev/md0 are then exported through iSCSI target software, with "blockio" mode. 

After starting iscsi target software, connecting to targets from other computer was successful, but creating filesystem brings up bug in domU blkfront.c, same as writing larger amount of data (~128MB) to the target with dd. In case that there is file system already on the target, mounting FS is successful, but trying to write large amount of data to it with 

	dd if=/dev/zero of=dummy-file-1 bs=1024 count=$[1024*512]
	
brings up the same bug. 

In case that iscsi target software is using "fileio" mode, everything is going just fine.
Exporting whole /dev/md0 as iscsi target also works great.

Same thing happens with both iSCSI enterprise target (IET) and scst-iscsi.




Version-Release number of selected component (if applicable):
(CentOS 5.2)
kernel-xen-2.6.18-92.1.6.el5
xen-3.0.3-64.el5_2.1

iscsitarget-0.4.16

or

scst-1.0.0-2.6.18.92.1.6.el5xen, iscsi-scst-1.0.0



How reproducible:
Easy.



Steps to Reproduce:
1. make 2 LVs in dom0 and push them to domU
2. inside domU, make RAID1 /dev/md0 consisting of these two devices
3. create logical volumes in /dev/md0
4. export logical volumes as separate iSCSI targets, with "blockio" mode
5. connect to iscsi target(s) from other computer
6. try to write large amount of data to iscsi target(s) - either mkfs, dd


  
Actual results:
Bug shows up in domU that is running iSCSI target software, and domU reboots:


------------[ cut here ]------------
kernel BUG at drivers/xen/blkfront/blkfront.c:567!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /block/ram0/dev
Modules linked in: iscsi_scst(FU) scst_disk(U) scst_vdisk(U) scst(U) iscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) scsi_mod lock_dlm gfs2(U) dlm configfs ipv6 xfrm_nalgo crypto_api dm_multipath raid1 parport_pc lp parport pcspkr xenblk xennet dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0061:[<da0b8704>]    Tainted: GF     VLI
EFLAGS: 00010046   (2.6.18-92.1.6.el5xen #1)
EIP is at do_blkif_request+0x182/0x37b [xenblk]
eax: 0000000c   ebx: c0dd17e0   ecx: 00000008   edx: 0000000b
esi: 00000000   edi: 0000bc26   ebp: d8f97628   esp: c0c52dec
ds: 007b   es: 007b   ss: 0069
Process md0_raid1 (pid: 2344, ti=c0c52000 task=d4fbc000 task.ti=c0c52000)
Stack: d8f6e468 c08ec000 d8f6abe4 00000003 c08ec000 00000001 00000177 00000000
       d261fec4 c0dd17e0 00000008 00000000 0000000b ffffffff d8f6e468 d8f6e468
       00000000 00000060 c04d5418 d8f6abe4 c04d7530 00000000 00001000 c0660000
Call Trace:
 [<c04d5418>] __generic_unplug_device+0x1d/0x1f
 [<c04d7530>] __make_request+0x31d/0x36a
 [<c04d4824>] generic_make_request+0x248/0x258
 [<c059f59e>] bitmap_unplug+0x135/0x14c
 [<c0429828>] del_timer+0x41/0x47
 [<da0bfa0b>] raid1d+0xec/0xc44 [raid1]
 [<c0607bac>] schedule+0x718/0x7cd
 [<c0607be0>] schedule+0x74c/0x7cd
 [<c0609350>] _spin_lock_irqsave+0x8/0x28
 [<c042914a>] lock_timer_base+0x15/0x2f
 [<c04291a8>] try_to_del_timer_sync+0x44/0x4a
 [<c04291b8>] del_timer_sync+0xa/0x14
 [<c060832e>] schedule_timeout+0x78/0x8c
 [<c0609350>] _spin_lock_irqsave+0x8/0x28
 [<c059c2c1>] md_thread+0xdf/0xf5
 [<c043190f>] autoremove_wake_function+0x0/0x2d
 [<c059c1e2>] md_thread+0x0/0xf5
 [<c043184d>] kthread+0xc0/0xeb
 [<c043178d>] kthread+0x0/0xeb
 [<c0403005>] kernel_thread_helper+0x5/0xb
 =======================
Code: 0f b7 5b 1a 6b c3 0c 89 5c 24 2c 89 44 24 20 8b 52 30 c7 44 24 30 00 00 00 00 01 d0




Expected results:
No bug :)




Additional info:
domU disk configuration is:
	disk = [ 'phy:/dev/system/root.container1,sda1,w',
	         'phy:/dev/system/swap.container1,sda2,w',
	         'phy:/dev/containers/test1,sdc,w',
	         'phy:/dev/containers/test2,sdd,w' ]



/etc/scst.conf file:

[HANDLER vdisk]
DEVICE SOMETARGET,/dev/somevg/somelv,BLOCKIO,512

#[ASSIGNMENT Default]
DEVICE SOMETARGET,0





/etc/iscsi-scst.conf file:

Target iqn.2008-06.net.panline:sometarget
        # Users, who can access this target. The same rules as for discovery
        # users apply here.
        # Leave them alone if you don't want to use authentication.
        #IncomingUser joe secret
        #OutgoingUser jim 12charpasswd
        # Alias name for this target
        # Alias Test
        # various iSCSI parameters
        # (not all are used right now, see also iSCSI spec for details)
        #MaxConnections         1
        InitialR2T              No
        ImmediateData           Yes
        MaxRecvDataSegmentLength 1048576
        MaxXmitDataSegmentLength 1048576
        MaxBurstLength          1048576
        FirstBurstLength        1048576
        #DefaultTime2Wait       2
        #DefaultTime2Retain     20
        #MaxOutstandingR2T      20
        #DataPDUInOrder         Yes
        #DataSequenceInOrder    Yes
        #ErrorRecoveryLevel     0
        #HeaderDigest           CRC32C,None
        #DataDigest             CRC32C,None
        # various target parameters
        #QueuedCommands         32

--- Additional comment from tao on 2009-01-06 18:31:39 EDT ---

In IT234267, the customer is experiencing occasional crashes while
installing a DomU.  All of the crashes go through the following code
path:

 [<c04d51b8>] __generic_unplug_device+0x1d/0x1f
 [<c04d5f0d>] generic_unplug_device+0x15/0x25
 [<ed1ef2b4>] unplug_slaves+0x4f/0x83 [raid1]
 [<ed1ef300>] raid1_unplug+0xe/0x1a [raid1]
 [<ed247840>] dm_table_unplug_all+0x22/0x2e [dm_mod]
 [<ed245c79>] dm_unplug_all+0x17/0x21 [dm_mod]
 [<c04d7373>] blk_backing_dev_unplug+0x56/0x5d
 [<c044e5c4>] sync_page+0x0/0x3b
 [<c046e748>] block_sync_page+0x31/0x32
 [<c044e5f7>] sync_page+0x33/0x3b
 [<c060811e>] __wait_on_bit_lock+0x2a/0x52
 [<c044e537>] __lock_page+0x52/0x59
 [<c043192c>] wake_bit_function+0x0/0x3c
 [<c0450f4b>] filemap_nopage+0x22e/0x313

I feel the underlying cause in IT234267 is the same as experienced in this
BZ.


Bill


Issue escalated to RHEL 5 Kernel by: bbraswel.
Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by bbraswel 
 issue 234267

--- Additional comment from clalance on 2009-02-04 16:49:59 EDT ---

FYI; this problem *may* be solved by the upstream patch posted here:

http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00117.html

Chris Lalancette

--- Additional comment from clalance on 2009-02-05 08:35:19 EDT ---

I've done a quick port of that upstream change to the RHEL-5 kernel, and did a quick test here.  Could someone who can reproduce the error (I wasn't able to) download the kernel at:

http://new-people.redhat.com/clalance/bz460693

And see if it fixes the issue for them?

Chris Lalancette

--- Additional comment from nenad on 2009-02-05 08:56:42 EDT ---

I will test kernel later today or tomorrow in the morning.

--- Additional comment from nenad on 2009-02-06 09:20:25 EDT ---

Ups, I was not able to reproduce the error, too. It looks like that something in my test configuration has been changed in last 6 months. I will try several other tests, but I'm not sure that this would lead to anything particulary useful. :(

--- Additional comment from clalance on 2009-02-06 09:29:57 EDT ---

Ah, OK.  Thanks for trying; I appreciate the effort.  If you *do* get some result, please be sure to report it here.

In the meantime, there were a couple of other people who had reported problems in this area, so I'm hoping one of them can reproduce the error and try this test patch out.

Thanks again!
Chris Lalancette

--- Additional comment from clalance on 2009-02-12 08:59:33 EDT ---

For anyone else (hint, hint) who was having problems with this bug, I've folded this patch into the main virttest kernels, since the patch referenced in Comment #2 is headed upstream.  You can get that kernel at:

http://new-people.redhat.com/clalance/virttest

Please give it a test to ensure we get it into the next RHEL release!

Chris Lalancette

--- Additional comment from cchen on 2009-02-20 17:43:45 EDT ---

I've tripped up this same bug in 5.2--same like in blkfront.c when the kernel panics. This happens mostly when doing a kickstart. I'll be giving 5.3 a test pretty soon.

Unfortunately, because I'm seeing this in the kickstart, I need the right kickstart initrd's to get it going--I've tried rolling the new modules into the existing initrd.img I have for 5.2 but there's a problem.

--- Additional comment from clalance on 2009-02-21 05:13:18 EDT ---

OK.  Well, my guess is that 5.3 won't change the issue for you; we didn't do anything in 5.3 to address this.  There is a patch in the virttest kernels that may address this problem, although I haven't been able to confirm it since I can't reproduce the issue at all.  Do you happen to have a reproduction scenario so I can try to reproduce?

Chris Lalancette

--- Additional comment from clalance on 2009-02-21 05:15:10 EDT ---

Oh, I should also mention that the kernels have now moved to:

http://people.redhat.com/clalance/virttest

Chris Lalancette

--- Additional comment from clalance on 2009-03-01 13:05:18 EDT ---

Created an attachment (id=333658)
Backport of upstream Linux 9e973e64ac6dc504e6447d52193d4fff1a670156

The current patch that we are carrying in the virttest kernels.  It still needs verification that it fixes the problem.

--- Additional comment from cchen on 2009-03-11 13:22:00 EDT ---

My problem is I haven't had the time to make a working initrd for kickstart from these test kernels. I have them running in my Xen DomU guests (already installed a running md raid1), and they're just fine.

Comment 2 Chris Chen 2009-07-21 17:18:23 UTC

I'm a little confused. Isn't this a RHEL 5 problem? Why is it assigned to 4.8? I haven't noticed this behavior on my 4.7 DomUs...

Comment 3 Bill Burns 2009-07-21 17:45:43 UTC

This is a clone of the rhel 5 issue as there is a fix needed for the RHEL 4 guest kernel.

Comment 4 Chris Lalancette 2009-07-27 14:46:23 UTC

(In reply to comment #2)
> I'm a little confused. Isn't this a RHEL 5 problem? Why is it assigned to 4.8?
> I haven't noticed this behavior on my 4.7 DomUs...  

Right, as Bill mentions, this bug is for RHEL-4, but we cloned it out of a RHEL-5 bug.  Since this is a guest-side issue, it's theoretically possible for a RHEL-4 guest to run into it.

Chris Lalancette

Comment 5 RHEL Program Management 2010-10-12 17:11:28 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Vivek Goyal 2010-10-14 14:39:47 UTC

Committed in 89.43.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 10 errata-xmlrpc 2011-02-16 15:58:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.