Bug 695809

Summary: BUG() hit after patch set 582435
Product: Red Hat Enterprise Linux 5 Reporter: Ross Walker <rswwalker>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.6CC: florian, jmoyer, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-12 18:48:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ross Walker 2011-04-12 18:25:25 UTC
Description of problem:

iSCSI Enterprise Target user complains of hitting kernel BUG() in block/cfq-iosched.c:1883 in kernel 2.6.18-238.5.1.el5, but not with 5.5 kernels.

Version-Release number of selected component (if applicable):

Kernel 2.6.18-238.5.1.el5

How reproducible:

Very

Steps to Reproduce:
1. Start IET service
2. Perform IO to target
3. While IO is being performed, restart IET service
  
Actual results:

Kernel BUG at block/cfq-iosched.c:1883
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:05:00.1/irq
CPU 4
Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink
ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge be2iscsi
ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic
uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2
scsi_transport_iscsi2 scsi_transport_iscsi iscsi_trgt(U) ipmi_devintf
ipmi_si ipmi_msghandler bonding drbd(U) autofs4 ipv6 xfrm_nalgo
crypto_api video backlight sbs power_meter hwmon i2c_ec i2c_core
dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp
parport ksm(U) kvm_intel(U) kvm(U) sg ata_piix libata i7core_edac bnx2
edac_mc tpm_tis tpm tpm_bios serio_raw pcspkr dm_raid45 dm_message
dm_region_hash dm_mem_cache dm_round_robin dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua scsi_dh dm_snapshot dm_zero dm_mirror dm_log
dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
ehci_hcd
Pid: 17366, comm: istiod2 Tainted: G      2.6.18-238.5.1.el5 #1
RIP: 0010:[<ffffffff8001c850>]  [<ffffffff8001c850>]
cfq_set_request+0x12f/0x454
RSP: 0018:ffff810037b51880  EFLAGS: 00010246
RAX: ffff81012e317400 RBX: ffff810121e744a8 RCX: 0000000000000000
RDX: ffff81011b1fca90 RSI: ffff8101186ed728 RDI: ffff810121e74528
RBP: ffff81011b1fca60 R08: ffff81012e0a5c00 R09: 0000000000000000
R10: ffff810121e744a8 R11: 0000000000000080 R12: ffff810121e74508
R13: ffff810121e744a8 R14: ffff81012e317400 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8101043a4240(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000011d01890 CR3: 000000010d0d1000 CR4: 00000000000026e0
Process istiod2 (pid: 17366, threadinfo ffff810037b50000, task
ffff81012c5cf040)
Stack:  0000000000000010 000000102dc34cc0 ffff8100379ec5f8 ffff81012e31d928
 0000000040000001 ffff81012c5cf040 0000000000000000 000000001777a010
 6e016f0169017401 ffff81012e31d948 ffff81012e31d928 ffff8100379ec5f8
Call Trace:
 [<ffffffff800135dd>] get_request+0x220/0x2f2
 [<ffffffff80028d3b>] get_request_wait+0x2a/0x130
 [<ffffffff8000c1ea>] __make_request+0x406/0x4ca
 [<ffffffff8005c2de>] cache_alloc_refill+0x86/0x186
 [<ffffffff8001c414>] generic_make_request+0x211/0x228
 [<ffffffff880e03f2>] :dm_mod:__map_bio+0x4e/0x125
 [<ffffffff880e0f39>] :dm_mod:__split_bio+0x176/0x3b0
 [<ffffffff880e1919>] :dm_mod:dm_request+0x115/0x124
 [<ffffffff8001c414>] generic_make_request+0x211/0x228
 [<ffffffff8001ac96>] bio_alloc_bioset+0x89/0xd9
 [<ffffffff8847ebf0>] :drbd:drbd_make_request_common+0xcd8/0xd0c
 [<ffffffff8847f63c>] :drbd:drbd_make_request_26+0xa18/0xaa8
 [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8005c35e>] cache_alloc_refill+0x106/0x186
 [<ffffffff8001c414>] generic_make_request+0x211/0x228
 [<ffffffff800233cc>] mempool_alloc+0x31/0xe7
 [<ffffffff80033550>] submit_bio+0xe4/0xeb
 [<ffffffff885166ff>] :iscsi_trgt:blockio_make_request+0x1a6/0x213
 [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff88514731>] :iscsi_trgt:build_write_response+0x9d/0xd7
 [<ffffffff8850f3fc>] :iscsi_trgt:send_scsi_rsp+0xb/0x4a
 [<ffffffff88514458>] :iscsi_trgt:disk_execute_cmnd+0x103/0x25a
 [<ffffffff88511307>] :iscsi_trgt:worker_thread+0x136/0x2af
 [<ffffffff8008e40a>] default_wake_function+0x0/0xe
 [<ffffffff885111d1>] :iscsi_trgt:worker_thread+0x0/0x2af
 [<ffffffff80032af3>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800329f5>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 a2 36 2c 80 c2 5b 07 48 89 f0 48 8b 32 48 85 f6 75
RIP  [<ffffffff8001c850>] cfq_set_request+0x12f/0x454
 RSP <ffff810037b51880>
 <0>Kernel panic - not syncing: Fatal exception

Expected results:

No OOPS :-)

Additional info:

iSCSI Enterprise Target attempts to use shared IO contexts to avoid performance issues with CFQ in conjunction with cooperating IO streams coming from kernel space (much like nfsiod).

It creates/assigns the shared io context before the thread starts:

        if (info->wthread_ioc &&
            atomic_inc_not_zero(&info->wthread_ioc->refcount)) {
                put_io_context(current->io_context);
                current->io_context = info->wthread_ioc;
        } else
                info->wthread_ioc = get_io_context(GFP_KERNEL);

And removes the shared io context when the thread finishes:

        if (current->io_context) {
                struct io_context *ioc;

                task_lock(current);
                ioc = current->io_context;
                current->io_context = NULL;
                task_unlock(current);

                put_io_context(ioc);
        }

The BUG() only happens if there is active IO happening during the restart. IET will halt all IO on shutdown before it, tears down its threads, clears their iocs and closes their block devices, but some IO may still be occurring on a sub-layer of a layered block device (ie drbd).

This of course only happens with CFQ I have told my user base that the deadline scheduler is an adequate work-around until the problem can be solved.

-Ross

Comment 2 Ric Wheeler 2011-04-12 18:39:42 UTC
Hi Ross,

Have you seen this upstream? If so, which kernels?

Thanks!

Comment 3 Jeff Moyer 2011-04-12 18:48:01 UTC
First, drbd isn't supported.  Second, your module is modifying object lifetimes of the very data structure that is causing problems in that BUG().  So, the onus is on you to prove that you're module isn't the problem.  I'm going to close this as NOTABUG.  If you can somehow prove that this is a more generic problem, I'll be happy to revisit.

Comment 4 Florian Haas 2011-04-12 19:31:18 UTC
(In reply to comment #3)
> First, drbd isn't supported.

Please, allow me to clarify here (for people hitting this bug via a Bugzilla search). While DRBD isn't supported by Red Hat as part of RHEL, LINBIT is a Red Hat Ready ISV with an existing support procedure for mutual customers. So customers on both LINBIT and Red Hat support contracts will not be forced to forgo Red Hat support just because they are using DRBD.

More information is available on the Partner Locator page at https://www.redhat.com/wapps/partnerlocator/web/home.html#productId=60400

Comment 5 Ross Walker 2011-04-12 19:35:46 UTC
(In reply to comment #3)
> First, drbd isn't supported.  Second, your module is modifying object lifetimes
> of the very data structure that is causing problems in that BUG().  So, the
> onus is on you to prove that you're module isn't the problem.  I'm going to
> close this as NOTABUG.  If you can somehow prove that this is a more generic
> problem, I'll be happy to revisit.

As far as Linbit's DRBD not supported, they are an authorized partner:

https://www.redhat.com/wapps/partnerlocator/web/home.html#productId=60400

Besides that though, I will see if I can have the user reproduce the problem without drbd in the stack.

I don't get the BUG() on ANY OTHER kernel but yours and IET is run on ALL distributions as well as several embedded systems.

The BUG() didn't occur on RH kernels before the patchset and now it occurs.

Is there an outside authority I could bring in on this conversation to validate my claims?

-Ross

Comment 6 Ric Wheeler 2011-04-12 19:41:59 UTC
Just for clarity, Red Hat bugzilla is not meant to be a tool for escalation from the field or partners. Please do file issue via your normal support contact (linbit via the partner channel, those with subscriptions via RH support).

Comment 7 Ross Walker 2011-04-12 19:50:50 UTC
(In reply to comment #2)
> Hi Ross,
> Have you seen this upstream? If so, which kernels?
> Thanks!

I haven't seen this on any other upstream kernels and never saw it on any RH kernels prior to 2.6.18-238.

I "suspect" that removing the spin_locks around cfq_exit_lock in bug 582435 may have inadvertantly removed an implicit memory barrier as a side affect.

If Redhat is to come out and officially say that RHEL5 does NOT support shared io contexts of any kind then at least I will have that to go with and I can remove that code for RHEL5 and put a notice in IET that RHEL5 users must use 'deadline' to achieve any adequate performance.

-Ross

Comment 8 Jeff Moyer 2011-04-12 21:07:21 UTC
OK, I had a look back through the RHEL 5 sources (getting quite long in the tooth already).  You are right, io_context sharing is not supported by RHEL 5.  I had looked into implementing this support some time ago, but it was too invasive, and definitely broke the kABI.

Now, I wouldn't write off the performance of CFQ without testing.  There is support for merging cfqq's, so long as processes are interleaving sequential I/O.  Of course, for hardware other than SATA disks, I would recommend deadline, anyway.

I'll change the state of this bug to WONTFIX, to hopefully at least give you a moral victory.  ;-)  Sorry for the inconvenience.

Cheers!

Comment 9 Ross Walker 2011-04-12 21:30:21 UTC
(In reply to comment #8)
> OK, I had a look back through the RHEL 5 sources (getting quite long in the
> tooth already).  You are right, io_context sharing is not supported by RHEL 5. 
> I had looked into implementing this support some time ago, but it was too
> invasive, and definitely broke the kABI.
> Now, I wouldn't write off the performance of CFQ without testing.  There is
> support for merging cfqq's, so long as processes are interleaving sequential
> I/O.  Of course, for hardware other than SATA disks, I would recommend
> deadline, anyway.
> I'll change the state of this bug to WONTFIX, to hopefully at least give you a
> moral victory.  ;-)  Sorry for the inconvenience.
> Cheers!

Ok, I just looked back in bugzilla and saw bug #427709 which supposedly fixes the performance issues with CFQ and cooperating threads/processes.

I'll retract the ioc sharing for kernels < 2.6.24 in the code, and put a caveat in the readmes saying that those running RHEL5 kernels < 5.5 should consider 'deadline' scheduler over CFQ due to bug #427709.

And thanks, I'll take any small victory.

-Ross