440506 – panic in aoe:aoecmd_ata_rsp during direct I/O to lvm [snap,mirror,stripe]

Bug 440506 - panic in aoe:aoecmd_ata_rsp during direct I/O to lvm [snap,mirror,stripe]

Summary: panic in aoe:aoecmd_ata_rsp during direct I/O to lvm [snap,mirror,stripe]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	beta
Target Release:	---
Assignee:	Tom Coughlan
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	KernelPrio5.3
TreeView+	depends on / blocked

Reported:	2008-04-03 20:52 UTC by Corey Marthaler
Modified:	2009-01-20 19:48 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 19:48:45 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
program to repo this panic on v22 (863 bytes, application/octet-stream) 2008-06-26 22:10 UTC, Corey Marthaler	no flags	Details
Here is the output file you requested (9.33 KB, text/plain) 2008-06-26 22:22 UTC, Corey Marthaler	no flags	Details
aoe: use bio->bi_idx to access biovecs (1012 bytes, patch) 2008-07-05 15:48 UTC, Ed Cashin	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0225	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update	2009-01-20 16:06:24 UTC

Description Corey Marthaler 2008-04-03 20:52:31 UTC

Description of problem:

I was running block level I/O to snapshot volumes (on aoe devices) and noticed
the following:

Apr  3 15:37:03 hayes-03 qarshd[25811]: Running cmdline: lvs --noheadings -o
lv_attr snapper/origin
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268
PGD 21a54c067 PUD 1eb7a1067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:02:00.0/irq
CPU 3
Modules linked in: aoe autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6
xfrm_nalgo crypto_api dm_multipath video sbs backlid
Pid: 0, comm: swapper Not tainted 2.6.18-85.el5 #1
RIP: 0010:[<ffffffff884f61e7>]  [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268
RSP: 0018:ffff8101239f7d00  EFLAGS: 00010046
RAX: ffff81009b4d0df0 RBX: ffff8101079d2df0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: ffff81009b4d0e00 RDI: 0000000000000000
RBP: ffff8101083d5c00 R08: ffff81021faad02a R09: ffffffffffffffff
R10: ffff81011fc4c038 R11: 0000000000000000 R12: 0000000000000000
R13: ffff81011f9f6800 R14: ffff8101083d5bc0 R15: 0000000000000400
FS:  00002aaaaaab98e0(0000) GS:ffff8101239cf8c0(0000) knlGS:00000000f7e5aac0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000002158a4000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81011fc2c000, task ffff8101239d8820)
Stack:  ffffffff804c7bc0 ffff81021c0366c0 ffff81011f9f6800 ffff81021faad012
 0000000000000001 ffffffff804c7bc0 ffff8101079d2df0 ffffffff884f6924
 ffff81012384afa0 ffff81011f9f6f18 ffff8101083d5bc0 0000000000000286
Call Trace:
 <IRQ>  [<ffffffff884f6924>] :aoe:aoecmd_ata_rsp+0x49f/0x4e7
 [<ffffffff8008b743>] rebalance_tick+0x183/0x3cc
 [<ffffffff80142d27>] __next_cpu+0x19/0x28
 [<ffffffff884f7149>] :aoe:aoenet_rcv+0x117/0x156
 [<ffffffff8002015c>] netif_receive_skb+0x330/0x3ae
 [<ffffffff882be19a>] :tg3:tg3_poll+0x6ed/0x92f
 [<ffffffff8000c4c1>] net_rx_action+0xa4/0x1a4
 [<ffffffff882b8b02>] :tg3:tg3_interrupt_tagged+0xa2/0xb2
 [<ffffffff80011e47>] __do_softirq+0x5e/0xd6
 [<ffffffff8007810b>] end_level_ioapic_vector+0x9/0x16
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006c55e>] do_softirq+0x2c/0x85
 [<ffffffff8006c3e6>] do_IRQ+0xec/0xf5
 [<ffffffff8006ad28>] default_idle+0x0/0x50
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8006ad51>] default_idle+0x29/0x50
 [<ffffffff80048a90>] cpu_idle+0x95/0xb8
 [<ffffffff80076613>] start_secondary+0x45a/0x469


Code: 48 8b 0a 48 c1 e9 33 48 89 c8 48 c1 e8 09 48 8b 04 c5 00 27
RIP  [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268
 RSP <ffff8101239f7d00>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception
 NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: aoe autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6
xfrm_nalgo crypto_api dm_multipath video sbs backlid
Pid: 0, comm: swapper Not tainted 2.6.18-85.el5 #1
RIP: 0010:[<ffffffff80064b60>]  [<ffffffff80064b60>] .text.lock.spinlock+0xe/0x30
RSP: 0018:ffff810123993e58  EFLAGS: 00000086
RAX: 0000000000000212 RBX: ffff810123968000 RCX: ffff81011f9f6ee8
RDX: 00000000000000c8 RSI: 10c1080c78dbebc0 RDI: ffff81011f9f6f18
RBP: ffff81011f9f6800 R08: 000000000c4cea80 R09: 000000000000003f
R10: ffff81011fc4c008 R11: 0000000000000246 R12: ffffffff884f5bc9
R13: 0000000000000002 R14: ffff81011f9f6f18 R15: 000000000000012c
FS:  00002aaaaaab8dc0(0000) GS:ffff810103f99440(0000) knlGS:00000000f7e566c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bdc5c CR3: 0000000109cd2000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81011fc02000, task ffff8101238dc7a0)
Stack:  ffffffff884f5bfa 0000000000000000 0000000000000000 000000000720c517
 0000000000000000 ffff81000100dfa0 ffff810123993ee8 0000000000000000
 0000000000000000 7ffffffffffffffe ffff810123993f48 ffffffff8008b743
Call Trace:
 <IRQ>  [<ffffffff884f5bfa>] :aoe:rexmit_timer+0x31/0x21f
 [<ffffffff8008b743>] rebalance_tick+0x183/0x3cc
 [<ffffffff884f5bc9>] :aoe:rexmit_timer+0x0/0x21f
 [<ffffffff80095183>] run_timer_softirq+0x133/0x1af
 [<ffffffff80011e47>] __do_softirq+0x5e/0xd6
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006c55e>] do_softirq+0x2c/0x85
 [<ffffffff8006ad28>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006ad51>] default_idle+0x29/0x50
 [<ffffffff80048a90>] cpu_idle+0x95/0xb8
 [<ffffffff80076613>] start_secondary+0x45a/0x469


Code: 83 3f 00 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe
Kernel panic - not syncing: nmi watchdog


Version-Release number of selected component (if applicable):
2.6.18-85.el5
lvm2-2.02.32-3.el5

How reproducible:
Only once so far

Comment 1 Corey Marthaler 2008-04-03 21:37:50 UTC

This is reproducable.

Comment 2 Corey Marthaler 2008-04-04 18:39:37 UTC

I hit this while running single machine lvm mirror block level I/O.

Comment 3 Corey Marthaler 2008-04-07 20:19:50 UTC

This smells like a regression and potentially a pretty big issue if we support
aoe in rhel5.2.

Comment 4 RHEL Program Management 2008-04-07 20:23:27 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 5 Corey Marthaler 2008-04-07 21:24:08 UTC

I've tried this with lvm stripes as well and that also causes the panic. So this
is somehow related to multiple device aoe lvm volumes. I/O to snap, mirrors, and
stripes causes the panics, but linears are fine.

Here is the I/O I was running:
b_iogen -o -m random -f direct -i 500 -s write,writev -t1000b -T10000b -d
/dev/hayes/lvol0 | b_doio -vD

Comment 6 Corey Marthaler 2008-04-07 21:38:06 UTC

I've also attempted this with a stripe, mirror, and a snapshot on the system but
as long as I only wrote to the linear, I was fine, as soon as I wrote to the
snap/mirror/stripe, panic.

Comment 7 Corey Marthaler 2008-05-23 15:56:56 UTC

Just a note that this is still occuring on 2.6.18-92.el5.

Comment 9 Ed Cashin 2008-06-19 16:12:04 UTC

[This bugzilla post follows a similar direct email.]

I would like to help resolve the issue that you are seeing when using
AoE and LVM striping/snapshotting together.

Can you please check whether the same problem is present with the
aoe6-62 driver from the Coraid website?

  http://www.coraid.com/support/linux/

I understand that RHEL might prefer to use the aoe driver in the
2.6.18 kernel, but knowing whether the current aoe driver exhibits the
same behavior will help me to identify any bug.

Could you please provide me with commands that I can run to reproduce
the panics you are seeing?  I see one listed above, but it looks like
recently you were able to cause the problem to manifest more easily
and consistently.

I would like to know something about the kind of AoE target you have.
Tom Coughlan mentions that it is a Coraid AoE storage box.  Could you
please provide the "sos" output from that box?

I would like to know something about the kind of AoE target you have.
Tom Coughlan mentions that it is a Coraid AoE storage box.  Could you
please provide the "sos" output from that box?  You can email it to
me, since it is more than a screenful.  If you have questions about
using CEC or the serial console, you can email support.

Information from the AoE initiator would complete the picture.  Can
you please send the file resulting from a run of the "sos-linux"
script?  It is available at the following URL.

  http://www.coraid.com/support/linux/sos-linux

I can try to replicate the problem in our lab here, but if you are
willing to test out patches on your system, I could send them to you
in order first to diagnose and then to fix the problem.

I appreciate the work you have done in characterizing this problem.  I
am also glad that Tom Coughlan brought this issue to my attention.
Thank you both.

Comment 10 Corey Marthaler 2008-06-20 16:06:43 UTC

I have verified that this issue is fixed with the latest aoe driver (v6.2).

Comment 11 Ed Cashin 2008-06-20 17:12:49 UTC

OK.  If the aoe6-62 driver doesn't have this panic, will
RHEL use the aoe6-62 driver, or should I attempt to find
and backport the fix?

We do regularly push updates upstream to kernel.org, but
I am running behind on the latest push.  In other words,
I know that if RHEL puts aoe6-62 in RHEL now, the upstream
will catch up, but I cannot say when.

Comment 12 Corey Marthaler 2008-06-23 16:38:26 UTC

That is a good question. Tom, how do we get the aoe6-62 driver into RHEL5 asap?

Comment 13 Tom Coughlan 2008-06-23 19:19:10 UTC

(In reply to comment #11)
> OK.  If the aoe6-62 driver doesn't have this panic, will
> RHEL use the aoe6-62 driver, or should I attempt to find
> and backport the fix?

The highest priority at this stage in RHEL 5 is to avoid regressions. So,
ideally, you would find and backport the specific fix. 

We have some lattitude here, though.  If you can make a convincing case that the
risk of a larger update is low and the benefit is large, we can look at it. I
would not be in favor of shipping a version of the driver in RHEL before it has
gotten some significant review and testing upstream, and Fedora. 

Please take a look at the diff between 5.2 and recent driver versions. If you
can isolate the fix, that would be great. If not, suggest a driver version that
has had some upstream exposure, and the smallest amount of change that is likely
to have the fix. Then maybe Corey can test that and see if it has the fix.

Comment 14 Ed Cashin 2008-06-24 20:39:38 UTC

The diff between the aoe driver in 2.6.18, which is aoe6-22,
and the one that is aoe6-62, is huge.  Besides bug fixes, there
have been many new features added.

To identify and backport the fix, I would need to be able to
replicate the problem or to work very closely with Corey Marthaler.

For replicating the problem, I just need to know the software
versions involved and the most simple commands that trigger
a panic.

For working with C.M., 

  * I would provide patches to C.M.'s kernel sources,
  * C.M. would apply the patches and build a modified aoe driver,
  * C.M. would install modified aoe driver, and run the commands,
  * C.M. would send me kernel messages, e.g., from netconsole,
  * I would evaluate the gathered information,

... and then we'd repeat with the next round of patches.
If this loop can iterate quickly, it should not take very
long to identify and backport the fix.

Comment 15 Corey Marthaler 2008-06-24 21:06:39 UTC

The commands that I ran for this can be boiled down pretty easily.

1. Create one of the following with your aoe devices (an lvm
snapshot/mirror/stripe).

For a snapshot:
# pvcreate /dev/etherd/e1p[123]
# vgcreate vg /dev/etherd/e1p[123]
# lvcreate -L 4G -n origin vg
# lvcreate -s vg/origin -L 1G -n snap

2. Run some kind of block level I/O to that snap volume (dev/vg/snap).  I used
our tool b_iogen/b_doio, but I assume a dd would work as well

3. That's it, you should have triggered that panic.

Comment 16 Ed Cashin 2008-06-25 19:45:23 UTC

Can you please confirm that the command below can
trigger a panic?

  dd if=/dev/vg/snap of=/dev/null bs=1M count=1000

Comment 17 Ed Cashin 2008-06-25 19:57:16 UTC

Also, could you please email me the file that
results when you run this sos-linux script,

  http://www.coraid.com/support/linux/sos-linux

?  I would like to have more specific information
about your system in case I have trouble reproducing
your problem.  My initial attempts in a VMware
instance running a RHEL clone are not causing a
panic.  I'm working on getting RHEL set up for testing.

Comment 18 Ed Cashin 2008-06-25 20:14:34 UTC

Also, does the panic only occur when you have
created partitions on the aoe device(s)?  What
kind of partition table are you using---fdisk
or GPT?

Comment 19 Corey Marthaler 2008-06-26 22:06:15 UTC

We've narrowed this down to direct I/O. I can't reproduce this using dd (even
with the iflag=direct). However, here is a brain dead program that only does a
direct read. It causes the panic every time.

Also, our aoe device had been partitioned using gpt labels

Comment 20 Corey Marthaler 2008-06-26 22:10:22 UTC

Created attachment 310391 [details]
program to repo this panic on v22

Comment 21 Corey Marthaler 2008-06-26 22:22:49 UTC

Created attachment 310392 [details]
Here is the output file you requested

Comment 22 Ed Cashin 2008-07-05 15:48:55 UTC

Created attachment 311070 [details]
aoe: use bio->bi_idx to access biovecs

The attached patch causes the aoe driver to use the bio's
bi_idx field when accessing the biovecs.  The test case
from Corey Marthaler panics consistently without this patch,
but the change in the patch eliminates the panic.

This patch was created using the standalone aoe driver,
(also version aoe6-22) from the Coraid website.  To use it
with the standalone driver requires that the second argument
to skb_linearize be removed as it is in the RHEL 5.2 kernel
sources.

With a "-p2" level, the patch is expected to apply
cleanly to the RHEL kernel sources.

Just in case the Mac I'm using does something
strange to the patch, I've made it available here,
as well:

http://noserose.net/e/temp/aoe6-22-22i.diff

Comment 23 Ed Cashin 2008-07-05 17:16:10 UTC

I should have asked: Please try out the patch,
 "aoe: use bio->bi_idx to access biovecs", and let
me know how it works for you as soon as you can.
I understand there's a RHEL deadline coming up
at the end of this month, when I expect to be quite
busy.

Comment 24 Tom Coughlan 2008-07-11 21:44:30 UTC

(In reply to comment #23)

> I understand there's a RHEL deadline coming up
> at the end of this month, when I expect to be quite
> busy.

Thanks for isolating the patch. 

The RHEL 5.2 deadline was quite a while ago, and we are just beginning
development on 5.3, so we have a while. 

The BZ was marked urgent because it was thought to be a regression in 5.2. I'm
not sure that is true, since the driver did not change in 5.2. 

I'll request Corey test this by setting NEEDINFO. (The BZ should not be in the
VERIFIED state, anyway, because the patch is not in RHEL 5 yet.)

I'll also ask Chip to handle this from here. :)

Tom

Comment 25 Ed Cashin 2008-07-14 14:17:55 UTC

Thank you.  Yes, I thought it was odd that it was
being called a regression, since there were no new
changes.

Please let me know if I can be of further assistance.

Comment 26 Corey Marthaler 2008-08-27 16:01:09 UTC

After once again reproducing this bz on 2.6.18-92.el5, I was unable to reproduce it on the newly built kern with the fix in it (2.6.18-105.el5.bz440506).

Comment 27 Tom Coughlan 2008-08-28 19:04:29 UTC

(In reply to comment #22)
> Created an attachment (id=311070) [details]
> aoe: use bio->bi_idx to access biovecs

Ed, 

Why did you do it this way

-     buf->bv = buf->bio->bi_io_vec;
+     buf->bv = buf->bio->bi_io_vec + buf->bio->bi_idx;

rather than they way it is done upstream

-       buf->bv = buf->bio->bi_io_vec;
+       buf->bv = &bio->bi_io_vec[bio->bi_idx];

?

Tom

Comment 28 Ed Cashin 2008-08-28 19:15:48 UTC

I think I just saw what needed to be done, did it, tested it, and only later noticed that I had used a different idiom in the past, but the two versions are identical.

Comment 29 Don Zickus 2008-09-09 21:15:36 UTC

in kernel-2.6.18-109.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 35 errata-xmlrpc 2009-01-20 19:48:45 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.