Description of problem: I was running block level I/O to snapshot volumes (on aoe devices) and noticed the following: Apr 3 15:37:03 hayes-03 qarshd[25811]: Running cmdline: lvs --noheadings -o lv_attr snapper/origin Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268 PGD 21a54c067 PUD 1eb7a1067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:02:00.0/irq CPU 3 Modules linked in: aoe autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlid Pid: 0, comm: swapper Not tainted 2.6.18-85.el5 #1 RIP: 0010:[<ffffffff884f61e7>] [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268 RSP: 0018:ffff8101239f7d00 EFLAGS: 00010046 RAX: ffff81009b4d0df0 RBX: ffff8101079d2df0 RCX: 0000000000000002 RDX: 0000000000000000 RSI: ffff81009b4d0e00 RDI: 0000000000000000 RBP: ffff8101083d5c00 R08: ffff81021faad02a R09: ffffffffffffffff R10: ffff81011fc4c038 R11: 0000000000000000 R12: 0000000000000000 R13: ffff81011f9f6800 R14: ffff8101083d5bc0 R15: 0000000000000400 FS: 00002aaaaaab98e0(0000) GS:ffff8101239cf8c0(0000) knlGS:00000000f7e5aac0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000002158a4000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81011fc2c000, task ffff8101239d8820) Stack: ffffffff804c7bc0 ffff81021c0366c0 ffff81011f9f6800 ffff81021faad012 0000000000000001 ffffffff804c7bc0 ffff8101079d2df0 ffffffff884f6924 ffff81012384afa0 ffff81011f9f6f18 ffff8101083d5bc0 0000000000000286 Call Trace: <IRQ> [<ffffffff884f6924>] :aoe:aoecmd_ata_rsp+0x49f/0x4e7 [<ffffffff8008b743>] rebalance_tick+0x183/0x3cc [<ffffffff80142d27>] __next_cpu+0x19/0x28 [<ffffffff884f7149>] :aoe:aoenet_rcv+0x117/0x156 [<ffffffff8002015c>] netif_receive_skb+0x330/0x3ae [<ffffffff882be19a>] :tg3:tg3_poll+0x6ed/0x92f [<ffffffff8000c4c1>] net_rx_action+0xa4/0x1a4 [<ffffffff882b8b02>] :tg3:tg3_interrupt_tagged+0xa2/0xb2 [<ffffffff80011e47>] __do_softirq+0x5e/0xd6 [<ffffffff8007810b>] end_level_ioapic_vector+0x9/0x16 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006c55e>] do_softirq+0x2c/0x85 [<ffffffff8006c3e6>] do_IRQ+0xec/0xf5 [<ffffffff8006ad28>] default_idle+0x0/0x50 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8006ad51>] default_idle+0x29/0x50 [<ffffffff80048a90>] cpu_idle+0x95/0xb8 [<ffffffff80076613>] start_secondary+0x45a/0x469 Code: 48 8b 0a 48 c1 e9 33 48 89 c8 48 c1 e8 09 48 8b 04 c5 00 27 RIP [<ffffffff884f61e7>] :aoe:aoecmd_work+0x1b7/0x268 RSP <ffff8101239f7d00> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception NMI Watchdog detected LOCKUP on CPU 2 CPU 2 Modules linked in: aoe autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlid Pid: 0, comm: swapper Not tainted 2.6.18-85.el5 #1 RIP: 0010:[<ffffffff80064b60>] [<ffffffff80064b60>] .text.lock.spinlock+0xe/0x30 RSP: 0018:ffff810123993e58 EFLAGS: 00000086 RAX: 0000000000000212 RBX: ffff810123968000 RCX: ffff81011f9f6ee8 RDX: 00000000000000c8 RSI: 10c1080c78dbebc0 RDI: ffff81011f9f6f18 RBP: ffff81011f9f6800 R08: 000000000c4cea80 R09: 000000000000003f R10: ffff81011fc4c008 R11: 0000000000000246 R12: ffffffff884f5bc9 R13: 0000000000000002 R14: ffff81011f9f6f18 R15: 000000000000012c FS: 00002aaaaaab8dc0(0000) GS:ffff810103f99440(0000) knlGS:00000000f7e566c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006bdc5c CR3: 0000000109cd2000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81011fc02000, task ffff8101238dc7a0) Stack: ffffffff884f5bfa 0000000000000000 0000000000000000 000000000720c517 0000000000000000 ffff81000100dfa0 ffff810123993ee8 0000000000000000 0000000000000000 7ffffffffffffffe ffff810123993f48 ffffffff8008b743 Call Trace: <IRQ> [<ffffffff884f5bfa>] :aoe:rexmit_timer+0x31/0x21f [<ffffffff8008b743>] rebalance_tick+0x183/0x3cc [<ffffffff884f5bc9>] :aoe:rexmit_timer+0x0/0x21f [<ffffffff80095183>] run_timer_softirq+0x133/0x1af [<ffffffff80011e47>] __do_softirq+0x5e/0xd6 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006c55e>] do_softirq+0x2c/0x85 [<ffffffff8006ad28>] default_idle+0x0/0x50 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff8006ad51>] default_idle+0x29/0x50 [<ffffffff80048a90>] cpu_idle+0x95/0xb8 [<ffffffff80076613>] start_secondary+0x45a/0x469 Code: 83 3f 00 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe Kernel panic - not syncing: nmi watchdog Version-Release number of selected component (if applicable): 2.6.18-85.el5 lvm2-2.02.32-3.el5 How reproducible: Only once so far
This is reproducable.
I hit this while running single machine lvm mirror block level I/O.
This smells like a regression and potentially a pretty big issue if we support aoe in rhel5.2.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
I've tried this with lvm stripes as well and that also causes the panic. So this is somehow related to multiple device aoe lvm volumes. I/O to snap, mirrors, and stripes causes the panics, but linears are fine. Here is the I/O I was running: b_iogen -o -m random -f direct -i 500 -s write,writev -t1000b -T10000b -d /dev/hayes/lvol0 | b_doio -vD
I've also attempted this with a stripe, mirror, and a snapshot on the system but as long as I only wrote to the linear, I was fine, as soon as I wrote to the snap/mirror/stripe, panic.
Just a note that this is still occuring on 2.6.18-92.el5.
[This bugzilla post follows a similar direct email.] I would like to help resolve the issue that you are seeing when using AoE and LVM striping/snapshotting together. Can you please check whether the same problem is present with the aoe6-62 driver from the Coraid website? http://www.coraid.com/support/linux/ I understand that RHEL might prefer to use the aoe driver in the 2.6.18 kernel, but knowing whether the current aoe driver exhibits the same behavior will help me to identify any bug. Could you please provide me with commands that I can run to reproduce the panics you are seeing? I see one listed above, but it looks like recently you were able to cause the problem to manifest more easily and consistently. I would like to know something about the kind of AoE target you have. Tom Coughlan mentions that it is a Coraid AoE storage box. Could you please provide the "sos" output from that box? I would like to know something about the kind of AoE target you have. Tom Coughlan mentions that it is a Coraid AoE storage box. Could you please provide the "sos" output from that box? You can email it to me, since it is more than a screenful. If you have questions about using CEC or the serial console, you can email support. Information from the AoE initiator would complete the picture. Can you please send the file resulting from a run of the "sos-linux" script? It is available at the following URL. http://www.coraid.com/support/linux/sos-linux I can try to replicate the problem in our lab here, but if you are willing to test out patches on your system, I could send them to you in order first to diagnose and then to fix the problem. I appreciate the work you have done in characterizing this problem. I am also glad that Tom Coughlan brought this issue to my attention. Thank you both.
I have verified that this issue is fixed with the latest aoe driver (v6.2).
OK. If the aoe6-62 driver doesn't have this panic, will RHEL use the aoe6-62 driver, or should I attempt to find and backport the fix? We do regularly push updates upstream to kernel.org, but I am running behind on the latest push. In other words, I know that if RHEL puts aoe6-62 in RHEL now, the upstream will catch up, but I cannot say when.
That is a good question. Tom, how do we get the aoe6-62 driver into RHEL5 asap?
(In reply to comment #11) > OK. If the aoe6-62 driver doesn't have this panic, will > RHEL use the aoe6-62 driver, or should I attempt to find > and backport the fix? The highest priority at this stage in RHEL 5 is to avoid regressions. So, ideally, you would find and backport the specific fix. We have some lattitude here, though. If you can make a convincing case that the risk of a larger update is low and the benefit is large, we can look at it. I would not be in favor of shipping a version of the driver in RHEL before it has gotten some significant review and testing upstream, and Fedora. Please take a look at the diff between 5.2 and recent driver versions. If you can isolate the fix, that would be great. If not, suggest a driver version that has had some upstream exposure, and the smallest amount of change that is likely to have the fix. Then maybe Corey can test that and see if it has the fix.
The diff between the aoe driver in 2.6.18, which is aoe6-22, and the one that is aoe6-62, is huge. Besides bug fixes, there have been many new features added. To identify and backport the fix, I would need to be able to replicate the problem or to work very closely with Corey Marthaler. For replicating the problem, I just need to know the software versions involved and the most simple commands that trigger a panic. For working with C.M., * I would provide patches to C.M.'s kernel sources, * C.M. would apply the patches and build a modified aoe driver, * C.M. would install modified aoe driver, and run the commands, * C.M. would send me kernel messages, e.g., from netconsole, * I would evaluate the gathered information, ... and then we'd repeat with the next round of patches. If this loop can iterate quickly, it should not take very long to identify and backport the fix.
The commands that I ran for this can be boiled down pretty easily. 1. Create one of the following with your aoe devices (an lvm snapshot/mirror/stripe). For a snapshot: # pvcreate /dev/etherd/e1p[123] # vgcreate vg /dev/etherd/e1p[123] # lvcreate -L 4G -n origin vg # lvcreate -s vg/origin -L 1G -n snap 2. Run some kind of block level I/O to that snap volume (dev/vg/snap). I used our tool b_iogen/b_doio, but I assume a dd would work as well 3. That's it, you should have triggered that panic.
Can you please confirm that the command below can trigger a panic? dd if=/dev/vg/snap of=/dev/null bs=1M count=1000
Also, could you please email me the file that results when you run this sos-linux script, http://www.coraid.com/support/linux/sos-linux ? I would like to have more specific information about your system in case I have trouble reproducing your problem. My initial attempts in a VMware instance running a RHEL clone are not causing a panic. I'm working on getting RHEL set up for testing.
Also, does the panic only occur when you have created partitions on the aoe device(s)? What kind of partition table are you using---fdisk or GPT?
We've narrowed this down to direct I/O. I can't reproduce this using dd (even with the iflag=direct). However, here is a brain dead program that only does a direct read. It causes the panic every time. Also, our aoe device had been partitioned using gpt labels
Created attachment 310391 [details] program to repo this panic on v22
Created attachment 310392 [details] Here is the output file you requested
Created attachment 311070 [details] aoe: use bio->bi_idx to access biovecs The attached patch causes the aoe driver to use the bio's bi_idx field when accessing the biovecs. The test case from Corey Marthaler panics consistently without this patch, but the change in the patch eliminates the panic. This patch was created using the standalone aoe driver, (also version aoe6-22) from the Coraid website. To use it with the standalone driver requires that the second argument to skb_linearize be removed as it is in the RHEL 5.2 kernel sources. With a "-p2" level, the patch is expected to apply cleanly to the RHEL kernel sources. Just in case the Mac I'm using does something strange to the patch, I've made it available here, as well: http://noserose.net/e/temp/aoe6-22-22i.diff
I should have asked: Please try out the patch, "aoe: use bio->bi_idx to access biovecs", and let me know how it works for you as soon as you can. I understand there's a RHEL deadline coming up at the end of this month, when I expect to be quite busy.
(In reply to comment #23) > I understand there's a RHEL deadline coming up > at the end of this month, when I expect to be quite > busy. Thanks for isolating the patch. The RHEL 5.2 deadline was quite a while ago, and we are just beginning development on 5.3, so we have a while. The BZ was marked urgent because it was thought to be a regression in 5.2. I'm not sure that is true, since the driver did not change in 5.2. I'll request Corey test this by setting NEEDINFO. (The BZ should not be in the VERIFIED state, anyway, because the patch is not in RHEL 5 yet.) I'll also ask Chip to handle this from here. :) Tom
Thank you. Yes, I thought it was odd that it was being called a regression, since there were no new changes. Please let me know if I can be of further assistance.
After once again reproducing this bz on 2.6.18-92.el5, I was unable to reproduce it on the newly built kern with the fix in it (2.6.18-105.el5.bz440506).
(In reply to comment #22) > Created an attachment (id=311070) [details] > aoe: use bio->bi_idx to access biovecs Ed, Why did you do it this way - buf->bv = buf->bio->bi_io_vec; + buf->bv = buf->bio->bi_io_vec + buf->bio->bi_idx; rather than they way it is done upstream - buf->bv = buf->bio->bi_io_vec; + buf->bv = &bio->bi_io_vec[bio->bi_idx]; ? Tom
I think I just saw what needed to be done, did it, tested it, and only later noticed that I had used a different idiom in the past, but the two versions are identical.
in kernel-2.6.18-109.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html