Bug 902595 - [multipath] kernel panic at IP: [<ffffffffa01a352c>] multipath_iterate_devices+0x3c/0xa0 [dm_multipath]
Summary: [multipath] kernel panic at IP: [<ffffffffa01a352c>] multipath_iterate_device...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks: 961662
TreeView+ depends on / blocked
 
Reported: 2013-01-22 03:48 UTC by Gris Ge
Modified: 2014-01-16 20:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-16 20:19:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gris Ge 2013-01-22 03:48:07 UTC
Description of problem:

When changing device handler from scsi_dh_alua to scsi_dh_emc for EMC VNX LUNs,
got kenrel panic:
===
<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
<1>IP: [<ffffffffa01a352c>] multipath_iterate_devices+0x3c/0xa0 [dm_multipath]
<4>PGD 2380a2067 PUD 2391e0067 PMD 0 
<4>Oops: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:07:00.0/host10/rport-10:0-4/target10:0:4/10:0:4:1/block/sdk/dev
<4>CPU 1 
<4>Modules linked in: bfa bridge bnx2fc cnic uio fcoe libfcoe 8021q libfc garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf dm_round_robin ipv6 dm_multipath bna igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci scsi_transport_fc scsi_tgt dm_mirror dm_region_hash dm_log dm_mod scsi_dh_alua scsi_dh_rdac scsi_dh_emc [last unloaded: bfa]
<4>
<4>Pid: 10877, comm: multipath Tainted: G        W  ---------------    2.6.32-355.el6.x86_64 #1 HP ProLiant DL160 G6  
<4>RIP: 0010:[<ffffffffa01a352c>]  [<ffffffffa01a352c>] multipath_iterate_devices+0x3c/0xa0 [dm_multipath]
<4>RSP: 0018:ffff88023808bc78  EFLAGS: 00010213
<4>RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000028
<4>RDX: 0000000000000000 RSI: ffffffffa001be40 RDI: ffffc900124df040
<4>RBP: ffff88023808bcb8 R08: ffff88023c402400 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900124df040
<4>R13: ffff88023808bcd4 R14: ffffffffa001be40 R15: ffff88023808bcd4
<4>FS:  00007f4195cba7a0(0000) GS:ffff88002f620000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000000038 CR3: 00000002393ea000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process multipath (pid: 10877, threadinfo ffff88023808a000, task ffff880237788ae0)
<4>Stack:
<4> ffff880239393c38 0000000000000000 ffff88023808bce8 0000000000000000
<4><d> ffff88023ae24600 ffff88023808bcd4 0000000000000002 ffffffffffffffea
<4><d> ffff88023808bcf8 ffffffffa001bec3 ffffc90012071040 0000000000000000
<4>Call Trace:
<4> [<ffffffffa001bec3>] dm_table_has_no_data_devices+0x63/0x90 [dm_mod]
<4> [<ffffffffa001a9b8>] dm_swap_table+0x58/0x2e0 [dm_mod]
<4> [<ffffffffa001c168>] ? dm_table_postsuspend_targets+0x18/0x20 [dm_mod]
<4> [<ffffffffa001a70c>] ? dm_suspend+0x3c/0x290 [dm_mod]
<4> [<ffffffff81063310>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa0020baf>] dev_suspend+0x12f/0x250 [dm_mod]
<4> [<ffffffffa00219d4>] ctl_ioctl+0x1b4/0x270 [dm_mod]
<4> [<ffffffffa0020a80>] ? dev_suspend+0x0/0x250 [dm_mod]
<4> [<ffffffffa0021aa3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
<4> [<ffffffff81194d42>] vfs_ioctl+0x22/0xa0
<4> [<ffffffff81194ee4>] do_vfs_ioctl+0x84/0x580
<4> [<ffffffff81195461>] sys_ioctl+0x81/0xa0
<4> [<ffffffff810dc565>] ? __audit_syscall_exit+0x265/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 0f 1f 44 00 00 48 8b 47 38 49 89 d7 49 89 fc 49 89 f6 48 8b 50 38 48 83 c0 38 48 89 45 c0 48 39 c2 48 89 55 c8 74 64 48 8b 45 c8 <48> 8b 58 38 49 89 c5 49 83 c5 38 4c 39 eb 75 0c eb 3a 66 90 48 
<1>RIP  [<ffffffffa01a352c>] multipath_iterate_devices+0x3c/0xa0 [dm_multipath]
<4> RSP <ffff88023808bc78>
<4>CR2: 0000000000000038
====

Will upload the kdump vmcore and dmesg in next comment.

Version-Release number of selected component (if applicable):
kernel -355.

How reproducible:
no sure.

Steps to Reproduce:
1. Disable LUNZ (commpath) on EMC VNX/CX and enable ALUA (failover mode 4)
2. Use this configurations for multipath:
====
devices {
        # Device attributes for EMC CLARiiON ALUA  (failover mode 4)
        device {
                vendor                  "DGC"
                product                 "*"
                path_grouping_policy    group_by_prio
                prio                    alua
                hardware_handler        "1 alua"
                #features               #"1 queue_if_no_path"
                path_checker            tur
                no_path_retry           queue
                fast_io_fail_tmo        8
                dev_loss_tmo            999
                failback                immediate
                product_blacklist       "LUNZ"
        }
}
====
3. Start mulitpathd.
4. Remove all mpath via command "multipath -F"
5. Change EMC VNX/CX by enable LUNZ.
6. Remove configure above by using build-in configure of EMC CX.
6. Execute command 'multipath -r'

Actual results:
kernel panic.

Expected results:
no kernel panic

Additional info:

Comment 2 Mike Snitzer 2013-01-28 20:56:18 UTC
I did a bit of crash analysis:

 #8 [ffff88023808bbc0] page_fault at ffffffff81510045
    [exception RIP: multipath_iterate_devices+60]
    RIP: ffffffffa01a352c  RSP: ffff88023808bc78  RFLAGS: 00010213
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000028
    RDX: 0000000000000000  RSI: ffffffffa001be40  RDI: ffffc900124df040
    RBP: ffff88023808bcb8   R8: ffff88023c402400   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffc900124df040
    R13: ffff88023808bcd4  R14: ffffffffa001be40  R15: ffff88023808bcd4
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

RDI is the dm_target structure passed to multipath_iterate_devices.

crash> dm_target ffffc900124df040
struct dm_target {
  features = 0, 
  table = 0xffff88023ae24600, 
  type = 0xffffffffa01a6820, 
  begin = 0, 
  len = 62914560, 
  split_io = 0, 
  num_flush_requests = 1, 
  num_discard_requests = 1, 
  private = 0xffff880239393c00, 
  error = 0xffffffffa00239d5 "Unknown error", 
  discards_supported = 0, 
  flush_supported = 0, 
  split_discard_requests = 0, 
  discard_zeroes_data_unsupported = 0
}

crash> multipath 0xffff880239393c00

doesn't yield memory that looks to be valid.

The ultimate NULL pointer is due to the pg->pgpaths dereference in multipath_iterate_devices:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

crash> struct priority_group -o
struct priority_group {
   [0x0] struct list_head list;
  [0x10] struct multipath *m;
  [0x18] struct path_selector ps;
  [0x28] unsigned int pg_num;
  [0x2c] unsigned int bypassed;
  [0x30] unsigned int nr_pgpaths;
  [0x38] struct list_head pgpaths;
}

But again, I don't think ti->private is a valid multipath structure.

static int multipath_iterate_devices(struct dm_target *ti,
                                     iterate_devices_callout_fn fn, void *data)
{
        struct multipath *m = ti->private;
        struct priority_group *pg;
        struct pgpath *p;
        int ret = 0;

        list_for_each_entry(pg, &m->priority_groups, list) {
                list_for_each_entry(p, &pg->pgpaths, list) {
                        ret = fn(ti, p->path.dev, ti->begin, ti->len, data);
                        if (ret)
                                goto out;
                }
        }

out:
        return ret;
}

Comment 3 Mike Snitzer 2013-01-28 22:03:35 UTC
Gris,

I don't have access to a system to try to reproduce this yet.  It would be ideal if we could get comprehensive multipathd logging when you perform this sequence.  In particular we need to see the libdevmapper logging that shows the DM table line that is being passed down from multipathd to the kernel.

Given that you've flushed all multipath tables (via multipath -F) the multipath -r should just trigger the equivalent of starting a new.  But it could be that multipathd is keeping some state for these devices.  So in addition to getting multipathd logging from the original sequence described in comment#0 I'd recommend killing multipathd and restarting the service (instead of multipathd -r).  If all works fine that at least tells us that multipathd's handling of this corner case is playing a role in this.

Comment 4 Ben Marzinski 2013-01-28 23:05:21 UTC
At the default log level, multipathd will log the table information after it loads it.  multipath will instead pretty print the topology. Unfortunately, neither of these happen until after the table load completes, which is after the
panic. I can make some debug packages that will print out the table before multipath tries a create or reload.

Comment 5 Mike Snitzer 2013-01-28 23:32:45 UTC
(In reply to comment #4)
> At the default log level, multipathd will log the table information after it
> loads it.  multipath will instead pretty print the topology. Unfortunately,
> neither of these happen until after the table load completes, which is after
> the
> panic. I can make some debug packages that will print out the table before
> multipath tries a create or reload.

OK, that would be helpful.

Comment 11 Gris Ge 2013-02-18 08:08:06 UTC
Issue reproduced with two scripts running at the same time: (It about take 2 hours to hit this race issue)

1. emc_vnx_fcoe_target_port_link_up_down.sh
Bring all targets ports down on each SP of EMC with random interval (100s - 300s).
====
for X in `seq 1 100`;do
    for SPX in SPA SPB;do
        for PORT in 0 1;do
            libsan_utils  -c link_down -a "emc_vnx_nay_${SPX}_FCoE${PORT}";
        done;
        sleep $(($RANDOM % 200 + 100));
        for PORT in 0 1;do
            libsan_utils  -c link_up -a "emc_vnx_nay_${SPX}_FCoE${PORT}";
        done;
        sleep $(($RANDOM % 200 + 100));
    done;
done
====

2. BZ_902595_alua_2_emc.sh switching multipath configure from ALUA to emc mode.
   1. Disable LUNZ, using alua (scsi_dh_alua) configure.
   2. Disable LUNZ, using EMC (scsi_dh_emc) configure. (incorrect config)
   3. Enable LUNZ, using EMC (scsi_dh_emc) configure. <===== This is when panic happen.
   4. Enable LUNZ, using ALUA (scsi_dh_alua) configure. 

Since BZ_902595_alua_2_emc.sh contain password of our storage array, it only available in internal URL: 
http://lacrosse.corp.redhat.com/~fge/tmp/BZ_902595/BZ_902595_alua_2_emc.sh

I will provide the debug log once debug multipath package rebuilded.


Will try "I'd recommend killing multipathd and restarting the service (instead of multipathd -r).  If all works fine that at least tells us that multipathd's handling of this corner case is playing a role in this." way and update later.

Comment 12 Gris Ge 2013-02-18 08:35:43 UTC
(In reply to comment #3)
I'd recommend killing multipathd and restarting the service
> (instead of multipathd -r).  If all works fine that at least tells us that
> multipathd's handling of this corner case is playing a role in this.

Mike Snitzer,

I got another kernel panic when change "multipath -r" to "service multipathd restart". It's Bug #912245.

The multipathd restart script is 
http://lacrosse.corp.redhat.com/~fge/tmp/BZ_902595/BZ_902595_alua_2_emc_multipathd_restart.sh

Thanks.

Comment 13 Mike Snitzer 2013-02-19 08:20:43 UTC
Do you have an earlier RHEL6.4 kernel that actually worked well with these various multipathd restart tests?

It would be useful to attempt to isolate when you started seeing these problems.  The nature of the failures are quite different each time (crash from comment#0 as compared to bug#912245).

Comment 14 Mike Snitzer 2013-02-19 08:27:58 UTC
Also, are all these crashes occurring on the same host?  If so, have any of these crashes been reproduced on a different host?

Comment 15 Gris Ge 2013-02-20 08:01:16 UTC
Triggered the similar crash (multipath_iterate_devices+0x3c/0xa0) on kernel -279 (RHEL 6.3 GA) using "service multipathd restart".
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
http://lacrosse.corp.redhat.com/~fge/tmp/BZ_902595/kernel-279/

When using "multipath -r" test script (BZ_902595_alua_2_emc.sh) on kernel -279, "multipath -r" will hang.

Yes. That's on the same host. I will try other host.

Let me know if you still need me to bisect on older kernel.

Comment 16 Gris Ge 2013-02-26 14:05:21 UTC
Mike Snitzer,

On different host (qla2xxx FC HBA), previous crash in this bug is found on (bfa FCoE HBA).
"multipath -r" will not panic the kernel (I run it for 10 hours with about 1000 times).
"service multipathd restart" will crash the kernel:
https://bugzilla.redhat.com/show_bug.cgi?id=912245#c4

Comment 19 Mike Snitzer 2014-01-16 20:19:25 UTC
The patches I worked on to address this BZ never got included upstream:
http://www.redhat.com/archives/dm-devel/2013-April/msg00039.html
http://www.redhat.com/archives/dm-devel/2013-April/msg00040.html

And I followed up with:
http://www.redhat.com/archives/dm-devel/2013-April/msg00126.html

The end of that last message stated:
"I'm now inclined to not care about this issue.  Take away is: don't
switch the device handler (attach the correct one from the start)."

That may not be a satisfying conclusion but with the scsi_dh attachment fixes/changes that went into RHEL6 users really shouldn't need to change the scsi_dh -- the correct scsi_dh should be attached from the beginning.


Note You need to log in before you can comment on or make changes to this bug.