Bug 481633 - Kernel Panics Observered During Multipath Storage Failover/Failback
Summary: Kernel Panics Observered During Multipath Storage Failover/Failback
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.3
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: 5.4
Assignee: Mike Christie
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 459808 480792 483784
TreeView+ depends on / blocked
 
Reported: 2009-01-26 20:17 UTC by joseph.r.gruher
Modified: 2010-01-12 02:46 UTC (History)
30 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:47:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kernel-2.6.18-120_INTEL_ALUA_MPIO_SUPPORT.el5.x86_64.rpm (16.42 MB, application/octet-stream)
2009-01-26 20:45 UTC, joseph.r.gruher
no flags Details
mpath_prio_intel-1.0.0140-3.x86_64.rpm (8.56 KB, application/octet-stream)
2009-01-26 20:48 UTC, joseph.r.gruher
no flags Details
zip file containing console logs of kernel panics (66.09 KB, application/x-zip-compressed)
2009-01-26 23:33 UTC, joseph.r.gruher
no flags Details
hacky patch to always throw in alua (617 bytes, patch)
2009-02-03 20:55 UTC, Mike Christie
no flags Details | Diff
clear request before using (345 bytes, patch)
2009-02-03 21:44 UTC, Mike Christie
no flags Details | Diff
new build with scsi_dh_alua's dmesg (45.36 KB, text/plain)
2009-02-10 18:20 UTC, ilgu hong
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1377 0 normal SHIPPED_LIVE device-mapper-multipath bug-fix and enhancement update 2009-09-01 12:41:23 UTC

Description joseph.r.gruher 2009-01-26 20:17:47 UTC
Description of problem:
Testing on Intel Modular Server with dual redundant storage controllers shows intermittent kernel panics during storage controller failover/failback testing.

How reproducible:
This reliably happens about 1 in 20 failover/failback cycles on RHEL5.3 RC2.

Steps to Reproduce:
1. Install RHEL5.3 RC2 x64 on Intel Modular Server with dual redundant storage controllers and configure multipath storage (see attached patches and BKM)
2. Perform storage controller failover/failback by alternately resetting each storage controller
3. Watch for kernel panic
  
Actual results:
Kernel panic experienced about 1 in 20 failover/failback cycles.

Expected results:
Kernel panic should never be caused by failover/failback.

More details on kernel panic will be posted momentarily.

Comment 1 joseph.r.gruher 2009-01-26 20:27:53 UTC
Note this bug BLOCKS support of RHEL5.3 on Intel Modular Server product line.

Comment 2 Mike Christie 2009-01-26 20:31:43 UTC
Is this the box using ALUA? Is it using the scsi_dh_alua module in this setup?

Comment 3 joseph.r.gruher 2009-01-26 20:40:40 UTC
Procedure for Configuring MPIO on RHEL 5.3

1.	Install RHEL 5.3, with ‘linux mpath’ typed immediately when CD is chosen for installation.  Make sure the ‘Virtualization’ is not selected when prompted for choosing the packages to load.
2.	Install the following rpm’s kernel-2.6.18-120_INTEL_ALUA_MPIO.el5.i686.rpm using rpm -ivh --force <rpm name as given in the link above> for RHEL 32 bit and 
kernel-2.6.18-120_INTEL_ALUA_MPIO_SUPPORT.el5.x86_64.rpm using rpm -ivh --force <rpm name as given in the link above> for RHEL 64 bit
3.	Reboot, and make sure the new kernel (installed above as .rpm) is selected for boot.
4.	Run ‘uname –a’ and check to make sure the kernel name is same as the RPM that was installed.
5.	Then install the mpath_prio_intel-1.0.0140-4.i386.rpm for 32 bit and mpath_prio_intel-1.0.0140-4.x86_64.rpm for 64 bit.
6.	Create multipath.conf under /etc/ and edit to have entries as mentioned below.
7.	Reboot and make sure the machine boots up fine.
8.	Run ‘multipath –ll’ to see the devices configured as multipath devices.


Description of /etc/multipath.conf

defaults {
        user_friendly_names yes
}

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^(hd|xvd)[a-z][[0-9]*]"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}

devices {
        device {
                vendor                  "Intel"
                product			    "Multi-Flex"
                path_grouping_policy    "group_by_prio"
                getuid_callout	    "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout		    "/sbin/mpath_prio_intel /dev/%n"
                path_checker            tur
                path_selector           "round-robin 0"
                hardware_handler        "1 alua"
                failback                immediate
                rr_weight               uniform
                rr_min_io               100
                no_path_retry           queue
                features                "1 queue_if_no_path"
                }
}

Comment 4 joseph.r.gruher 2009-01-26 20:45:50 UTC
Created attachment 330026 [details]
kernel-2.6.18-120_INTEL_ALUA_MPIO_SUPPORT.el5.x86_64.rpm

Comment 5 joseph.r.gruher 2009-01-26 20:48:23 UTC
Created attachment 330027 [details]
mpath_prio_intel-1.0.0140-3.x86_64.rpm

Comment 6 joseph.r.gruher 2009-01-26 23:30:39 UTC
Hi Mike, in response to your question, yes, this is with scsi_dh_alua.

Please also note the kernel panic failures are new to us in RHEL5.3 RC2.  Earlier testing (such as against beta and snapshot2) did not show the kernel panic problems.

Comment 7 joseph.r.gruher 2009-01-26 23:33:26 UTC
Created attachment 330042 [details]
zip file containing console logs of kernel panics

Comment 8 Mike Christie 2009-01-27 17:52:41 UTC
(In reply to comment #6)
> Hi Mike, in response to your question, yes, this is with scsi_dh_alua.
> 
> Please also note the kernel panic failures are new to us in RHEL5.3 RC2. 
> Earlier testing (such as against beta and snapshot2) did not show the kernel
> panic problems.


There were several bugs in scsi_dh_alua so we ended up dropping it from the final release. In some email or one of the other BZs about this or the call, we told you that we were shooting for RHEL 5.4 with this. We were only going to tech preview this for 5.3, but due to its instability we could not do that.

What is up with this BZ vs the others that you guys made? Didn't you guys make a bugzilla or feature request shooting for 5.4 support for scsi_dh_alua?

Comment 9 Mike Christie 2009-01-27 18:06:53 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Hi Mike, in response to your question, yes, this is with scsi_dh_alua.
> > 
> > Please also note the kernel panic failures are new to us in RHEL5.3 RC2. 
> > Earlier testing (such as against beta and snapshot2) did not show the kernel
> > panic problems.
> 
> 
> There were several bugs in scsi_dh_alua so we ended up dropping it from the
> final release. In some email or one of the other BZs about this or the call, we
> told you that we were shooting for RHEL 5.4 with this. We were only going to
> tech preview this for 5.3, but due to its instability we could not do that.
> 
> What is up with this BZ vs the others that you guys made? Didn't you guys make
> a bugzilla or feature request shooting for 5.4 support for scsi_dh_alua?

I am little lost on what is going on, because it seemed like you are using RHEL 5.3 GA in some other comments. Are you guys distributing the scsi_dh_alua module yourselves, because we dropped i, and so this bugzilla is just asking for help with the version you guys are distributing? If so could you attach it here.

Also did you guys send any of these boxes to Red Hat ever?

Comment 10 Mike Christie 2009-01-27 18:26:38 UTC
Did you guys want to do a real quick call so we can sync up on all this?

Comment 11 Mike Christie 2009-01-27 18:55:14 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Hi Mike, in response to your question, yes, this is with scsi_dh_alua.
> > 
> > Please also note the kernel panic failures are new to us in RHEL5.3 RC2. 
> > Earlier testing (such as against beta and snapshot2) did not show the kernel
> > panic problems.
> 
> 
> There were several bugs in scsi_dh_alua so we ended up dropping it from the
> final release. In some email or one of the other BZs about this or the call, we
> told you that we were shooting for RHEL 5.4 with this. We were only going to
> tech preview this for 5.3, but due to its instability we could not do that.
> 

Oh yeah one clarification so peoples heads do not explode :)

We only did not ship our own scsi_dh_alua module.

We did ship the underlying scsi_dh infrastructure so you can load in your own scsi_dh_alua module. For 5.3 you guys had to ship your own scsi_dh_alua module (you had to do this because we never got patches/request/bz/whatever-you-call-it to add your boxes to the default device table that attached the module to the device) and this will hook into our scsi_dh code fine.

Comment 12 ilgu hong 2009-02-03 18:51:37 UTC
Dear, Mike.

We only used patch for scsi_dh_alua which shiped with kernel source rpm of yours.

So, we need your latest scsi_dh_alua code to make a patch.

I checked you RHEL 5.3 GA kernel source rpm.

If there is no change in scsi_dh_alua, we can use scsi_dh_alua by disabling  Linux-2.6-scsi-remove-scsi_dh_alua.path (line number 3302 (patch23461)) in kernel.spec file.

Can you confirm that we can get latest scsi_dh_alua code with upper procedure ?

thanks.

Comment 13 Mike Christie 2009-02-03 19:02:17 UTC
(In reply to comment #12)
> Dear, Mike.
> 
> We only used patch for scsi_dh_alua which shiped with kernel source rpm of
> yours.
> 
> So, we need your latest scsi_dh_alua code to make a patch.
> 

I am still working on this. I said Mon or Tues, so give me a couple extra hours :)

I was also waiting to hear back from upstream about the patches I sent. I will just send you what I have assuming it will be accepted.

For some of the other issues of using the new code:

I am still trying to find info on how to load it for the installer. Are you guys familiar with making driver disks? Is this what you guys were doing before?


And for the boot from the device it is crazy. It was one of the issues that prevented it from being a solid solution (it should be fixed for 5.4). To do this now you have to make a initramfs manually. Are you familar with this operation or do you need instructions for that too?

Comment 14 ilgu hong 2009-02-03 19:27:36 UTC
Dear, Mike.

Previous, we modified initramfs. but I want to get information from you. It will be more helpful to me. 
thanks.

Comment 15 Mike Christie 2009-02-03 20:31:39 UTC
Here is some info about distributing your own driver:
http://dup.et.redhat.com/
You do not have to use it. You can build it however you want.

If you are going to use the disks attached to the box for partitions used during install you will want to build a driver disk
http://dup.et.redhat.com/ddiskit/
(see the README and INSTALL in the tarball for instructions).

When you boot the OS install disk then you pass it the driver disk argument. I think you do
linux dd
When the command prompt comes up initially (do a help if that does not work). The installer will ask you for a driver disk during the install.

If you have root on a disk accessed through your box or you are going to do multipath root with paths on the box, then you will have to make a initramfs/initrd for the boot. The problem is that mkinitrd did not get change to be able to handle this for the scsi_dh modules in 5.3. I would basically take the mkinitrd script from 5.3 and hack in some code to just stick in your scsi_dh module. And then just distribute the modified mkinitrd with the other stuff. I will attach an example in the next comment.

Comment 16 Mike Christie 2009-02-03 20:55:23 UTC
Created attachment 330779 [details]
hacky patch to always throw in alua

Here is a really hacky patch to always do multipath and always throw in scsi_dh_alua.

I have not scripting skills. You guys can modify this better than me, but this gives you an idea of what you need to do.

Comment 17 ilgu hong 2009-02-03 21:32:47 UTC
Dear, Mike.

Thanks for your information.

Comment 18 Mike Christie 2009-02-03 21:44:47 UTC
Created attachment 330784 [details]
clear request before using

Could you try this patch with the scsi_dh_alua module from here:

http://people.redhat.com/mchristi/scsi_dh/rhel5.4/testing/0001-Add-scsi_dh_alua.patch

This patch is just the code that was reverted in that patch you referenced with your fixes integrated.

Comment 19 Mike Christie 2009-02-03 21:51:45 UTC
Oh yeah, could you also send the oops?

Comment 20 ilgu hong 2009-02-10 18:20:43 UTC
Created attachment 331447 [details]
new build with scsi_dh_alua's dmesg

This is dmesg for system RHEL 5.3 GA with scsi_dh_alus.
It shows that it fail to configure multipath for the system device.
Please, check mike.

Thanks.

Comment 21 ilgu hong 2009-02-10 18:22:45 UTC
Hi, Mike.

I build kernel with new patch, and try to configure multipath.
There is some problem to configure  boot device as multipath device.
Can you help me?


This is a build step.
1.	Patch mkinitrd with patch (always-throw-in-alua.patch) which you give to me.
2.	Download kernel source rpm.
3.	Copy patchs (0001-Add-scsi_dh_alua.patch, clear-everything.patch) to source directory.
4.	Change kernel.spec file to add this patch.
5.	Add CONFIG_SCSI_DH_ALUA in .config file in source directory.
6.	Build with rpmbuild.


In test machine, I allocate 3 logical driver  - one for system (OS install), two for data

Configure multipath.conf.
Multipath tools only show only two data logical drives are configured as multipath device.



<multipath –ll result>
multipath -ll
 mpath2 (222c6000155b1e629) dm-3 Intel,Multi-Flex
[size=5.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 0:0:0:2 sdc 8:32  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:1:2 sdf 8:80  [active][ready]
mpath1 (2221a000155fa44d4) dm-2 Intel,Multi-Flex
[size=5.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
 \_ 0:0:1:1 sde 8:64  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:0:1 sdb 8:16  [active][ready]

<multipath –v4 result>
multipath -v4
dm-0: blacklisted
dm-1: blacklisted
dm-2: blacklisted
dm-3: blacklisted
md0: blacklisted
ram0: blacklisted
ram10: blacklisted
ram11: blacklisted
ram12: blacklisted
ram13: blacklisted
ram14: blacklisted
ram15: blacklisted
ram1: blacklisted
ram2: blacklisted
ram3: blacklisted
ram4: blacklisted
ram5: blacklisted
ram6: blacklisted
ram7: blacklisted
ram8: blacklisted
ram9: blacklisted
sda: not found in pathvec
sda: mask = 0x1f
sda: bus = 1
sda: dev_t = 8:0
sda: size = 283115520
sda: vendor = Intel
sda: product = Multi-Flex
sda: rev = 0302
sda: h:b:t:l = 0:0:0:0
sda: serial = 4C202020000000000000000025E811AA84D6FFEB
sda: path checker = tur (controller setting)
sda: state = 2
sda: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sda: prio = 50
sda: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sda: uid = 2222b0001552b7635 (callout)
sdb: not found in pathvec
sdb: mask = 0x1f
sdb: bus = 1
sdb: dev_t = 8:16
sdb: size = 10485760
sdb: vendor = Intel
sdb: product = Multi-Flex
sdb: rev = 0302
sdb: h:b:t:l = 0:0:0:1
sdb: serial = 4C202020000000000000000073564A1C82E53AA8
sdb: path checker = tur (controller setting)
sdb: state = 2
sdb: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sdb: prio = 1
sdb: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sdb: uid = 2221a000155fa44d4 (callout)
sdc: not found in pathvec
sdc: mask = 0x1f
sdc: bus = 1
sdc: dev_t = 8:32
sdc: size = 10485760
sdc: vendor = Intel
sdc: product = Multi-Flex
sdc: rev = 0302
sdc: h:b:t:l = 0:0:0:2
sdc: serial = 4C2020200000000000000000537C97820BE19F77
sdc: path checker = tur (controller setting)
sdc: state = 2
sdc: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sdc: prio = 50
sdc: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sdc: uid = 222c6000155b1e629 (callout)
sdd: not found in pathvec
sdd: mask = 0x1f
sdd: bus = 1
sdd: dev_t = 8:48
sdd: size = 283115520
sdd: vendor = Intel
sdd: product = Multi-Flex
sdd: rev = 0302
sdd: h:b:t:l = 0:0:1:0
sdd: serial = 4C202020000000000000000025E811AA84D6FFEB
sdd: path checker = tur (controller setting)
sdd: state = 2
sdd: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sdd: prio = 1
sdd: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sdd: uid = 2222b0001552b7635 (callout)
sde: not found in pathvec
sde: mask = 0x1f
sde: bus = 1
sde: dev_t = 8:64
sde: size = 10485760
sde: vendor = Intel
sde: product = Multi-Flex
sde: rev = 0302
sde: h:b:t:l = 0:0:1:1
sde: serial = 4C202020000000000000000073564A1C82E53AA8
sde: path checker = tur (controller setting)
sde: state = 2
sde: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sde: prio = 50
sde: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sde: uid = 2221a000155fa44d4 (callout)
sdf: not found in pathvec
sdf: mask = 0x1f
sdf: bus = 1
sdf: dev_t = 8:80
sdf: size = 10485760
sdf: vendor = Intel
sdf: product = Multi-Flex
sdf: rev = 0302
sdf: h:b:t:l = 0:0:1:2
sdf: serial = 4C2020200000000000000000537C97820BE19F77
sdf: path checker = tur (controller setting)
sdf: state = 2
sdf: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sdf: prio = 1
sdf: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
sdf: uid = 222c6000155b1e629 (callout)
sr0: blacklisted
===== paths list =====
uuid              hcil    dev dev_t pri dm_st  chk_st  vend/prod/rev   
2222b0001552b7635 0:0:0:0 sda 8:0   50  [undef][ready] Intel,Multi-Flex
2221a000155fa44d4 0:0:0:1 sdb 8:16  1   [undef][ready] Intel,Multi-Flex
222c6000155b1e629 0:0:0:2 sdc 8:32  50  [undef][ready] Intel,Multi-Flex
2222b0001552b7635 0:0:1:0 sdd 8:48  1   [undef][ready] Intel,Multi-Flex
2221a000155fa44d4 0:0:1:1 sde 8:64  50  [undef][ready] Intel,Multi-Flex
222c6000155b1e629 0:0:1:2 sdf 8:80  1   [undef][ready] Intel,Multi-Flex
params = 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:32 100 round-robin 0 1 1 8:80 100 
status = 2 0 1 0 2 1 A 0 1 0 8:32 A 0 E 0 1 0 8:80 A 0 
*word = 1, len = 1
*word = queue_if_no_path, len = 16
*word = 1, len = 1
*word = alua, len = 4
*word = 2, len = 1
*word = 1, len = 1
*word = round-robin, len = 11
*word = 0, len = 1
*word = 1, len = 1
*word = 1, len = 1
*word = 8:32, len = 4
*word = 100, len = 3
*word = 1, len = 1
*word = 1, len = 1
*word = 8:80, len = 4
*word = 100, len = 3
*word = 2, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = 2, len = 1
*word = A, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
*word = E, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
params = 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:16 100 
status = 2 0 1 0 2 1 A 0 1 0 8:64 A 0 E 0 1 0 8:16 A 0 
*word = 1, len = 1
*word = queue_if_no_path, len = 16
*word = 1, len = 1
*word = alua, len = 4
*word = 2, len = 1
*word = 1, len = 1
*word = round-robin, len = 11
*word = 0, len = 1
*word = 1, len = 1
*word = 1, len = 1
*word = 8:64, len = 4
*word = 100, len = 3
*word = 1, len = 1
*word = 1, len = 1
*word = 8:16, len = 4
*word = 100, len = 3
*word = 2, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = 2, len = 1
*word = A, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
*word = E, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
Found matching wwid [2222b0001552b7635] in bindings file.
Setting alias to mpath0
sda: ownership set to mpath0
sda: not found in pathvec
sda: mask = 0xc
sda: state = 2
sda: prio = 50
sdd: ownership set to mpath0
sdd: not found in pathvec
sdd: mask = 0xc
sdd: state = 2
sdd: prio = 1
mpath0: pgfailback = -2 (controller setting)
mpath0: pgpolicy = group_by_prio (controller setting)
mpath0: selector = round-robin 0 (controller setting)
mpath0: features = 1 queue_if_no_path (controller setting)
mpath0: hwhandler = 1 alua (controller setting)
mpath0: rr_weight = 1 (internal default)
mpath0: minio = 100 (controller setting)
mpath0: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath0: set ACT_CREATE (map does not exist)
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed: Invalid argument
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed: Invalid argument
mpath0: domap (0) failure for create/reload map
mpath0: remove multipath map
sda: orphaned
sdd: orphaned
Found matching wwid [2221a000155fa44d4] in bindings file.
Setting alias to mpath1
sdb: ownership set to mpath1
sdb: not found in pathvec
sdb: mask = 0xc
sdb: state = 2
sdb: prio = 1
sde: ownership set to mpath1
sde: not found in pathvec
sde: mask = 0xc
sde: state = 2
sde: prio = 50
mpath1: pgfailback = -2 (controller setting)
mpath1: pgpolicy = group_by_prio (controller setting)
mpath1: selector = round-robin 0 (controller setting)
mpath1: features = 1 queue_if_no_path (controller setting)
mpath1: hwhandler = 1 alua (controller setting)
mpath1: rr_weight = 1 (internal default)
mpath1: minio = 100 (controller setting)
mpath1: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath1: set ACT_NOTHING (map unchanged)
Found matching wwid [222c6000155b1e629] in bindings file.
Setting alias to mpath2
sdc: ownership set to mpath2
sdc: not found in pathvec
sdc: mask = 0xc
sdc: state = 2
sdc: prio = 50
sdf: ownership set to mpath2
sdf: not found in pathvec
sdf: mask = 0xc
sdf: state = 2
sdf: prio = 1
mpath2: pgfailback = -2 (controller setting)
mpath2: pgpolicy = group_by_prio (controller setting)
mpath2: selector = round-robin 0 (controller setting)
mpath2: features = 1 queue_if_no_path (controller setting)
mpath2: hwhandler = 1 alua (controller setting)
mpath2: rr_weight = 1 (internal default)
mpath2: minio = 100 (controller setting)
mpath2: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath2: set ACT_NOTHING (map unchanged)
Found matching wwid [2222b0001552b7635] in bindings file.
Setting alias to mpath0
sda: ownership set to mpath0
sda: not found in pathvec
sda: mask = 0xc
sda: path checker = tur (controller setting)
sda: state = 2
sda: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sda: prio = 50
sdd: ownership set to mpath0
sdd: not found in pathvec
sdd: mask = 0xc
sdd: path checker = tur (controller setting)
sdd: state = 2
sdd: getprio = /sbin/mpath_prio_intel /dev/%n (controller setting)
sdd: prio = 1
mpath0: pgfailback = -2 (controller setting)
mpath0: pgpolicy = group_by_prio (controller setting)
mpath0: selector = round-robin 0 (controller setting)
mpath0: features = 1 queue_if_no_path (controller setting)
mpath0: hwhandler = 1 alua (controller setting)
mpath0: rr_weight = 1 (internal default)
mpath0: minio = 100 (controller setting)
mpath0: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath0: set ACT_CREATE (map does not exist)
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed: Invalid argument
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed: Invalid argument
mpath0: domap (0) failure for create/reload map
mpath0: remove multipath map
sda: orphaned
sdd: orphaned

<dmsetup table result>
dmsetup table
mpath2: 0 10485760 multipath 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:32 100 round-robin 0 1 1 8:80 100 
mpath1: 0 10485760 multipath 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:16 100 
VolGroup00-LogVol01: 0 8257536 linear 8:2 274596224
VolGroup00-LogVol00: 0 274595840 linear 8:2 384

<dmsetup -v table result>
Name:              mpath2
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      1
Major, minor:      253, 3
Number of targets: 1
UUID: mpath-222c6000155b1e629

0 10485760 multipath 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:32 100 round-robin 0 1 1 8:80 100 

Name:              mpath1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      1
Major, minor:      253, 2
Number of targets: 1
UUID: mpath-2221a000155fa44d4

0 10485760 multipath 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:16 100 

Name:              VolGroup00-LogVol01
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-Z0fLniuSVFhdqMPj243e85zZw585dT3959TFjr7VnGjLfVoISYPRH5b117Kyy7kf

0 8257536 linear 8:2 274596224

Name:              VolGroup00-LogVol00
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-Z0fLniuSVFhdqMPj243e85zZw585dT399OY2xjUgaioyvlCTG92NampWSZcRimaK

0 274595840 linear 8:2 384


I also attach dmesg log before, you can find that system logical drive filed in configure multipath.

Thanks.

Comment 22 Mike Christie 2009-02-10 18:52:51 UTC
Is mpath0 the problem here?

I saw this:

mpath0: set ACT_CREATE (map does not exist)
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
Invalid argument
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
Invalid argument
mpath0: domap (0) failure for create/reload map
mpath0: remove multipath map


I am not familiar with multipath tools. I am not sure if that is cause.


Where was this log taken? Was it for a normal boot up? Was it during the initramfs part of the boot up? Was it from anaconda startup?

Did the simple case work, where you boot from a local drive, then start up multipath when the system is booted up?

Comment 23 ilgu hong 2009-02-10 22:01:09 UTC
Hi, Mike
Below message is result of "multipath -v4".
>>
mpath0: set ACT_CREATE (map does not exist)
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
Invalid argument
libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
Invalid argument
mpath0: domap (0) failure for create/reload map
mpath0: remove multipath map

We do following step.

1. building new kernel witch include scsi_dh_alua modules.
 a. please, refer upper comment #21.
2. boot from new image.
3. configure multipath.conf 
 a. please, refer upper comment #3.
4. install mpath_prio_intel which attached in the upper list.
5. run multipath -v4.


I already attached boot message in comment #20. 
And we use new initramfs(which was create step 1) when boot up.

Comment 24 joseph.r.gruher 2009-02-17 22:13:52 UTC
Note: This BZ corresponds to IT 267309.

Comment 25 joseph.r.gruher 2009-02-18 00:51:04 UTC
Any response from RH to comment #23?  No update in the past week.

Comment 26 ilgu hong 2009-02-18 01:30:44 UTC
Hi, Joe.

Currently, we did not get any response from RH. And Pradeep is working on setup for remote access.

Comment 27 Mike Christie 2009-02-18 18:19:25 UTC
(In reply to comment #23)
> Hi, Mike
> Below message is result of "multipath -v4".
> >>
> mpath0: set ACT_CREATE (map does not exist)
> libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
> Invalid argument
> libdevmapper: ioctl/libdm-iface.c(1634): device-mapper: reload ioctl failed:
> Invalid argument
> mpath0: domap (0) failure for create/reload map
> mpath0: remove multipath map
> 

I was asking if you had traced this to being the reason for the device not being added.

Ben, do you know?

Maybe promise/intel guys, you could dig a little deeper and give the multipath-tools guys more info to help them out.

Comment 28 Bryn M. Reeves 2009-02-18 18:25:52 UTC
You need the corresponding log from dmesg. That should show why the mpath constructor is returning -EINVAL.

Comment 29 Mike Christie 2009-02-18 18:30:48 UTC
I think it is garbbled up in one of the logs then attached, but there is this:

device-mapper: multipath: Using scsi_dh module scsi_dh_alua for failover/failback and device management.
device-mapper: table: 253:4: multipath: error getting device
device-mapper: ioctl: error adding target to table

I do not think this is useful enough, right? Some debug printing is needed from intel/promise?

Comment 30 Mike Christie 2009-02-18 18:33:47 UTC
Oh yeah, promise/intel guys, is this the setup where you are trying to do root on the dm device using the scsi dh alua handler?

Are these logs from a install startup, or are these logs from a normal boot up? Did you install to a alau device, then now are trying to boot from a dm device using alua?

Comment 31 Mike Christie 2009-02-18 18:39:47 UTC
Oh yeah, promise/intel guys, you should modify the initramfs so that it loads the scsi modules like sd_mod and scsi_mod, then loads the scsi_dh_alua so it is there when the scsi devices are getting scanned and setup. It looked like it was getting loaed when dm multipath start up.

I do not think it will fix this. It should just remove some of the error messages during startup.

Comment 32 Bryn M. Reeves 2009-02-18 18:50:47 UTC
Sorry, didn't realise we already had the matching logs. No, there's not much to
go on in that set of output.

It'd be useful to see the table that multipath is trying to load but iirc we
only log the table as a string if the reload succeeds..

Comment 33 ilgu hong 2009-02-24 18:30:22 UTC
Sorry, for late response.

In promise site, system setup is completed but we need more time IT engineer to setup network for remote access. we asked, but we cannot estimate the time. I wish it will be done in next two days.

If we finished, I will let you know.

Thanks.

Comment 35 Mike Christie 2009-02-25 20:23:31 UTC
I am going to mark that last comment private because anyone can view this bz currently.

Comment 36 joseph.r.gruher 2009-03-11 02:03:46 UTC
In latest testing we are able setup multipath and run failover/failback testing and the only problem we experience is bug 455678.  We will do more testing to see if we can mark this bug as resolved.

Comment 37 Keve Gabbert 2009-04-22 17:54:24 UTC
(In reply to comment #36)
> In latest testing we are able setup multipath and run failover/failback testing
> and the only problem we experience is bug 455678.  We will do more testing to
> see if we can mark this bug as resolved.  

what is the status of your testing?

Comment 38 joseph.r.gruher 2009-04-22 18:28:52 UTC
We still see the failure described in bug 455678 but this bug (kernel panic) has not been observed in quite some time.  We can close this bug.

Comment 40 errata-xmlrpc 2009-09-02 11:47:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1377.html


Note You need to log in before you can comment on or make changes to this bug.