Bug 226990 - Multipath sometime breaks on device disconnects
Summary: Multipath sometime breaks on device disconnects
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ben Marzinski
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-02 14:13 UTC by Jimmy Olsen
Modified: 2012-06-20 13:20 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 13:20:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jimmy Olsen 2007-02-02 14:13:27 UTC
Description of problem:

We've got a rig of four machines, connected via dual cables to a SAN through a
QLogic Corp. QLA2312 adapter. We recently disconnected one of the switches (thus
failing one of the paths) to move it physically in preparation of moving the
servers themselves. After failing the path, one server has gotten some problems
with its multipath (the three others simply marked it as failed and continued as
normal). The problematic server continues to use the SAN devices as it should,
so the service is not affected (we didn't fail the active path, after all), but
after about one day "multipath -l" no longer works (it just hangs).

Version-Release number of selected component (if applicable):

device-mapper-multipath-0.4.5-12.0.RHEL4

How reproducible:

Not very reproducible. This happened only to one of the four machines, using the
same hardware, connected to the same SAN, with the same fibre channel adapter,
drivers, and kernel.

Steps to Reproduce:
1.Set up dual connections to a SAN, and set up multipath to handle them.
2.Disconnect one of the paths (the non-active one in this case)
3.Watch "multipath -l" hang after 1-2 days.
  
Actual results:

multipath -l hangs; Not sure if multipathd will still be able to handle a
reconnect of the old path or if it is hanging as well.

Expected results:

The failed path being marked as failed, and "multipath -l" continuing to work.

Additional info:

We failed the path on 2007-01-24. We killed and restarted multipathd on
2007-01-29 in a (failed) attempt to make the problem go away. The following is
cut from a ps list with processes that might or might not be relevant:

root      8565  0.0  0.0  2380  200 ?        S    Jan26   0:00
/sbin/mpath_prio_emc /dev/sde
root      9275  0.0  0.0  5896  684 ?        S    Jan26   0:00 /sbin/multipath -l
root      9292  0.0  0.0  5896  684 ?        S    Jan26   0:00 /sbin/multipath -l
root      9315  0.0  0.0  5896  684 ?        S    Jan26   0:00 /sbin/multipath -l
root     10077  0.0  0.0  5896  684 ?        S    Jan26   0:00 /sbin/multipath -l
root     19746  0.0  0.0  5900  688 ?        S    Jan26   0:00 multipath -v 3 -l
root     17640  0.0  0.0  9744 2596 ?        SLl  Jan29   0:00 /sbin/multipathd
root     17647  0.0  0.0  3644  544 ?        S    Jan29   0:00  \_ /sbin/scsi_id
-g -u -s /block/sdd

All the "multipath -l" from Jan26" was made by our monitoring system before we
spotted the problem and disabled the monitoring. The monitoring system polls
multipath every 5 minutes, and managed to work (i.e. not get hanging processes)
for more than a day before the problems started.

From /var/log/messages when we failed the path:

Jan 24 13:24:00 pund kernel: end_request: I/O error, dev sdb, sector 20324440
Jan 24 13:24:01 pund kernel: device-mapper: dm-multipath: Failing path 8:16.
Jan 24 13:24:01 pund kernel: SCSI error : <1 0 0 0> return code = 0x20000
Jan 24 13:24:01 pund kernel: end_request: I/O error, dev sdb, sector 78476736
Jan 24 13:24:01 pund multipathd: 8:16: mark as failed
Jan 24 13:24:01 pund multipathd: nfs: remaining active paths: 0
Jan 24 13:24:05 pund multipathd: 8:16: emc_clariion_checker: Path healthy
Jan 24 13:24:05 pund multipathd: 8:16: reinstated
Jan 24 13:24:05 pund multipathd: nfs: remaining active paths: 1
Jan 24 13:24:05 pund kernel: device-mapper: dm-emc: emc_pg_init: sending
switch-over command
Jan 24 13:24:05 pund kernel: SCSI error : <2 0 0 0> return code = 0x10000
Jan 24 13:24:05 pund multipathd: nfs: switch to path group #2
Jan 24 13:24:05 pund multipathd: nfs: switch to path group #2
Jan 24 13:24:05 pund kernel: SCSI error : <2 0 0 0> return code = 0x10000
Jan 24 13:24:05 pund kernel: SCSI error : <2 0 0 0> return code = 0x10000
Jan 24 13:24:05 pund kernel: device-mapper: dm-emc: emc_pg_init: sending
switch-over command
Jan 24 13:24:05 pund multipathd: nfs: switch to path group #2
Jan 24 13:24:05 pund kernel: SCSI error : <2 0 0 1> return code = 0x10000
Jan 24 13:24:05 pund kernel: SCSI error : <2 0 0 1> return code = 0x10000
Jan 24 13:24:06 pund kernel: device-mapper: dm-emc: emc_pg_init: sending
switch-over command

there is no further logging from multipathd after this.

Running a strace on multipathd shows that it's hanging on:

read(8,  <unfinished ...>

Running strace on the scsi_id child of multipathd shows that it's hanging on:

write(2, "2:0:0:0: sg_io failed status 0x0"..., 45 <unfinished ...>

Running strace on any of the multipath -l processes show that they're hanging on:

connect(4, {sa_family=AF_FILE, path="/var/run/multipathd.sock"}, 110 <unfinished>

running strace on the mpath_prio_emc process shows that it's hanging on:

write(2, "mmand indicates error", 21 <unfinished ...>

Please let me know if there is any further information that might help in
debugging this issue.

Comment 1 Jimmy Olsen 2007-02-02 14:33:09 UTC
...and just as I had fired off a bug report about it, more information pops up. :-)

We found the problem. FD 2 on the processes that were hanging in write(), was
/dev/console. We connected to the remote management interface (the ILO - this is
Sun machines) and opened the remote console, everything stopped hanging.

/dev/console seems to be able to "fill up" on Sun X4100 with ILO (integrated
lights-out) management interfaces.

I don't know whether this is something you want to fix in multipath (i.e. make
it not log to /dev/console) or hand the bug off to someone taking care of
/dev/console, so I'll leave it to you at RH to decide which component the bug
should be assigned to. :)

Comment 2 Dave Wysochanski 2007-05-21 16:55:57 UTC
multipathing still worked right, just the "multipath -l" command didn't?

I'm not sure what exactly we could do here.  "multipath -l" by default goes to
the console, display of output being the purpose.  The blocking could be viewed
as a feature, since it prompted you to find the problem right?  What would be
our alternative?  Stop writing output if it's blocked?  Write to some file? 
Neither option seems reasonable.

Closing as wont_fix.  Fee free to re-open if you disagree or I've misinterpreted
the problem.

Comment 3 Tore Anderson 2007-06-06 14:08:01 UTC
"multipath -l" output does _not_ go to the console by default, it goes to
stdout.  The reason multipath starts hanging is because multipathd is hanging,
read on...

That /dev/console might block could be considered a feature - I'm guessing flow
control on the serial port, but if I understand
Documentation/kernel-parameters.txt correctly the kernel should default to _not_
using RTS but maybe that's incorrect or that Red Hat patches change it.  Doesn't
matter.  If it's a feature, multipathd needs to be able to cope with it, but
right now it doesn't.  That's the bug.

The thing is that when /dev/console starts blocking, multipathd also start
blocking, because in daemon mode it sends all output to /dev/console.  From
multipathd/main.c:

        out_fd = open("/dev/console", O_WRONLY);
        if (out_fd < 0){
                fprintf(stderr, "cannot open /dev/console for output : %s\n",
                        strerror(errno));
                _exit(0);
        }
[...]
        close(STDOUT_FILENO);
        dup(out_fd);
        close(STDERR_FILENO);
        dup(out_fd);
[...]
        close(out_fd);

So if /dev/console blocks you're in big trouble if you're writing to it without
O_NONBLOCK.  multipathd itself might not do this, but its helper programs
inherits its file descriptors, and when there's an error at least scsi_id and
mpath_prio_emc wants to print stuff as shown by the strace.  Those calls never
return, and multipathd gets stuck waiting for them.  Which means that it won't
be checking and reinstating paths either, which is really bad as this is the
exact moment where that functionality is needed.

That multipath also ends up hanging is because it attempts to contact multipathd
through /var/run/multipathd.sock.  This takes forever because multipathd is
hanging and never gets around to accept()ing multipath's connect().  Not really
critical, but that's the way we noticed it because we run multipath periodically
from our monitoring system to look for failed paths.

The fix is to stop multipathd from dup()ing stdout and stderr to /dev/console
(/dev/null should be used instead) - output from its child processes should just
be discarded (or even better, captured and logged).

I'd like to reopen the bug but Bugzilla won't let me.  And Jimmy's on sick leave
right now.  Can you reopen it, Dave?

Regards
Tore

Comment 4 Alasdair Kergon 2007-06-06 14:31:17 UTC
Messages should be written via syslog so the sysadmin can configure where they 
go with syslog.conf.  /dev/console should not be used directly!

Comment 5 Alasdair Kergon 2007-06-06 14:33:03 UTC
"Those calls never return, and multipathd gets stuck waiting for them."

That's also a bug - it should time out.

Comment 6 Tore Anderson 2007-06-06 15:12:50 UTC
Hi Alasdair, long time no see.  :-)

I'm afraid I don't really remember exactly what FD 8 was in the "read(8,
<unfinished ...>" strace, but I _think_ it was a pipe to a child process.  If I
read callout.c correctly it grabs stdout from the child process here:

                while (1) {
                        count = read(fds[0], value + i, len - i-1);
                        if (count <= 0)
                                break;

                        i += count;
                        if (i >= len-1) {
                                retval = -1;
                                break;
                        }
                }

...and I see no alarms set up around it either so I guess that's where it got
stuck.  The child process got stuck on writing to stderr a.k.a. /dev/console
because no pipe is set up for stderr and thus the /dev/console fd was inherited.

At least so I think..  My C skills are not only rusty, they're almost gone.  :-(

Another thing though, since I see it's you.  :-)  I see that you've got a
dm-mpath-rdac.patch in your kernel.org people subdir.  I'm drooling...  Been
trying to set up several clusters against Engenio/LSI gear and it's hopeless
because only AVT mode is working.  Every time one cluster members boots or
reloads the HBA module, causing the kernel to scan for partition tables, or
mount goes looking for filesystem labels, or LVM go looking for PVs, and so on,
volumes bounce around and paths are failing like crazy on all the other cluster
members.  It's just pain, pain, pain.  Is there a chance of such a RDAC hardware
handler making its way into the RHEL and Linus' kernel any time soon?

Regards
Tore

Comment 7 Magnus 2007-07-05 17:58:35 UTC
Just tried to get multipathing up and running on a StorageTek 6140 and it breaks
the same way. 6140 LSI-based indeed. Multipathd cannot call out to the path, the
diskarray fails the controller and all sorts of disasters as soon as something
tries to access the array.

Googling comes up with device-mapper and kernel-patches, I guess we will be
unsupported as far as RedHat is concerned - if we decide to go that route.

We are running RHEL 4.5 x86_64 and this is a major showstopper right now.

Regards
Magnus


Comment 8 Tore Anderson 2007-09-20 07:56:13 UTC
Any hope of this bug being fixed?  It is quite nasty as it prevents multipathd
from working correctly when /dev/console blocks, which seems to be the case with
SunFire X4x00 machines console outputted to the serial port of the ILO card.

We've had problems with this bug causing I/O interruptions several times now.

Magnus - I suspect you're having other issues.  This bug is not specific to the
6140, it will hit no matter what array you have.  There's a thread about the
6140 at the dm-devel list you might want to take a look at. 
https://www.redhat.com/archives/dm-devel/2007-September/msg00064.html

Tore

Comment 9 Charlie Brady 2009-03-31 20:35:05 UTC
> Any hope of this bug being fixed? 

Not fixed as of RHEL 5.2.

Comment 11 Ben Marzinski 2010-10-21 19:10:59 UTC
Is anyone still hitting this issue on RHEL4.  There's not much chance of getting this fixed there.  Fixing it on RHEL5 and RHEL6 are definitely doable, since multipathd is still opening dev/console on both of them.  Looking through the current code, it doesn't look like there are many multipathd messages that don't go through multipath's condlog interface, which should handle printing correctly, and most of the ones that there are obviously happen during the initial startup, before multipathd is completely daemonized.  But a couple look wrong, and a review of multipthd's output is probably overdue.

Does anyone have an objection if I change this from a RHEL4 bug to a RHEL5 bug? If people are still running into this problem on RHEL4, I'll clone it, and leave this bug open for RHEL4.

Comment 12 Jiri Pallich 2012-06-20 13:20:46 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.