228689 – 2.6.19: EFAIL on MPATH failback

Bug 228689 - 2.6.19: EFAIL on MPATH failback

Summary: 2.6.19: EFAIL on MPATH failback

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	6
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	LVM and device-mapper development team
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-02-14 14:48 UTC by Charlie Bennett
Modified:	2008-05-06 19:13 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-05-06 19:13:47 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Charlie Bennett 2007-02-14 14:48:36 UTC

Description of problem:

FC6, 2.6.19-1.2895

As posted to LKML:

  I have two systems running 2.6.19 from fedora-updates.

  System 1: vanilla x86 SMP server box with 2x Emulex LP-101
  System 2: vanilla x86 SMP server box with 2x QLogic 2310F

Both boxes are multipath to Hitachi USP TagmaStore.  All LUNs including
LUN0 (/, /boot, swap, /var via dm-linear/kpartx) are multipathed.

Userspace multipath-tools is 0.4.7, latest from fedora-updates.

With no IO load, multipath failover and failback work flawlessly after
fibre pull/reinsert.

Under IO load, failback fails on both nodes.

With QLogic:
Feb 12 12:48:12 arc-node-112 kernel: qla2xxx 0000:06:01.0: LOOP DOWN
detected (4
).
Feb 12 12:48:47 arc-node-112 kernel:  rport-1:0-0: blocked FC remote
port time o
ut: removing target and saving binding
Feb 12 12:48:47 arc-node-112 kernel: Synchronizing SCSI cache for disk
sde: 
Feb 12 12:48:47 arc-node-112 kernel: FAILED
Feb 12 12:48:47 arc-node-112 kernel:   status = 0, message = 00, host =
1, drive
r = 00
Feb 12 12:48:47 arc-node-112 kernel:   <5>Synchronizing SCSI cache for
disk sdf:
 
Feb 12 12:48:47 arc-node-112 kernel: FAILED
Feb 12 12:48:47 arc-node-112 kernel:   status = 0, message = 00, host =
1, drive
r = 00
Feb 12 12:48:47 arc-node-112 kernel:   <5>Synchronizing SCSI cache for
disk sdg:
 
Feb 12 12:48:47 arc-node-112 kernel: FAILED
Feb 12 12:48:47 arc-node-112 kernel:   status = 0, message = 00, host =
1, drive
r = 00
Feb 12 12:48:47 arc-node-112 kernel:   <5>Synchronizing SCSI cache for
disk sdh:
 
Feb 12 12:48:47 arc-node-112 kernel: FAILED
Feb 12 12:48:47 arc-node-112 kernel:   status = 0, message = 00, host =
1, drive
r = 00
Feb 12 12:49:15 arc-node-112 kernel:   <6>qla2xxx 0000:06:01.0: LIP
reset occure
d (f7f7).
Feb 12 12:49:16 arc-node-112 kernel: qla2xxx 0000:06:01.0: LOOP UP
detected (2 G
bps).
Feb 12 12:49:18 arc-node-112 kernel: scsi 1:0:0:0: Direct-Access
HITACHI  OP
EN-V           5004 PQ: 0 ANSI: 3
Feb 12 12:49:18 arc-node-112 kernel: SCSI device sdi: 211077120 512-byte
hdwr se
ctors (108071 MB)
Feb 12 12:49:18 arc-node-112 kernel: sdi: Write Protect is off
Feb 12 12:49:18 arc-node-112 kernel: SCSI device sdi: drive cache: write
back
Feb 12 12:49:18 arc-node-112 kernel: SCSI device sdi: 211077120 512-byte
hdwr se
ctors (108071 MB)
Feb 12 12:49:18 arc-node-112 kernel: sdi: Write Protect is off
Feb 12 12:49:18 arc-node-112 kernel: SCSI device sdi: drive cache: write
back
Feb 12 12:49:18 arc-node-112 kernel:  sdi: sdi1 sdi2 sdi3 sdi4 < sdi5
sdi6 sdi7 
sdi8 >
Feb 12 12:49:18 arc-node-112 kernel: sd 1:0:0:0: Attached scsi disk sdi
Feb 12 12:49:18 arc-node-112 kernel: sd 1:0:0:0: Attached scsi generic
sg4 type 
0
Feb 12 12:49:18 arc-node-112 kernel: scsi 1:0:0:0: Direct-Access
HITACHI  OP
EN-V           5004 PQ: 0 ANSI: 3
Feb 12 12:49:18 arc-node-112 kernel: kobject_add failed for 1:0:0:0 with
-EEXIST
, don't try to register things with the same name in the same directory.
Feb 12 12:49:18 arc-node-112 kernel:  [<c0405018>] dump_trace+0x69/0x1b6
Feb 12 12:49:18 arc-node-112 kernel:  [<c040517d>] show_trace_log_lvl
+0x18/0x2c
Feb 12 12:49:18 arc-node-112 kernel:  [<c0405778>] show_trace+0xf/0x11
Feb 12 12:49:18 arc-node-112 kernel:  [<c0405875>] dump_stack+0x15/0x17
Feb 12 12:49:18 arc-node-112 kernel:  [<c04eef7a>] kobject_add
+0x16d/0x196
Feb 12 12:49:19 arc-node-112 kernel:  [<c055cdd3>] device_add+0x9f/0x46f
Feb 12 12:49:19 arc-node-112 kernel:  [<f8863168>] scsi_sysfs_add_sdev
+0x2a/0x1d
f [scsi_mod]
Feb 12 12:49:19 arc-node-112 kernel:  [<f8861847>]
scsi_probe_and_add_lun+0x82e/
0x93e [scsi_mod]
Feb 12 12:49:19 arc-node-112 kernel:  [<f8862259>] __scsi_scan_target
+0x447/0x60
a [scsi_mod]
Feb 12 12:49:19 arc-node-112 kernel:  [<f88626c0>] scsi_scan_target
+0x69/0x7b [s
csi_mod]
Feb 12 12:49:19 arc-node-112 kernel:  [<f88b5a13>] fc_scsi_scan_rport
+0x53/0x71 
[scsi_transport_fc]
Feb 12 12:49:19 arc-node-112 kernel:  [<c04368c7>] run_workqueue
+0x97/0xdd
Feb 12 12:49:19 arc-node-112 kernel:  [<c0437284>] worker_thread
+0xd9/0x10d
Feb 12 12:49:19 arc-node-112 kernel:  [<c0439810>] kthread+0xc0/0xec
Feb 12 12:49:19 arc-node-112 kernel:  [<c0404c03>] kernel_thread_helper
+0x7/0x10
Feb 12 12:49:19 arc-node-112 kernel:  =======================
Feb 12 12:49:19 arc-node-112 kernel: error 1
Feb 12 12:49:19 arc-node-112 kernel: scsi 1:0:0:0: Unexpected response
from lun 
0 while scanning, scan aborted



Similar on Emulex:
Feb 12 12:52:03 arc-node-109 kernel: lpfc 0000:06:01.0: 1:1305 Link Down
Event x2 received Data: x2 x20 x110
Feb 12 12:52:03 arc-node-109 kernel: lpfc 0000:06:01.0: 1:1305 Link Down
Event x4 received Data: x4 x4 x100
Feb 12 12:52:29 arc-node-109 mgetty[3772]: failed dev=ttyS0, pid=3772,
login time out
Feb 12 12:52:33 arc-node-109 kernel:  rport-1:0-2: blocked FC remote
port time out: removing target and saving binding
Feb 12 12:52:33 arc-node-109 kernel: lpfc 0000:06:01.0: 1:0203 Devloss
timeout on WWPN 50:6:e:80:4:2b:e5:44 NPort x102ae Data: x8 x7 x4
Feb 12 12:52:33 arc-node-109 kernel: Synchronizing SCSI cache for disk
sde: 
Feb 12 12:52:33 arc-node-109 kernel: FAILED
Feb 12 12:52:33 arc-node-109 kernel:   status = 0, message = 00, host =
1, driver = 00
Feb 12 12:52:33 arc-node-109 kernel:   <5>Synchronizing SCSI cache for
disk sdf: 
Feb 12 12:52:33 arc-node-109 kernel: FAILED
Feb 12 12:52:33 arc-node-109 kernel:   status = 0, message = 00, host =
1, driver = 00
Feb 12 12:52:33 arc-node-109 kernel:   <5>Synchronizing SCSI cache for
disk sdg: 
Feb 12 12:52:33 arc-node-109 kernel: FAILED
Feb 12 12:52:33 arc-node-109 kernel:   status = 0, message = 00, host =
1, driver = 00
Feb 12 12:52:33 arc-node-109 kernel:   <5>Synchronizing SCSI cache for
disk sdh: 
Feb 12 12:52:33 arc-node-109 kernel: FAILED
Feb 12 12:52:33 arc-node-109 kernel:   status = 0, message = 00, host =
1, driver = 00
Feb 12 12:53:00 arc-node-109 kernel:   <3>lpfc 0000:06:01.0: 1:1303 Link
Up Event x5 received Data: x5 x0 x8 x2
Feb 12 12:53:02 arc-node-109 kernel: scsi 1:0:0:0: Direct-Access
HITACHI  OPEN-V           5004 PQ: 0 ANSI: 3
Feb 12 12:53:03 arc-node-109 kernel: SCSI device sdi: 211077120 512-byte
hdwr sectors (108071 MB)
Feb 12 12:53:03 arc-node-109 kernel: sdi: Write Protect is off
Feb 12 12:53:03 arc-node-109 kernel: SCSI device sdi: drive cache: write
back
Feb 12 12:53:03 arc-node-109 kernel: SCSI device sdi: 211077120 512-byte
hdwr sectors (108071 MB)
Feb 12 12:53:03 arc-node-109 kernel: sdi: Write Protect is off
Feb 12 12:53:03 arc-node-109 kernel: SCSI device sdi: drive cache: write
back
Feb 12 12:53:03 arc-node-109 kernel:  sdi: sdi1 sdi2 sdi3 sdi4 < sdi5
sdi6 sdi7 sdi8 >
Feb 12 12:53:03 arc-node-109 kernel: sd 1:0:0:0: Attached scsi disk sdi
Feb 12 12:53:03 arc-node-109 kernel: sd 1:0:0:0: Attached scsi generic
sg4 type 0
Feb 12 12:53:03 arc-node-109 kernel: scsi 1:0:0:0: Direct-Access
HITACHI  OPEN-V           5004 PQ: 0 ANSI: 3
Feb 12 12:53:03 arc-node-109 kernel: kobject_add failed for 1:0:0:0 with
-EEXIST, don't try to register things with the same name in the same
directory.
Feb 12 12:53:03 arc-node-109 kernel:  [<c0405018>] dump_trace+0x69/0x1b6
Feb 12 12:53:03 arc-node-109 kernel:  [<c040517d>] show_trace_log_lvl
+0x18/0x2c
Feb 12 12:53:03 arc-node-109 kernel:  [<c0405778>] show_trace+0xf/0x11
Feb 12 12:53:03 arc-node-109 kernel:  [<c0405875>] dump_stack+0x15/0x17
Feb 12 12:53:03 arc-node-109 kernel:  [<c04eef7a>] kobject_add
+0x16d/0x196
Feb 12 12:53:03 arc-node-109 kernel:  [<c055cdd3>] device_add+0x9f/0x46f
Feb 12 12:53:03 arc-node-109 kernel:  [<f8863168>] scsi_sysfs_add_sdev
+0x2a/0x1df [scsi_mod]
Feb 12 12:53:03 arc-node-109 kernel:  [<f8861847>]
scsi_probe_and_add_lun+0x82e/0x93e [scsi_mod]
Feb 12 12:53:03 arc-node-109 kernel:  [<f8862259>] __scsi_scan_target
+0x447/0x60a [scsi_mod]
Feb 12 12:53:04 arc-node-109 kernel:  [<f88626c0>] scsi_scan_target
+0x69/0x7b [scsi_mod]
Feb 12 12:53:04 arc-node-109 kernel:  [<f88b5a13>] fc_scsi_scan_rport
+0x53/0x71 [scsi_transport_fc]
Feb 12 12:53:04 arc-node-109 kernel:  [<c04368c7>] run_workqueue
+0x97/0xdd
Feb 12 12:53:04 arc-node-109 kernel:  [<c0437284>] worker_thread
+0xd9/0x10d
Feb 12 12:53:04 arc-node-109 kernel:  [<c0439810>] kthread+0xc0/0xec
Feb 12 12:53:04 arc-node-109 kernel:  [<c0404c03>] kernel_thread_helper
+0x7/0x10
Feb 12 12:53:04 arc-node-109 kernel:  =======================
Feb 12 12:53:04 arc-node-109 kernel: error 1
Feb 12 12:53:04 arc-node-109 kernel: scsi 1:0:0:0: Unexpected response
from lun 0 while scanning, scan aborted


Have I configured something poorly?  Are there known fixes out there?

Smells to me like whoever it is in the scsi midlayer keeping track of
/sys/block/foo is either

1. not fully cleaning up things that should be cleaned up on path-fail
or
2. not aware that path-fail is not supposed to clean up sysfs and that it needs
to match us back up to our original sysfs entries


Some notes:
- system was installed with a custom install (no Anaconda)
- my multipath.conf pins the wwids of the disks to specific user_friendly_names
- multipathd is running
- I've nuked 40-multipath-blahblah from udev since multipathd and udev like to
fight over who's creating the dev special files
- I'm also seeing this with QLA2340 and IBM DS6000.

ps - have Peter Jones put a long sleep (8-10 secs) in initrd after loading FC
hbas.  Without the sleep nash tries to grab the root filesystem label before the
IO subsystem has its act together.


Thanks in Advance,
ccb

Comment 1 Chuck Ebbert 2007-02-15 19:45:01 UTC

What does the log look like when failover works the way it should?

And exactly which rpm is multipath-tools? I can't find it
anywhere on the Fedora site or distro CD.

Comment 2 Charlie Bennett 2007-02-16 04:03:16 UTC

I'll scan my logs for a successful failover-failback test in the morning and
snip my logs for you.

The RPM which packages multipath-tools is device-mapper-multipath.
According to my records, it hasn't been updated since the original release
of FC6.

Comment 3 Dave Wysochanski 2007-05-21 14:29:10 UTC

I think the reason it passes with no I/O load is probably because the kernel
sees the failover / failback request from userspace, not the LLD in the kernel.
 In the I/O load case, the kernel sees the failure and this causes a removal of
the scsi target / device which multipath does not handle well.

Comment 4 Mike Christie 2007-05-22 16:00:52 UTC

scsi commands have a reference to the scsi device so if there were commands
lingering then the device would not be completely removed and you would hit the
above errors. This is probably not likely given the trace in the first comment
(looks like IO is being failed correctly).

So the other problem that is more likely is that older dm/dm-multipaths could
not make changes to the dm table if IO was running through a device on it. So if
you do IO, fail all paths, then have dm-multipath internally queue the IO (use
queue_if_no_path or no_path_retry or whatver it is called in the version you are
using), then dm itself has a reference to the open scsi devices because of the
table reference and so when the rport is deleted and it tries to delete the scsi
devices, it only partially cleans up the mess. Because dm still has a reference
it still sits around and when we fix the link and try to readd devices at the
scsi level we hit the nice collision we see in the log.

There are work around patches for this problem but James Smart and Hannes have
worked on real fixes for the scsi clean up and readdition problem

For dm, the NEC guys fixed the table/queueing problem so we can now completely
remove the devices and so that also sort of fixes the problem you saw. I am not
sure what kernel those patches went in. We should ask alasdair.

Comment 5 Mike Christie 2007-05-22 16:05:58 UTC

(In reply to comment #4)
> For dm, the NEC guys fixed the table/queueing problem so we can now completely
> remove the devices and so that also sort of fixes the problem you saw. I am not
> sure what kernel those patches went in. We should ask alasdair.

Is there a FC6 kernel based on 2.6.21? If so this has the dm fixes which would
prevent getting into the unproperly cleaned up scsi devices mess.

Comment 6 Bug Zapper 2008-04-04 06:14:16 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 7 Bug Zapper 2008-05-06 19:13:45 UTC

This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.