Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 821580

Summary:

[device-mapper] System hang/freeze when multipath over iSCSI got 1 iface down.

Product:

Red Hat Enterprise Linux 6

Reporter:

Gris Ge <fge>

Component:

kernel

Assignee:

Mike Snitzer <msnitzer>

Status:

CLOSED DUPLICATE

QA Contact:

Storage QE <storage-qe>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

6.3

CC:

bdonahue, bmarzins, czhang, lvm-team, msnitzer, thenzl

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-09-21 19:36:01 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

840683

Attachments:

Description	Flags
console log when trigger this bug	none
block and scsi error throttling patch	none

Description Gris Ge 2012-05-15 02:12:56 UTC

Description of problem:

When connecting 50+ LUNs via 4 target each, 1 interface down will cause system hang/freeze.

console flooded with these errors:
====
end_request: I/O error, dev dm-9, sector 1445847
====

Bug #800555 apply the rate limit to SCSI layer printk, need check whether device-mapper need this rate limit.

As each mpath has 50+ partitions, udev might the one who running I/O.


Version-Release number of selected component (if applicable):
kernel -268

How reproducible:
100%

Steps to Reproduce:
1. Use attached tool (tgtd.sh) to create 50 LUN via 4 iSCSI target each.
2. Use these commands to login iscsi target:
====
iscsiadm -m discovery -t st -p localhost
iscsiadm -m node -l
====
3. Use these commands to enable multipath:
====
mpathconf --enable
service multipathd start
====
4. Use these commands to create 52 partitions on each mpath. (no sure whether this step is necessary)
===========
fdisk /dev/mapper/mpathe << EOF
n
e
1


w
EOF

for X in `seq 5 54`;do
fdisk /dev/mapper/mpathe << EOF
n
l

+10M
t
$X
8e
w
EOF
done

for X in `multipath -l |grep mpath \
| perl -ne 'print "$1 " if /(mpath[a-z]+)/'`;
do
    sfdisk /dev/mapper/mpathe -d -f \
      | sfdisk /dev/mapper/$X;
done
===========

5. Use these commands to create mpath partitions (kpartx rule is different from udev rull, so we use udev way):
======
mulitpath -F
multipath -r
======

6. Logout iscsi session:
======
iscsiadm -m node -u
======

Actual results:
console flooded with "end_request: I/O error, dev dm-9, sector 1445847"
OS freeze.

Expected results:
OS no freeze.

Additional info:

This bug just request limit error message printed by kernel, request exception.

Comment 4 Mike Snitzer 2012-08-07 20:45:59 UTC

So you're creating 50 mpath devices, each with 52 partitions, with tgt target and iscsi client on the same machine.

Once multipath devices (and partitions are active) you're tearing down all the iscsi sessions.

This causes _all_ paths to the multipath devices to fail simultaneously.

Odd test.  Unlikely we'll do anything to throttle the kernel's error messages.  The OS freezing needs to be understood though.

Do you happen to have console access and do you have any understanding what went wrong?  (do you have a console trace that shows some stack trace and/or crash?).

Just needs reproducing, preferably against RHEL6.3.. really doubtful all the partition creation has anything to do with this issue.

Comment 5 Gris Ge 2012-08-08 05:43:05 UTC

Mike,

It's might be the console who slow OS down when kernel error message flood  in it.

It seems there is a error message rate limit patch applied to scsi layer which  
might fix this issue.

I will try to reproduce on RHLE 6.3 GA again and keep you posted.

Comment 6 Gris Ge 2012-08-08 08:11:25 UTC

Created attachment 602962 [details]
console log when trigger this bug

Mike,

I reproduced this problem on RHEL 6.3 GA.

The console was flooded by the I/O error on dm-XX (multipath devices) which freeze OS. I would like to rate limit apply to these error messages.

I have attached the console log.

Comment 7 Mike Snitzer 2012-08-08 13:24:51 UTC

(In reply to comment #6)
> Created attachment 602962 [details]
> console log when trigger this bug
> 
> Mike,
> 
> I reproduced this problem on RHEL 6.3 GA.
> 
> The console was flooded by the I/O error on dm-XX (multipath devices) which
> freeze OS. I would like to rate limit apply to these error messages.
> 
> I have attached the console log.

Seems there is something pathological about all iscsi sessions being dropped simultaneously. multipathd is attempting to reload all the multipath tables -- but that is failing because all the iscsi devices nolonger exist (hence: "multipath: error getting device" for each path).

It'd be useful to get the /var/log/messages from the same test cycle; this should give us more information about what multipathd is doing.

I'm not sure what the right response would be to this situation; but if a device no longer exists there clearly isn't any point trying to push down a multipath table that references the missing device(s).

Cc'ing Ben to get his insight.

Comment 8 Ben Marzinski 2012-08-08 21:26:59 UTC

The issue is that multipathd gets those remove uevents one at a time. So, when it gets the request to remove the first path, it doesn't know that the other have been removed. I suppose it would be possible to revalidate all of a multipath device's paths whenever one of them is removed.  I'm not sure that this would be the best idea for all cases.  Those uevents can pile up, and multipathd needs to deal with them quickly.  Also, this wouldn't change the amount of IO error messages.

Comment 9 Mike Snitzer 2012-09-04 13:27:32 UTC

Upstream has started to accept an error throttling patch for block and SCSI (block chunk was accepted, SCSI hasn't been yet):
http://www.open-fcoe.org/patchwork/patch/2655/

But looking at the log from comment#6 it seems the block patch would help the most.

Though we might look to rate limit these DM messages too:
device-mapper: table: 253:8: multipath: error getting device                                           
device-mapper: ioctl: error adding target to table

Comment 10 Mike Snitzer 2012-09-04 13:33:42 UTC

Created attachment 609686 [details]
block and scsi error throttling patch

Proposed patch from http://www.open-fcoe.org/patchwork/patch/2655/

Comment 12 Mike Snitzer 2012-09-21 19:36:01 UTC


*** This bug has been marked as a duplicate of bug 800555 ***