Bug 821580
| Summary: | [device-mapper] System hang/freeze when multipath over iSCSI got 1 iface down. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Gris Ge <fge> | ||||||
| Component: | kernel | Assignee: | Mike Snitzer <msnitzer> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Storage QE <storage-qe> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.3 | CC: | bdonahue, bmarzins, czhang, lvm-team, msnitzer, thenzl | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-09-21 19:36:01 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 840683 | ||||||||
| Attachments: |
|
||||||||
|
Description
Gris Ge
2012-05-15 02:12:56 UTC
So you're creating 50 mpath devices, each with 52 partitions, with tgt target and iscsi client on the same machine. Once multipath devices (and partitions are active) you're tearing down all the iscsi sessions. This causes _all_ paths to the multipath devices to fail simultaneously. Odd test. Unlikely we'll do anything to throttle the kernel's error messages. The OS freezing needs to be understood though. Do you happen to have console access and do you have any understanding what went wrong? (do you have a console trace that shows some stack trace and/or crash?). Just needs reproducing, preferably against RHEL6.3.. really doubtful all the partition creation has anything to do with this issue. Mike, It's might be the console who slow OS down when kernel error message flood in it. It seems there is a error message rate limit patch applied to scsi layer which might fix this issue. I will try to reproduce on RHLE 6.3 GA again and keep you posted. Created attachment 602962 [details]
console log when trigger this bug
Mike,
I reproduced this problem on RHEL 6.3 GA.
The console was flooded by the I/O error on dm-XX (multipath devices) which freeze OS. I would like to rate limit apply to these error messages.
I have attached the console log.
(In reply to comment #6) > Created attachment 602962 [details] > console log when trigger this bug > > Mike, > > I reproduced this problem on RHEL 6.3 GA. > > The console was flooded by the I/O error on dm-XX (multipath devices) which > freeze OS. I would like to rate limit apply to these error messages. > > I have attached the console log. Seems there is something pathological about all iscsi sessions being dropped simultaneously. multipathd is attempting to reload all the multipath tables -- but that is failing because all the iscsi devices nolonger exist (hence: "multipath: error getting device" for each path). It'd be useful to get the /var/log/messages from the same test cycle; this should give us more information about what multipathd is doing. I'm not sure what the right response would be to this situation; but if a device no longer exists there clearly isn't any point trying to push down a multipath table that references the missing device(s). Cc'ing Ben to get his insight. The issue is that multipathd gets those remove uevents one at a time. So, when it gets the request to remove the first path, it doesn't know that the other have been removed. I suppose it would be possible to revalidate all of a multipath device's paths whenever one of them is removed. I'm not sure that this would be the best idea for all cases. Those uevents can pile up, and multipathd needs to deal with them quickly. Also, this wouldn't change the amount of IO error messages. Upstream has started to accept an error throttling patch for block and SCSI (block chunk was accepted, SCSI hasn't been yet): http://www.open-fcoe.org/patchwork/patch/2655/ But looking at the log from comment#6 it seems the block patch would help the most. Though we might look to rate limit these DM messages too: device-mapper: table: 253:8: multipath: error getting device device-mapper: ioctl: error adding target to table Created attachment 609686 [details] block and scsi error throttling patch Proposed patch from http://www.open-fcoe.org/patchwork/patch/2655/ *** This bug has been marked as a duplicate of bug 800555 *** |