Bug 454093
| Summary: | Install successfully but fail to failover | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | I-Chung Ho <tom.ho> | ||||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 5.0 | CC: | agk, Anthony.Laker, bmarzins, bmr, christophe.varoqui, dennis.zhou, dwysocha, egoggin, heinzm, iannis, junichi.nomura, kueda, lmb, mark.rugare, mbroz, prockai, tom.ho, tranlan | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | i386 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-05-12 17:24:40 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
First off, I trust you have multipathd running. If it's not, please run # service multipathd start # chkconfig multipathd on and see if that fixes your problem. Could you please send me the output of /var/log/messages while you are pulling the cables? You could also try setting no_path_retry to something other than "fail". For instance, if you have no_path_retry 10 multipath will not fail the IO until it checks for active paths 10 times in a row, and doesn't find any (With the default polling_interval of 5 seconds, this should allow multipath to deal with all paths being down for up to 50 seconds). However, if this fixes the problem, please let me know. If you are making sure that the paths lost during the last cable pull have come back up before pulling the next cable, like you mentioned in your steps to reproduce, then I don't see why this should be necessary. Created attachment 311349 [details]
Sytem log when we perform hot plug miniSAS cable
Looking at your system log, there is no multipathd output at all. Are you sure that multipathd is running? Can you please send me the output of # multipath -v3 -ll from before the start of your test. Then run # service multipathd stop # multipathd -v3 Then start your test, and collect the output to /var/log/messages for the entire length of your test. Created attachment 314532 [details]
system log
System log when we perform hot plug miniSAS cabe, one path of dm-53 didn't come back.
We have tried the command you suggested: # service multipathd stop # multipathd -v3 It pass 5 cycles of hot plugging miniSAS. (What is different between: # service multipathd stop and # multipathd -v3) But one path of dm-53 didn't comeback after we plug the miniSAS cable. It's a SAS disk. We will try on SATA disk. Created attachment 315917 [details]
System log, one disk's path missing
We still have issue with hot-plugging sas cable.
A disk's path() won't come back after several cycles of hot-plug.
Is this still an issue? There have been multiple changes that may have solved this issue. If this is still reproduceable, please let me know, otherwise I'm going to close this bug. Numerous SAS fixes have happened since 5.0. This bug should be solved, and there is not enough information for me to debug in this bugzilla. |
From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; Tablet PC 2.0) Description of problem: We have Sun Storage J4200 and LSI 1068E HBA card on RHEL5. We try to use device-mapper to handle the multipath. There is no problem to install and enable the multipath. And no problem to create software RAID (mdadm) with the merged devices(/dev/dm*) and run IO on the raid. The problem occurs when we try to failover. We perform the hot plug in/out miniSAS cables on the host. Before plugging out the miniSAS cable we do check that all paths are ready. But after several cycles hot plugging, the IO fail when we plug out the miniSAS cable. The other path remain ready and active and the link LED indicates normal. We have tried different mutipath setting, such as set path_grouping_policy=failover or different path_checker. The symptom remains the same. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.7-8.el5.i386 How reproducible: Always Steps to Reproduce: 1.One host connect to Sun Storage J4200 with 2 paths. 2.Enable the multipath. 3.Create RAID(any RAID, 1, 5, 6) by mdadm with the merged devices. 4.Start IO. 5.Plug out one miniSAS on host. 6.Make sure that half of the path have gone. 7.Plug in the miniSAS cable. 8.Make sure that all the paths have back. 9.Plug out another miniSAS on host. 10.Make sure that half of the paths have gone. 11.Plug in the miniSAS cable. 12.Make sure that all paths have back. 13.Repeat step 5~12. Actual Results: After 2~3 cycles of hot plugging the miniSAS cable, the IO hangs. Expected Results: When we plug out one miniSAS cable the other cable should remain maintaining the IO. And IO should remain running no matter how many cycles of hot plugging the miniSAS calbe we have performed. Additional info: HBA driver: 4.00.21.00-1 multipath.conf: device { vendor "SEAGATE" product "ST314655SSUN146G" path_grouping_policy multibus getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout "none" path_checker tur path_selector "round-robin 0" failback immediate rr_weight uniform hardware_handler "0" no_path_retry fail user_friendly_name no }