| Summary: | [device-mapper-multipath] [scale] multipath scan takes 10 minutes to return at certain occasions | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Storage QE <storage-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | CC: | agk, bmarzins, coughlan, cpelland, czhang, ddumas, dwysocha, heinzm, mbroz, mgoldboi, prajnoha, prockai, smizrahi, yeylon, ykaul, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-09-22 12:13:42 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 682015 | ||||||
| Attachments: |
|
||||||
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x7fae7147fa70) = 9438 > --- SIGCHLD (Child exited) @ 0 (0) --- > close(29) = 0 > read(28, "36006016066102900cc07b2daf2ade01"..., 127) = 34 > read(28, "", 93) = 0 > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9438 > close(28) = 0 This is the getuid_callout. You can manually run this to verify how long it takes. It should return almost instantly. > ioctl(27, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 01, c0, 00, 80, 00], > mx_sb_len=128, iovec_count=0, dxfer_len=256, timeout=60000, flags=0 This is the prio checker function. Again, it should return quickly, but the ioctl itself has a timeout of 1 minute which, looking at the return value, isn't getting hit. Do you know which one of these is the slow one? Ping Haim - if this is a blocker, we need more information here. |
Created attachment 513639 [details] strace dump Description of problem: host connected to 500 Luns via FC, sometimes, we see that it takes 10 minutes and more for command /sbin/multipath to return. running strace, it appears to be hanging on the following command: clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fae7147fa70) = 9438 --- SIGCHLD (Child exited) @ 0 (0) --- close(29) = 0 read(28, "36006016066102900cc07b2daf2ade01"..., 127) = 34 read(28, "", 93) = 0 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9438 close(28) = 0 ioctl(27, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 01, c0, 00, 80, 00], mx_sb_len=128, iovec_count=0, dxfer_len=256, timeout=60000, flags=0 attached full strace of /sbin/multipath multiapth.conf: # RHEV REVISION 0.6 defaults { polling_interval 5 getuid_callout "/sbin/scsi_id -g -u -d /dev/%n" no_path_retry fail user_friendly_names no flush_on_last_del yes fast_io_fail_tmo 5 dev_loss_tmo 30 max_fds 4096 } device-mapper-multipath-0.4.9-41.el6.x86_64