| Summary: | systemd-udevd inotify_add_watch7 failed for multipath partitions | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jonathan Edwards <joedward> | ||||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Lin Li <lilin> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 7.3 | CC: | aghadge, agk, bmarzins, dmulford, dvd, dwojewod, heinzm, igreen, jaeshin, jbrassow, jmoon, joedward, jpittman, klaas, lilin, loberman, mjankula, mkarg, msekleta, msnitzer, nchandek, nkshirsa, nyewale, prajnoha, raginjup, rick.beldin, skim, systemd-maint-list, systemd-maint, udev-maint-list, vanhoof | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 7.2 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-10-15 15:16:13 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1420851, 1473733 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Jonathan Edwards
2016-12-07 14:28:04 UTC
Created attachment 1229168 [details]
inotify_add_watch journalctl with udev debug
Created attachment 1229183 [details]
inotify_add_watch journalctl with udev and multipathd debug
Issue is only reproducible with device-mapper-multipath-0.4.9-99.el7.x86_64. Downgrading package to latest 7.2 level(device-mapper-multipath-0.4.9-85.el7_2.6.x86_64) seems to work around the issue. As a test I used the old 62-multipath.rules file with the new package and the issue persisted. I also tried the 7.3 kpartx package with 7.2 device-mapper-multipath package and the issue persisted as well. Attached are two 7.3 journalctl files, one with udev debug enabled and one with udev & multipathd debug enabled. Attempting to isolate this further, through loosely followed bisection of the patches, I was able to narrow things down to a few patches. 0166-RHBZ-1323429-dont-allow-new-wwid.patch 0167-RHBZ-1335176-fix-show-cmds.patch 0168-RHBZ-1347769-shared-lock.patch Package was built with edited spec file, new rpms were force installed (minus the debug package), and the system was rebooted. With patches 166 - 188 commented out, I was not able to reproduce the issue. uncomment 166 - issue does not occur uncomment 167 - issue does occur once on 1st reboot 1st reboot repro 2nd reboot no repro 3rd reboot no repro 4th reboot no repro 5th reboot no repro uncomment 168 - issue occurs everytime Based on the logs, the problem here is that we're processing a uevent for a device and the processing is still not finished while the device is removed in parallel (by another udevd worker here as seen in this report - but I don't think that's important - the device can be removed anytime by anything in parallel, not only as part of another udevd worker, it can be removed from any script or directly by user). When udevd worker finishes processing, it tries to update udev database and also set inotify watch for the device - but that was removed during processing and so we end up with the error/warning messages. We can never assume the device is still there when we finish rule processing for that device. The only thing we can do is to decrease the level of warning/error messages produced in this situation. The system is dynamic - devices can appear and disappear anytime - even during udevd processing. This problem can be easily simulated if we add a rule like: 1) adding /etc/udev/rules.d/50-delay.rules with this content: RUN+="/usr/bin/sleep 10" 2) udevadm control --reload 3) echo add > /sys/block/sda uevent (or any other device) 4) echo 1 > /sys/block/sda/device/delete (make it within the sleep time, which is 10 in our example here) 5) wait for the sleep to finish (10s here) 6) look at the logs, you'll see something like: Apr 04 13:18:12 rhel7-a systemd-udevd[1375]: inotify_add_watch(7, /dev/sda, 10) failed: No such file or directory (The errors from lvm2 we see in the report from comment #1: Dec 06 17:09:38 localhost.localdomain lvm[497]: WARNING: Failed to get udev device handler for device /dev/sda1. ...is just a consequence - because lvm2 reads udev db - and the udev db simply does not reflect the current state where the device is already removed - udev db still contains the device when it is read by lvm2, the udev db is updated only after the REMOVE event is processed) This is not bug in device-mapper-multipath - the rules for removing the partitions (partx -d ...) just made this fact as described in comment #17 more visible because the device/partition is now removed automatically while another udevd worker can still be in the middle of processing of this device/partition. Normally, the device is either removed manually where the coincidence and chance of removing a device while another event for that device is being processed is very small. Or that device is removed because it gets unplugged - again, the chances of running into situation as described in comment #17 is small. So this problem was always here, it's just made more visible now with adding the partx -d. Is it just a scary message then, or does something need to be done? It's sounding to me like the former... Hi,I understand what you explain on Commant 17 and 18. But here(on customer site) is critical issues with Oracle CRS and removing partition table when adding or removing SAN path. If Oracle CRS check some partition table where is on SAN paths with multipathd/lvm2 and fail to check some partition tables during SAN path add or remove for health checking, CRS force to reboot system by itself because CRS just think emergency case is occuerd like losing path. So we need to find best way that partition table is not lost for preventing to reboot system by Oracle CRS whenever SAN path add or remove. Is there any good ideas? Thanks and Best Regard, Cloud(SangWoon),Kim Red Hat Platform Consult (In reply to Jonathan Earl Brassow from comment #19) > Is it just a scary message then, or does something need to be done? It's > sounding to me like the former... As for the warning messages - yes, it's just a message - we can't avoid that completely because a device can be removed anytime, even just in time udevd does the processing and the udev rules are processed. So from this point of view, I'd probably suggest lowering the message severity level to "debug" only, instead of "warning" or "error", if possible. (In reply to Sang Woon Kim from comment #20) > Hi,I understand what you explain on Commant 17 and 18. > But here(on customer site) is critical issues with Oracle CRS and removing > partition table when adding or removing SAN path. > > If Oracle CRS check some partition table where is on SAN paths with > multipathd/lvm2 and fail to check some partition tables during SAN path add > or remove for health checking, CRS force to reboot system by itself because > CRS just think emergency case is occuerd like losing path. > > So we need to find best way that partition table is not lost for preventing > to reboot system by Oracle CRS whenever SAN path add or remove. > This sounds like an improper assumption by Oracle CRS. I think those partitons are removed because we need them on top of the top-level mpath device instead of the components, but I'm no 100% sure - asking Ben to confirm for sure (...or if there's any other reason the partx -d is called). The only reason why we do this is to make sure users work with the kpartx created partition devices, and not the scsi partitions. People kept getting this wrong, and so it seemed easier if these devices just weren't around. That's why the rules only will delete the partitions one time. If the user recreates the partitions, it leaves them alone. It might be better to actually remove the partitions as they come up. So instead of removing all the partitions during udev processing for the whole scsi device. I could remove them individually with a RUN rule during processing on the partition itself. That way the partition couldn't get removed while it was being processed. It's not crucial that we remove the partitions. We could simply mark them as not ready in the udev rules, and that should take care of most automated things that might use them. But changing it now would confuse customers that are not expecting to see them on their multipathed scsi devices, and it does seem to help avoid user error with using multipath devices. confused about the logging level comment (comment 21) to strip the complaint in systemd-udevd .. wouldn't debug be a higher severity logging level than warn or error? How can explain to customers why SAN Block I/O hang up with inotify_add_watch messages when SAN paths are added by power-off and on one of four SAN Switches?
customer use below multipathd options.
device {
vendor "HITACHI"
product "OPEN-.*"
path_grouping_policy multibus
path_selector "round-robin 0"
path_checker tur
"1 queue_if_no_path"
hardware_handler "0"
prio const
rr_weight uniform
no_path_retry 6
rr_min_io 1000
rr_min_io_rq 1
}
Thanks and Best Regard,
Cloud(SangWoon),Kim
Red Hat Platform Consult
I don't think that any of these issues are directly related to the patches in Comment 5. Those simply are changing the timing of things, which can change the outcome of races. I'm also not sure how removing partitions on a path device could cause IO to a multipath device to hang. Fixing the issue by clearing the page cache also points away from a block layer issue, since the block layer sits below the page cache. Once a request hits device-mapper, it has already made it past the page cache, and I don't see how the page cache could effect the request anymore. But just to make sure that removing the partition devices isn't causing an issue, I'll post a patch that you can try, which will stop multipath from removing partitions from its path devices, and will instead set SYSTEMD_READY to 0 for them, so that they are ignored in favor of the kpartx devices. Created attachment 1274753 [details]
patch to stop multipath from deleting path device partitions.
You can apply this patch with
# patch -d /lib/udev/rules.d/ -p1 < multipath-rules.patch
It will stop /lib/udev/rules.d/62-multipath.rules from removing the partition device nodes for the paths of multipath devices. With this patch applied, you should no loner see the inotify udev messages. Do you still see IO hangs when you run your tests with this applied?
Has anyone run the test I mentioned in Comment 29. This bug isn't going to make forward progress without more information. On a server with multipath partitions, I had the same issue. After applying the patch proposed by Ben Marzinski, "inotify systemd-udevd inotify_add_watch" logs disappeared. I'm moving this bug out from rhel-7.5. Like has been mentioned above, these inotify_add_watch messages should not effect anything. On the other hand, it should be completely safe to apply my patch to remove them, as long a you realize that these partition devices should never be used directly. Instead, you should use the kpartx created partition devices. If you are worried that these messages are correlated with an actual problem, you should apply my patch to remove them, and see if your problem is resolved. If that is the case, please let me know. I am closing this bug. The messages about paths being removed do not represent a problem. This is intentionally being done. Further, no one has tried my udev rules patch, and reported any change other than that the scsi partitions were no longer removed. But we specifically are removing these partitions because leaving them in has caused many issues with customers using them instead of the partition devices on top of multipath. If someone can come up with a good reason why this bug should not be closed, please re-open in with an explanation of a actual problem caused by removing the scsi partitions. |