| Summary: | [LLNL 7.5 Bug] iSCSI multipath fails to boot 10% of the time | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ben Woodard <woodard> | ||||
| Component: | iscsi-initiator-utils | Assignee: | Chris Leech <cleech> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Filip Suba <fsuba> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.3 | CC: | dracut-maint-list, lnykryn, tdhooge, tgummels | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 7.5 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-08-04 18:46:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1599298 | ||||||
| Attachments: |
|
||||||
|
Description
Ben Woodard
2016-10-11 23:53:07 UTC
Trent, Since this might be related to the driver and the switch and Opal is down for power work. Can you fill in the details about those two things. Could you boot these machines with rd.debug on kernel cmdline and when the issue occurs upload or save somewhere the content of /run/initramfs/rdsosreport.txt and output of journalctl before you reboot? Created attachment 1454837 [details]
Console, rdsosreport.txt, and journalctl output
This error is not one I remember, maybe a clue to what is going on. I see it on all the nodes that fail to boot. sysfs: cannot create duplicate filename '/devices/platform/host10/session1/connection1:0' Based on the log looks more like iscsi issue. Do you have a feel for server or client side. My guess is client. If I don't multi-path I don't have failures. I have the same amount of load going to the server side. To make things more annoying, I found by using rd.break=pre-pivot and using multi-path, I don't see the issue..... Same when I turn on rd.debug, timing issue somewhere in there. tested on 1000 nodes the modules.d/95iscsi/iscsiroot.sh from github, and this seems to address the issue. all 1000 nodes booted first time. looks like they changed from iscsistart to iscsid commit b31f3fe0d1bea66078ef65c736df03a150f74607 Hi Lukáš, Trent @ LLNL has noted that a closer to upstream version of the iscsi startup scripts resolves the issue for them. He is specifically calling out: https://github.com/dracutdevs/dracut/commit/b31f3fe0d1bea66078ef65c736df03a150f74607 Would it be possible to pull this change in to 7.6 this late in the schedule? Could you build an rpm with this change for LLNL to validate? Thank you, Travis Note I just tested iscsiroot.sh as is. I was just calling out what commit I felt likely brought in the fix. Definitely not for 7.6. With the complexity of the patch I am not sure if we want to do such change in rhel7 generally. LLNL has been carrying a later iscsiroot.sh which at last report had resolved the issue. Since RHEL 7 isn't entertaining any further enhancements, defects have to clear a high bar for inclusion and the earlier concern with even including the change I'm closing this bug. As far as I can discern the version Trent was using (or a later version) is in RHEL 8 (dracut v49). If RHEL 8 exhibits the same defect please log a new bug. |