Bug 1758223

Summary: In some cases a /dev/disk/by-path/fc---lun-0 is created for some devices.
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: rhosp-director-imagesAssignee: Bob Fournier <bfournie>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: amoralej, assingh, augol, bbowen, bfournie, dtantsur, gkadam, ietingof, jpittman, kthakre, mburns, rpittau, schhabdi, ssigwald, svigan, systemd-maint-list, tbzatek
Target Milestone: z11Keywords: Tracking, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 20:22:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1750417    
Bug Blocks:    

Description David Hill 2019-10-03 15:29:25 UTC
Description of problem:
In some cases a /dev/disk/by-path/fc---lun-0 is created for some devices and I susepect the following:

~~~
#
# FC WWPN-based by-path links
#

ACTION!="add|change", GOTO="fc_wwpn_end"
KERNEL!="sd*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", IMPORT{program}="fc_wwpn_id %p"
ENV{DEVTYPE}=="partition", IMPORT{parent}="FC_*"
ENV{FC_TARGET_WWPN}!="$*"; GOTO="fc_wwpn_end"
ENV{FC_INITIATOR_WWPN}!="$*"; GOTO="fc_wwpn_end"
ENV{FC_TARGET_LUN}!="$*"; GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}"
ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}-part%n"

LABEL="fc_wwpn_end"
~~~

should be instead:

~~~
#
# FC WWPN-based by-path links
#

ACTION!="add|change", GOTO="fc_wwpn_end"
KERNEL!="sd*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", IMPORT{program}="fc_wwpn_id %p"
ENV{DEVTYPE}=="partition", IMPORT{parent}="FC_*"
ENV{FC_TARGET_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_INITIATOR_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_TARGET_LUN}!="$*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}"
ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}-part%n"

LABEL="fc_wwpn_end"
~~~


Version-Release number of selected component (if applicable):
Latest

How reproducible:
This customer environment.

Steps to Reproduce:
1. Install RHEL 7.7
2.
3.

Actual results:
/dev/disk/by-path/fc---lun-0 is created

Expected results:
That file shouldn't be created

Additional info:

Comment 2 David Hill 2019-10-03 20:12:55 UTC
This is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1750417

Comment 5 John Pittman 2019-10-04 20:54:43 UTC
The code/rulefile that is causing the issue was introduced by sg3_utils-1.37-18.el7_7.1.  Unsure if this is possible in terms of openstack needs, but if the customer could temporarily downgrade back to sg3_utils-1.37-18.el7, the bad by-path links should not be created at device attach.

# yum downgrade sg3_utils sg3_utils-libs

Comment 7 Tomáš Bžatek 2019-10-10 12:18:18 UTC
Looks like the new udev rules are uncovering hidden bugs in the OpenStack code. To understand the real issue here, could you please somebody answer the following questions?

 * is that a stray symlink or is it a misplaced one, missing somewhere else?
 * judging by the ouput provided in comment 4, why the "by_path" attribute matter when "name" is present? (assuming the link points to the mentioned block device file)
 * is /dev/disk/by-* structure actually used for block device enumeration?
 * does the OpenStack code interact with udev through libudev or its bindings?

Could you also please provide output of `udevadm info --export-db` at the time of the issue?

Comment 9 Dmitry Tantsur 2019-10-17 09:13:10 UTC
I can only answer the questions from the information collecting side, I'm not sure how it's used by Ceph:

> is /dev/disk/by-* structure actually used for block device enumeration?

Now, we use lsblk for that. /dev/disk/by-* is only used to provide the by_path attribute. I'm not sure how it's consumed.

> does the OpenStack code interact with udev through libudev or its bindings?

We definitely use pyudev to collect additional attributes of block devices.

Relevant code (quite a lot of it): https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L238-L387

Comment 11 John Pittman 2019-10-17 11:41:24 UTC
Hi Ashish.  I just mean that if you downgrade sg3_utils and the libs, the bad fc symlink will not be created in /dev/disk/by-path.  I don't know anything about the stack environment.  Alternatively, sg3_utils can just be un-installed and the libs left on the system.  I don't believe sg3_utils is installed by default anyways.

Comment 12 David Hill 2019-10-17 22:54:54 UTC
This is a possible workaround for the current IPA initramfs issue:
~~~
 mkdir tmp
 cd tmp
 mv ../ironic-python-agent.initramfs ironic-python-agent.initramfs.gz
 gunzip ironic-python-agent.initramfs.gz
 cat ironic-python-agent.initramfs | cpio -ivd
 rm -rf ironic-python-agent.initramfs
 # fix typos in the initramfs 59-fc-wwpn-id.rules 
 find . -print -depth | cpio -ov > ironic-python-agent.initramfs
 gzip ironic-python-agent.initramfs
 mv ironic-python-agent.initramfs.gz ironic-python-agent.initramfs
~~~

Comment 13 David Hill 2019-10-17 23:19:25 UTC
Something like this:
~~~
 mkdir tmp
 cd tmp
 cp ../ironic-python-agent.initramfs ironic-python-agent.initramfs.gz
 gunzip ironic-python-agent.initramfs.gz
 cat ironic-python-agent.initramfs | cpio -ivd
 rm -rf ironic-python-agent.initramfs
 cat << EOF> usr/lib/udev/rules.d/59-fc-wwpn-id.rules
#
# FC WWPN-based by-path links
#

ACTION!="add|change", GOTO="fc_wwpn_end"
KERNEL!="sd*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", IMPORT{program}="fc_wwpn_id %p"
ENV{DEVTYPE}=="partition", IMPORT{parent}="FC_*"
ENV{FC_TARGET_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_INITIATOR_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_TARGET_LUN}!="$*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}"
ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}-part%n"

LABEL="fc_wwpn_end"
EOF
 find . -print -depth | cpio -ov > ironic-python-agent.initramfs
 gzip ironic-python-agent.initramfs
 mv ironic-python-agent.initramfs.gz ../ironic-python-agent.initramfs
~~~

Comment 14 David Hill 2019-10-17 23:20:45 UTC
This one would be better as the variables are not expanded:
~~~
 mkdir tmp
 cd tmp
 cp ../ironic-python-agent.initramfs ironic-python-agent.initramfs.gz
 gunzip ironic-python-agent.initramfs.gz
 cat ironic-python-agent.initramfs | cpio -ivd
 rm -rf ironic-python-agent.initramfs
 cat << 'EOF'> usr/lib/udev/rules.d/59-fc-wwpn-id.rules
#
# FC WWPN-based by-path links
#

ACTION!="add|change", GOTO="fc_wwpn_end"
KERNEL!="sd*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", IMPORT{program}="fc_wwpn_id %p"
ENV{DEVTYPE}=="partition", IMPORT{parent}="FC_*"
ENV{FC_TARGET_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_INITIATOR_WWPN}!="$*", GOTO="fc_wwpn_end"
ENV{FC_TARGET_LUN}!="$*", GOTO="fc_wwpn_end"

ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}"
ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-path/fc-$env{FC_INITIATOR_WWPN}-$env{FC_TARGET_WWPN}-lun-$env{FC_TARGET_LUN}-part%n"

LABEL="fc_wwpn_end"
EOF
 find . -print -depth | cpio -ov > ironic-python-agent.initramfs
 gzip ironic-python-agent.initramfs
 mv ironic-python-agent.initramfs.gz ../ironic-python-agent.initramfs
~~~

Comment 18 Brian 2019-10-20 20:52:55 UTC
email from Bob Fournier

It may be a problem with the procedure in Comment 14.  It may be safer to try downgrading sg3_utils and sg3_utils-libs to
1.37-18.el7 as indicated in https://bugzilla.redhat.com/show_bug.cgi?id=1758223#c5.  The downloads can be found here - 
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=853138

Note when you upgrade the image you'll need to make sure
- the files have root permissions
- that you upload the images after making the change.  

There's a KCS article - https://access.redhat.com/solutions/3548611, that describes how to do this, and also the downstream documentation -
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks

Comment 20 Dmitry Tantsur 2019-10-21 11:45:52 UTC
Did you try following https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks precisely (modulo different package)? I'd prefer we don't implement custom procedures if there is an official one.

Comment 21 Dmitry Tantsur 2019-10-21 11:47:04 UTC
Also it's not clear from the previous comments: please make sure you're doing everything as a root user. Files inside initramfs are owned by root.

Comment 29 Bob Fournier 2019-11-22 17:07:21 UTC
Shailesh - its more of a question on the RHEL backport to 7.7 for https://bugzilla.redhat.com/show_bug.cgi?id=1750417.  Once that backport merges we will pick it up in the next OSP release.

Comment 31 Bob Fournier 2019-11-23 22:09:05 UTC
Including storage DFG for Comment 30 as they have the expertise for ceph configuration.  May need to open a separate bug as this appears unrelated to the original issue.

Comment 34 Bob Fournier 2020-01-09 22:02:10 UTC
RHEL 7.7 backport for this fix is here - https://bugzilla.redhat.com/show_bug.cgi?id=1788876.  It will be picked up in the next OSP-13z release - 13z11.

Comment 35 Bob Fournier 2020-07-29 20:22:12 UTC
Fix is available in RHEL 7.7 and later, closing this.