Bug 2148567 - [GSS] OSD prepare job is skipping OSD configuration (multipath devices)
Summary: [GSS] OSD prepare job is skipping OSD configuration (multipath devices)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.11
Hardware: All
OS: All
unspecified
urgent
Target Milestone: ---
: ODF 4.12.0
Assignee: Travis Nielsen
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-25 23:50 UTC by kelwhite
Modified: 2023-08-09 17:03 UTC (History)
12 users (show)

Fixed In Version: 4.12.0-145
Doc Type: Bug Fix
Doc Text:
Cause: Configuring an ODF cluster with multipath devices may not create OSDs as expected. Consequence: Devices with the mpath_member label will be skipped for OSD creation. Fix: Allow OSDs to be created even when the mpath_member FSType is set on the device since the device is specifically provisioned with a PVC. Result: OSDs are expected to be created on clean mpath devices.
Clone Of:
Environment:
Last Closed: 2023-02-08 14:06:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 438 0 None open Bug 2148567: osd: Allow mpath_member filesystem for mpath disks 2022-12-12 17:06:30 UTC
Github rook rook pull 11413 0 None Merged osd: Allow mpath_member filesystem for mpath disks 2022-12-12 14:37:22 UTC
Red Hat Knowledge Base (Solution) 6989736 0 None None None 2022-12-08 19:53:55 UTC

Comment 16 loberman 2022-12-05 22:21:27 UTC
Hello

We are going to need help here and the customer is fast becoming impatient with us.
The case has been escalated.

We did not understand what next steps are for us.

2022-11-29T17:36:07.332318070Z 2022-11-29 17:36:07.332314 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-localblock-0-data-0lx4cl raw prepare --bluestore --data /dev/mapper/mpatha
2022-11-29T17:36:08.104559230Z 2022-11-29 17:36:08.104514 I | cephosd: --> Raw device /dev/mapper/mpatha is already prepared.

For example, is this raw prepare going after the wrong mapper device (mpatha) and that is why its complaining.
When Kelson ran this we told it which one to use.

Please treat this as very URGENT and give us next steps to gather, even if it includes getting additional debug.

Thanks
Laurence Oberman

Comment 18 loberman 2022-12-06 14:15:51 UTC
From Laurence

Hello Red Hat Team

When we configured this we did change it to use /dev/disk/by-id because we had a multipath ordering issue for consistency across all nodes.

The other complication here is this customer is boot from SAN so one of the mpaths is used for the O/S so they have multiple mpaths per node.
Having said this two out of three worked (actually 3 out of 4) so for me the issue is not the naming anymore its something else.

The data volumes are all 2.4T

mpatha (3624a93708a2c2aed4e9a423800026b1e) dm-0 PURE,FlashArray        Disk to be used for the OSD
size=2.4T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 1:0:0:2 sdb 8:16  active ready running
  |- 1:0:1:2 sdg 8:96  active ready running
  |- 8:0:0:2 sdi 8:128 active ready running
  `- 8:0:1:2 sdk 8:160 active ready running

mpathb (3624a93708a2c2aed4e9a423800026938) dm-1 PURE,FlashArray         O/S disk
size=250G features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 1:0:0:1 sda 8:0   active ready running
  |- 1:0:1:1 sdf 8:80  active ready running
  |- 8:0:0:1 sdh 8:112 active ready running
  `- 8:0:1:1 sdj 8:144 active ready running

So its complaining about mpatha and that is the correct device

2022-11-29T17:36:07.332318070Z 2022-11-29 17:36:07.332314 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-localblock-0-data-0lx4cl raw prepare --bluestore --data /dev/mapper/mpatha
2022-11-29T17:36:08.104559230Z 2022-11-29 17:36:08.104514 I | cephosd: --> Raw device /dev/mapper/mpatha is already prepared.
                                                                           *************************************************

Can we not run the ceph-volume manually to workaround this and trace it to see why it thinks its already prepared.

Either that to temporarily blacklist that mpath device from multipath, restart multipath and try again.

Regards
Laurence

Comment 19 loberman 2022-12-06 14:22:57 UTC
I gave the customer this

Hello Tarek

Can we try something as this will give me more data to give to engineering

For storage1 where the issue is happening

First get a multipath -ll

Save the mapping so we know the current path names for mpatha

Then

edit /etc/multipath.conf

add this in the blacklist part

blacklist {
   wwid 3624a93708a2c2aed4e9a423800026b1e
}


Then run systemctl reload multipathd

multipath -ll should no longer map the device

You should only see mpathb

then overwrite the whole disk
dd if=/dev/zero of=/dev/sdxxx bs=1024K oflag=direct    change sdxxx to one of the path names saved from above


Then retry the provisioning of the OSD using /dev/disk/by-id again

We can try do it together if you want

Regards
Laurence Oberman

Comment 21 loberman 2022-12-06 22:32:24 UTC
Hello

With this being coresOS and booting from SAN (mpath) I had to jump through hoops top fully blacklist the device.
I managed to manually remove the mpath and we tried again and it failed differently but still failed.

I am storage/kernel internals maintenance engineer but neither a shift nor ODF savvy engineer.
I think given this is escalated and customer cannot make progress then a Engineering developer resource should get on and live troubleshoot this.

Device is correct
2022-12-06 21:49:22.947092 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2 Size:2684354560000 UUID:ebae9336-0fe9-4090-a176-8567ec7064ac Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray 

2022-12-06 21:49:22.956262 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-06 21:49:22.956268 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.002629 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:23.005214 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-12-06 21:49:23.258582 D | cephosd: {}
2022-12-06 21:49:23.258611 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-06 21:49:23.258618 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.265085 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-06 21:49:23.265099 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-06 21:49:23.477027 D | cephosd: {}
2022-12-06 21:49:23.477052 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-06 21:49:23.477057 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"



Full log
-----------

$ oc logs rook-ceph-osd-prepare-30a979865d0234fdc9c770eb1afbc7cb-l4l5h

2022-12-06 21:49:22.914717 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-06 21:49:22.916770 I | rookcmd: starting Rook v4.11.3-0.224a35508091e5dcf8f09dd910118b75ef52f84e with arguments '/rook/rook ceph osd provision'
2022-12-06 21:49:22.916783 I | rookcmd: flag values: --cluster-id=4b30fd44-6be2-43f3-8129-f8eb670b82fe, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-0lx4cl","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-0lx4cl, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-12-06 21:49:22.916792 I | op-mon: parsing mon endpoints: b=172.30.217.124:6789,c=172.30.156.113:6789,a=172.30.79.36:6789
2022-12-06 21:49:22.925634 I | op-osd: CRUSH location=root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-06 21:49:22.925656 I | cephcmd: crush location of osd: root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-06 21:49:22.925663 D | exec: Running command: dmsetup version
2022-12-06 21:49:22.927533 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.43.0
2022-12-06 21:49:22.935929 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-12-06 21:49:22.936078 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-12-06 21:49:22.936137 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                         = b5c98aa2-4a54-4714-9979-fdd232f0bd46
mon initial members          = c a b
mon host                     = [v2:172.30.156.113:3300,v1:172.30.156.113:6789],[v2:172.30.79.36:3300,v1:172.30.79.36:6789],[v2:172.30.217.124:3300,v1:172.30.217.124:6789]
rbd_mirror_die_after_seconds = 3600
bdev_flock_retry             = 20
mon_osd_full_ratio           = .85
mon_osd_backfillfull_ratio   = .8
mon_osd_nearfull_ratio       = .75
mon_max_pg_per_osd           = 600
mon_pg_warn_max_object_skew  = 0
mon_data_avail_warn          = 15

[osd]
osd_memory_target_cgroup_limit_ratio = 0.8

[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring

2022-12-06 21:49:22.936146 I | cephosd: discovering hardware
2022-12-06 21:49:22.936152 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:22.937939 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:22.938401 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:22.942605 D | exec: Running command: udevadm info --query=property /dev/sde
2022-12-06 21:49:22.947032 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2\nDEVNAME=/dev/sde\nDEVPATH=/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0/0000:61:00.0/0000:62:00.2/host7/rport-7:0-1/target7:0:0/7:0:0:2/block/sde\nDEVTYPE=disk\nDM_DEL_PART_NODES=1\nDM_MULTIPATH_DEVICE_PATH=1\nFC_INITIATOR_WWPN=0x20000025b510b154\nFC_TARGET_LUN=2\nFC_TARGET_WWPN=0x524a937863ad1581\nID_BUS=scsi\nID_FS_TYPE=mpath_member\nID_MODEL=FlashArray\nID_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2\nID_PATH_TAG=pci-0000_62_00_2-fc-0x524a937863ad1581-lun-2\nID_REVISION=8888\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=8A2C2AED4E9A423800026B1E\nID_SERIAL=3624a93708a2c2aed4e9a423800026b1e\nID_SERIAL_SHORT=624a93708a2c2aed4e9a423800026b1e\nID_TARGET_PORT=0\nID_TYPE=disk\nID_VENDOR=PURE\nID_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nID_WWN=0x624a93708a2c2aed\nID_WWN_VENDOR_EXTENSION=0x4e9a423800026b1e\nID_WWN_WITH_EXTENSION=0x624a93708a2c2aed4e9a423800026b1e\nMAJOR=8\nMINOR=64\nMPATH_SBIN_PATH=/sbin\nSCSI_IDENT_LUN_LOGICAL_UNIT_GROUP=0x0\nSCSI_IDENT_LUN_NAA_REGEXT=624a93708a2c2aed4e9a423800026b1e\nSCSI_IDENT_LUN_T10=PURE_FlashArray:8A2C2AED4E9A423800026B1E\nSCSI_IDENT_LUN_VENDOR=IP-OC-04802-C5C_001\nSCSI_IDENT_PORT_NAME=naa.524a937863ad1581,t,0x0001\nSCSI_IDENT_PORT_RELATIVE=83\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x0\nSCSI_IDENT_SERIAL=8A2C2AED4E9A423800026B1E\nSCSI_MODEL=FlashArray\nSCSI_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=8888\nSCSI_TPGS=1\nSCSI_TYPE=disk\nSCSI_VENDOR=PURE\nSCSI_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nSYSTEMD_READY=0\nTAGS=:systemd:\nUSEC_INITIALIZED=14283051"
2022-12-06 21:49:22.947062 I | cephosd: creating and starting the osds
2022-12-06 21:49:22.947072 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-06 21:49:22.947074 D | cephosd: context.Devices are:
2022-12-06 21:49:22.947092 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2 Size:2684354560000 UUID:ebae9336-0fe9-4090-a176-8567ec7064ac Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray WWN:0x624a93708a2c2aed WWNVendorExtension:0x624a93708a2c2aed4e9a423800026b1e Empty:false CephVolumeData: RealPath:/dev/sde KernelName:sde Encrypted:false}
2022-12-06 21:49:22.947095 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"
2022-12-06 21:49:22.956253 I | cephosd: configuring osd devices: {"Entries":{}}
2022-12-06 21:49:22.956262 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-06 21:49:22.956268 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.002629 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:23.005214 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-12-06 21:49:23.258582 D | cephosd: {}
2022-12-06 21:49:23.258611 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-06 21:49:23.258618 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.265085 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-06 21:49:23.265099 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-06 21:49:23.477027 D | cephosd: {}
2022-12-06 21:49:23.477052 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-06 21:49:23.477057 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"

Comment 22 loberman 2022-12-07 15:17:37 UTC
Hello

We are meeting with the customer at 1PM we really need an engineering person to live trouble shoot this issue please.
We are way past a situation of keeping this customer happy.

Its setting a bad precedent for how we support and service OCS/ODF and ceph

Regards
Laurence

Comment 25 Travis Nielsen 2022-12-07 18:47:05 UTC
The latest osd prepare log shows that there is a remnant of the multipath. 

cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"


Can this be cleaned and try again?

Comment 26 loberman 2022-12-07 18:55:41 UTC
Hello Travis

I removed the multipath using multipathd -k to delete the map.
Of course there are multiple devices that have the same wwid.
I could try delete all paths but 1 so we only have a single device.

What I would like to know (if possible) is:
We successfully configured 2 of them already with multipath in place 
Why would only this 3rd one have an issue finding the multiple devices pointing to the same LUN.

I am concerned to go yet back again to this customer who is fast becoming frustrated unless you are fairly confident this is is the issue.
If you really think its going to work this time I can try, but what about the others that worked with multipath in place, including /dev/mapper/mpathxxx.

Regards
Laurence

Comment 27 Travis Nielsen 2022-12-07 19:09:10 UTC
The osd prepare job is querying lsblk for any existing filesystems:

2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""

You can run lsblk on the disk in advance and see if it has the FSTYPE populated or not. For now it is returning mpath_member.


I'm not sure why the other two OSDs were able to initialize successfully while the third one has an issue. Multipath configuration is an advanced support area. Perhaps this blog would help as Bipin pointed out in a separate thread.
https://source.redhat.com/communities/communities_of_practice/infrastructure/storage_cop/storage_community_of_practice_blog/can_we_use_san_fc_and_iscsi_storage_appliances_with_odf

Comment 28 Travis Nielsen 2022-12-07 19:10:31 UTC
To be clear on my previous comment, Rook will skip creating an OSD on any device that appears to have a filesystem. So this property must not be set.

Comment 29 loberman 2022-12-07 19:16:08 UTC
Hello

Firstly let me apologize for the back and forth.
OK, I will remove all paths but 1, but are you saying that a second device makes Rook think its a file system.

We did this prior a few times
dd if=/dev/zero of/dev/sdxxx bs=1024K oflag=direct and overwrote all 2.4TB

I will try again after my meeting.

Regards
Laurence

Comment 30 Travis Nielsen 2022-12-07 19:24:40 UTC
Rook is calling "lsblk", then Rook is interpreting the existence of FSTYPE to mean that the disk may be in use and is not available for OSD creation. Perhaps Rook should allow creation even if FSTYPE=mpath_member if we can understand that is really the expected behavior and will not risk creation of an OSD on top of some unintended mpath device, but currently Rook does not allow it.

Comment 31 loberman 2022-12-07 19:56:49 UTC
Thanks I will document what we are going to try

Sent to customer
-----------------
Engineering is not understanding how two of 3 worked because they say multipathed devices
i.e. devices that have the same wwid will have issues.

They are suggesting we remove all but 1 device.

To do this we would run

multipath -ll
get the list of sd devices making up the storage for the OSD

Last capture looked like this

mpatha (3624a93708a2c2aed4e9a423800026b1e) dm-5 PURE,FlashArray
size=2.4T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 6:0:0:2 sdb 8:16  active ready running
  |- 6:0:1:2 sdh 8:112 active ready running
  |- 7:0:0:2 sde 8:64  active ready running
  `- 7:0:1:2 sdg 8:96  active ready running

Run the same as before that you show worked here to get rid of the mapper device

multipathd> del multipath mpatha
ok
multipathd> exit
sh-4.4# ls -lt /dev/mapper/mpatha
ls: cannot access '/dev/mapper/mpatha': No such file or directory
sh-4.4# 

Then for all but one of the devices

Change as appropriate to match the paths
Example from above

for disk in sdh sde sdg
do
 echo 1 > /sys/block/$disk/device/delete
done

Now should only have sdb as a way to reach device mpatha used to be on.

Overwrite the device (change sd name as appropriate)
dd if=/dev/zero of=/dev/sdb bs=1024K ioflag=direct    

Try the provisioning again.

Regards
Laurence

Comment 32 loberman 2022-12-07 21:25:15 UTC
We ran the test

Still failed and getting the logs

lsblk -t did not show /dev/sdb as a mpath member and we deleted the other three sd devices pointing to LUN2.

I believe something in the config has saved this device as an mpath because we definitely turned it into a single LUN2 device after we 
deleted the mpath and all subpaths but 1.

In some config it still thinks its an mpath member.

The logs from our call showe wonly have mpathb whihc is the O/S disk

/dev/sdb is a single lone device now and we overwrote the first 50GB of the drive.
There is not FS signature from what I can see.

But we still see
cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"

sh-4.4# lsblk -t

NAME        ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE   RA WSAME
sda                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdb                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M  ************ Note
sdc                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdd                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdf                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
rbd0                0  65536   65536     512     512    0 none            128  128    0B


Fails thinking its an FS or an mpath

oc logs rook-ceph-osd-prepare-30a979865d0234fdc9c770eb1afbc7cb-wn5mx

2022-12-07 21:08:50.684247 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-07 21:08:50.686380 I | rookcmd: starting Rook v4.11.4-0.96e324244ec878d70194179a2892ec7193f6b591 with arguments '/rook/rook ceph osd provision'
2022-12-07 21:08:50.686394 I | rookcmd: flag values: --cluster-id=4b30fd44-6be2-43f3-8129-f8eb670b82fe, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-0lx4cl","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-0lx4cl, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-12-07 21:08:50.686401 I | op-mon: parsing mon endpoints: b=172.30.217.124:6789,c=172.30.156.113:6789,a=172.30.79.36:6789
2022-12-07 21:08:50.696189 I | op-osd: CRUSH location=root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-07 21:08:50.696200 I | cephcmd: crush location of osd: root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-07 21:08:50.696206 D | exec: Running command: dmsetup version
2022-12-07 21:08:50.698008 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.43.0
2022-12-07 21:08:50.706525 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-12-07 21:08:50.706653 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-12-07 21:08:50.706723 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                         = b5c98aa2-4a54-4714-9979-fdd232f0bd46
mon initial members          = b c a
mon host                     = [v2:172.30.217.124:3300,v1:172.30.217.124:6789],[v2:172.30.156.113:3300,v1:172.30.156.113:6789],[v2:172.30.79.36:3300,v1:172.30.79.36:6789]
rbd_mirror_die_after_seconds = 3600
bdev_flock_retry             = 20
mon_osd_full_ratio           = .85
mon_osd_backfillfull_ratio   = .8
mon_osd_nearfull_ratio       = .75
mon_max_pg_per_osd           = 600
mon_pg_warn_max_object_skew  = 0
mon_data_avail_warn          = 15

[osd]
osd_memory_target_cgroup_limit_ratio = 0.8

[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring

2022-12-07 21:08:50.706728 I | cephosd: discovering hardware
2022-12-07 21:08:50.706733 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-07 21:08:50.708494 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-07 21:08:50.708522 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:50.710157 D | exec: Running command: udevadm info --query=property /dev/sdb
2022-12-07 21:08:50.714171 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2 /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/fc-0x20000025b510a057-0x524a937863ad1590-lun-2\nDEVNAME=/dev/sdb\nDEVPATH=/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0/0000:61:00.0/0000:62:00.1/host6/rport-6:0-1/target6:0:0/6:0:0:2/block/sdb\nDEVTYPE=disk\nDM_DEL_PART_NODES=1\nDM_MULTIPATH_DEVICE_PATH=1\nFC_INITIATOR_WWPN=0x20000025b510a057\nFC_TARGET_LUN=2\nFC_TARGET_WWPN=0x524a937863ad1590\nID_BUS=scsi\nID_FS_TYPE=mpath_member\nID_MODEL=FlashArray\nID_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2\nID_PATH_TAG=pci-0000_62_00_1-fc-0x524a937863ad1590-lun-2\nID_REVISION=8888\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=8A2C2AED4E9A423800026B1E\nID_SERIAL=3624a93708a2c2aed4e9a423800026b1e\nID_SERIAL_SHORT=624a93708a2c2aed4e9a423800026b1e\nID_TARGET_PORT=1\nID_TYPE=disk\nID_VENDOR=PURE\nID_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nID_WWN=0x624a93708a2c2aed\nID_WWN_VENDOR_EXTENSION=0x4e9a423800026b1e\nID_WWN_WITH_EXTENSION=0x624a93708a2c2aed4e9a423800026b1e\nMAJOR=8\nMINOR=16\nMPATH_SBIN_PATH=/sbin\nSCSI_IDENT_LUN_LOGICAL_UNIT_GROUP=0x0\nSCSI_IDENT_LUN_NAA_REGEXT=624a93708a2c2aed4e9a423800026b1e\nSCSI_IDENT_LUN_T10=PURE_FlashArray:8A2C2AED4E9A423800026B1E\nSCSI_IDENT_LUN_VENDOR=IP-OC-04802-C5C_001\nSCSI_IDENT_PORT_NAME=naa.524a937863ad1590,t,0x0001\nSCSI_IDENT_PORT_RELATIVE=131\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x1\nSCSI_IDENT_SERIAL=8A2C2AED4E9A423800026B1E\nSCSI_MODEL=FlashArray\nSCSI_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=8888\nSCSI_TPGS=1\nSCSI_TYPE=disk\nSCSI_VENDOR=PURE\nSCSI_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nSYSTEMD_READY=0\nTAGS=:systemd:\nUSEC_INITIALIZED=14278666"
2022-12-07 21:08:50.714204 I | cephosd: creating and starting the osds
2022-12-07 21:08:50.714223 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-07 21:08:50.714228 D | cephosd: context.Devices are:
2022-12-07 21:08:50.714251 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2 /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/fc-0x20000025b510a057-0x524a937863ad1590-lun-2 Size:2684354560000 UUID:8c2c51aa-f3c4-4888-a06c-2c58828d0c2c Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray WWN:0x624a93708a2c2aed WWNVendorExtension:0x624a93708a2c2aed4e9a423800026b1e Empty:false CephVolumeData: RealPath:/dev/sdb KernelName:sdb Encrypted:false}
2022-12-07 21:08:50.714260 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"
2022-12-07 21:08:50.720987 I | cephosd: configuring osd devices: {"Entries":{}}
2022-12-07 21:08:50.721010 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-07 21:08:50.721023 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:50.776569 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-07 21:08:50.776603 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-07 21:08:50.778599 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-07 21:08:50.778718 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-12-07 21:08:51.038399 D | cephosd: {}
2022-12-07 21:08:51.038426 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-07 21:08:51.038433 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:51.046160 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-07 21:08:51.046177 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-07 21:08:51.258855 D | cephosd: {}
2022-12-07 21:08:51.258882 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-07 21:08:51.258888 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"
sh-4.4# lsblk -t

NAME        ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE   RA WSAME
sda                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdb                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
sdc                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdd                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
sdf                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M
`-mpathb            0    512 4194304     512     512    0 mq-deadline     256 8192   32M
  |-mpathb1         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb2         0    512 4194304     512     512    0                 128 8192   32M
  |-mpathb3         0    512 4194304     512     512    0                 128 8192   32M
  `-mpathb4         0    512 4194304     512     512    0                 128 8192   32M
rbd0                0  65536   65536     512     512    0 none            128  128    0B
Thanks,
Tarek Karam

Comment 33 loberman 2022-12-07 21:33:33 UTC
Just to be sure we ran

sh-4.4# wipefs /dev/sdb
sh-4.4# blkid /dev/sdb

Empty
Device is clean so I think we have a bug folks.

Regards
Laurence

Comment 34 Travis Nielsen 2022-12-07 21:52:21 UTC
The question is why lsblk is still reporting the old fstype and whether the kernel is reporting a stale value, or if there is some other multipath config. Searching online, here are a couple ideas [1]:
1. Use partprobe to reload the partition table
2. Restart the node where the osd prepare job is running

That article doesn't discuss this in the context of multipath, but it's a very similar issue. If we can at least confirm something isn't just reported incorrectly by the kernel, then we can narrow down that multipath config is still guilty. And I'm afraid I'm not very helpful with multipath config.

[1] https://unix.stackexchange.com/questions/516381/why-is-lsblk-showing-the-old-fstype-and-label-of-a-device-that-was-formatted

Comment 35 loberman 2022-12-07 22:05:01 UTC
Hello
 
I am excellent with multipath, I live in all that storage space.

Where are you seeing lsblk report that besides your messages.
I suppose its cached somewhere then because look here.

Do you run lsblk -f to check

sh-4.4# wipefs /dev/sdb
sh-4.4# blkid /dev/sdb

And lsblk here shows nothing

sh-4.4# lsblk -t

NAME        ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE   RA WSAME
sdb                 0    512 4194304     512     512    0 mq-deadline     256 8192   32M


So I will have them run partprobe but should not make a difference.
Of course I hope it does.

Let me add additional notes here
----------------------------------
We cannot fully remove multipath, they boot the coreOS from a multipath device on the SAN.
I tried to blacklist the wwid for mpatha in /etc/multipath.conf but its RHCOS and I could not get that to work.
After reboot it came back as a multipath even with it blacklisted n /etc/multipath.conf
That is because it is in the initramfs.

I asked sbr-shift and nobody had anything for me that worked to allow multipath to be saved in in the initramfs
so I could fully blacklist. Hence why I am deleting the map manually.

We did try 

rpm-ostree initramfs --enable

Thinking it would rebuild the initramfs on reboot but it seemed to not take my change in /etc/multipath.conf

Checking out tree a46d360... done
Generating initramfs... done
Writing OSTree commit... done
Staging deployment... done
Initramfs regeneration is now: enabled

Comment 36 loberman 2022-12-07 22:12:04 UTC
hmmmm

sh-4.4# lsblk -f seems to think its still an mpath mem so that ie exactly what you ar tripping over

NAME        FSTYPE    LABEL      UUID                                 MOUNTPOINT
sdb         mpath_mem

We are trying partprobe

Comment 37 loberman 2022-12-07 22:14:35 UTC
Turns out partprobe is not installed on RHCOS

I will try blockdev
blockdev  --rereadpt /dev/sdb

Comment 38 loberman 2022-12-08 15:32:22 UTC
So blockdev --rereadpt did not help get rid of the mpath_mem seen in lsblk -f

I tried a hack but now got stuck with read-only coresOS again.
We could not do the mv on the CoreOS.

sh-4.4# mv /usr/bin/lsblk /usr/bin/lsblk.orig

mv: cannot move '/usr/bin/lsblk' to '/usr/bin/lsblk.orig': Read-only file system


So now back to trying to blacklist the OSD device in multipath permanently.

My attempt at a hack to change lsblk -f into just lsblk so it allows the provisioning

mv /usr/bin/lsblk /usr/bin/lsblk.orig
vi /usr/bin/lsblk

add this

#!/bin/bash
## Ignore arguments and just tun lsblk
lsblk.orig


chmod +x /usr/bin/lsblk

Basically does not run the -f

Then retry the provisioning

Afterwards 
rm /usr/bin/lsblk
mv /usr/bin/lsblk.orig /usr/bin/lsblk

Comment 39 loberman 2022-12-08 17:30:15 UTC
We are trying some things

We managed to get the blacklist into initramfs but we never removed the multipath default  from the kernel line.
So on reboot mpath device was back.

Asked them to try this
rpm-ostree kargs --delete rd.multipath=default

Then reboot again

Quite honestly, Travis
I think its time to modify the code so we dont check the mpath_mem anymore when we know multipath is in use.

The amount of frustration this has cause everybody, and most importantly the customers view of the product means
we need a code change.

The customer had an automated install on F/C devices so by default multipath gets enabled for coreOS
Had the first two OSD deployments not worked we would have known from the get go that multipath enabled was an issue.

So how about allowing mpath devices for the OSD

Regards
Laurence

Comment 40 Travis Nielsen 2022-12-08 18:04:53 UTC
I've opened an upstream issue to start the change that would allow the OSD creation where that fstype=mpath_member. This seems like a safe enough change for other upstream scenarios as well, but I'd like to get upstream feedback as well to confirm there aren't other risks with this. Feel free to also comment on the issue
https://github.com/rook/rook/issues/11409

While the change will be small, getting it to the customer will depend on the schedule for the next release. 

Is the meeting with the customer still needed today? Seems like we've investigated as much as we can already until that fix is available.

Comment 41 loberman 2022-12-08 18:28:12 UTC
Hello Travis
We will meet with the customer.

I would like to ask that you remain on standby to join in case we get the multipath blacklist to work and then still have issues.
So we will join the call from support and reach out if we need you.

Would that work.

Regards
Laurence

Comment 42 Travis Nielsen 2022-12-08 18:34:27 UTC
Sounds good I'll be on standby. Ping me in gchat if needed.

Comment 43 loberman 2022-12-08 18:38:25 UTC
Its working
Now the OSD is up and provisioned so we are finally over this.

So we are good. Will never know how the other two worked though :)

I will reply to the thread as well
Regards
Laurence

Comment 44 loberman 2022-12-08 19:00:16 UTC
Closing and will write up a KCS
Will close as notabug but expectation is next release will allow mpath_members to be used.
Thanks
Laurence

Comment 45 Travis Nielsen 2022-12-08 19:05:37 UTC
Since we still need to allow mpath_members, how about reopening this, or open a new issue? Then we can work on it for 4.12

Comment 47 kelwhite 2022-12-08 19:18:52 UTC
re-opening this for work to be done in 4.12.

Comment 50 Elad 2022-12-13 14:08:57 UTC
We don't test with multipath devices. The BZ will be verified based on the regression testing

Comment 68 samy 2023-01-05 20:15:08 UTC
Innovapost Customer who opened  case 03369252 which is the origine of this BUG is asking a  couple of questions, in case someone could provide some answers:
1. how the ODF SystemStorage creation will look like after the release of this fix ? do they still need to create LocalVolume and specify the mpath UUID or should they follow the documentation and rely on LocalVolumeDiscovery pods to discover the volumes ?
2. Will any documentation be created  regarding this  fix ?
3. Was this fix tested in a bare metal cluster? if not is there any risque that it won't  work properly in customer cluster that is bare metal cluster ?

Comment 69 Travis Nielsen 2023-01-05 21:09:26 UTC
(In reply to samy from comment #68)
> Innovapost Customer who opened  case 03369252 which is the origine of this
> BUG is asking a  couple of questions, in case someone could provide some
> answers:
> 1. how the ODF SystemStorage creation will look like after the release of
> this fix ? do they still need to create LocalVolume and specify the mpath
> UUID or should they follow the documentation and rely on
> LocalVolumeDiscovery pods to discover the volumes ?

Either way should work, depending if you want to define the PVs statically
or discover them dynamically. But it is critical that there is only one PV
per device, otherwise ODF will attempt and fail to create two OSDs on the same
underlying device.

> 2. Will any documentation be created  regarding this  fix ?

See the doc text in the BZ for now.

> 3. Was this fix tested in a bare metal cluster? if not is there any risque
> that it won't  work properly in customer cluster that is bare metal cluster ?

See comment 50 regarding testing. Since mpath devices have not been tested,
there is a risk that the device is not configured as expected.


Note You need to log in before you can comment on or make changes to this bug.