Bug 2148567
| Summary: | [GSS] OSD prepare job is skipping OSD configuration (multipath devices) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | kelwhite |
| Component: | rook | Assignee: | Travis Nielsen <tnielsen> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Prasad Desala <tdesala> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.11 | CC: | bhull, ebenahar, jverreng, loberman, muagarwa, nberry, ocs-bugs, odf-bz-bot, owasserm, rar, szemmour, tnielsen |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | ODF 4.12.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | 4.12.0-145 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Configuring an ODF cluster with multipath devices may not create OSDs as expected.
Consequence: Devices with the mpath_member label will be skipped for OSD creation.
Fix: Allow OSDs to be created even when the mpath_member FSType is set on the device since the device is specifically provisioned with a PVC.
Result: OSDs are expected to be created on clean mpath devices.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-02-08 14:06:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 16
loberman
2022-12-05 22:21:27 UTC
From Laurence
Hello Red Hat Team
When we configured this we did change it to use /dev/disk/by-id because we had a multipath ordering issue for consistency across all nodes.
The other complication here is this customer is boot from SAN so one of the mpaths is used for the O/S so they have multiple mpaths per node.
Having said this two out of three worked (actually 3 out of 4) so for me the issue is not the naming anymore its something else.
The data volumes are all 2.4T
mpatha (3624a93708a2c2aed4e9a423800026b1e) dm-0 PURE,FlashArray Disk to be used for the OSD
size=2.4T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 1:0:0:2 sdb 8:16 active ready running
|- 1:0:1:2 sdg 8:96 active ready running
|- 8:0:0:2 sdi 8:128 active ready running
`- 8:0:1:2 sdk 8:160 active ready running
mpathb (3624a93708a2c2aed4e9a423800026938) dm-1 PURE,FlashArray O/S disk
size=250G features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 1:0:0:1 sda 8:0 active ready running
|- 1:0:1:1 sdf 8:80 active ready running
|- 8:0:0:1 sdh 8:112 active ready running
`- 8:0:1:1 sdj 8:144 active ready running
So its complaining about mpatha and that is the correct device
2022-11-29T17:36:07.332318070Z 2022-11-29 17:36:07.332314 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-localblock-0-data-0lx4cl raw prepare --bluestore --data /dev/mapper/mpatha
2022-11-29T17:36:08.104559230Z 2022-11-29 17:36:08.104514 I | cephosd: --> Raw device /dev/mapper/mpatha is already prepared.
*************************************************
Can we not run the ceph-volume manually to workaround this and trace it to see why it thinks its already prepared.
Either that to temporarily blacklist that mpath device from multipath, restart multipath and try again.
Regards
Laurence
I gave the customer this
Hello Tarek
Can we try something as this will give me more data to give to engineering
For storage1 where the issue is happening
First get a multipath -ll
Save the mapping so we know the current path names for mpatha
Then
edit /etc/multipath.conf
add this in the blacklist part
blacklist {
wwid 3624a93708a2c2aed4e9a423800026b1e
}
Then run systemctl reload multipathd
multipath -ll should no longer map the device
You should only see mpathb
then overwrite the whole disk
dd if=/dev/zero of=/dev/sdxxx bs=1024K oflag=direct change sdxxx to one of the path names saved from above
Then retry the provisioning of the OSD using /dev/disk/by-id again
We can try do it together if you want
Regards
Laurence Oberman
Hello
With this being coresOS and booting from SAN (mpath) I had to jump through hoops top fully blacklist the device.
I managed to manually remove the mpath and we tried again and it failed differently but still failed.
I am storage/kernel internals maintenance engineer but neither a shift nor ODF savvy engineer.
I think given this is escalated and customer cannot make progress then a Engineering developer resource should get on and live troubleshoot this.
Device is correct
2022-12-06 21:49:22.947092 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2 Size:2684354560000 UUID:ebae9336-0fe9-4090-a176-8567ec7064ac Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray
2022-12-06 21:49:22.956262 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-06 21:49:22.956268 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.002629 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:23.005214 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json
2022-12-06 21:49:23.258582 D | cephosd: {}
2022-12-06 21:49:23.258611 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-06 21:49:23.258618 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.265085 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-06 21:49:23.265099 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-06 21:49:23.477027 D | cephosd: {}
2022-12-06 21:49:23.477052 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-06 21:49:23.477057 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"
Full log
-----------
$ oc logs rook-ceph-osd-prepare-30a979865d0234fdc9c770eb1afbc7cb-l4l5h
2022-12-06 21:49:22.914717 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-06 21:49:22.916770 I | rookcmd: starting Rook v4.11.3-0.224a35508091e5dcf8f09dd910118b75ef52f84e with arguments '/rook/rook ceph osd provision'
2022-12-06 21:49:22.916783 I | rookcmd: flag values: --cluster-id=4b30fd44-6be2-43f3-8129-f8eb670b82fe, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-0lx4cl","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-0lx4cl, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-12-06 21:49:22.916792 I | op-mon: parsing mon endpoints: b=172.30.217.124:6789,c=172.30.156.113:6789,a=172.30.79.36:6789
2022-12-06 21:49:22.925634 I | op-osd: CRUSH location=root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-06 21:49:22.925656 I | cephcmd: crush location of osd: root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-06 21:49:22.925663 D | exec: Running command: dmsetup version
2022-12-06 21:49:22.927533 I | cephosd: Library version: 1.02.181-RHEL8 (2021-10-20)
Driver version: 4.43.0
2022-12-06 21:49:22.935929 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-12-06 21:49:22.936078 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-12-06 21:49:22.936137 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid = b5c98aa2-4a54-4714-9979-fdd232f0bd46
mon initial members = c a b
mon host = [v2:172.30.156.113:3300,v1:172.30.156.113:6789],[v2:172.30.79.36:3300,v1:172.30.79.36:6789],[v2:172.30.217.124:3300,v1:172.30.217.124:6789]
rbd_mirror_die_after_seconds = 3600
bdev_flock_retry = 20
mon_osd_full_ratio = .85
mon_osd_backfillfull_ratio = .8
mon_osd_nearfull_ratio = .75
mon_max_pg_per_osd = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn = 15
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
2022-12-06 21:49:22.936146 I | cephosd: discovering hardware
2022-12-06 21:49:22.936152 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:22.937939 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:22.938401 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:22.942605 D | exec: Running command: udevadm info --query=property /dev/sde
2022-12-06 21:49:22.947032 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2\nDEVNAME=/dev/sde\nDEVPATH=/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0/0000:61:00.0/0000:62:00.2/host7/rport-7:0-1/target7:0:0/7:0:0:2/block/sde\nDEVTYPE=disk\nDM_DEL_PART_NODES=1\nDM_MULTIPATH_DEVICE_PATH=1\nFC_INITIATOR_WWPN=0x20000025b510b154\nFC_TARGET_LUN=2\nFC_TARGET_WWPN=0x524a937863ad1581\nID_BUS=scsi\nID_FS_TYPE=mpath_member\nID_MODEL=FlashArray\nID_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2\nID_PATH_TAG=pci-0000_62_00_2-fc-0x524a937863ad1581-lun-2\nID_REVISION=8888\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=8A2C2AED4E9A423800026B1E\nID_SERIAL=3624a93708a2c2aed4e9a423800026b1e\nID_SERIAL_SHORT=624a93708a2c2aed4e9a423800026b1e\nID_TARGET_PORT=0\nID_TYPE=disk\nID_VENDOR=PURE\nID_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nID_WWN=0x624a93708a2c2aed\nID_WWN_VENDOR_EXTENSION=0x4e9a423800026b1e\nID_WWN_WITH_EXTENSION=0x624a93708a2c2aed4e9a423800026b1e\nMAJOR=8\nMINOR=64\nMPATH_SBIN_PATH=/sbin\nSCSI_IDENT_LUN_LOGICAL_UNIT_GROUP=0x0\nSCSI_IDENT_LUN_NAA_REGEXT=624a93708a2c2aed4e9a423800026b1e\nSCSI_IDENT_LUN_T10=PURE_FlashArray:8A2C2AED4E9A423800026B1E\nSCSI_IDENT_LUN_VENDOR=IP-OC-04802-C5C_001\nSCSI_IDENT_PORT_NAME=naa.524a937863ad1581,t,0x0001\nSCSI_IDENT_PORT_RELATIVE=83\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x0\nSCSI_IDENT_SERIAL=8A2C2AED4E9A423800026B1E\nSCSI_MODEL=FlashArray\nSCSI_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=8888\nSCSI_TPGS=1\nSCSI_TYPE=disk\nSCSI_VENDOR=PURE\nSCSI_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nSYSTEMD_READY=0\nTAGS=:systemd:\nUSEC_INITIALIZED=14283051"
2022-12-06 21:49:22.947062 I | cephosd: creating and starting the osds
2022-12-06 21:49:22.947072 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-06 21:49:22.947074 D | cephosd: context.Devices are:
2022-12-06 21:49:22.947092 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.2-fc-0x524a937863ad1581-lun-2 /dev/disk/by-path/fc-0x20000025b510b154-0x524a937863ad1581-lun-2 Size:2684354560000 UUID:ebae9336-0fe9-4090-a176-8567ec7064ac Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray WWN:0x624a93708a2c2aed WWNVendorExtension:0x624a93708a2c2aed4e9a423800026b1e Empty:false CephVolumeData: RealPath:/dev/sde KernelName:sde Encrypted:false}
2022-12-06 21:49:22.947095 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"
2022-12-06 21:49:22.956253 I | cephosd: configuring osd devices: {"Entries":{}}
2022-12-06 21:49:22.956262 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-06 21:49:22.956268 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.002629 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-06 21:49:23.005214 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json
2022-12-06 21:49:23.258582 D | cephosd: {}
2022-12-06 21:49:23.258611 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-06 21:49:23.258618 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-06 21:49:23.265085 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-06 21:49:23.265099 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-06 21:49:23.477027 D | cephosd: {}
2022-12-06 21:49:23.477052 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-06 21:49:23.477057 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"
Hello We are meeting with the customer at 1PM we really need an engineering person to live trouble shoot this issue please. We are way past a situation of keeping this customer happy. Its setting a bad precedent for how we support and service OCS/ODF and ceph Regards Laurence Most recent comment: On 2022-12-07 08:18:05, Karam, Tarek commented: "Meeting link for 1:00pm EST: https://teams.microsoft.com/l/meetup-join/19%3ameeting_MWU0MTM2MTQtYjc3MC00ZjBhLWE4ZTItOTRhOWZiZDlhYjE4%40thread.v2/0?context=%7b%22Tid%22%3a%2275056d76-b628-4488-82b0-80b08b52d854%22%2c%22Oid%22%3a%22232f3a44-7aa8-4a48-97a8-1736bae8c1d1%22%7d" Meeting has bee re-scheduled for tomorrow 3PM Eastern time remote link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_MWU0MTM2MTQtYjc3MC00ZjBhLWE4ZTItOTRhOWZiZDlhYjE4%40thread.v2/0?context=%7b%22Tid%22%3a%2275056d76-b628-4488-82b0-80b08b52d854%22%2c%22Oid%22%3a%22232f3a44-7aa8-4a48-97a8-1736bae8c1d1%22%7d The latest osd prepare log shows that there is a remnant of the multipath. cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member" Can this be cleaned and try again? Hello Travis I removed the multipath using multipathd -k to delete the map. Of course there are multiple devices that have the same wwid. I could try delete all paths but 1 so we only have a single device. What I would like to know (if possible) is: We successfully configured 2 of them already with multipath in place Why would only this 3rd one have an issue finding the multiple devices pointing to the same LUN. I am concerned to go yet back again to this customer who is fast becoming frustrated unless you are fairly confident this is is the issue. If you really think its going to work this time I can try, but what about the others that worked with multipath in place, including /dev/mapper/mpathxxx. Regards Laurence The osd prepare job is querying lsblk for any existing filesystems: 2022-12-06 21:49:23.002652 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE 2022-12-06 21:49:23.005127 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sde\" KNAME=\"/dev/sde\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\"" You can run lsblk on the disk in advance and see if it has the FSTYPE populated or not. For now it is returning mpath_member. I'm not sure why the other two OSDs were able to initialize successfully while the third one has an issue. Multipath configuration is an advanced support area. Perhaps this blog would help as Bipin pointed out in a separate thread. https://source.redhat.com/communities/communities_of_practice/infrastructure/storage_cop/storage_community_of_practice_blog/can_we_use_san_fc_and_iscsi_storage_appliances_with_odf To be clear on my previous comment, Rook will skip creating an OSD on any device that appears to have a filesystem. So this property must not be set. Hello Firstly let me apologize for the back and forth. OK, I will remove all paths but 1, but are you saying that a second device makes Rook think its a file system. We did this prior a few times dd if=/dev/zero of/dev/sdxxx bs=1024K oflag=direct and overwrote all 2.4TB I will try again after my meeting. Regards Laurence Rook is calling "lsblk", then Rook is interpreting the existence of FSTYPE to mean that the disk may be in use and is not available for OSD creation. Perhaps Rook should allow creation even if FSTYPE=mpath_member if we can understand that is really the expected behavior and will not risk creation of an OSD on top of some unintended mpath device, but currently Rook does not allow it. Thanks I will document what we are going to try Sent to customer ----------------- Engineering is not understanding how two of 3 worked because they say multipathed devices i.e. devices that have the same wwid will have issues. They are suggesting we remove all but 1 device. To do this we would run multipath -ll get the list of sd devices making up the storage for the OSD Last capture looked like this mpatha (3624a93708a2c2aed4e9a423800026b1e) dm-5 PURE,FlashArray size=2.4T features='0' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 6:0:0:2 sdb 8:16 active ready running |- 6:0:1:2 sdh 8:112 active ready running |- 7:0:0:2 sde 8:64 active ready running `- 7:0:1:2 sdg 8:96 active ready running Run the same as before that you show worked here to get rid of the mapper device multipathd> del multipath mpatha ok multipathd> exit sh-4.4# ls -lt /dev/mapper/mpatha ls: cannot access '/dev/mapper/mpatha': No such file or directory sh-4.4# Then for all but one of the devices Change as appropriate to match the paths Example from above for disk in sdh sde sdg do echo 1 > /sys/block/$disk/device/delete done Now should only have sdb as a way to reach device mpatha used to be on. Overwrite the device (change sd name as appropriate) dd if=/dev/zero of=/dev/sdb bs=1024K ioflag=direct Try the provisioning again. Regards Laurence We ran the test
Still failed and getting the logs
lsblk -t did not show /dev/sdb as a mpath member and we deleted the other three sd devices pointing to LUN2.
I believe something in the config has saved this device as an mpath because we definitely turned it into a single LUN2 device after we
deleted the mpath and all subpaths but 1.
In some config it still thinks its an mpath member.
The logs from our call showe wonly have mpathb whihc is the O/S disk
/dev/sdb is a single lone device now and we overwrote the first 50GB of the drive.
There is not FS signature from what I can see.
But we still see
cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"
sh-4.4# lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M ************ Note
sdc 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdd 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdf 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
rbd0 0 65536 65536 512 512 0 none 128 128 0B
Fails thinking its an FS or an mpath
oc logs rook-ceph-osd-prepare-30a979865d0234fdc9c770eb1afbc7cb-wn5mx
2022-12-07 21:08:50.684247 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-07 21:08:50.686380 I | rookcmd: starting Rook v4.11.4-0.96e324244ec878d70194179a2892ec7193f6b591 with arguments '/rook/rook ceph osd provision'
2022-12-07 21:08:50.686394 I | rookcmd: flag values: --cluster-id=4b30fd44-6be2-43f3-8129-f8eb670b82fe, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-0lx4cl","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-0lx4cl, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-12-07 21:08:50.686401 I | op-mon: parsing mon endpoints: b=172.30.217.124:6789,c=172.30.156.113:6789,a=172.30.79.36:6789
2022-12-07 21:08:50.696189 I | op-osd: CRUSH location=root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-07 21:08:50.696200 I | cephcmd: crush location of osd: root=default host=storage1-npd-ocp-dc-cpggpc-ca
2022-12-07 21:08:50.696206 D | exec: Running command: dmsetup version
2022-12-07 21:08:50.698008 I | cephosd: Library version: 1.02.181-RHEL8 (2021-10-20)
Driver version: 4.43.0
2022-12-07 21:08:50.706525 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-12-07 21:08:50.706653 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-12-07 21:08:50.706723 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid = b5c98aa2-4a54-4714-9979-fdd232f0bd46
mon initial members = b c a
mon host = [v2:172.30.217.124:3300,v1:172.30.217.124:6789],[v2:172.30.156.113:3300,v1:172.30.156.113:6789],[v2:172.30.79.36:3300,v1:172.30.79.36:6789]
rbd_mirror_die_after_seconds = 3600
bdev_flock_retry = 20
mon_osd_full_ratio = .85
mon_osd_backfillfull_ratio = .8
mon_osd_nearfull_ratio = .75
mon_max_pg_per_osd = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn = 15
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
2022-12-07 21:08:50.706728 I | cephosd: discovering hardware
2022-12-07 21:08:50.706733 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-07 21:08:50.708494 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-07 21:08:50.708522 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:50.710157 D | exec: Running command: udevadm info --query=property /dev/sdb
2022-12-07 21:08:50.714171 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2 /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/fc-0x20000025b510a057-0x524a937863ad1590-lun-2\nDEVNAME=/dev/sdb\nDEVPATH=/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/0000:5f:00.0/0000:60:00.0/0000:61:00.0/0000:62:00.1/host6/rport-6:0-1/target6:0:0/6:0:0:2/block/sdb\nDEVTYPE=disk\nDM_DEL_PART_NODES=1\nDM_MULTIPATH_DEVICE_PATH=1\nFC_INITIATOR_WWPN=0x20000025b510a057\nFC_TARGET_LUN=2\nFC_TARGET_WWPN=0x524a937863ad1590\nID_BUS=scsi\nID_FS_TYPE=mpath_member\nID_MODEL=FlashArray\nID_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2\nID_PATH_TAG=pci-0000_62_00_1-fc-0x524a937863ad1590-lun-2\nID_REVISION=8888\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=8A2C2AED4E9A423800026B1E\nID_SERIAL=3624a93708a2c2aed4e9a423800026b1e\nID_SERIAL_SHORT=624a93708a2c2aed4e9a423800026b1e\nID_TARGET_PORT=1\nID_TYPE=disk\nID_VENDOR=PURE\nID_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nID_WWN=0x624a93708a2c2aed\nID_WWN_VENDOR_EXTENSION=0x4e9a423800026b1e\nID_WWN_WITH_EXTENSION=0x624a93708a2c2aed4e9a423800026b1e\nMAJOR=8\nMINOR=16\nMPATH_SBIN_PATH=/sbin\nSCSI_IDENT_LUN_LOGICAL_UNIT_GROUP=0x0\nSCSI_IDENT_LUN_NAA_REGEXT=624a93708a2c2aed4e9a423800026b1e\nSCSI_IDENT_LUN_T10=PURE_FlashArray:8A2C2AED4E9A423800026B1E\nSCSI_IDENT_LUN_VENDOR=IP-OC-04802-C5C_001\nSCSI_IDENT_PORT_NAME=naa.524a937863ad1590,t,0x0001\nSCSI_IDENT_PORT_RELATIVE=131\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x1\nSCSI_IDENT_SERIAL=8A2C2AED4E9A423800026B1E\nSCSI_MODEL=FlashArray\nSCSI_MODEL_ENC=FlashArray\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=8888\nSCSI_TPGS=1\nSCSI_TYPE=disk\nSCSI_VENDOR=PURE\nSCSI_VENDOR_ENC=PURE\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nSYSTEMD_READY=0\nTAGS=:systemd:\nUSEC_INITIALIZED=14278666"
2022-12-07 21:08:50.714204 I | cephosd: creating and starting the osds
2022-12-07 21:08:50.714223 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-12-07 21:08:50.714228 D | cephosd: context.Devices are:
2022-12-07 21:08:50.714251 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-0lx4cl Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-3624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/pci-0000:62:00.1-fc-0x524a937863ad1590-lun-2 /dev/disk/by-id/wwn-0x624a93708a2c2aed4e9a423800026b1e /dev/disk/by-path/fc-0x20000025b510a057-0x524a937863ad1590-lun-2 Size:2684354560000 UUID:8c2c51aa-f3c4-4888-a06c-2c58828d0c2c Serial:3624a93708a2c2aed4e9a423800026b1e Type:data Rotational:false Readonly:false Partitions:[] Filesystem:mpath_member Mountpoint: Vendor:PURE Model:FlashArray WWN:0x624a93708a2c2aed WWNVendorExtension:0x624a93708a2c2aed4e9a423800026b1e Empty:false CephVolumeData: RealPath:/dev/sdb KernelName:sdb Encrypted:false}
2022-12-07 21:08:50.714260 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" because it contains a filesystem "mpath_member"
2022-12-07 21:08:50.720987 I | cephosd: configuring osd devices: {"Entries":{}}
2022-12-07 21:08:50.721010 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-12-07 21:08:50.721023 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:50.776569 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". exit status 5
2022-12-07 21:08:50.776603 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-0lx4cl --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-07 21:08:50.778599 D | sys: lsblk output: "SIZE=\"2684354560000\" ROTA=\"0\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"mpath_member\""
2022-12-07 21:08:50.778718 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json
2022-12-07 21:08:51.038399 D | cephosd: {}
2022-12-07 21:08:51.038426 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-12-07 21:08:51.038433 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-0lx4cl
2022-12-07 21:08:51.046160 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-0lx4cl" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-0lx4cl". Device /mnt/ocs-deviceset-localblock-0-data-0lx4cl is not a valid LUKS device.: exit status 1
2022-12-07 21:08:51.046177 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-0lx4cl --format json
2022-12-07 21:08:51.258855 D | cephosd: {}
2022-12-07 21:08:51.258882 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-07 21:08:51.258888 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-0lx4cl"
sh-4.4# lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
sdc 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdd 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
sdf 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
`-mpathb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M
|-mpathb1 0 512 4194304 512 512 0 128 8192 32M
|-mpathb2 0 512 4194304 512 512 0 128 8192 32M
|-mpathb3 0 512 4194304 512 512 0 128 8192 32M
`-mpathb4 0 512 4194304 512 512 0 128 8192 32M
rbd0 0 65536 65536 512 512 0 none 128 128 0B
Thanks,
Tarek Karam
Just to be sure we ran sh-4.4# wipefs /dev/sdb sh-4.4# blkid /dev/sdb Empty Device is clean so I think we have a bug folks. Regards Laurence The question is why lsblk is still reporting the old fstype and whether the kernel is reporting a stale value, or if there is some other multipath config. Searching online, here are a couple ideas [1]: 1. Use partprobe to reload the partition table 2. Restart the node where the osd prepare job is running That article doesn't discuss this in the context of multipath, but it's a very similar issue. If we can at least confirm something isn't just reported incorrectly by the kernel, then we can narrow down that multipath config is still guilty. And I'm afraid I'm not very helpful with multipath config. [1] https://unix.stackexchange.com/questions/516381/why-is-lsblk-showing-the-old-fstype-and-label-of-a-device-that-was-formatted Hello I am excellent with multipath, I live in all that storage space. Where are you seeing lsblk report that besides your messages. I suppose its cached somewhere then because look here. Do you run lsblk -f to check sh-4.4# wipefs /dev/sdb sh-4.4# blkid /dev/sdb And lsblk here shows nothing sh-4.4# lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME sdb 0 512 4194304 512 512 0 mq-deadline 256 8192 32M So I will have them run partprobe but should not make a difference. Of course I hope it does. Let me add additional notes here ---------------------------------- We cannot fully remove multipath, they boot the coreOS from a multipath device on the SAN. I tried to blacklist the wwid for mpatha in /etc/multipath.conf but its RHCOS and I could not get that to work. After reboot it came back as a multipath even with it blacklisted n /etc/multipath.conf That is because it is in the initramfs. I asked sbr-shift and nobody had anything for me that worked to allow multipath to be saved in in the initramfs so I could fully blacklist. Hence why I am deleting the map manually. We did try rpm-ostree initramfs --enable Thinking it would rebuild the initramfs on reboot but it seemed to not take my change in /etc/multipath.conf Checking out tree a46d360... done Generating initramfs... done Writing OSTree commit... done Staging deployment... done Initramfs regeneration is now: enabled hmmmm sh-4.4# lsblk -f seems to think its still an mpath mem so that ie exactly what you ar tripping over NAME FSTYPE LABEL UUID MOUNTPOINT sdb mpath_mem We are trying partprobe Turns out partprobe is not installed on RHCOS I will try blockdev blockdev --rereadpt /dev/sdb So blockdev --rereadpt did not help get rid of the mpath_mem seen in lsblk -f I tried a hack but now got stuck with read-only coresOS again. We could not do the mv on the CoreOS. sh-4.4# mv /usr/bin/lsblk /usr/bin/lsblk.orig mv: cannot move '/usr/bin/lsblk' to '/usr/bin/lsblk.orig': Read-only file system So now back to trying to blacklist the OSD device in multipath permanently. My attempt at a hack to change lsblk -f into just lsblk so it allows the provisioning mv /usr/bin/lsblk /usr/bin/lsblk.orig vi /usr/bin/lsblk add this #!/bin/bash ## Ignore arguments and just tun lsblk lsblk.orig chmod +x /usr/bin/lsblk Basically does not run the -f Then retry the provisioning Afterwards rm /usr/bin/lsblk mv /usr/bin/lsblk.orig /usr/bin/lsblk We are trying some things We managed to get the blacklist into initramfs but we never removed the multipath default from the kernel line. So on reboot mpath device was back. Asked them to try this rpm-ostree kargs --delete rd.multipath=default Then reboot again Quite honestly, Travis I think its time to modify the code so we dont check the mpath_mem anymore when we know multipath is in use. The amount of frustration this has cause everybody, and most importantly the customers view of the product means we need a code change. The customer had an automated install on F/C devices so by default multipath gets enabled for coreOS Had the first two OSD deployments not worked we would have known from the get go that multipath enabled was an issue. So how about allowing mpath devices for the OSD Regards Laurence I've opened an upstream issue to start the change that would allow the OSD creation where that fstype=mpath_member. This seems like a safe enough change for other upstream scenarios as well, but I'd like to get upstream feedback as well to confirm there aren't other risks with this. Feel free to also comment on the issue https://github.com/rook/rook/issues/11409 While the change will be small, getting it to the customer will depend on the schedule for the next release. Is the meeting with the customer still needed today? Seems like we've investigated as much as we can already until that fix is available. Hello Travis We will meet with the customer. I would like to ask that you remain on standby to join in case we get the multipath blacklist to work and then still have issues. So we will join the call from support and reach out if we need you. Would that work. Regards Laurence Sounds good I'll be on standby. Ping me in gchat if needed. Its working Now the OSD is up and provisioned so we are finally over this. So we are good. Will never know how the other two worked though :) I will reply to the thread as well Regards Laurence Closing and will write up a KCS Will close as notabug but expectation is next release will allow mpath_members to be used. Thanks Laurence Since we still need to allow mpath_members, how about reopening this, or open a new issue? Then we can work on it for 4.12 re-opening this for work to be done in 4.12. We don't test with multipath devices. The BZ will be verified based on the regression testing Innovapost Customer who opened case 03369252 which is the origine of this BUG is asking a couple of questions, in case someone could provide some answers: 1. how the ODF SystemStorage creation will look like after the release of this fix ? do they still need to create LocalVolume and specify the mpath UUID or should they follow the documentation and rely on LocalVolumeDiscovery pods to discover the volumes ? 2. Will any documentation be created regarding this fix ? 3. Was this fix tested in a bare metal cluster? if not is there any risque that it won't work properly in customer cluster that is bare metal cluster ? (In reply to samy from comment #68) > Innovapost Customer who opened case 03369252 which is the origine of this > BUG is asking a couple of questions, in case someone could provide some > answers: > 1. how the ODF SystemStorage creation will look like after the release of > this fix ? do they still need to create LocalVolume and specify the mpath > UUID or should they follow the documentation and rely on > LocalVolumeDiscovery pods to discover the volumes ? Either way should work, depending if you want to define the PVs statically or discover them dynamically. But it is critical that there is only one PV per device, otherwise ODF will attempt and fail to create two OSDs on the same underlying device. > 2. Will any documentation be created regarding this fix ? See the doc text in the BZ for now. > 3. Was this fix tested in a bare metal cluster? if not is there any risque > that it won't work properly in customer cluster that is bare metal cluster ? See comment 50 regarding testing. Since mpath devices have not been tested, there is a risk that the device is not configured as expected. |