2217568 – [IBM Z] ODF deployed on IBM Z with DSAD/COREOS

Bug 2217568 - [IBM Z] ODF deployed on IBM Z with DSAD/COREOS [NEEDINFO]

Summary: [IBM Z] ODF deployed on IBM Z with DSAD/COREOS

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	documentation
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Anjana Suparna Sriram
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-26 17:57 UTC by khover
Modified:	2024-11-04 06:37 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	asriram: needinfo? (sapillai) asriram: needinfo? (khover)

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-6308	0	None	None	None	2024-11-04 06:37:41 UTC

Description khover 2023-06-26 17:57:32 UTC

Description of problem (please be detailed as possible and provide log
snippests):

After cluster wide reboot on cert auth. A ODF node reboot removed the DASD partition and they lost all 3 OSDs.

Customer followed this IBM documentation to partition the DASD.

> https://www.ibm.com/docs/en/linux-on-systems?topic=architecture-storage
> See Section “4.1.2 Steps specific for DASD devices”

ODF deployed successfully with LSO and the OSDs mapped to dasde1.

 To use host binaries, run `chroot /host`
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop1      7:1    0 811.6G  0 loop
dasda     94:0    0 103.2G  0 disk
|-dasda1  94:1    0   384M  0 part /host/boot
`-dasda2  94:2    0 102.8G  0 part /host/sysroot
dasde     94:16   0 811.6G  0 disk
`-dasde1  94:17   0 811.6G  0 part

After cluster wide reboot on cert auth.

MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory

Events log:

3m27s       Warning   FailedMapVolume   pod/rook-ceph-osd-0-59c9db848-5rp9f                                   MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory
23m         Warning   FailedMount       pod/rook-ceph-osd-0-59c9db848-5rp9f                                   (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7], unattached volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7 ocs-deviceset-odf-cluster-storage-0-data-1wgxd7-bridge kube-api-access-25kpz rook-data rook-config-override rook-ceph-log rook-ceph-crash run-udev]: timed out waiting for the condition



NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
dasda     94:0    0 103.2G  0 disk
|-dasda1  94:1    0   384M  0 part /boot
`-dasda2  94:2    0 102.8G  0 part /sysroot
dasde     94:16   0 811.6G  0 disk


Version of all relevant components (if applicable):

OCP/ODF 4.12


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Node reboot destroys the OSD path to dasde1

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

5

Can this issue reproducible?

Yes, on reboot of the node.


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:

DSAD partition persists on reboot.


Additional info:

Comment 3 Santosh Pillai 2023-06-28 02:32:30 UTC

Hi. I got to know that it could just be an issue on how the partition was configured and not a real issue at all. So any new updates on this BZ?

Comment 4 khover 2023-06-28 11:52:53 UTC

Hi Santosh,

I believe your correct.

It seemed to be a issue with partition config.

The customer is going to do some testing to ensure partition persistence on reboot.

I asked the customer to upload testing details to the case.

I will share the testing details when provided.

The next action items will most likely target documentation, supportability and QE testing.

I will open a new Doc BZ for that.

Comment 5 Santosh Pillai 2023-07-03 04:52:11 UTC

Thank for the update Kevan. I'll wait for more details.

Comment 6 khover 2023-07-03 11:48:24 UTC

Hi Santosh,

This was definately a miss config by the customer.

I suspect they ran # chzdev -e dasde instead of the ID.

======================================

lszdev shows that the persistent flag is not set

[root@c02ns001 ~]# lszdev
TYPE         ID                          ON   PERS  NAMES
dasd-eckd    0.0.0100                    yes  no    dasda
dasd-eckd    0.0.0190                    no   no
dasd-eckd    0.0.0191                    no   no
dasd-eckd    0.0.01fd                    yes  no
dasd-eckd    0.0.01fe                    yes  no
dasd-eckd    0.0.01ff                    yes  no
dasd-eckd    0.0.0592                    no   no
dasd-eckd    0.0.0a00                    yes  no    dasde
dasd-eckd    0.0.0afc                    yes  no
dasd-eckd    0.0.0afd                    yes  no
dasd-eckd    0.0.0afe                    yes  no
dasd-eckd    0.0.0aff                    yes  no
qeth         0.0.2d00:0.0.2d01:0.0.2d02  yes  no    enc2d00
generic-ccw  0.0.0009                    yes  no
generic-ccw  0.0.000c                    no   no
generic-ccw  0.0.000d                    no   no
generic-ccw  0.0.000e                    no   no

then the chzdev command is issued

[root@c02ns001 ~]# chzdev -e 0.0.0a00
ECKD DASD 0.0.0a00 configured


Now the persistent flag is set on

[root@c02ns001 ~]# lszdev
TYPE         ID                          ON   PERS  NAMES
dasd-eckd    0.0.0100                    yes  no    dasda
dasd-eckd    0.0.0190                    no   no
dasd-eckd    0.0.0191                    no   no
dasd-eckd    0.0.01fd                    yes  no
dasd-eckd    0.0.01fe                    yes  no
dasd-eckd    0.0.01ff                    yes  no
dasd-eckd    0.0.0592                    no   no
dasd-eckd    0.0.0a00                    yes  yes   dasde
dasd-eckd    0.0.0afc                    yes  no
dasd-eckd    0.0.0afd                    yes  no
dasd-eckd    0.0.0afe                    yes  no
dasd-eckd    0.0.0aff                    yes  no
qeth         0.0.2d00:0.0.2d01:0.0.2d02  yes  no    enc2d00
generic-ccw  0.0.0009                    yes  no
generic-ccw  0.0.000c                    no   no
generic-ccw  0.0.000d                    no   no
generic-ccw  0.0.000e                    no   no


format issued with a ldl instead of a cdl which gives it a partition as part of the formatting. 

[root@c02ns001 ~]# dasdfmt /dev/dasde -b 4096 -p -y -F -d ldl
Releasing space for the entire device...
Skipping format check due to --force.
Finished formatting the device.
Rereading the partition table... ok

lsblk shows the partition dasde1

[root@c02ns001 ~]# lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
dasda     94:0    0 103.2G  0 disk
|-dasda1  94:1    0   384M  0 part /boot
`-dasda2  94:2    0 102.8G  0 part /sysroot
dasde     94:16   0 811.6G  0 disk
`-dasde1  94:17   0 811.6G  0 part


after a reboot 

[core@c02ns001 ~]$ lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0      7:0    0 811.6G  0 loop
dasda     94:0    0 103.2G  0 disk
|-dasda1  94:1    0   384M  0 part /boot
`-dasda2  94:2    0 102.8G  0 part /sysroot
dasde     94:16   0 811.6G  0 disk
`-dasde1  94:17   0 811.6G  0 part


I have put together the following KCS for the RH ODF team for awareness on DASD config.

https://access.redhat.com/solutions/7022104

Comment 8 khover 2023-07-20 15:36:22 UTC

We may need to put this on hold for a moment.

There are still pending issues in the customer environment so Im not sure we have 100% complete doc update items needed at the moment.

Note You need to log in before you can comment on or make changes to this bug.