Bug 2222728
| Summary: | [IBM Z] ODF deployed on IBM Z with DASD ( BlueStore FAILED ceph_assert ) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | khover |
| Component: | ceph | Assignee: | Adam Kupczyk <akupczyk> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | akandath, akupczyk, bniver, gjose, glaw, jquinn, mgokhool, mhackett, muagarwa, nojha, odf-bz-bot, rzarzyns, saliu, sarora, sostapov, tstober |
| Version: | 4.12 | Flags: | khover:
needinfo?
(tstober) |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
khover
2023-07-13 14:54:30 UTC
Regarding: Errors like this one usually are symptoms of an underlying hardware problem. Has it been excluded? We will need a IBM Z expert for that. Abdul tried to reproduce the issue on my DASD based cluster. After applying certificate change, the nodes got rebooted and ODF came back healthy completely without issues. Version tried is: OCP 4.12.23 ODF 4.12.4 Question: Was the customer able to create a new OSD and re-attach to cluster? And did that replaced cluster work? And one more question: Can someone please verify if the logs provided by the customer had been created at the correct date (July 14th) (NOT June26th) ? The customer has not replaced the OSD at this time in case further info was needed for this BZ. All logs surrounding this BZ are from July 13/14. 2023-07-12T09:21:44.004+0000 3ff8ab68800 -1 rocksdb: Corruption: SST file is ahead of WALs ``` Errors like this one usually are symptoms of an underlying hardware problem. Has it been excluded? (In reply to khover from comment #7) > The customer has not replaced the OSD at this time in case further info was > needed for this BZ. > > > All logs surrounding this BZ are from July 13/14. > > > 2023-07-12T09:21:44.004+0000 3ff8ab68800 -1 rocksdb: Corruption: SST file is > ahead of WALs > ``` > > Errors like this one usually are symptoms of an underlying hardware > problem. Has it been excluded? Hi Kevan, The customer ran a script dbginfo.sh on Friday during the call and collected a DBGINFO-...tarball. Did you get that data? Hi, Yes, the tarball is uploaded to supportshell. supportshell/cases/03539410 DBGINFO-2023-06-26-07-08-03-c02ns001-25B658.tgz (In reply to khover from comment #11) > Hi, > > Yes, the tarball is uploaded to supportshell. > > supportshell/cases/03539410 > > DBGINFO-2023-06-26-07-08-03-c02ns001-25B658.tgz But this is not the one taken on last Friday, it's an old one. The one taken during the call on Friday should have a name like DBGINFO-2023-07-14-... Ill reach out to customer for that. Hi The customer states that DBGINFO-2023-07-14-13-30-46 was collected for case 03562792/BZ 2223380 cluster. entries (bad CRC) 2023-07-17T08:06:34.180507921Z debug -2> 2023-07-17T08:06:34.146+0000 3ff8926a500 -1 osd.0 0 failed to load OSD map for epoch 14, got 0 bytes ======================================================================= Do we need a fresh debug collected for this cluster ? I can get that if needed. We may have Identified the issue in gchat collab. I requested If there was a diff in the IBM test env and the customer env. ID 0.0.0a00 is being partitioned, on each node for OSDs correct way to configure would look like the following ? # lszdev TYPE ID ON PERS NAMES dasd-eckd 0.0.0100 yes no dasda dasd-eckd 0.0.0190 no no dasd-eckd 0.0.0191 no no dasd-eckd 0.0.01fd yes no <<-- use for node 1 dasd-eckd 0.0.01fe yes no <<-- use for node 2 dasd-eckd 0.0.01ff yes no dasd-eckd 0.0.0592 no no dasd-eckd 0.0.0a00 yes yes <<-- use for node 3 @Abdul Yes, Kevan, Please use different DASD ids for each OSD’s as you listed below. As Sa mentioned, we always had different DASD ids for each OSD disks in our test environment. ============================================================= This will be set up in the customer ENV for case 03562792/BZ 2223380 for testing initially in case we need additional info collected for this case/BZ Raising the escalation flag on the BZ ticket to match the ongoing internal escalation The deployment of applications on OpenShift on z is being block by these issues. This morning in the customer call a problem was identified that the block device /dev/dasde1 had been accidentally overwritten and showed as a file with size 0 in the filesystem. The customer would like to correct it and reinstall odf. The result will be shown in the customer call later today. This is not the root cause for the initial crc errors. - We observed on reading 4k byte from /dev/dasde1, it showed 0 byte. Running strace further showed read() call is returning 0. From cli history, observed dd if=/dev/zero of=/dev/dasde1 was run to swipe the disk as part of troubleshooting earlier. This dd was run with the customer in an attempt to recover the OSD down/ CLBO outlined in the initial BZ Description. Yesterday in the customer call we did a fresh installation of OCP and ODF. Then after update of certification and reboot, everything worked fine. The only difference comparing with the setup before, is disk layout CDL(Compatible disk layout) is used instead of LDL(Linux® disk layout). https://www.ibm.com/docs/en/linux-on-systems?topic=know-disk-layout-summary Although we could reproduce the customer issue using LDL formatting in our lab, the CDL formatting is the default for DASD disks and more recommended. It is also required for OCP installation on IBM Z: https://docs.openshift.com/container-platform/4.13/installing/installing_ibm_z/installing-ibm-z.html The difference is after formatting with CDL, the customer needs to run another command "fdasd" to do the partitioning. This step is not needed with LDL. This may lead to some confusion.
> Although we could reproduce the customer issue using LDL formatting in our
> lab, the CDL formatting is the default for DASD disks and more recommended.
Just reviewed the comment I wrote yesterday and need to correct: We could NOT reproduce the customer issue using LDL formatting...
Customer updated today: After the call we have done some testing. We did a reinstall of the OpenShift and ODF cluster using LDL formatted disks and noticed inconsistencies with the install compared to the one we did on the call. We couldn't create a ODF Storage Cluster as it wasn't available, only a generic storage system was available. We then rebuild OpenShift and ODF cluster again using CDL formatting and used our scripts and yamls rather than the UI and it worked fine. We then performed multiple updates that triggered reboots and everything is fine at the moment. I have attached the yamls for the storage components, if these could be checked just to make sure look fine that would be good. @saliu
I am still working on RH documentation for dasd.
You stated:
> the customer needs to run another command "fdasd" to do the partitioning. This step is not needed with LDL.
Here is what I have documented for RH, please help me verify or correct any errors.
Are we missing a "fdasd" command ?
( EXAMPLE )
# chzdev -e 0.0.0a00
dasd-eckd 0.0.0a00 yes yes dasde
Formatting is issued which gives the device a partition as part of the formatting.
# dasdfmt /dev/dasde -b 4096 -p -y -F -d ldl
Releasing space for the entire device...
Skipping format check due to --force.
Finished formatting the device.
Rereading the partition table... ok
ITEM -2
If for some reason we had a failed OSD/device
Should the cleaning of the device be.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
dasda 94:0 0 103.2G 0 disk
|-dasda1 94:1 0 384M 0 part /boot
`-dasda2 94:2 0 102.8G 0 part /sysroot
dasde 94:16 0 811.6G 0 disk
`-dasde1 94:17 0 811.6G 0 part
On partition ?
# /usr/bin/dd if=/dev/zero of=/dev/dasde1 bs=1M count=10 conv=fsync
Or on dasde:
# /usr/bin/dd if=/dev/zero of=/dev/dasde bs=1M count=10 conv=fsync
# formatting disks and partitioning from customer script
for x in $storage_nodes; do oc debug node/${x} -- chroot /host chzdev -e 0.0.0a00; done
for x in $storage_nodes; do oc debug node/${x} -- chroot /host dasdfmt /dev/dasde -b 4096 -p -y -F; done
for x in $storage_nodes; do oc debug node/${x} -- chroot /host fdasd -a /dev/dasde; done
As discussed in the call, the commands for formatting looks good. Perhaps adding a sleep 60 after the dasdfmt command to wait for finishing the formatting. But only because the customer is using ESE DASD which is by default formatted in a quick mode (onyl the first two tracks). A full mode dasdfmt of a big disk can take longer than an hour. One additional recommendation to the customer is, to make best use of PAV when formatting a DASD that has one base device (0a00) and four alias devices (0afc, 0afd, 0afe, 0aff), specify five cylinders by adding the option "-r 5" dasdfmt /dev/dasde -b 4096 -p -y -F -r 5 I'm wondering why do they need to run 3 loops to execute these 3 commands? Isn't it possible to run for each node three commands at once? |