Bug 2312245 - [GSS] OSD device headers are being wiped
Summary: [GSS] OSD device headers are being wiped
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.12
Hardware: All
OS: All
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Santosh Pillai
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-13 17:14 UTC by kelwhite
Modified: 2024-10-11 19:55 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCSBZM-9250 0 None None None 2024-09-19 16:07:20 UTC
Red Hat Issue Tracker OCSBZM-9253 0 None None None 2024-09-18 18:14:38 UTC
Red Hat Issue Tracker OCSBZM-9254 0 None None None 2024-09-13 17:16:56 UTC

Description kelwhite 2024-09-13 17:14:40 UTC

Comment 4 kelwhite 2024-09-13 17:17:31 UTC
Making this BZ as urgent.. Assuming replacing the osds in the only way forward this could caused data loss for other customers...

Also, to add onto the env, this a fresh cluster that was deployed last week:

$  cat storagecluster.yaml 
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2024-09-05T17:22:12Z"

Comment 5 kelwhite 2024-09-13 17:19:33 UTC
I also forgot  to mention, this happened on the ocs4 node earlier this week. We ended up replacing the osds to resolve the issue..

Comment 6 kelwhite 2024-09-13 17:22:05 UTC
Would we need to configure something like [1] to see what accesses the device to cause the wipe? Is it too late to do anything?

[1] https://access.redhat.com/solutions/7039896

Comment 12 kelwhite 2024-09-17 22:06:42 UTC
The customer did a full redeployment and install 4.12.12 and still has the same issue. I dont think this is the same bug, or it it is, we didn't fix it in this version.

Customer is uploading a fresh ODF must-gather now... I'm also reviewing and will post findings when ive reviewed.

Comment 13 kelwhite 2024-09-17 22:09:48 UTC
Sorry 4.12.14

Comment 14 kelwhite 2024-09-17 22:19:48 UTC
Getting the file for 'dd if=/dev/sdb of=/tmp/block.8k.dump bs=4K count=2'

Comment 23 kelwhite 2024-09-26 18:32:12 UTC
Still have came up empty. The customer won't share their playbooks with us, nor have we been able to reproduce this issue. 

Atm, the case is pending on them to set up another call with us.


Note You need to log in before you can comment on or make changes to this bug.