2312245 – [GSS] OSD device headers are being wiped

Bug 2312245 - [GSS] OSD device headers are being wiped

Summary: [GSS] OSD device headers are being wiped

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.12
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Santosh Pillai
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-09-13 17:14 UTC by kelwhite
Modified:	2024-10-11 19:55 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-9250	None	None	None	2024-09-19 16:07:20 UTC
Red Hat Issue Tracker	OCSBZM-9253	None	None	None	2024-09-18 18:14:38 UTC
Red Hat Issue Tracker	OCSBZM-9254	None	None	None	2024-09-13 17:16:56 UTC

Description kelwhite 2024-09-13 17:14:40 UTC

Comment 4 kelwhite 2024-09-13 17:17:31 UTC

Making this BZ as urgent.. Assuming replacing the osds in the only way forward this could caused data loss for other customers...

Also, to add onto the env, this a fresh cluster that was deployed last week:

$  cat storagecluster.yaml 
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2024-09-05T17:22:12Z"

Comment 5 kelwhite 2024-09-13 17:19:33 UTC

I also forgot  to mention, this happened on the ocs4 node earlier this week. We ended up replacing the osds to resolve the issue..

Comment 6 kelwhite 2024-09-13 17:22:05 UTC

Would we need to configure something like [1] to see what accesses the device to cause the wipe? Is it too late to do anything?

[1] https://access.redhat.com/solutions/7039896

Comment 12 kelwhite 2024-09-17 22:06:42 UTC

The customer did a full redeployment and install 4.12.12 and still has the same issue. I dont think this is the same bug, or it it is, we didn't fix it in this version.

Customer is uploading a fresh ODF must-gather now... I'm also reviewing and will post findings when ive reviewed.

Comment 13 kelwhite 2024-09-17 22:09:48 UTC

Sorry 4.12.14

Comment 14 kelwhite 2024-09-17 22:19:48 UTC

Getting the file for 'dd if=/dev/sdb of=/tmp/block.8k.dump bs=4K count=2'

Comment 23 kelwhite 2024-09-26 18:32:12 UTC

Still have came up empty. The customer won't share their playbooks with us, nor have we been able to reproduce this issue. 

Atm, the case is pending on them to set up another call with us.

Note You need to log in before you can comment on or make changes to this bug.