1974313 – Unable to mount volume with error "Found a atari partition table"

Bug 1974313 - Unable to mount volume with error "Found a atari partition table"

Summary: Unable to mount volume with error "Found a atari partition table"

Keywords:
Status:	CLOSED DUPLICATE of bug 1711674
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	aos-storage-staff@redhat.com
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-21 10:55 UTC by Martin Gencur
Modified:	2021-06-22 15:26 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-22 15:26:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Martin Gencur 2021-06-21 10:55:54 UTC

Description of problem:

Mounting a volume using PersistentVolumeClaim in AWS on OpenShift sometimes fails with the following problem which can be seen in the Node's journal:

fsck` error fsck from util-linux 2.32.1
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdch
/dev/xvdch:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>
Found a atari partition table in /dev/xvdch

The CI build that had this failure is here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-knative-serverless-operator-main-4.7-e2e-aws-ocp-47-continuous/1400422159498612736/artifacts/e2e-aws-ocp-47-continuous (the must-gather logs have all the info: events in kafka namespace, node's logs)

The failure is manifested as a failed volume attachment in Kafka's zookeeper in Kafka namespace:
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[zookeeper-metrics-and-logging zookeeper-nodes cluster-ca-certs my-cluster-zookeeper-token-rtxdb strimzi-tmp data]: timed out waiting for the condition",


Version-Release number of selected component (if applicable):
OCP 4.7.16

How reproducible:
Randomly. Sometimes the installation is successful.

Steps to Reproduce:
1. Checkout https://github.com/openshift-knative/serverless-operator (main branch)
2. Run `make install-strimzi` from the root of the repository

Actual results:
The installation hangs in "WAITING FOR STRIMZI CLUSTER TO BECOME READY"


Expected results:
The Kafka cluster is installed successfully.


Additional info:

Comment 1 Tomas Smetana 2021-06-21 14:06:10 UTC

I don't think we can do anything about this one: it's a known problem of AWS: see https://bugzilla.redhat.com/show_bug.cgi?id=1711674#c30 or https://github.com/kubernetes/kubernetes/issues/86064.

Comment 2 Jan Safranek 2021-06-22 15:26:22 UTC

Indeed, this is existing bug in AWS itself. Sometimes it provisions volumes with random garbage that Kubernetes is too afraid to overwrite. I am sorry, but we don't want to fix in Kubernetes. I suggest you to push AWS to fix it (or push whoever manages AWS env. for you) and make AWS aware this is a problem.

*** This bug has been marked as a duplicate of bug 1711674 ***

Note You need to log in before you can comment on or make changes to this bug.