Description of problem (please be detailed as possible and provide log snippests): From bugs bz1965768, We tried mon corruption and recovery of all 3 MONs DB( MONs recovered successfully), We were able to get rbd app-pods (created before mon db corruption) running and retrieved old data. In the case of cephfs, which was in health: HEALTH_ERR 1 filesystem is offline 1 filesystem is online with fewer MDS than max_mds 3 daemons have recently crashed Later, removed existing cephfs , created new cephfs using https://access.redhat.com/solutions/5441711, and ceph health was OK. We could not get the old cephfs app-pod (created before mon db corruption) back. From cephfs app-pod describe: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 9m1s (x502 over 19h) kubelet Unable to attach or mount volumes: unmounted volumes=[fedora-vol], unattached volumes=[fedora-vol]: timed out waiting for the condition Warning FailedMount 3m37s (x565 over 19h) kubelet MountVolume.MountDevice failed for volume "pvc-d9d90f00-3222-4259-941b-271888506638" : rpc error: code = Internal desc = pool not found: fscID (1) not found in Ceph cluster Creation of new cephfs pvcs and pods are working Patrick analysed the cluster and suggested ```The CephFS recovery completed okay using [1] but old PVCs won't bind because ceph-csi remembers the old fscid (unique integer assigned to file systems) when remounting. A procedure needs to be in place to update that (new ceph-csi BZ TBC). [1] https://access.redhat.com/solutions/5441711 ``` Version of all relevant components (if applicable): ocs-operator.v4.6.4-323.ci ocp 4.6.0-0.nightly-2021-06-16-061653 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, old cephfs pod data is not accessible Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? 2/2 Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. From https://bugzilla.redhat.com/show_bug.cgi?id=1965768 , corrupt MONs db, perform recovery procedure from article https://access.redhat.com/solutions/6100031/ 2. After MONs recovery, since CephFS was offline, performed CephFS recovery from https://access.redhat.com/solutions/5441711 3. New CephFS was active, but old cephfs app-pod data can't be accessed Actual results: Old cephfs app-pod in ContainerCreating state Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 9m1s (x502 over 19h) kubelet Unable to attach or mount volumes: unmounted volumes=[fedora-vol], unattached volumes=[fedora-vol]: timed out waiting for the condition Warning FailedMount 3m37s (x565 over 19h) kubelet MountVolume.MountDevice failed for volume "pvc-d9d90f00-3222-4259-941b-271888506638" : rpc error: code = Internal desc = pool not found: fscID (1) not found in Ceph cluster Expected results: Old cephfs app-pod should be running and old data from the pod should be retrieved Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1975608 is fixed in 5.0z1 We will get a fix in ODF as soon as there is a ceph container build with the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086