Hi @jdobson , do we have any findings on this issue? The customer just got back to us letting us know they were able to trigger this same issue on 4.8.27. They were running on 4.8.20 without any problems and they started happening as soon as they upgraded to 4.8.27. We do have must-gathers, and LSO logs. Please let me know if you want me to attach them to the bug. Best regards,
Alright, I was able to reproduce this twice today with the following steps. It varies how long it takes to reproduce, but both times it took less than 10 minutes for 2032924-delete-recreate-loop.sh to trigger the issue. 1) Create LV [jon@dagobah deploy]$ cat lv-loop0.yaml apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "lv-loop0" namespace: "openshift-local-storage" spec: storageClassDevices: - storageClassName: "lv-loop0" volumeMode: Filesystem fsType: xfs devicePaths: - /dev/loop0 [jon@dagobah deploy]$ oc apply -f lv-loop0.yaml localvolume.local.storage.openshift.io/lv-loop0 created [jon@dagobah deploy]$ oc get localvolumes -n openshift-local-storage NAME AGE lv-loop0 18s [jon@dagobah deploy]$ oc get pv | grep lv-loop0 local-pv-e739fe1b 1Gi RWO Delete Available lv-loop0 22s 2) Create PVC and Pod [jon@dagobah deploy]$ cat pvc-pod-lv.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: local-pv-claim1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi volumeName: local-pv-e739fe1b storageClassName: lv-loop0 volumeMode: Filesystem --- apiVersion: v1 kind: Pod metadata: name: example labels: app: example spec: volumes: - name: claim1-vol persistentVolumeClaim: claimName: local-pv-claim1 containers: - image: busybox command: - "/bin/sh" - "-ec" - "touch \"/mnt/claim1/$(date -Iseconds).txt\"; sleep 604800" imagePullPolicy: IfNotPresent name: busybox volumeMounts: - mountPath: "/mnt/claim1" name: claim1-vol restartPolicy: Always [jon@dagobah deploy]$ oc apply -f pvc-pod-lv.yaml persistentvolumeclaim/local-pv-claim1 created pod/example created [jon@dagobah deploy]$ oc get pods NAME READY STATUS RESTARTS AGE example 1/1 Running 0 17s 3) Delete and recreate PVC and Pod in a loop [jon@dagobah deploy]$ cat 2032924-delete-recreate-loop.sh #!/bin/bash set -x year=$(date +%Y) claimdir=/mnt/claim1 podname=pod/example oc apply -f pvc-pod-lv.yaml sleep 30 while true; do oc delete -f pvc-pod-lv.yaml sleep 10 oc apply -f pvc-pod-lv.yaml sleep 30 oc debug $podname -- ls -l $claimdir 2>&1| grep $year numfiles=$(oc debug $podname -- ls -l $claimdir 2>&1| grep $year | wc -l) if [ $numfiles -gt 1 ]; then echo "found $numfiles instead of 1 file in $claimdir" exit 1 fi done [jon@dagobah deploy]$ ./2032924-delete-recreate-loop.sh ... + oc delete -f pvc-pod-lv.yaml persistentvolumeclaim "local-pv-claim1" deleted pod "example" deleted + sleep 10 + oc apply -f pvc-pod-lv.yaml persistentvolumeclaim/local-pv-claim1 created pod/example created + sleep 30 + oc debug pod/example -- ls -l /mnt/claim1 + grep 2022 -rw-r--r-- 1 root root 0 Jan 26 00:31 2022-01-26T00:31:47+00:00.txt ++ oc debug pod/example -- ls -l /mnt/claim1 ++ grep 2022 ++ wc -l + numfiles=1 + '[' 1 -gt 1 ']' + true + oc delete -f pvc-pod-lv.yaml persistentvolumeclaim "local-pv-claim1" deleted pod "example" deleted + sleep 10 + oc apply -f pvc-pod-lv.yaml persistentvolumeclaim/local-pv-claim1 created pod/example created + sleep 30 + oc debug pod/example -- ls -l /mnt/claim1 + grep 2022 -rw-r--r-- 1 root root 0 Jan 26 00:31 2022-01-26T00:31:47+00:00.txt -rw-r--r-- 1 root root 0 Jan 26 00:33 2022-01-26T00:33:08+00:00.txt ++ oc debug pod/example -- ls -l /mnt/claim1 ++ grep 2022 ++ wc -l + numfiles=2 + '[' 2 -gt 1 ']' + echo 'found 2 instead of 1 file in /mnt/claim1' found 2 instead of 1 file in /mnt/claim1 + exit 1 Notes: The logs in my case were slightly different than the logs in the original bug report. There was no duplicate cleanup job preceding the missed cleanup. But sig storage deleter did not get triggered to remove the previously successful cleanup job before the next deletion came in and found an existing cleanup job with a successful status. Still, I think this points to the sleep in handlePVChange() being insufficient. To fix this, we need some way to guarantee that the job status in sig storage deleter always gets cleaned up (without starting a second cleanup job) before the next deletion request is processed. Next steps: - Work on implementing a fix and verify with the process above.
I discussed this with the team today to clarify a few points, but it wasn't quite clear how best to address this. So I ran through the scenario again with some extra log statements to see exactly what happens step-by-step. Here are the two files found by ./2032924-delete-recreate-loop.sh + oc debug pod/example -- ls -l /mnt/claim1 + grep 2022 -rw-r--r-- 1 root root 0 Jan 27 23:25 2022-01-27T23:25:20+00:00.txt -rw-r--r-- 1 root root 0 Jan 27 23:26 2022-01-27T23:26:48+00:00.txt 23:25:06: The previous cleanup finishes, and the PV is deleted. I0127 23:25:06.513134 171818 deleter.go:162] jdd: cleanup status for local-pv-e739fe1b: 5 I0127 23:25:06.516267 171818 deleter.go:167] Deleting pv local-pv-e739fe1b after successful cleanup handlePVChange() gets an UpdateEvent, and the PV is still in the released state: I0127 23:25:06.529823 171818 cache.go:64] Updated pv "local-pv-e739fe1b" to cache 1.6433259065298407e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:176 found PV with state released {"pvName": "local-pv-e739fe1b"} 1.6433259065298662e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:176 jdd: handlePVChange stack trace This triggers Reconcile() to call DeletePVs() again: 1.6433259065300732e+09 INFO diskmaker-controllers.Deleter controller/controller.go:114 Deleting Pvs through sig storage deleter {"Request.Namespace": "openshift-local-storage", "Request.Name": ""} Then handlePVChange() gets a DeleteEvent to remove the cache entry: I0127 23:25:06.531551 171818 cache.go:73] Deleted pv "local-pv-e739fe1b" from cache 1.6433259065315902e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:182 found PV with state released {"pvName": "local-pv-e739fe1b"} 1.6433259065316565e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:182 jdd: handlePVChange stack trace But it's too late, DeletePVs() already listed PV's in the cache before the cache entry was removed, and the new cleanup job started: I0127 23:25:06.536126 171818 deleter.go:162] jdd: cleanup status for local-pv-e739fe1b: 2 I0127 23:25:06.536136 171818 deleter.go:197] Start cleanup for pv local-pv-e739fe1b I0127 23:25:06.536139 171818 deleter.go:215] jdd: calling runProcess for local-pv-e739fe1b ... I0127 23:25:06.541492 171818 deleter.go:309] Cleanup pv "local-pv-e739fe1b": StdoutBuf - "Calling mkfs" A new cache entry gets created for the PV: I0127 23:25:06.727279 171818 cache.go:55] Added pv "local-pv-e739fe1b" to cache 1.643325906727301e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:170 jdd: handlePVChange stack trace Cleanup completes: I0127 23:25:06.739524 171818 deleter.go:309] Cleanup pv "local-pv-e739fe1b": StdoutBuf - "Quick reset completed" I0127 23:25:06.739898 171818 deleter.go:287] Completed cleanup of pv "local-pv-e739fe1b" All of that happened in about 0.2 seconds. But now we have a new cache entry that is NOT in the released state, so Reconcile() does not call DeletePVs() to clean up the successful job status yet. 23:25:20: The new Pod gets created, the volume gets mounted, and this file is created: 2022-01-27T23:25:20+00:00.txt Some time passes and we request deletion of the new pod and PVC. 23:26:24: The PV state is set to released. I0127 23:26:24.859352 171818 cache.go:64] Updated pv "local-pv-e739fe1b" to cache 1.643325984859374e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:176 found PV with state released {"pvName": "local-pv-e739fe1b"} 1.643325984859413e+09 INFO diskmaker-controllers.Deleter deleter/reconcile.go:176 jdd: handlePVChange stack trace Reconcile() calls DeletePVs(): 1.6433259848596299e+09 INFO diskmaker-controllers.Deleter controller/controller.go:114 Deleting Pvs through sig storage deleter {"Request.Namespace": "openshift-local-storage", "Request.Name": ""} It finds the cleanup job that started at 23:25:06 but the status was not checked before. The status is successful, so it deletes the PV without attempting cleanup. I0127 23:26:24.865132 171818 deleter.go:162] jdd: cleanup status for local-pv-e739fe1b: 5 I0127 23:26:24.865141 171818 deleter.go:167] Deleting pv local-pv-e739fe1b after successful cleanup 23:26:48: The new Pod gets created, the volume gets mounted, and the new file is created: -rw-r--r-- 1 root root 0 Jan 27 23:25 2022-01-27T23:25:20+00:00.txt -rw-r--r-- 1 root root 0 Jan 27 23:26 2022-01-27T23:26:48+00:00.txt What do we actually expect to happen above? - PVC / Pod gets deleted - pv.Status.Phase is set to VolumeReleased - this triggers handlePVChange as an UpdateEvent Updated pv "local-pv-e739fe1b" to cache found PV with state released - handlePVChange pokes Reconcile (q.Add) - Reconcile calls r.deleter.DeletePVs() - Cleanup job is started - At some point in the future, Reconcile() should run again to check for completed job - in progress jobs are ignored, need to check again later - failed jobs will try cleanup again, need to check again later - succeeded jobs will remove cleanup job status and call DeletePV - DeletePV (in sig storage deleter) should trigger DeleteEvent and handlePVChange - handlePVChange should remove the cache entry Part of what makes this difficult to address is that diskmaker's Reconcile() can not tell if DeletePVs() in sig storage deleter is going to clean up a completed job, or start a new job. Still evaluating potential fixes. This might require changes upstream in sig storage deleter, which would take some time to deliver to customers.
Submitted a potential fix upstream for review that resolves the issue in my test environment: https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/pull/287
Instructions for testing the 4.9 dev build below. I'll provide a subscription manifest for 4.8 as well once that image is ready. Note: this should only be used in test environments to validate the fix for this bug, NOT on any production systems. 1) Remove existing local storage operator from the cluster Follow the instructions here: https://docs.openshift.com/container-platform/4.9/operators/admin/olm-deleting-operators-from-cluster.html Make sure there are no resources remaining in the openshift-local-storage namespace. The namespace should still exist but be empty. $ oc get all -n openshift-local-storage No resources found in openshift-local-storage namespace. Manually delete the remaining CRD and operator resources. $ oc delete crd localvolumediscoveries.local.storage.openshift.io localvolumediscoveryresults.local.storage.openshift.io localvolumes.local.storage.openshift.io localvolumesets.local.storage.openshift.io $ oc delete operators local-storage-operator.openshift-local-storage And remove the operatorgroup, if one exists. $ oc get operatorgroup -n openshift-local-storage NAME AGE openshift-local-storage-hxlwq 85m $ oc delete operatorgroup -n openshift-local-storage openshift-local-storage-hxlwq 2) Create subscription manifest for the 4.9 dev image $ cat subscribe-bug2032924-4.9.yaml apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: local-operator-group namespace: openshift-local-storage spec: targetNamespaces: - openshift-local-storage --- apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: localstorage-operator-manifests namespace: openshift-local-storage spec: sourceType: grpc image: quay.io/jdobson/index:bug2032924-4.9 --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-subscription namespace: openshift-local-storage spec: channel: preview name: local-storage-operator source: localstorage-operator-manifests sourceNamespace: openshift-local-storage 3) Apply the subscription manifest for the desired release $ oc apply -f subscribe-bug2032924-4.9.yaml 4) Wait for installation to complete Monitor the "Installed Operators" page in the web console, or: $ oc describe operators local-storage-operator.openshift-local-storage Once it's ready, you can begin testing. To uninstall the dev build once testing is complete: $ oc delete -f subscribe-bug2032924-4.9.yaml Then follow the instructions from Step 1 above to cleanup any remaining resources.
(In reply to Jonathan Dobson from comment #21) > I'll provide a subscription manifest for 4.8 as well once that image is ready. The only difference for 4.8 is the image being pulled from quay is built on top of release-4.8. Otherwise, the instructions are the same as described in https://bugzilla.redhat.com/show_bug.cgi?id=2032924#c21 $ cat subscribe-bug2032924-4.8.yaml apiVersion: operators.coreos.com/v1alpha2 kind: OperatorGroup metadata: name: local-operator-group namespace: openshift-local-storage spec: targetNamespaces: - openshift-local-storage --- apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: localstorage-operator-manifests namespace: openshift-local-storage spec: sourceType: grpc image: quay.io/jdobson/index:bug2032924-4.8 --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-subscription namespace: openshift-local-storage spec: channel: preview name: local-storage-operator source: localstorage-operator-manifests sourceNamespace: openshift-local-storage
Tested followed https://bugzilla.redhat.com/show_bug.cgi?id=2032924#c16 and passed with local-storage-operator.4.11.0-202202172124
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
*** Bug 2219918 has been marked as a duplicate of this bug. ***