Bug 1878499
Summary: | DV import doesn't recover from scratch space PVC deletion | ||||||
---|---|---|---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Alex Kalenyuk <akalenyu> | ||||
Component: | Storage | Assignee: | Bartosz Rybacki <brybacki> | ||||
Status: | CLOSED ERRATA | QA Contact: | Alex Kalenyuk <akalenyu> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 2.4.1 | CC: | alitke, cnv-qe-bugs, mrashish, ngavrilo | ||||
Target Milestone: | --- | Keywords: | Automation | ||||
Target Release: | 2.6.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | virt-cdi-importer 2.6.0-14 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-03-10 11:18:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Alex Kalenyuk
2020-09-13 13:52:13 UTC
@Bartosz please take a look. I am on it, trying to recreate. Recreate successful, but need to remove scratch just after creating it, when pod has not been scheduled again/not started yet. The pod becomes Unschedulable and controller does not handle the state correctly. "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2020-09-22T11:01:03Z", "message": "persistentvolumeclaim \"scratch-space-delete-scratch\" not found", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], "phase": "Pending", "qosClass": "BestEffort" } After requesting the creation of the pod and the scratch-space: 1. the Pod is being created so it shows in the system as Pending (waits for all PVC to be available), this is observed in controller and it tries to create scratch again, but scratch is already in the system (also Pending), so controller sets condition "Claim Pending" and returns. 2. Some external action removes PVC (if I am not mistaken the finalizer: "kubernetes.io/pvc-protection" does not work if pod is Pending/not scheduled yet), now the only thing the controller can see is a PVC event, but the scratch PVC is not found so controller returns. No further events for the POD (still Pending - no changes) . Not a blocker for 2.5. Pushing out. Had some problem with downstream builds. There's no -11 (or newer) available. Build should work now (thanks to Gal Ben Haim!) I've got some information from @akalenyu that now the original problem does not show. The original problem was that every time the scratch pvc was deleted while the pod was still Pending, the system was in a state where there was pending importer pod, no scratch space and the import controller was not reconciling this situation (it could after the Resync period - 10 hours). After the fix is applied, the import controller does requeue the reconcile loop until the DV is in state Succeeded or Failed. So in this situation the scratch PVC is recreated. This was proved by runing the tests. Now we discovered that the test fails once in many runs. Analyzing the logs shows this situation: PVC: test-scratch status=Terminating POD: importer-test status=ContainerCreating, and the last event shows: Type Reason Age From Message Warning FailedMount 4m (x8 over 8m28s) kubelet Unable to attach or mount volumes: unmounted volumes=[cdi-scratch-vol], unattached volumes=[cdi-scratch-vol]: error processing PVC test-scratch: PVC is being deleted This looks exactly like this: https://bugzilla.redhat.com/show_bug.cgi?id=1570606 To resolve this user can recreate a DV. I am not sure we can/should detect the situation and try to resolve this. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799 |