Bug 1990432
Summary: | Volumes are accidentally deleted along with the machine [vsphere] | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fedosin <mfedosin> |
Component: | Cloud Compute | Assignee: | Mike Fedosin <mfedosin> |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | Flags: | miyadav:
needinfo-
|
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:45:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1993117 |
Description
Mike Fedosin
2021-08-05 11:48:41 UTC
Validated on : [miyadav@miyadav vsphere]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-08-26-040328 True False 39m Cluster version is 4.9.0-0.nightly-2021-08-26-040328 [miyadav@miyadav vsphere]$ 1.Create PVC [miyadav@miyadav vsphere]$ oc create -f pvc.yaml opersistentvolumeclaim/pvc4 created Result : PVC created successfully 2.Create deployment which uses PVC by below yaml. [miyadav@miyadav vsphere]$ oc create -f deploymentyaml.yaml deployment.apps/dep1 created apiVersion: apps/v1 kind: Deployment metadata: name: "dep1" spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: "myfrontend" image: "quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e" ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/var/www/html" name: "pvol" volumes: - name: "pvol" persistentVolumeClaim: claimName: "pvc4" Result : deployment created successfully 4.Stop the kubelet of the node running the pod and delete the machine having the node object , we should get proper logs message which suggest disk removed before deattaching [miyadav@miyadav vsphere]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-autoscaler-operator-78bf97c749-5xvkp 2/2 Running 0 38m 10.130.0.30 miyadav-2708-hptnr-master-1 <none> <none> cluster-baremetal-operator-688fcf9594-dvwvk 2/2 Running 0 38m 10.130.0.21 miyadav-2708-hptnr-master-1 <none> <none> dep1-64495756b4-sqd7c 1/1 Running 0 4m26s 10.131.0.31 miyadav-2708-hptnr-worker-h8nmj <none> <none> machine-api-controllers-7f49d8bbbb-nfj5g 7/7 Running 0 35m 10.128.0.11 miyadav-2708-hptnr-master-2 <none> <none> machine-api-operator-779c45669b-c8dht 2/2 Running 0 38m 10.130.0.25 miyadav-2708-hptnr-master-1 <none> <none> [miyadav@miyadav vsphere]$ oc debug node/miyadav-2708-hptnr-worker-h8nmj Starting pod/miyadav-2708-hptnr-worker-h8nmj-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 172.31.249.39 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# sh-4.4# systemctl stop kubelet Removing debug pod ... [miyadav@miyadav vsphere]$ oc delete machine miyadav-2708-hptnr-worker-h8nmj machine.machine.openshift.io "miyadav-2708-hptnr-worker-h8nmj" deleted . . I0827 04:18:36.039313 1 reconciler.go:284] miyadav-2708-hptnr-worker-h8nmj: node not ready, kubelet unreachable for some reason. Detaching disks before vm destroy. I0827 04:18:36.053559 1 reconciler.go:792] miyadav-2708-hptnr-worker-h8nmj: Updating provider status I0827 04:18:36.057589 1 machine_scope.go:102] miyadav-2708-hptnr-worker-h8nmj: patching machine E0827 04:18:36.082391 1 actuator.go:57] miyadav-2708-hptnr-worker-h8nmj error: miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling E0827 04:18:36.082442 1 controller.go:239] miyadav-2708-hptnr-worker-h8nmj: failed to delete machine: miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling E0827 04:18:36.082486 1 controller.go:304] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling" "name"="miyadav-2708-hptnr-worker-h8nmj" "namespace"="openshift-machine-api" . . Additional Info: Looks good to me , will wait for sometime if any inputs on test steps , if no comments , will move to VERIFIED. Test case looks good to me, only possible suggestion I would add is to check the PVC/disk to make sure it's still ok, eg check it still exists on the vCenter, check there's no errors reported on the PVC object Thanks @Joel , I checked from vsphere side as well , the vmdk was persisted even after we machine got deleted and new machine provisioned in its place . Moving to VERIFIED. Validated on the different cluster today , even after deleting machine , could see it exists. [miyadav@miyadav ~]$ govc datastore.ls -l '5137595f-7ce3-e95a-5c03-06d835dea807' | grep 'miyadav-2708' 12.0MB Mon Aug 30 05:45:14 2021 miyadav-2708-htqh4-dyn-pvc-413e0eaa-549c-4aa4-b969-bbc96550a6d3.vmdk Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |