Bug 1990432
| Summary: | Volumes are accidentally deleted along with the machine [vsphere] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fedosin <mfedosin> |
| Component: | Cloud Compute | Assignee: | Mike Fedosin <mfedosin> |
| Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | Flags: | miyadav:
needinfo-
|
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-18 17:45:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1993117 | ||
|
Description
Mike Fedosin
2021-08-05 11:48:41 UTC
Validated on :
[miyadav@miyadav vsphere]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-08-26-040328 True False 39m Cluster version is 4.9.0-0.nightly-2021-08-26-040328
[miyadav@miyadav vsphere]$
1.Create PVC
[miyadav@miyadav vsphere]$ oc create -f pvc.yaml
opersistentvolumeclaim/pvc4 created
Result : PVC created successfully
2.Create deployment which uses PVC by below yaml.
[miyadav@miyadav vsphere]$ oc create -f deploymentyaml.yaml
deployment.apps/dep1 created
apiVersion: apps/v1
kind: Deployment
metadata:
name: "dep1"
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
-
name: "myfrontend"
image: "quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e"
ports:
-
containerPort: 80
name: "http-server"
volumeMounts:
-
mountPath: "/var/www/html"
name: "pvol"
volumes:
-
name: "pvol"
persistentVolumeClaim:
claimName: "pvc4"
Result : deployment created successfully
4.Stop the kubelet of the node running the pod and delete the machine having the node object , we should get proper logs message which suggest disk removed before deattaching
[miyadav@miyadav vsphere]$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-autoscaler-operator-78bf97c749-5xvkp 2/2 Running 0 38m 10.130.0.30 miyadav-2708-hptnr-master-1 <none> <none>
cluster-baremetal-operator-688fcf9594-dvwvk 2/2 Running 0 38m 10.130.0.21 miyadav-2708-hptnr-master-1 <none> <none>
dep1-64495756b4-sqd7c 1/1 Running 0 4m26s 10.131.0.31 miyadav-2708-hptnr-worker-h8nmj <none> <none>
machine-api-controllers-7f49d8bbbb-nfj5g 7/7 Running 0 35m 10.128.0.11 miyadav-2708-hptnr-master-2 <none> <none>
machine-api-operator-779c45669b-c8dht 2/2 Running 0 38m 10.130.0.25 miyadav-2708-hptnr-master-1 <none> <none>
[miyadav@miyadav vsphere]$ oc debug node/miyadav-2708-hptnr-worker-h8nmj
Starting pod/miyadav-2708-hptnr-worker-h8nmj-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 172.31.249.39
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4#
sh-4.4# systemctl stop kubelet
Removing debug pod ...
[miyadav@miyadav vsphere]$ oc delete machine miyadav-2708-hptnr-worker-h8nmj
machine.machine.openshift.io "miyadav-2708-hptnr-worker-h8nmj" deleted
.
.
I0827 04:18:36.039313 1 reconciler.go:284] miyadav-2708-hptnr-worker-h8nmj: node not ready, kubelet unreachable for some reason. Detaching disks before vm destroy.
I0827 04:18:36.053559 1 reconciler.go:792] miyadav-2708-hptnr-worker-h8nmj: Updating provider status
I0827 04:18:36.057589 1 machine_scope.go:102] miyadav-2708-hptnr-worker-h8nmj: patching machine
E0827 04:18:36.082391 1 actuator.go:57] miyadav-2708-hptnr-worker-h8nmj error: miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling
E0827 04:18:36.082442 1 controller.go:239] miyadav-2708-hptnr-worker-h8nmj: failed to delete machine: miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling
E0827 04:18:36.082486 1 controller.go:304] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="miyadav-2708-hptnr-worker-h8nmj: reconciler failed to Delete machine: destroying vm in progress, reconciling" "name"="miyadav-2708-hptnr-worker-h8nmj" "namespace"="openshift-machine-api"
.
.
Additional Info:
Looks good to me , will wait for sometime if any inputs on test steps , if no comments , will move to VERIFIED.
Test case looks good to me, only possible suggestion I would add is to check the PVC/disk to make sure it's still ok, eg check it still exists on the vCenter, check there's no errors reported on the PVC object Thanks @Joel , I checked from vsphere side as well , the vmdk was persisted even after we machine got deleted and new machine provisioned in its place . Moving to VERIFIED. Validated on the different cluster today , even after deleting machine , could see it exists. [miyadav@miyadav ~]$ govc datastore.ls -l '5137595f-7ce3-e95a-5c03-06d835dea807' | grep 'miyadav-2708' 12.0MB Mon Aug 30 05:45:14 2021 miyadav-2708-htqh4-dyn-pvc-413e0eaa-549c-4aa4-b969-bbc96550a6d3.vmdk Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |