Bug 1977637
| Summary: | vSphere Machines stuck in deleting phase if associated Node object is deleted | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Milind Yadav <miyadav> |
| Component: | Cloud Compute | Assignee: | dmoiseev |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | dmoiseev, jhou, mimccune, miyadav, ssoto, welin, zhsun |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1977369 | Environment: | |
| Last Closed: | 2021-08-19 10:23:36 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1977634 | ||
| Bug Blocks: | |||
|
Description
Milind Yadav
2021-06-30 08:14:29 UTC
I have also encounted the machine stuck in deleting status for a long time
# oc get machine -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
cluster1-storage-vlan-40-h8mm5 Running 44d
cluster1-storage-vlan-40-s2lv4 Running 44d
cluster1-storage-vlan-40-tqsd6 Running 44d
cluster1-worker-vlan-40-2w7hx Running 7h58m
cluster1-worker-vlan-40-kxbvw Deleting 4d1h
cluster1-worker-vlan-50-lqs9m Deleting 4d1h
# oc describe machine -n openshift-machine-api cluster1-worker-vlan-40-kxbvw
Name: cluster1-worker-vlan-40-kxbvw
Namespace: openshift-machine-api
Labels: machine.openshift.io/cluster-api-cluster=cluster1-vh9zx
machine.openshift.io/cluster-api-machine-role=worker
machine.openshift.io/cluster-api-machine-type=worker
machine.openshift.io/cluster-api-machineset=cluster1-worker-vlan-40
machine.openshift.io/region=
machine.openshift.io/zone=
Annotations: machine.openshift.io/instance-state: poweredOn
API Version: machine.openshift.io/v1beta1
Kind: Machine
Metadata:
Creation Timestamp: 2021-07-08T00:01:50Z
Deletion Grace Period Seconds: 0
Deletion Timestamp: 2021-07-11T15:27:56Z
Finalizers:
machine.machine.openshift.io
Generate Name: cluster1-worker-vlan-40-
Generation: 3
Managed Fields:
API Version: machine.openshift.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:labels:
.:
f:machine.openshift.io/cluster-api-machine-role:
f:machine.openshift.io/cluster-api-machine-type:
f:machine.openshift.io/cluster-api-machineset:
f:ownerReferences:
.:
k:{"uid":"a7bab249-e4f0-45b3-9d7a-9b403462f69b"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:spec:
.:
f:metadata:
.:
f:labels:
.:
f:node-role.kubernetes.io/app:
f:node-role.kubernetes.io/vlan-40:
f:providerSpec:
.:
f:value:
.:
f:apiVersion:
f:credentialsSecret:
f:diskGiB:
f:kind:
f:memoryMiB:
f:metadata:
f:network:
f:numCPUs:
f:numCoresPerSocket:
f:snapshot:
f:template:
f:userDataSecret:
f:workspace:
Manager: machineset-controller
Operation: Update
Time: 2021-07-08T00:01:50Z
API Version: machine.openshift.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:machine.openshift.io/instance-state:
f:finalizers:
.:
v:"machine.machine.openshift.io":
f:labels:
f:machine.openshift.io/region:
f:machine.openshift.io/zone:
f:spec:
f:providerID:
f:status:
.:
f:addresses:
f:phase:
f:providerStatus:
.:
f:conditions:
f:instanceId:
f:instanceState:
f:taskRef:
Manager: machine-controller-manager
Operation: Update
Time: 2021-07-11T15:28:16Z
API Version: machine.openshift.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:lastUpdated:
f:nodeRef:
.:
f:kind:
f:name:
f:uid:
Manager: nodelink-controller
Operation: Update
Time: 2021-07-11T18:24:10Z
Owner References:
API Version: machine.openshift.io/v1beta1
Block Owner Deletion: true
Controller: true
Kind: MachineSet
Name: cluster1-worker-vlan-40
UID: a7bab249-e4f0-45b3-9d7a-9b403462f69b
Resource Version: 82676685
Self Link: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/cluster1-worker-vlan-40-kxbvw
UID: 0c3b83b1-9dfb-495d-ac58-a6b831d38435
Spec:
Metadata:
Labels:
node-role.kubernetes.io/app:
node-role.kubernetes.io/vlan-40:
Provider ID: vsphere://422b6bc8-fabd-939d-0d45-ed987f81334d
Provider Spec:
Value:
API Version: vsphereprovider.openshift.io/v1beta1
Credentials Secret:
Name: vsphere-cloud-credentials
Disk Gi B: 120
Kind: VSphereMachineProviderSpec
Memory Mi B: 8192
Metadata:
Creation Timestamp: <nil>
Network:
Devices:
Network Name: OCP4-Lab2-vLan-40
Network Name: OCP4-Lab2-vLan-Trunk
Network Name: OCP4-Lab2-vLan-Trunk
Num CP Us: 2
Num Cores Per Socket: 1
Snapshot:
Template: rhcos-4.7.7-x86_64
User Data Secret:
Name: worker-user-data
Workspace:
Datacenter: Datacenter1
Datastore: datastore3
Folder: /Datacenter1/vm/OCP4-Lab2/03-DataCenter/OCP/Cluster-1/cluster1-worker-vlan-40
Server: 192.168.1.6
Status:
Addresses:
Address: 192.168.40.55
Type: InternalIP
Address: fe80::5c56:1229:ae1e:dfbd
Type: InternalIP
Address: cluster1-worker-vlan-40-kxbvw
Type: InternalDNS
Last Updated: 2021-07-12T00:43:14Z
Node Ref:
Kind: Node
Name: cluster1-worker-vlan-40-kxbvw
UID: 6cb0b173-c2f1-44aa-8cca-cc3671edff10
Phase: Deleting
Provider Status:
Conditions:
Last Probe Time: 2021-07-08T00:01:50Z
Last Transition Time: 2021-07-08T00:01:50Z
Message: Machine successfully created
Reason: MachineCreationSucceeded
Status: True
Type: MachineCreation
Instance Id: 422b6bc8-fabd-939d-0d45-ed987f81334d
Instance State: poweredOn
Task Ref: task-61052
Events: <none>
# oc get nodes
NAME STATUS ROLES AGE VERSION
cluster1-storage-vlan-40-h8mm5 Ready storage,worker 44d v1.20.0+2817867
cluster1-storage-vlan-40-s2lv4 Ready storage,worker 44d v1.20.0+2817867
cluster1-storage-vlan-40-tqsd6 Ready storage,worker 44d v1.20.0+2817867
cluster1-worker-vlan-40-2w7hx Ready app,vlan-40,worker 7h51m v1.20.0+2817867
cluster1-worker-vlan-40-kxbvw Ready,SchedulingDisabled app,vlan-40,worker 4d1h v1.20.0+2817867
cluster1-worker-vlan-50-lqs9m Ready,SchedulingDisabled app,vlan-50,worker 4d1h v1.20.0+2817867
master-01.cluster1.ocp4.example.internal Ready master 56d v1.20.0+2817867
master-02.cluster1.ocp4.example.internal Ready master 56d v1.20.0+2817867
master-03.cluster1.ocp4.example.internal Ready master 56d v1.20.0+2817867
# oc describe nodes cluster1-worker-vlan-40-kxbvw
Name: cluster1-worker-vlan-40-kxbvw
Roles: app,vlan-40,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=cluster1-worker-vlan-40-kxbvw
kubernetes.io/os=linux
node-role.kubernetes.io/app=
node-role.kubernetes.io/vlan-40=
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
Annotations: csi.volume.kubernetes.io/nodeid:
{"openshift-storage.cephfs.csi.ceph.com":"cluster1-worker-vlan-40-kxbvw","openshift-storage.rbd.csi.ceph.com":"cluster1-worker-vlan-40-kxb...
machine.openshift.io/machine: openshift-machine-api/cluster1-worker-vlan-40-kxbvw
machineconfiguration.openshift.io/currentConfig: rendered-worker-a7aa7de76b7ef645f66b332beb7766dd
machineconfiguration.openshift.io/desiredConfig: rendered-worker-a7aa7de76b7ef645f66b332beb7766dd
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 08 Jul 2021 08:09:33 +0800
Taints: node.kubernetes.io/unschedulable:NoSchedule
Unschedulable: true
Lease:
HolderIdentity: cluster1-worker-vlan-40-kxbvw
AcquireTime: <unset>
RenewTime: Mon, 12 Jul 2021 09:16:32 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 12 Jul 2021 09:12:01 +0800 Mon, 12 Jul 2021 08:41:58 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 12 Jul 2021 09:12:01 +0800 Mon, 12 Jul 2021 08:41:58 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 12 Jul 2021 09:12:01 +0800 Mon, 12 Jul 2021 08:41:58 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 12 Jul 2021 09:12:01 +0800 Mon, 12 Jul 2021 08:41:58 +0800 KubeletReady kubelet is posting ready status
Addresses:
ExternalIP: 192.168.40.55
InternalIP: 192.168.40.55
Hostname: cluster1-worker-vlan-40-kxbvw
Capacity:
cpu: 2
ephemeral-storage: 125293548Ki
hugepages-2Mi: 0
memory: 8153700Ki
pods: 250
Allocatable:
cpu: 1500m
ephemeral-storage: 114396791822
hugepages-2Mi: 0
memory: 7002724Ki
pods: 250
System Info:
Machine ID: 6a4377b6bbff45ba9b177c0418ee0291
System UUID: c86b2b42-bdfa-9d93-0d45-ed987f81334d
Boot ID: 71ec56c1-6526-4465-900a-2aaf347f1230
Kernel Version: 4.18.0-240.22.1.el8_3.x86_64
OS Image: Red Hat Enterprise Linux CoreOS 47.83.202106032343-0 (Ootpa)
Operating System: linux
Architecture: amd64
Container Runtime Version: cri-o://1.20.3-2.rhaos4.7.gitb53fa9d.el8
Kubelet Version: v1.20.0+2817867
Kube-Proxy Version: v1.20.0+2817867
ProviderID: vsphere://422b6bc8-fabd-939d-0d45-ed987f81334d
Non-terminated Pods: (14 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
openshift-cluster-node-tuning-operator tuned-c4cg8 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 4d1h
openshift-dns dns-default-j8lcv 65m (4%) 0 (0%) 131Mi (1%) 0 (0%) 4d1h
openshift-image-registry node-ca-j5x2h 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 4d1h
openshift-ingress-canary ingress-canary-j994v 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 4d1h
openshift-machine-config-operator machine-config-daemon-jxszk 40m (2%) 0 (0%) 100Mi (1%) 0 (0%) 4d1h
openshift-monitoring node-exporter-dn5kx 9m (0%) 0 (0%) 210Mi (3%) 0 (0%) 4d1h
openshift-multus multus-nn76n 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 4d1h
openshift-multus network-metrics-daemon-s9h7p 20m (1%) 0 (0%) 120Mi (1%) 0 (0%) 4d1h
openshift-network-diagnostics network-check-target-7sh8h 10m (0%) 0 (0%) 15Mi (0%) 0 (0%) 4d1h
openshift-nmstate nmstate-handler-g79nz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d
openshift-sdn sdn-jv4g5 110m (7%) 0 (0%) 220Mi (3%) 0 (0%) 4d1h
openshift-storage csi-cephfsplugin-nfrlg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d1h
openshift-storage csi-rbdplugin-9tn4h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d1h
percona-test-1 cluster1-haproxy-1 200m (13%) 0 (0%) 1G (13%) 0 (0%) 4d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 494m (32%) 0 (0%)
memory 2075838976 (28%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasSufficientMemory 98m (x60 over 4d1h) kubelet Node cluster1-worker-vlan-40-kxbvw status is now: NodeHasSufficientMemory
my OCP version is OCP 4.7.16,vSphere UPI + machineset @welin Not related to this bug. Node is there. ```cluster1-worker-vlan-40-kxbvw Ready,SchedulingDisabled app,vlan-40,worker 4d1h v1.20.0+2817867``` *** This bug has been marked as a duplicate of bug 1989648 *** |