Bug 2037307 - [CNV-4.8.4] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7.32 to OCP 4.8.13
Summary: [CNV-4.8.4] VMI is in LiveMigrate loop when Upgrading Cluster from 2.6.7/4.7....
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.8.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.4
Assignee: Jed Lejosne
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks: 2014509
TreeView+ depends on / blocked
 
Reported: 2022-01-05 12:19 UTC by Kedar Bidarkar
Modified: 2022-01-20 17:21 UTC (History)
4 users (show)

Fixed In Version: virt-operator-container-v4.8.4-12 hco-bundle-registry-container-v4.8.4-44
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-20 17:21:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 7028 0 None open [release-0.41] virt-handler: use /proc instead of virt-chroot to stat virt-launcher files 2022-01-05 19:58:37 UTC
Red Hat Product Errata RHBA-2022:0213 0 None None None 2022-01-20 17:21:26 UTC

Description Kedar Bidarkar 2022-01-05 12:19:56 UTC
This bug was initially created as a copy of Bug #2008511

I am copying this bug because: 



Description of problem:

Have been upgrading a Cluster from OCP/CNV-4.6/2.5.8 to OCP/CNV-4.7.32/2.6.7 to OCP-4.8.13  for another bug verification.

CNV_SOURCE=osbs   ;  2.5.8 to 2.6.7 upgrade was as follows: 2.5.8 to 2.6.4 to 2.6.5 to 2.6.6 to 2.6.7
Post 2.6.7, cluster status was GOOD.


When upgrading from  OCP/CNV 4.7.32/2.6.7   to OCP 4.8.13, that is CNV 2.6.7 running on OCP 4.8.13 , we see "VMI is in  LiveMigrate loop"

From the below pastebin, upgrade seems complete but,
Worker Nodes are not yet updated to 4.8 Images.


The Worker Nodes are stuck, because the vm2-ocs-rhel84 VM is in a constant LiveMigrate Loop.


I guess, If I stop/start or restart the VM, the worker Nodes might continue with the upgrade.
But I feel the VMI should have LiveMigrated successfully. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Upgrade stuck, due to constant VMI Live-Migrate Loop

Expected results:
Upgrade is not stuck, due to VMI Live-Migrate Loop

Additional info:
Running Must-Gather shows,

ClusterOperators:
	clusteroperator/machine-config is not available (Cluster not available for 4.8.13) because Failed to resync 4.8.13 because: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)
	clusteroperator/operator-lifecycle-manager is not upgradeable because ClusterServiceVersions blocking cluster upgrade: openshift-local-storage/local-storage-operator.4.6.0-202109030220 is incompatible with OpenShift minor versions greater than 4.8,openshift-cnv/kubevirt-hyperconverged-operator.v2.6.7 is incompatible with OpenShift minor versions greater than 4.8


1) local-storage operator was not updated and it remains the same, which was installed with 4.6/2.5.8
2) OCS was not updated and it remains the same, which was installed with 4.6/2.5.8

Comment 3 Kedar Bidarkar 2022-01-07 16:43:13 UTC
Summary: LiveMigration was successful, directly from virt-launcher version v2.6.4 ( to be precise it's v2.6.3-2 ) to v4.8.4-12

VERIFIED with: virt-operator-container-v4.8.4-12 hco-bundle-registry-container-v4.8.4-44

[kbidarka@localhost migration]$ oc get pods
NAME                                 READY   STATUS      RESTARTS   AGE
virt-launcher-vm-rhel84-ocs-8nqps    0/1     Completed   0          90m
virt-launcher-vm-rhel84-ocs-pxp5b    1/1     Running     0          7m23s
virt-launcher-vm2-rhel84-ocs-9xldq   1/1     Running     0          9m41s
virt-launcher-vm2-rhel84-ocs-f7pd7   0/1     Error       0          90m
[kbidarka@localhost migration]$ oc get vmi 
NAME             AGE   PHASE     IP             NODENAME
vm-rhel84-ocs    21h   Running   10.xxx.yy.zzz   virt05m-kswp6-worker-0-k2xtk
vm2-rhel84-ocs   21h   Running   10.xxx.yy.zzz  virt05m-kswp6-worker-0-k2xtk
[kbidarka@localhost migration]$ virtctl console vm-rhel84-ocs
Successfully connected to vm-rhel84-ocs console. The escape sequence is ^]

[cloud-user@vm-rhel84-ocs ~]$ uptime
 11:22:46 up 21:15,  1 user,  load average: 0.00, 0.00, 0.00
[cloud-user@vm-rhel84-ocs ~]$ [kbidarka@localhost migration]$ 
[kbidarka@localhost migration]$ virtctl console vm2-rhel84-ocs
Successfully connected to vm2-rhel84-ocs console. The escape sequence is ^]

[cloud-user@vm2-rhel84-ocs ~]$ uptime
 11:22:57 up 21:15,  1 user,  load average: 0.00, 0.00, 0.00
[cloud-user@vm2-rhel84-ocs ~]$ [kbidarka@localhost migration]$ 
[kbidarka@localhost migration]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.25    True        False         19h     Cluster version is 4.8.25
[kbidarka@localhost migration]$ oc get csv -n openshift-cnv 
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.8.4   OpenShift Virtualization   4.8.4     kubevirt-hyperconverged-operator.v2.6.4   Succeeded

Direct CNV upgrade from v2.6.4  to v4.8.4
--------------------------------------------
[kbidarka@localhost ocp-cnv-scripts]$ oc get ip -A
NAMESPACE                 NAME            CSV                                         APPROVAL    APPROVED
openshift-cnv             install-fw9n5   kubevirt-hyperconverged-operator.v2.6.4     Manual      true
openshift-cnv             install-v42hc   kubevirt-hyperconverged-operator.v4.8.4     Manual      true


---

Source Virt-Launcher Pod:
]$ ./fetch_pod_version_info_icsp.sh virt-launcher-vm-rhel84-ocs-8nqps default
Output: container-native-virtualization/virt-launcher/images/v2.6.3-2

Target Virt-Launcher Pod:
]$ ./fetch_pod_version_info_icsp.sh virt-launcher-vm-rhel84-ocs-pxp5b default
Output: container-native-virtualization/virt-launcher/images/v4.8.4-12

---

[kbidarka@localhost migration]$ oc logs -f virt-launcher-vm2-rhel84-ocs-f7pd7 | grep '\"level\":\"error\"'
{"component":"virt-launcher","level":"error","msg":"internal error: Child process (dmidecode -q -t 0,1,2,3,4,11,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2745","subcomponent":"libvirt","thread":"42","timestamp":"2022-01-07T14:52:15.221000Z"}
{"component":"virt-launcher","level":"error","msg":"Error when closing file","pos":"os_helper.go:34","reason":"invalid argument","timestamp":"2022-01-07T14:52:16.550089Z"}
{"component":"virt-launcher","level":"error","msg":"At least one cgroup controller is required: No such device or address","pos":"virCgroupDetect:415","subcomponent":"libvirt","thread":"30","timestamp":"2022-01-07T14:52:17.836000Z"}
{"component":"virt-launcher","level":"error","msg":"failed to get domain metadata","pos":"libvirt_helper.go:137","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:17.939299Z"}
{"component":"virt-launcher","level":"error","msg":"Could not fetch the Domain specification.","pos":"client.go:231","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:17.939365Z"}
{"component":"virt-launcher","level":"error","msg":"failed to get domain metadata","pos":"libvirt_helper.go:137","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.478106Z"}
{"component":"virt-launcher","level":"error","msg":"Could not fetch the Domain specification.","pos":"client.go:231","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.478207Z"}
{"component":"virt-launcher","kind":"","level":"error","msg":"failed to get domain job info","name":"vm2-rhel84-ocs","namespace":"default","pos":"manager.go:742","reason":"virError(Code=55, Domain=20, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-01-07T16:13:06.657070Z","uid":"b9fe7292-8841-4396-8e76-17f7636d9a0a"}
{"component":"virt-launcher","level":"error","msg":"timeout on stopping the cmd server, continuing anyway.","pos":"server.go:478","timestamp":"2022-01-07T16:13:07.906406Z"}
{"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:522","reason":"wait: no child processes","timestamp":"2022-01-07T16:13:07.912884Z"}



[kbidarka@localhost migration]$ oc logs -f virt-launcher-vm-rhel84-ocs-8nqps | grep '\"level\":\"error\"'
{"component":"virt-launcher","level":"error","msg":"internal error: Child process (dmidecode -q -t 0,1,2,3,4,11,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2745","subcomponent":"libvirt","thread":"42","timestamp":"2022-01-07T14:52:15.734000Z"}
{"component":"virt-launcher","level":"error","msg":"Error when closing file","pos":"os_helper.go:34","reason":"invalid argument","timestamp":"2022-01-07T14:52:17.242507Z"}
{"component":"virt-launcher","level":"error","msg":"At least one cgroup controller is required: No such device or address","pos":"virCgroupDetect:415","subcomponent":"libvirt","thread":"28","timestamp":"2022-01-07T14:52:18.312000Z"}
{"component":"virt-launcher","level":"error","msg":"failed to get domain metadata","pos":"libvirt_helper.go:137","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.410666Z"}
{"component":"virt-launcher","level":"error","msg":"Could not fetch the Domain specification.","pos":"client.go:231","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.410741Z"}
{"component":"virt-launcher","level":"error","msg":"failed to get domain metadata","pos":"libvirt_helper.go:137","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.981304Z"}
{"component":"virt-launcher","level":"error","msg":"Could not fetch the Domain specification.","pos":"client.go:231","reason":"virError(Code=80, Domain=20, Message='metadata not found: Requested metadata element is not present')","timestamp":"2022-01-07T14:52:18.981376Z"}
{"component":"virt-launcher","kind":"","level":"error","msg":"failed to get domain job info","name":"vm-rhel84-ocs","namespace":"default","pos":"manager.go:742","reason":"virError(Code=55, Domain=20, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-01-07T16:15:18.317062Z","uid":"af7fc901-7852-490f-bd60-8532dc7068d1"}
{"component":"virt-launcher","level":"error","msg":"timeout on stopping the cmd server, continuing anyway.","pos":"server.go:478","timestamp":"2022-01-07T16:15:19.550850Z"}
{"component":"virt-launcher","level":"error","msg":"Failed to reap process -1","pos":"virt-launcher.go:487","reason":"no child processes","timestamp":"2022-01-07T16:15:19.555211Z"}

Comment 9 errata-xmlrpc 2022-01-20 17:21:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.8.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0213


Note You need to log in before you can comment on or make changes to this bug.