Created attachment 1568292 [details] virt-operator.log Description of problem: After creating HC resource not all expected pods come up. I can see following list only: kubemacpool-system kubemacpool-mac-controller-manager-cf5589bd6-mmbjm 1/1 Running 1 11m kubevirt-hyperconverged cdi-operator-598c5c4587-cwkq6 1/1 Running 0 12h kubevirt-hyperconverged cluster-network-addons-operator-84b9ff7d78-qmlxk 1/1 Running 0 12h kubevirt-hyperconverged hco-operator-7bd55465bb-skn7x 1/1 Running 0 12h kubevirt-hyperconverged kubevirt-ssp-operator-7cd75f5fb6-sd5tj 1/1 Running 0 12h kubevirt-hyperconverged kubevirt-web-ui-operator-5977749bcc-7tmkj 1/1 Running 0 12h kubevirt-hyperconverged node-maintenance-operator-c7c595c98-fmqwd 1/1 Running 0 12h kubevirt-hyperconverged virt-operator-7f5bb69654-8lfpd 1/1 Running 0 12h kubevirt-hyperconverged virt-operator-7f5bb69654-v6vsl 1/1 Running 0 12h linux-bridge bridge-marker-22vzt 1/1 Running 0 11m linux-bridge bridge-marker-96wbp 1/1 Running 0 11m linux-bridge bridge-marker-h4xmk 1/1 Running 0 11m linux-bridge bridge-marker-r9rz2 1/1 Running 0 11m linux-bridge bridge-marker-tcdt6 1/1 Running 0 11m linux-bridge bridge-marker-zwcgs 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-2kkbp 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-2kt4b 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-2rvjs 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-fqtdr 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-l5c7p 1/1 Running 0 11m linux-bridge kube-cni-linux-bridge-plugin-x84hh 1/1 Running 0 11m Looking into HCO log and it looks that partial resources were created to be picked up by other operators: {"level":"info","ts":1557814983.0328276,"logger":"controller_hyperconverged","msg":"Reconciling HyperConverged operator","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"hyperconverged-cluster"} {"level":"info","ts":1557814983.0390756,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubeVirtConfig"} {"level":"info","ts":1557814983.045492,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubeVirt"} {"level":"info","ts":1557814983.052106,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"CDI"} {"level":"info","ts":1557814983.0592034,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"NetworkAddonsConfig"} {"level":"info","ts":1557814983.0657887,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtCommonTemplatesBundle"} {"level":"info","ts":1557814983.072585,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtNodeLabellerBundle"} {"level":"info","ts":1557814983.0801625,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtTemplateValidator"} {"level":"info","ts":1557814983.0872798,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KWebUI"} I can see following error message in virt-operator: {"component":"virt-operator","kind":"","level":"error","msg":"Failed to create all resources: unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","name":"kubevirt-hyperconverged-cluster","namespace":"kubevirt-hyperconverged","pos":"kubevirt.go:858","timestamp":"2019-05-14T06:26:17.118421Z","uid":"1aba1544-760e-11e9-a437-fa163e060c36"} {"component":"virt-operator","level":"info","msg":"reenqueuing KubeVirt kubevirt-hyperconverged/kubevirt-hyperconverged-cluster","pos":"kubevirt.go:419","reason":"unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","timestamp":"2019-05-14T06:26:17.118483Z"} See attached log for full content. Version-Release number of selected component (if applicable): ocp-4.1.0-rc3 hco-2.0.0-11 How reproducible:100 Steps to Reproduce: 1.deploy hco on ocp Actual results: installation is not completed Expected results: CNV cluster up Additional info:
Looks like in order to address CDI we'll need this change to be vendored into the HCO https://github.com/kubevirt/containerized-data-importer/pull/798
this log line is an issue {"component":"virt-operator","level":"info","msg":"reenqueuing KubeVirt kubevirt-hyperconverged/kubevirt-hyperconverged-cluster","pos":"kubevirt.go:419","reason":"unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","timestamp":"2019-05-14T06:26:17.118483Z"} That indicates that OCP4 no longer allows us to use the StrategicMerge patch type for updating SCC. We can fix this by transitioning to a JSON patch. This will require a change to virt-operator.
pr is posted upstream to address this in KubeVirt. https://github.com/kubevirt/kubevirt/pull/2285
Denys Shchedrivyi mentioned the following issue with the kubevirt-node-labeller on the cnv-devel list (Subj: [cnv] hco-bundle-registry:v2.0.0-13) that I believe comes from the kubevirt-ssp-operator: [root@dell-r640-010 ~]# oc describe pod -n kubevirt-hyperconverged kubevirt-node-labeller-hzwxl . Warning Failed 1h (x4 over 1h) kubelet, working-jww4k-worker-0-6dqp9 Failed to pull image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin:v0.4.0": rpc error: code = Unknown desc = Error reading manifest v0.4.0 in brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin: unknown: Not Found Warning Failed 1h (x4 over 1h) kubelet, working-jww4k-worker-0-6dqp9 Error: ErrImagePull Warning Failed 5m (x331 over 1h) kubelet, working-jww4k-worker-0-6dqp9 Error: ImagePullBackOff Normal BackOff 44s (x353 over 1h) kubelet, working-jww4k-worker-0-6dqp9 Back-off pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin:v0.4.0" At current count we have 3 independent issues preventing a successful deployment of CNV: 1) kubevirt-node-labeller fails to start 2) CDI lacks permissions it needs to deploy 3) KubeVirt PR to use JSON patch instead of Strategic Merge David's PR addresses #3. A bug for 1 and 2 should also be created. I updated the title to reflect what is being fixed.
(In reply to David Zager from comment #6) > Denys Shchedrivyi mentioned the following issue with the > kubevirt-node-labeller on the cnv-devel list (Subj: [cnv] > hco-bundle-registry:v2.0.0-13) that I believe comes from the > kubevirt-ssp-operator: > > [root@dell-r640-010 ~]# oc describe pod -n kubevirt-hyperconverged > kubevirt-node-labeller-hzwxl > . > Warning Failed 1h (x4 over 1h) kubelet, > working-jww4k-worker-0-6dqp9 Failed to pull image > "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native- > virtualization/kvm-info-nfd-plugin:v0.4.0": rpc error: code = Unknown desc = > Error reading manifest v0.4.0 in > brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native- > virtualization/kvm-info-nfd-plugin: unknown: Not Found > Warning Failed 1h (x4 over 1h) kubelet, > working-jww4k-worker-0-6dqp9 Error: ErrImagePull > Warning Failed 5m (x331 over 1h) kubelet, > working-jww4k-worker-0-6dqp9 Error: ImagePullBackOff > Normal BackOff 44s (x353 over 1h) kubelet, > working-jww4k-worker-0-6dqp9 Back-off pulling image > "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native- > virtualization/kvm-info-nfd-plugin:v0.4.0" > > > At current count we have 3 independent issues preventing a successful > deployment of CNV: > > 1) kubevirt-node-labeller fails to start > 2) CDI lacks permissions it needs to deploy > 3) KubeVirt PR to use JSON patch instead of Strategic Merge > > > David's PR addresses #3. A bug for 1 and 2 should also be created. I updated > the title to reflect what is being fixed. Agreed. For #1 a HCO PR could be sufficient, but needs to be tested. We are working on https://github.com/kubevirt/hyperconverged-cluster-operator/pull/94/commits/0fb498c8a6704ee9cd482dff677169948d4bbe82
Michal, do you maybe have any ideas why this has happened?
(In reply to David Zager from comment #6) > 1) kubevirt-node-labeller fails to start Here about this issue https://bugzilla.redhat.com/show_bug.cgi?id=1710333 > 2) CDI lacks permissions it needs to deploy Here is for this one https://bugzilla.redhat.com/show_bug.cgi?id=1710261 > 3) KubeVirt PR to use JSON patch instead of Strategic Merge This bug is addressing this last thing.
please add fixed in version
(In reply to Fabian Deutsch from comment #8) > Michal, do you maybe have any ideas why this has happened? For reference - Fabian's thread to get an understanding of the background to the change: http://post-office.corp.redhat.com/archives/aos-devel/2019-May/msg00547.html
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1850