Bug 1709677 - virt-operator fails updating SCC using strategic merge
Summary: virt-operator fails updating SCC using strategic merge
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.0
Assignee: David Zager
QA Contact: Irina Gulina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-14 07:06 UTC by Lukas Bednar
Modified: 2019-07-24 20:16 UTC (History)
14 users (show)

Fixed In Version: hco-bundle-registry:v2.0.0-20
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-24 20:15:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virt-operator.log (105.00 KB, text/plain)
2019-05-14 07:06 UTC, Lukas Bednar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:1850 0 None None None 2019-07-24 20:15:59 UTC

Description Lukas Bednar 2019-05-14 07:06:27 UTC
Created attachment 1568292 [details]
virt-operator.log

Description of problem:

After creating HC resource not all expected pods come up.
I can see following list only:
kubemacpool-system                                      kubemacpool-mac-controller-manager-cf5589bd6-mmbjm                1/1       Running     1          11m
kubevirt-hyperconverged                                 cdi-operator-598c5c4587-cwkq6                                     1/1       Running     0          12h
kubevirt-hyperconverged                                 cluster-network-addons-operator-84b9ff7d78-qmlxk                  1/1       Running     0          12h
kubevirt-hyperconverged                                 hco-operator-7bd55465bb-skn7x                                     1/1       Running     0          12h
kubevirt-hyperconverged                                 kubevirt-ssp-operator-7cd75f5fb6-sd5tj                            1/1       Running     0          12h
kubevirt-hyperconverged                                 kubevirt-web-ui-operator-5977749bcc-7tmkj                         1/1       Running     0          12h
kubevirt-hyperconverged                                 node-maintenance-operator-c7c595c98-fmqwd                         1/1       Running     0          12h
kubevirt-hyperconverged                                 virt-operator-7f5bb69654-8lfpd                                    1/1       Running     0          12h
kubevirt-hyperconverged                                 virt-operator-7f5bb69654-v6vsl                                    1/1       Running     0          12h
linux-bridge                                            bridge-marker-22vzt                                               1/1       Running     0          11m
linux-bridge                                            bridge-marker-96wbp                                               1/1       Running     0          11m
linux-bridge                                            bridge-marker-h4xmk                                               1/1       Running     0          11m
linux-bridge                                            bridge-marker-r9rz2                                               1/1       Running     0          11m
linux-bridge                                            bridge-marker-tcdt6                                               1/1       Running     0          11m
linux-bridge                                            bridge-marker-zwcgs                                               1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-2kkbp                                1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-2kt4b                                1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-2rvjs                                1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-fqtdr                                1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-l5c7p                                1/1       Running     0          11m
linux-bridge                                            kube-cni-linux-bridge-plugin-x84hh                                1/1       Running     0          11m


Looking into HCO log and it looks that partial resources were created to be picked up by other operators:
{"level":"info","ts":1557814983.0328276,"logger":"controller_hyperconverged","msg":"Reconciling HyperConverged operator","Request.Namespace":"kubevirt-hyperconverged","Request.Name":"hyperconverged-cluster"}
{"level":"info","ts":1557814983.0390756,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubeVirtConfig"}
{"level":"info","ts":1557814983.045492,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubeVirt"}
{"level":"info","ts":1557814983.052106,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"CDI"}
{"level":"info","ts":1557814983.0592034,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"NetworkAddonsConfig"}
{"level":"info","ts":1557814983.0657887,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtCommonTemplatesBundle"}
{"level":"info","ts":1557814983.072585,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtNodeLabellerBundle"}
{"level":"info","ts":1557814983.0801625,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KubevirtTemplateValidator"}
{"level":"info","ts":1557814983.0872798,"logger":"controller_hyperconverged","msg":"Skip reconcile: resource already exists","Kind":"KWebUI"}

I can see following error message in virt-operator:
{"component":"virt-operator","kind":"","level":"error","msg":"Failed to create all resources: unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","name":"kubevirt-hyperconverged-cluster","namespace":"kubevirt-hyperconverged","pos":"kubevirt.go:858","timestamp":"2019-05-14T06:26:17.118421Z","uid":"1aba1544-760e-11e9-a437-fa163e060c36"}
{"component":"virt-operator","level":"info","msg":"reenqueuing KubeVirt kubevirt-hyperconverged/kubevirt-hyperconverged-cluster","pos":"kubevirt.go:419","reason":"unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","timestamp":"2019-05-14T06:26:17.118483Z"}

See attached log for full content.

Version-Release number of selected component (if applicable):
ocp-4.1.0-rc3
hco-2.0.0-11


How reproducible:100


Steps to Reproduce:
1.deploy hco on ocp

Actual results: installation is not completed


Expected results: CNV cluster up


Additional info:

Comment 2 David Zager 2019-05-14 18:14:18 UTC
Looks like in order to address CDI we'll need this change to be vendored into the HCO https://github.com/kubevirt/containerized-data-importer/pull/798

Comment 3 David Vossel 2019-05-14 18:53:59 UTC
this log line is an issue

{"component":"virt-operator","level":"info","msg":"reenqueuing KubeVirt kubevirt-hyperconverged/kubevirt-hyperconverged-cluster","pos":"kubevirt.go:419","reason":"unable to patch scc: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json","timestamp":"2019-05-14T06:26:17.118483Z"}

That indicates that OCP4 no longer allows us to use the StrategicMerge patch type for updating SCC.  We can fix this by transitioning to a JSON patch. This will require a change to virt-operator.

Comment 5 David Vossel 2019-05-14 19:58:16 UTC
pr is posted upstream to address this in KubeVirt.

https://github.com/kubevirt/kubevirt/pull/2285

Comment 6 David Zager 2019-05-14 20:54:05 UTC
Denys Shchedrivyi mentioned the following issue with the kubevirt-node-labeller on the cnv-devel list (Subj: [cnv] hco-bundle-registry:v2.0.0-13) that I believe comes from the kubevirt-ssp-operator: 

[root@dell-r640-010 ~]# oc describe pod -n kubevirt-hyperconverged  kubevirt-node-labeller-hzwxl
.
  Warning  Failed            1h (x4 over 1h)     kubelet, working-jww4k-worker-0-6dqp9  Failed to pull image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin:v0.4.0": rpc error: code = Unknown desc = Error reading manifest v0.4.0 in brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin: unknown: Not Found
  Warning  Failed            1h (x4 over 1h)     kubelet, working-jww4k-worker-0-6dqp9  Error: ErrImagePull
  Warning  Failed            5m (x331 over 1h)   kubelet, working-jww4k-worker-0-6dqp9  Error: ImagePullBackOff
  Normal   BackOff           44s (x353 over 1h)  kubelet, working-jww4k-worker-0-6dqp9  Back-off pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/kvm-info-nfd-plugin:v0.4.0"


At current count we have 3 independent issues preventing a successful deployment of CNV:

1) kubevirt-node-labeller fails to start
2) CDI lacks permissions it needs to deploy
3) KubeVirt PR to use JSON patch instead of Strategic Merge


David's PR addresses #3. A bug for 1 and 2 should also be created. I updated the title to reflect what is being fixed.

Comment 7 Francesco Romani 2019-05-15 09:13:18 UTC
(In reply to David Zager from comment #6)
> Denys Shchedrivyi mentioned the following issue with the
> kubevirt-node-labeller on the cnv-devel list (Subj: [cnv]
> hco-bundle-registry:v2.0.0-13) that I believe comes from the
> kubevirt-ssp-operator: 
> 
> [root@dell-r640-010 ~]# oc describe pod -n kubevirt-hyperconverged 
> kubevirt-node-labeller-hzwxl
> .
>   Warning  Failed            1h (x4 over 1h)     kubelet,
> working-jww4k-worker-0-6dqp9  Failed to pull image
> "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-
> virtualization/kvm-info-nfd-plugin:v0.4.0": rpc error: code = Unknown desc =
> Error reading manifest v0.4.0 in
> brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-
> virtualization/kvm-info-nfd-plugin: unknown: Not Found
>   Warning  Failed            1h (x4 over 1h)     kubelet,
> working-jww4k-worker-0-6dqp9  Error: ErrImagePull
>   Warning  Failed            5m (x331 over 1h)   kubelet,
> working-jww4k-worker-0-6dqp9  Error: ImagePullBackOff
>   Normal   BackOff           44s (x353 over 1h)  kubelet,
> working-jww4k-worker-0-6dqp9  Back-off pulling image
> "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-
> virtualization/kvm-info-nfd-plugin:v0.4.0"
> 
> 
> At current count we have 3 independent issues preventing a successful
> deployment of CNV:
> 
> 1) kubevirt-node-labeller fails to start
> 2) CDI lacks permissions it needs to deploy
> 3) KubeVirt PR to use JSON patch instead of Strategic Merge
> 
> 
> David's PR addresses #3. A bug for 1 and 2 should also be created. I updated
> the title to reflect what is being fixed.

Agreed. For #1 a HCO PR could be sufficient, but needs to be tested. We are working on https://github.com/kubevirt/hyperconverged-cluster-operator/pull/94/commits/0fb498c8a6704ee9cd482dff677169948d4bbe82

Comment 8 Fabian Deutsch 2019-05-15 09:25:19 UTC
Michal, do you maybe have any ideas why this has happened?

Comment 9 Lukas Bednar 2019-05-15 11:03:28 UTC
(In reply to David Zager from comment #6)
> 1) kubevirt-node-labeller fails to start

Here about this issue https://bugzilla.redhat.com/show_bug.cgi?id=1710333

> 2) CDI lacks permissions it needs to deploy

Here is for this one https://bugzilla.redhat.com/show_bug.cgi?id=1710261

> 3) KubeVirt PR to use JSON patch instead of Strategic Merge

This bug is addressing this last thing.

Comment 10 Nelly Credi 2019-05-27 08:33:03 UTC
please add fixed in version

Comment 11 Mark McLoughlin 2019-05-28 11:19:58 UTC
(In reply to Fabian Deutsch from comment #8)
> Michal, do you maybe have any ideas why this has happened?

For reference - Fabian's thread to get an understanding of the background to the change: http://post-office.corp.redhat.com/archives/aos-devel/2019-May/msg00547.html

Comment 14 errata-xmlrpc 2019-07-24 20:15:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1850


Note You need to log in before you can comment on or make changes to this bug.