Bug 1867380 - When using webhooks in OCP 4.5 fails to rollout latest deploymentconfig
Summary: When using webhooks in OCP 4.5 fails to rollout latest deploymentconfig
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks: 1906428 1906640 1907958
TreeView+ depends on / blocked
 
Reported: 2020-08-09 12:23 UTC by hgomes
Modified: 2023-12-15 18:46 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Specifies the correct version for the DeploymentConfig. Previously all requests targeting "deploymentconfigs/{name}/instantiate" subresource failed with "no kind DeploymentConfig is registered for version apps.openshift.io/"
Clone Of:
: 1906428 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:15:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-apiserver pull 165 0 None closed Bug 1867380: When using webhooks in OCP 4.5 fails to rollout latest deploymentconfig 2021-02-18 08:37:26 UTC
Red Hat Knowledge Base (Solution) 5644641 0 None None None 2020-12-13 12:13:00 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:15:49 UTC

Description hgomes 2020-08-09 12:23:50 UTC
Description of problem:
After deploying OPA
https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/


$ oc rollout latest dc/httpd
Error from server (InternalError): Internal error occurred: no kind "DeploymentConfig" is registered for version "apps.openshift.io/" in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"


Increase the debug level to 6 of the kubelet services:

~~~
I0728 13:52:52.747168       1 rest.go:94] New deployment for "httpd" caused by []apps.DeploymentCause{apps.DeploymentCause{Type:"Manual", ImageTrigger:(*apps.DeploymentCauseImageTrigger)(nil)}}
I0728 13:52:52.747564       1 wrap.go:47] POST /apis/apps.openshift.io/v1/namespaces/ifnotporesent-latest-opa-test/deploymentconfigs/httpd/instantiate: (14.391723ms) 500
goroutine 49338 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc43f19c690, 0x1f4)
        /builddir/build/BUILD/atomic-openshift-git-0.a5bc32f/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:207 +0xd2
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc43f19c690, 0x1f4)
logging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Internal error occurred: no kind \\\"DeploymentConfig\\\" is registered for version \\\"apps.openshift.io/\\\" in scheme \\\"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29\\\"\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"no kind \\\"DeploymentConfig\\\" is registered for version \\\"apps.openshift.io/\\\" in scheme \\\"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29\\\"\"}]},\"code\":500}\n"
 [[oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0] 10.0.95.93:45820]
~~~


When looking through the code it is omitting the version v1 in the /instantiate method which is not a registered type.


Version-Release number of selected component (if applicable):

OCP 4.5.4

How reproducible:

100%

Steps to Reproduce:
1. Deploy OPA
2. Use the policy
`
deny[msg] {
    input.request.kind.kind == "DeploymentConfig"
    msg:= "No entry for you"
}`
3. Deploy a sample app like httpd..then run `oc rollout latest dc/myapache`
The specific command 'rollout latest' isnt accepted. But creating new apps works ok.


Actual results:


Expected results:

To have 'oc rollout latest' working with DeploymentoConfig

Additional info:

Comment 1 hgomes 2020-08-16 14:51:20 UTC
Hi team,
Any updates? Let me know if you have any questions. This bug has been reproduced a few times.

Comment 2 Stefan Schimanski 2020-08-21 15:53:34 UTC
Please attach must-gather logs.

Comment 4 hgomes 2020-08-24 13:30:37 UTC
A snip of DeploymentConfig rollout 'example'

~~~
2020-08-04T15:44:29.335127517Z I0804 15:44:29.335078       1 rest.go:94] New deployment for "example" caused by []apps.DeploymentCause{apps.DeploymentCause{Type:"Manual"
, ImageTrigger:(*apps.DeploymentCauseImageTrigger)(nil)}}
2020-08-04T15:44:29.335408104Z I0804 15:44:29.335389       1 controller.go:606] quota admission added evaluator for: DeploymentConfig.apps.openshift.io
2020-08-04T15:44:29.336126052Z I0804 15:44:29.336082       1 httplog.go:90] verb="POST" URI="/apis/apps.openshift.io/v1/namespaces/test-opa/deploymentconfigs/example/ins
tantiate" latency=20.094503ms resp=500 UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36" src
IP="10.17.0.1:51318": 
2020-08-04T15:44:29.336126052Z goroutine 13369 [running]:
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc00319bb90, 0x1f4)
2020-08-04T15:44:29.336126052Z  @/k8s.io/apiserver/pkg/server/httplog/httplog.go:225 +0xc8
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc00319bb90, 0x1f4)
2020-08-04T15:44:29.336126052Z  @/k8s.io/apiserver/pkg/server/httplog/httplog.go:204 +0x35
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).WriteHeader(0xc00605e300, 0x1f4)
2020-08-04T15:44:29.336126052Z  @/k8s.io/apiserver/pkg/server/filters/timeout.go:228 +0xb2
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/endpoints/filters.(*auditResponseWriter).WriteHeader(0xc005cfa500, 0x1f4)
2020-08-04T15:44:29.336126052Z  @/k8s.io/apiserver/pkg/endpoints/filters/audit.go:219 +0x63
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/endpoints/metrics.(*ResponseWriterDelegator).WriteHeader(0xc004a2fef0, 0x1f4)
2020-08-04T15:44:29.336126052Z  @/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:389 +0x45
2020-08-04T15:44:29.336126052Z k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.(*deferredResponseWriter).Write(0xc005ca4230, 0xc004b26200, 0x1fd, 0x1fd, 0x0, 0x2bd8718, 0x2)
~~~

Comment 5 Stefan Schimanski 2020-09-11 15:12:30 UTC
We have to look into this. Adding UpcomingSprint.

Comment 10 Abu Kashem 2020-10-25 15:31:22 UTC
Iā€™m adding UpcomingSprint, because I was occupied with fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. We will revisit this in a future sprint.

Comment 13 Abu Kashem 2020-11-06 14:27:07 UTC
Hi mtleilia, 
This is on my plate now, I have started looking at it.

Comment 15 Abu Kashem 2020-11-06 16:05:44 UTC
Hi mtleilia, hgomes
Did the customer install OPA using the instructions from here - https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/#3-deploy-opa-on-top-of-kubernetes? 

I used the steps outlined here https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control. The admission webhook seems to run but I am seeing the following error:

> time="2020-11-06T15:45:06Z" level=error msg="Failed to reset OPA data for v1/namespaces (will retry after 30s): Put http://localhost:8181/v1/data/kubernetes/namespaces: dial tcp [::1]:8181: getsockopt: connection refused" 

Maybe this is a noise, but I would like the following in order to make sure I am following the same exact steps:

- The exact steps and all the yaml files that the customer used to install OPA (if different from the above URL)
- The yaml file for the "dc/httpd" that the customer used.

Comment 20 Abu Kashem 2020-11-10 18:28:19 UTC
Hi mtleilia,
It took me a while to setup opa properly on my cluster, and I can reproduce the issue. I will take a deep dive to find out the root cause. Stay tuned.

Comment 21 Abu Kashem 2020-11-10 23:18:24 UTC
Hi mtleilia,
I am still debugging the issue, have not found the root cause yet. But I think I might have a workaround. Please ask the customer to try it and let me know if it works.

Basically, specify the group and version for the openshift 'apps' API group in the webhook rules:
> kind: ValidatingWebhookConfiguration
> apiVersion: admissionregistration.k8s.io/v1beta1
> metadata:
>   name: opa-validating-webhook
> webhooks:
>     rules:
>     - operations: ["CREATE", "UPDATE"]
>       apiGroups:
>       - apps.openshift.io
>       apiVersions:
>       - v1
>       resources: ["*"]
>       scope: 'Namespaced'
>

Make the above the first rule in the "rules" section of the validating webhook configuration and apply the change
> oc apply -f webhook-configuration.yaml

After this change I expect 'oc rollout latest dc/jenkins' to work. Let me know.

Comment 24 antonio.quintavalle 2020-11-12 13:42:34 UTC
Hello Ashwini,
we have tried to add the reported rule to OPA ValidatingWebhookConfiguration as suggested:

...
  rules:
  - apiGroups:
    - apps.openshift.io
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - '*'
    scope: '*'
  - apiGroups:
    - '*'
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - '*'
    scope: '*'
...

DeploymentConfig rollout still fails because of the same issue:

oc rollout latest <deployment-config-name>
Error from server (InternalError): Internal error occurred: no kind "DeploymentConfig" is registered for version "apps.openshift.io/" in scheme "pkg/api/legacyscheme/scheme.go:30"

Here are some details about our environment:

oc version
Client Version: 4.4.10
Server Version: 4.5.7
Kubernetes Version: v1.18.3+2cf11e2

Thank you,
Best Regards,
Antonio

Comment 25 Abu Kashem 2020-11-12 14:30:02 UTC
Hi akhaire, mtleilia,

Can you please try the following rues? I tested it on my dev cluster and it seems to work. If the following works, I am proposing it as a workaround while we identify the root cause and get a fix in place.

>
> kind: ValidatingWebhookConfiguration
> apiVersion: admissionregistration.k8s.io/v1beta1
> metadata:
>   name: opa-validating-webhook
> webhooks:
>     rules:
>     - apiGroups:
>       - apps.openshift.io
>       apiVersions:
>       - v1
>       operations:
>       - CREATE
>       - UPDATE
>       resources:
>       - '*/*'
>       scope: '*'
>     - apiGroups:
>       - '*'
>       apiVersions:
>       - v1
>       - v1beta1
>       - v1alpha1
>       operations:
>       - CREATE
>       - UPDATE
>       resources:
>       - '*'
>       scope: '*'
>

Comment 26 antonio.quintavalle 2020-11-13 14:25:57 UTC
Hello,
I confirm that the workaround works fine on OCP4 cluster.

Such workaround wouldn't allow to validate any resources available under apps.openshift.io/*, but it should be just deploymentconfigs:

https://docs.openshift.com/container-platform/4.4/rest_api/workloads_apis/deploymentconfig-apps-openshift-io-v1.html#api-endpoints

Do you confirm? 

Thank you in advance,
Best Regards,
Antonio

Comment 27 Abu Kashem 2020-11-13 14:49:01 UTC
Hi antonio.quintavalle,

> Such workaround wouldn't allow to validate any resources available under apps.openshift.io/*, but it should be just deploymentconfigs:

It should, the rules above just specify 'apps.openshift.io' explicitly. Rules are evaluated in order and if any rule matches the object being created/updated will be validated.


The following rule should work as well

>
> kind: ValidatingWebhookConfiguration
> apiVersion: admissionregistration.k8s.io/v1beta1
> metadata:
>   name: opa-validating-webhook
> webhooks:
>     rules:
>     - apiGroups:
>       - '*'
>       apiVersions:
>       - v1
>       - v1beta1
>       - v1alpha1
>       operations:
>       - CREATE
>       - UPDATE
>       resources:
>       - '*'
>       scope: '*'
>

You just need to specify the version explicitly, not use '*'. If you have any API that uses a different version than the ones specified above you can add it to the list.

Comment 28 Abu Kashem 2020-11-13 16:04:19 UTC
work continues - workaround has been suggested to the customer while we find the root cause

Comment 30 Abu Kashem 2020-11-23 17:05:36 UTC
Hi akhaire,
can you share the yaml of the webhook?

Comment 33 Abu Kashem 2020-12-01 22:08:12 UTC
Hi akhaire,
which webhook is failing, I see two webhooks in the configuration yaml?

can you remove the second webhook from the configuration and test?
- clientConfig:
    caBundle: Cg==
    service:
      name: gatekeeper-webhook-service
      namespace: infra-gatekeeper
      path: /v1/admitlabel
  failurePolicy: Fail
  name: check-ignore-label.gatekeeper.sh
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - namespaces
  sideEffects: None
  timeoutSeconds: 5

Comment 34 Stefan Schimanski 2020-12-09 16:44:35 UTC
Is the snippet in c33 from a real webhook config? If yes, this is dangerous. Always exclude control-plane (run-level 0 and 1) namespaces from any such rule.

Comment 36 Lukasz Szaszkiewicz 2020-12-10 13:48:44 UTC
Hi there,

We managed to find the root cause. We have opened the following PR https://github.com/openshift/openshift-apiserver/pull/165 to address the issue.
We are going to back port the fix to 4.5.
Feel free to validate it on 4.7 once the mentioned PR merges and share your feedback.
Many thanks.

Comment 39 Xingxing Xia 2020-12-15 14:59:47 UTC
Verified in 4.7.0-0.nightly-2020-12-15-005943:
Below steps referred to https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/
but I modified something, e.g.:
4.7 ValidatingWebhookConfiguration is v1 instead of v1beta1,
and v1 requires sideEffects and admissionReviewVersions must be not empty, so I added them,
and I modified with "operator: In", and I modified with "scope: 'Namespaced'", because original link defines it too destructive,
and so on.

$ cd bug-deploymentconfig-webhook-1867380
$ openssl genrsa -out ca.key 2048
$ openssl req -x509 -new -nodes -key ca.key -days 100000 -out ca.crt -subj "/CN=admission_ca"
$ cat >server.conf <<EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth
EOF

$ openssl genrsa -out server.key 2048
$ openssl req -new -key server.key -out server.csr -subj "/CN=opa.opa.svc" -config server.conf
$ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 100000 -extensions v3_req -extfile server.conf
$ oc create secret tls opa-server --cert=server.crt --key=server.key

# admission-controller.yaml is copied from https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/
# but here I need to add below securityContext fields for "opa" container
# otherwise the container will CrashLoopBackOff with err in binding 443 port in logs
#         - name: opa
#           image: openpolicyagent/opa:0.12.2
#           securityContext:
#             privileged: true
#             runAsUser: 0
$ oc apply -f admission-controller.yaml
$ cat > webhook-configuration.yaml <<EOF
kind: ValidatingWebhookConfiguration
apiVersion: admissionregistration.k8s.io/v1
metadata:
  name: opa-validating-webhook
webhooks:
  - name: validating-webhook.openpolicyagent.org
    admissionReviewVersions:
    - v1
    sideEffects: None
    namespaceSelector:
      matchExpressions:
      - key: openpolicyagent.org/webhook
        operator: In
        values:
        - ignore
    rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: ["*"]
        apiVersions: ["*"]
        resources: ["*"]
        scope: 'Namespaced'
    clientConfig:
      caBundle: $(cat ca.crt | base64 | tr -d '\n')
      service:
        namespace: opa
        name: opa
EOF

$ oc adm policy add-scc-to-user privileged -z default # to solve the CrashLoopBackOff mentioned above
$ oc get po
NAME                  READY   STATUS    RESTARTS   AGE
opa-fffdf4574-wmfgr   2/2     Running   0          36s
$ oc apply -f webhook-configuration.yaml
validatingwebhookconfiguration.admissionregistration.k8s.io/opa-validating-webhook created
$ oc create ns test-ns
$ oc label ns test-ns openpolicyagent.org/webhook=ignore
$ oc create deploymentconfig mydc --image openshift/hello-openshift -n test-ns
deploymentconfig.apps.openshift.io/mydc created
$ oc rollout latest dc/mydc -n test-ns                         
deploymentconfig.apps.openshift.io/mydc rolled out

oc rollout works well for deploymentconfig

Comment 40 Xingxing Xia 2020-12-15 15:24:46 UTC
Per comment 0, I need to define a policy.
So adding more steps below:
$ cat dc-policy.rego 
package kubernetes.admission

deny[msg] {
    input.request.kind.kind == "DeploymentConfig"
    msg:= "No entry for you"
}

$ oc rollout latest dc/mydc -n test-ns
Error from server (No entry for you): admission webhook "validating-webhook.openpolicyagent.org" denied the request: No entry for you

$ oc create deploymentconfig mydc2 --image openshift/hello-openshift -n test-ns
Error from server (No entry for you): admission webhook "validating-webhook.openpolicyagent.org" denied the request: No entry for you

oc rollout, or oc oc create deploymentconfig, don't hit the bug's error, and the output is expected as the policy.

Comment 41 Xingxing Xia 2020-12-15 15:27:25 UTC
Forgot pasting: after cat dc-policy.rego, I also ran: oc create configmap dc-policy --from-file=dc-policy.rego

Comment 42 Xingxing Xia 2020-12-16 03:06:58 UTC
(In reply to Xingxing Xia from comment #39)
...
Forgot pasting something, here ran:
$ oc create namespace opa
$ oc project opa
$ mkdir bug-deploymentconfig-webhook-1867380
> $ cd bug-deploymentconfig-webhook-1867380
...
>     admissionReviewVersions:
>     - v1
Here use v1beta1 instead of v1:
     admissionReviewVersions:
     - v1beta1
(Otherwise later `oc create deploymentconfig` hits: failed calling webhook "validating-webhook.openpolicyagent.org": converting (v1beta1.AdmissionReview) to (v1.AdmissionReview): unknown conversion)

Comment 46 errata-xmlrpc 2021-02-24 15:15:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 47 Red Hat Bugzilla 2023-09-18 00:21:57 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.