Bug 1694878
Summary: | Unexpected `Unauthorized` errors in e2e extended tests when openshift-apiserver available==true | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gabe Montero <gmontero> |
Component: | apiserver-auth | Assignee: | Mo <mkhan> |
Status: | CLOSED ERRATA | QA Contact: | Chuan Yu <chuyu> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | adam.kaplan, aos-bugs, ccoleman, dgoodwin, evb, mkhan, nagrawal, slaznick, tkral |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | buildcop | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:46:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gabe Montero
2019-04-01 22:51:39 UTC
The apiserver message is unfortunately but a red herring caused by sending an empty user.Info structure to SAR in SCC admission plugin. This should probably be fixed but it is not the root cause of the trouble. "TLS handshake error from 10.131.0.5:49028: remote error: tls: bad certificate throughout" looks bad, though, I'll try to investigate that next. The bug https://bugzilla.redhat.com/show_bug.cgi?id=1695048 that Devan opened yesterday looks an awful lot like this one as well. So I found the damming 401 audit log for an `Unauthorized` failure on an `oc start-build ...` ip-10-0-137-162.ec2.internal-audit.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"c846969e-c563-4dce-b3de-11685224ca3f","stage":"ResponseComplete","requestURI":"/apis/build.openshift.io/v1/namespaces/e2e-test-cli-start-build-c4jjg/buildconfigs/sample-build/webhooks/mysecret/generic","verb":"create","user":{"username":"system:anonymous","groups":["system:unauthenticated"]},"sourceIPs":["10.0.6.236"],"userAgent":"curl/7.29.0","objectRef":{"resource":"buildconfigs","namespace":"e2e-test-cli-start-build-c4jjg","name":"sample-build","apiGroup":"build.openshift.io","apiVersion":"v1","subresource":"webhooks"},"responseStatus":{"metadata":{},"status":"Failure","reason":"Unauthorized","code":401},"requestReceivedTimestamp":"2019-04-02T21:00:02.514046Z","stageTimestamp":"2019-04-02T21:00:02.565089Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:webhooks\" of ClusterRole \"system:webhook\" to Group \"system:unauthenticated\""}} ip-10-0-172-97.ec2.internal-audit-2019-04-02T21-13-56.645.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"e0f84536-0250-4071-82e0-4436be5e98b6","stage":"ResponseComplete","requestURI":"/apis/build.openshift.io/v1/namespaces/e2e-test-cli-start-build-c4jjg/buildconfigs/sample-build/webhooks/mysecret/generic","verb":"create","user":{"username":"system:anonymous","groups":["system:unauthenticated"]},"sourceIPs":["10.0.6.236"],"userAgent":"curl/7.29.0","objectRef":{"resource":"buildconfigs","namespace":"e2e-test-cli-start-build-c4jjg","name":"sample-build","apiGroup":"build.openshift.io","apiVersion":"v1","subresource":"webhooks"},"responseStatus":{"metadata":{},"code":401},"requestReceivedTimestamp":"2019-04-02T21:00:02.507874Z","stageTimestamp":"2019-04-02T21:00:02.602991Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:webhooks\" of ClusterRole \"system:webhook\" to Group \"system:unauthenticated\""}} In conversations with Mo, it looks like the authorization provided for the build config webhook is no bueno So there appears to be some sort of webhook flavor to this, even if the empty user error message from before is in fact benign. Mo said he is working on another debug PR. Be sure to run /test e2e-aws-builds /test e2e-aws-jenkins in that PR. Getting an `Unauthorized` should be pretty likely. I've triggered them in Standa's debug PR. So neither Mo's or my debug PRs turned up anymore unexpected 401's on the bc webhook flows (my debug did uncover one of our e2e's that expectedly induces an error in the bc webhook flow :-) ). Today, I saw several `Unauthorized` errors that mapped to 404's on the user lookup during the e2e's. I posted details on one in https://github.com/openshift/origin/pull/22482#issuecomment-480028888 I'll start trying to figure out how to add missing user diagnostics to my debug PR. Of course suggestions/comments welcome. *** Bug 1695048 has been marked as a duplicate of this bug. *** *** Bug 1702103 has been marked as a duplicate of this bug. *** Sally - can you add information about what your initial investigations today found? Marking this as urgent since 1/2 ci runs fail in part due to the flakes this causes. Generates a huge amount of noise impairing our ability to find other issues. I will update this tomorrow on my efforts over the last few weeks trying to track this down. Sally, please focus on other BZs. Seems that we have a newer manifestation of this issue pushing imagestreamtags to the registry. Getting lots of these errors in latest registry and build tests: error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.skip_mount_home=false]image-registry.openshift-image-registry.svc/e2e-test-build-multistage-l2px4/multi-stage:v1" to "docker://image-registry.openshift-image-registry.svc/e2e-test-build-multistage-l2px4/multi-stage:v1": Error trying to reuse blob sha256:8ba884070f611d31cb2c42eddb691319dc9facf5e0ec67672fcfa135181ab3df at destination: Error checking whether a blob sha256:8ba884070f611d31cb2c42eddb691319dc9facf5e0ec67672fcfa135181ab3df exists in image-registry.openshift-image-registry.svc/e2e-test-build-multistage-l2px4/multi-stage: unauthorized: authentication required This is critical as all build tests are failing on this. https://github.com/openshift/origin/pull/22679 is merged and that will fix this issue. https://github.com/openshift/cluster-authentication-operator/pull/118 is related but not required to fix this issue (so QA can begin testing immediately). #118 fixes a different issue where CAO reports available too soon which made the bug fixed by #22679 more common (as it is a race). FWIW, after consistently seeing the various forms of `Unauthorized` throughout last week in my various PRs, since Mo's PRs merged 2 days ago, I have had a double digit amount of runs across e2e-aws-build/jenkins/image-ecosystem in my various PRs run `Unauthorized` free. Bug reporters from dev do not mark bugs verified, but from things are looking good from my end. Thanks Mo. Verified. Run the e2e testing locally, no such issue now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |