Bug 1868760

Summary: [4.4] node client cert requests armoring: deny pod's access to /config/master API endpoint
Product: OpenShift Container Platform Reporter: Micah Abbott <miabbott>
Component: Cloud ComputeAssignee: Michael McCune <mimccune>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agarcial, aos-bugs, danili, jokerman, mimccune, openshift-bugzilla-robot, skunkerk, zhsun
Version: 4.4   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: non-multi-arch
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1868464 Environment:
node client cert requests armoring: [Top Level] node client cert requests armoring: deny pod's access to /config/master API endpoint [Suite:openshift/conformance/parallel]
Last Closed: 2020-10-27 16:28:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1868464, 1876931    

Description Micah Abbott 2020-08-13 18:17:16 UTC
+++ This bug was initially created as a clone of Bug #1868464 +++

test:
node client cert requests armoring: deny pod's access to /config/master API endpoint 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=node+client+cert+requests+armoring%3A+deny+pod%27s+access+to+%2Fconfig%2Fmaster+API+endpoint

fail [github.com/openshift/origin/test/extended/csrapprover/csrapprover.go:48]: Unexpected error:
    <*errors.errorString | 0xc0002981c0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.3/1293578108904935424

--- Additional comment from Seth Jennings on 2020-08-12 18:51:14 UTC ---

failure context

=============
[It] deny pod's access to /config/master API endpoint [Suite:openshift/conformance/parallel]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/csrapprover/csrapprover.go:36
Aug 12 17:31:23.259: INFO: Running 'oc --namespace=e2e-test-cluster-client-cert-bn47n --config=/tmp/configfile787210623 run get-bootstrap-creds --labels name=get-bootstrap-creds --image quay.io/fedora/fedora:32-x86_64 --restart Never --command -- /bin/bash -c sleep infinity'
[AfterEach] node client cert requests armoring:
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:101
STEP: Collecting events from namespace "e2e-test-cluster-client-cert-bn47n".
STEP: Found 5 events.
Aug 12 17:34:25.311: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for get-bootstrap-creds: {default-scheduler } Scheduled: Successfully assigned e2e-test-cluster-client-cert-bn47n/get-bootstrap-creds to ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6
Aug 12 17:34:25.311: INFO: At 2020-08-12 17:31:26 +0000 UTC - event for get-bootstrap-creds: {kubelet ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6} Pulling: Pulling image "quay.io/fedora/fedora:32-x86_64"
Aug 12 17:34:25.311: INFO: At 2020-08-12 17:31:38 +0000 UTC - event for get-bootstrap-creds: {kubelet ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6} Pulled: Successfully pulled image "quay.io/fedora/fedora:32-x86_64"
Aug 12 17:34:25.311: INFO: At 2020-08-12 17:31:38 +0000 UTC - event for get-bootstrap-creds: {kubelet ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6} Created: Created container get-bootstrap-creds
Aug 12 17:34:25.311: INFO: At 2020-08-12 17:31:38 +0000 UTC - event for get-bootstrap-creds: {kubelet ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6} Started: Started container get-bootstrap-creds
Aug 12 17:34:25.451: INFO: POD                  NODE                                       PHASE   GRACE  CONDITIONS
Aug 12 17:34:25.451: INFO: get-bootstrap-creds  ci-op-pbbtjczd-416f4-lv9g6-worker-0-hbwd6  Failed         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-08-12 17:31:24 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-08-12 17:31:24 +0000 UTC ContainersNotReady containers with unready status: [get-bootstrap-creds]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-08-12 17:31:24 +0000 UTC ContainersNotReady containers with unready status: [get-bootstrap-creds]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-08-12 17:31:24 +0000 UTC  }]
Aug 12 17:34:25.451: INFO: 
Aug 12 17:34:25.596: INFO: get-bootstrap-creds[e2e-test-cluster-client-cert-bn47n].container[get-bootstrap-creds].log
standard_init_linux.go:211: exec user process caused "exec format error"

Aug 12 17:34:25.731: INFO: skipping dumping cluster info - cluster too large
Aug 12 17:34:25.934: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-cluster-client-cert-bn47n-user}, err: <nil>
Aug 12 17:34:26.152: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-cluster-client-cert-bn47n}, err: <nil>
Aug 12 17:34:26.339: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  P8J7qchYTRC8PB-c4PbdZQAAAAAAAAAA}, err: <nil>
[AfterEach] node client cert requests armoring:
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152
Aug 12 17:34:26.339: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-test-cluster-client-cert-bn47n" for this suite.
Aug 12 17:34:26.682: INFO: Running AfterSuite actions on all nodes
Aug 12 17:34:26.682: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin/test/extended/csrapprover/csrapprover.go:48]: Unexpected error:
    <*errors.errorString | 0xc0002981c0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

failed: (3m11s) 2020-08-12T17:34:26 "node client cert requests armoring: deny pod's access to /config/master API endpoint [Suite:openshift/conformance/parallel]"
=============

in particular 

standard_init_linux.go:211: exec user process caused "exec format error"

test suite is e2e-remote-libvirt-s390x-4.3 so this is s390x trying to exec a x86_64 binary

--- Additional comment from Sohan Kunkerkar on 2020-08-12 18:56:57 UTC ---



--- Additional comment from Seth Jennings on 2020-08-12 18:59:04 UTC ---

changed in
4.6 https://github.com/openshift/origin/pull/25087

backported in
4.5 https://bugzilla.redhat.com/show_bug.cgi?id=1846091
4.4 https://bugzilla.redhat.com/show_bug.cgi?id=1862171
4.3 https://bugzilla.redhat.com/show_bug.cgi?id=1867402

xref https://bugzilla.redhat.com/show_bug.cgi?id=1845792

Node team did backports to 4.4 and 4.3 in response to https://bugzilla.redhat.com/show_bug.cgi?id=1867613 but change originated with Cloud team.

--- Additional comment from Seth Jennings on 2020-08-12 19:00:46 UTC ---

Failing against all releases that run this test
https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/?job=*e2e-remote-libvirt-s390x*

--- Additional comment from Michael McCune on 2020-08-12 20:36:37 UTC ---

i don't think this bug is about the Cloud Compute component, it should probably be addressed to the node team.

--- Additional comment from Seth Jennings on 2020-08-13 14:23:41 UTC ---

Assigned to Cloud because https://bugzilla.redhat.com/show_bug.cgi?id=1845792, the change that introduced this break, was assign to Cloud and Alberto

--- Additional comment from Michael McCune on 2020-08-13 14:35:38 UTC ---

ack, thanks Seth. i'll spend a little more time reviewing those.

Comment 4 Michael McCune 2020-09-04 19:33:00 UTC
*** Bug 1868464 has been marked as a duplicate of this bug. ***

Comment 7 sunzhaohua 2020-09-14 08:23:53 UTC
From the test history, didn't meet this again, move it to Verified.
https://prow.ci.openshift.org/pr-history/?org=openshift&repo=origin&pr=25480

Comment 8 Michael McCune 2020-09-30 18:52:18 UTC
*** Bug 1876931 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2020-10-27 16:28:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 11 Joel Speed 2021-04-28 11:36:15 UTC
*** Bug 1876931 has been marked as a duplicate of this bug. ***