Bug 1991357 - Fresh installation shows kube-apiserver error NodeInstallerDegraded: 1 nodes are failing on revision 4
Summary: Fresh installation shows kube-apiserver error NodeInstallerDegraded: 1 nodes ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.9
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.9.0
Assignee: Antonio Ojea
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 2059659
TreeView+ depends on / blocked
 
Reported: 2021-08-09 06:27 UTC by pmali
Modified: 2022-03-01 16:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:45:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must gather (11.65 MB, application/gzip)
2021-08-09 06:27 UTC, pmali
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1203 0 None None None 2021-08-11 13:41:00 UTC
Github openshift library-go pull 1179 0 None None None 2021-08-10 18:40:04 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:45:39 UTC

Description pmali 2021-08-09 06:27:32 UTC
Created attachment 1812272 [details]
must gather

Description of problem:
Fresh ocp4.9 installation shows error for kube-apiserver.


Version-Release number of selected component (if applicable):
Server Version: 4.9.0-0.nightly-2021-08-07-175228
Platform : AWS

How reproducible:
Occurred Once 

Steps to Reproduce:
1. Install ocp 4.9 environment.
2.
3.

Actual results:
Showing error as below:

    message: "NodeInstallerDegraded: 1 nodes are failing on revision 4:\nNodeInstallerDegraded:
      installer: 30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/user-serving-cert-000\":
      dial tcp 172.30.0.1:443: connect: connection refused\nNodeInstallerDegraded:
      I0809 04:46:58.847903       1 copy.go:24] Failed to get secret openshift-kube-apiserver/user-serving-cert-000:
      Get \"https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/user-serving-cert-000\":
      dial tcp 172.30.0.1:443: connect: connection refused\nNodeInstallerDegraded:
      I0809 04:46:59.116854       1 copy.go:24] Failed to get secret openshift-kube-apiserver/user-serving-cert-000:
      Get \"https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/user-serving-cert-000\":
      dial tcp 172.30.0.1:443: connect: connection refused\nNodeInstallerDegraded:
      W0809 04:46:59.117906       1 recorder.go:198] Error creating event &Event{ObjectMeta:{installer-4-ip-10-0-167-134.us-east-2.compute.internal.169989f37bb86e32
      \ openshift-kube-apiserver    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[]
      map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:openshift-kube-apiserver,Name:installer-4-ip-10-0-167-134.us-east-2.compute.internal,UID:7c85fc09-408d-4e73-b175-6177507e47da,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:StaticPodInstallerFailed,Message:Installing
      revision 4: Get \"https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/user-serving-cert-000\":
      dial tcp 172.30.0.1:443: connect: connection refused,Source:EventSource{Component:static-pod-installer,Host:,},FirstTimestamp:2021-08-09
      04:46:59.116887602 +0000 UTC m=+19.308927563,LastTimestamp:2021-08-09 04:46:59.116887602
      +0000 UTC m=+19.308927563,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00
      +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}:
      Post \"https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/events\":
      dial tcp 172.30.0.1:443: connect: connection refused\nNodeInstallerDegraded:
      F0809 04:46:59.118050       1 cmd.go:96] failed to copy: Get \"https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/user-serving-cert-000\":
      dial tcp 172.30.0.1:443: connect: connection refused\nNodeInstallerDegraded: "
    reason: NodeInstaller_InstallerPodFailed
    status: "True"
    type: Degraded

Expected results:
cluster operator kube-apiserver should not show any error



Additional info:

Comment 2 Antonio Ojea 2021-08-10 18:32:23 UTC
https://github.com/openshift/library-go/pull/1179

Comment 4 Ke Wang 2021-08-24 03:30:37 UTC
Two PRs of this bug, the last PR was merged in 8 days ago, in the past 14 days, there are still many such failures can be found,

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=NodeInstallerDegraded.*1+nodes+are+failing+on+revision&maxAge=336h&context=1&type=junit&name=4%5C.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'kube-apiserver.*1 nodes are failing on revision' | wc -l
30

In the past 7 days, after both PR was merged, we cannot find any such failure, 
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=NodeInstallerDegraded.*1+nodes+are+failing+on+revision&maxAge=168h&context=1&type=junit&name=4%5C.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'kube-apiserver.*1 nodes are failing on revision'
No results found.

Based on the above, the PRs work fine, so move the bug VERIFIED.

Comment 5 Ke Wang 2021-08-24 03:35:41 UTC
In addition,no such issues have been found in recent installations.

Comment 8 errata-xmlrpc 2021-10-18 17:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.