Bug 1768262 - node failed to upgrade - master node not ready
Summary: node failed to upgrade - master node not ready
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Ryan Phillips
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-03 18:21 UTC by Ben Parees
Modified: 2020-01-23 11:10 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:10:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24129 0 'None' closed Bug 1768262: Bump nodes ready timeout 2020-11-06 09:55:01 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:10:40 UTC

Description Ben Parees 2019-11-03 18:21:59 UTC
Description of problem:
Nov  1 04:09:50.155: INFO: cluster upgrade is Progressing: Unable to apply 4.3.0-0.nightly-2019-10-31-223009: the cluster operator kube-apiserver is degraded
Nov  1 04:09:50.155: INFO: cluster upgrade is Failing: Cluster operator kube-apiserver is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-146-62.ec2.internal" not ready


Nov  1 04:17:10.494: INFO: Condition Ready of node ip-10-0-146-62.ec2.internal is false, but Node is tainted by NodeController with [{node-role.kubernetes.io/master  NoSchedule <nil>} {node.kubernetes.io/unschedulable  NoSchedule 2019-11-01 03:26:21 +0000 UTC} {node.kubernetes.io/unreachable  NoSchedule 2019-11-01 03:27:14 +0000 UTC} {node.kubernetes.io/unreachable  NoExecute 2019-11-01 03:27:20 +0000 UTC}]. Failure


in:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/10319

Comment 1 Ben Parees 2019-11-03 18:24:38 UTC
possibly related but probably not since it's Azure, not AWS: worker node failed to upgrade/become ready:

Nov  1 04:48:22.871: INFO: Pool worker is still reporting (Updated: false, Updating: true, Degraded: false)
Nov  1 04:48:22.871: INFO: Unexpected error occurred: Pools did not complete upgrade: timed out waiting for the condition

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.2/200

Feel free to split it out as a separate bug after investigation.

Comment 2 Ben Parees 2019-11-03 18:25:52 UTC
recurrence of the AWS upgrade failure from the initial BZ description:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/10333

Comment 3 Ryan Phillips 2019-11-07 18:25:40 UTC
In the 10333 upgrade test the ip-10-0-137-155.ec2.internal node did not come back. It is hard to tell as to why, since those master logs are missing.

Comment 4 Ryan Phillips 2019-11-11 19:46:55 UTC
Build: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2648/pull-ci-openshift-installer-master-e2e-aws-upgrade/3264

Log: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2648/pull-ci-openshift-installer-master-e2e-aws-upgrade/3264/build-log.txt

1. At 11:36:29 ip-10-0-151-180 node reboots
2. Upon reboot, there are a number of pods exiting with 255 (or other) error codes

I suspect a timeout needs to be bumped within the unit tests.

Comment 6 MinLi 2019-11-19 09:02:27 UTC
test several times, not reproduce, verified.

Comment 8 errata-xmlrpc 2020-01-23 11:10:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.