Bug 1873590

Summary: [4.6] rhcos nightly builds contain unstable etcd
Product: OpenShift Container Platform Reporter: Sam Batschelet <sbatsche>
Component: ReleaseAssignee: Justin Pierce <jupierce>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.6CC: aos-bugs, bbreard, geliu, imcleod, jligon, jokerman, jupierce, mifiedle, mkrutov, nstielau, walters, wking, wzheng
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:35:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sam Batschelet 2020-08-28 17:19:47 UTC
Description of problem: Description of problem: The build process for rhcos had divergence in the golang versions that were used to compile etcd. The CI images were compiled with golang 1.12. While the rhcos builds were compiled using golang 1.14.

THe problem with that is bbolt(KVS) version used in 3.4.9 is not compatible with golang 1.14. Although the binary compiles, unit tests panic[1].

It appears that the issue can manifest itself with very high etcd CPU utilization that can result in OOM. Other issues may also exist. 

For this reason any cluster built with these images should be considered tainted. Until such time that rhcos builds contain the correct etcd binary image. CI nightly images would perhaps be the next choice.

[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/11302/rehearse-11302-pull-ci-openshift-etcd-openshift-4.6-unit/1298751792531640320


Version-Release number of selected component (if applicable):


How reproducible: unclear, assumption is 100%


Steps to Reproduce:
1. oc logs -n openshift-etcd $POD -c etcd | grep 'Go Version:'
2.
3.

Actual results: etcd is compiled with golang 1.14


Expected results: etcd is compiled with golang 1.12


Additional info:

Comment 2 Sam Batschelet 2020-08-28 17:26:36 UTC
*** Bug 1873412 has been marked as a duplicate of this bug. ***

Comment 3 Sam Batschelet 2020-08-28 17:32:49 UTC
*** Bug 1872598 has been marked as a duplicate of this bug. ***

Comment 4 Colin Walters 2020-08-28 17:46:06 UTC
Also RHCOS doesn't include etcd, it comes as a container image right?  So Build seems more likely yes.

Comment 5 Justin Pierce 2020-08-28 19:11:33 UTC
ART / production builds changed to golang 1.12 for etcd on 8/26 with this commit: https://github.com/openshift/ocp-build-data/commit/2c8eb9dd53cf69b010c58b12837a93f3b3ac4a8f

Confirmed in build logs that etcd is being compiled with 1.12 in recent nightlies with timestamps >= 20200826.195121.

From the latest nightly:
[jupierce@localhost Downloads]$ oc adm release info registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-28-102309 --pullspecs | grep etcd
  cluster-etcd-operator                          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7673f35971fac8fe36eddd732a4ef62469bdbb7db22a777c90524f888889b5fa
  etcd                                           quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84
[jupierce@localhost Downloads]$ docker run -it --rm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84
Unable to find image 'quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84' locally
sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84: Pulling from openshift-release-dev/ocp-v4.0-art-dev
c9fa7d57b902: Already exists 
74cbb6607642: Already exists 
c676df4ac84e: Already exists 
ad956945835b: Already exists 
0fc958457837: Pull complete 
Digest: sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84
Status: Downloaded newer image for quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52d6a7a9e653589e4ba8a21c189d99ec31c265c310b5f0f14949536366cb4d84
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2020-08-28 19:08:57.824441 I | etcdmain: etcd Version: 3.4.9
2020-08-28 19:08:57.825333 I | etcdmain: Git SHA: f3fdb32
2020-08-28 19:08:57.825341 I | etcdmain: Go Version: go1.12.12     <== HERE


Nightlies are not presently passing CI, however, due to other disruptions caused during the migration.

Comment 9 Stefan Schimanski 2020-08-31 07:01:48 UTC
*** Bug 1868025 has been marked as a duplicate of this bug. ***

Comment 10 Mike Gahagan 2020-08-31 20:51:04 UTC
I can confirm 4.6.0-0.nightly-2020-08-27-005538 has an etcd that was built with 1.12.12

[m@localhost mgahagan-133108]$ oc logs -n openshift-etcd etcd-mgahagan-133108-x98t6-master-0 -c etcd | grep 'Go Version:'
2020-08-31 17:19:46.803283 I | etcdmain: Go Version: go1.12.12
[m@localhost mgahagan-133108]$ oc logs -n openshift-etcd etcd-mgahagan-133108-x98t6-master-1 -c etcd | grep 'Go Version:'
2020-08-31 17:21:03.646744 I | etcdmain: Go Version: go1.12.12
[m@localhost mgahagan-133108]$ oc logs -n openshift-etcd etcd-mgahagan-133108-x98t6-master-2 -c etcd | grep 'Go Version:'
2020-08-31 17:19:03.160948 I | etcdmain: Go Version: go1.12.12

Comment 11 ge liu 2020-09-21 12:24:38 UTC
Yes, verified 4.6.0-0.nightly-2020-09-20-184226

2020-09-21 12:23:17.560850 I | etcdmain: etcd Version: 3.4.9
2020-09-21 12:23:17.560856 I | etcdmain: Git SHA: 4aa8f02
2020-09-21 12:23:17.560860 I | etcdmain: Go Version: go1.12.12
2020-09-21 12:23:17.560870 I | etcdmain: Go OS/Arch: linux/amd64

Comment 13 errata-xmlrpc 2020-10-27 16:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196