Description of problem: When installing IPI on OSP, found following things in bootstrap instance: Sep 09 08:05:09 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Starting etcd certificate signer... Sep 09 08:05:12 share-0909c-dgk97-bootstrap bootkube.sh[1730]: 9dcecdffc78b278959a534e54d1dd6cc5d9e0327bd7616b93ffacd6a85e7252f Sep 09 08:05:12 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Waiting for etcd cluster... Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-0.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-1.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-2.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Error: unhealthy cluster Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: etcd cluster up. Killing etcd certificate signer... Version-Release number of the following components: ./openshift-install v4.2.0 built from commit b5dbb46b7e97d2c63333048f055dd518aa01eb10 release image registry.svc.ci.openshift.org/ocp/release@sha256:0ef8b927112149e6eaee60074992cff97a16f386079de1d332c202eff766f55b How reproducible: Not sure Steps to Reproduce: 1. Try install IPI on OSP 2. Check bootkube service log on bootstrap 3. Actual results: > journalctl -u bootkube|grep bootkube.sh|less ... Sep 09 08:05:09 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Starting etcd certificate signer... Sep 09 08:05:12 share-0909c-dgk97-bootstrap bootkube.sh[1730]: 9dcecdffc78b278959a534e54d1dd6cc5d9e0327bd7616b93ffacd6a85e7252f Sep 09 08:05:12 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Waiting for etcd cluster... Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-0.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-1.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: https://etcd-2.share-0909c.qe.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.share-0909c.qe.rhcloud.com on 192.168.0.12:53: no such host Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: Error: unhealthy cluster Sep 09 08:15:16 share-0909c-dgk97-bootstrap bootkube.sh[1730]: etcd cluster up. Killing etcd certificate signer... Expected results: Should wait for another round Additional info: Please attach logs from ansible-playbook with the -vvv flag
The issue seems to be with the etcd image reporting success via its exit code while it failed to bring the etcd cluster up. We're running the `etcdctl endpoint health` command [1]. This may be an issue with etcd itself. [1] https://github.com/openshift/installer/blob/b5dbb46b7e97d2c63333048f055dd518aa01eb10/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L256-L273
*** This bug has been marked as a duplicate of bug 1741157 ***
Re-open this since it sometime block our installation on bare metal and OpenStack. And we have to track when the patch will be in for OpenShift
Since you reopened this issue can you provide details about how this issue is not the duplicate of already fixed bz https://bugzilla.redhat.com/show_bug.cgi?id=1741157 ?? What are we tracking? Is this issue not fixed by the bz 1741157?
Verified on podman 1.4.2-stable2 Sep 25 06:02:54 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Starting etcd certificate signer... Sep 25 06:02:59 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: 38be6350ff30b8d0edc3cb3f154bbb092449af33f91ebc91be07cb7783353e6f Sep 25 06:02:59 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Waiting for etcd cluster... Sep 25 06:13:05 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-1.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:13:05 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-0.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:13:05 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-2.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:13:05 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Error: unhealthy cluster Sep 25 06:13:05 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: etcdctl failed. Retrying in 5 seconds... Sep 25 06:23:11 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-0.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:23:11 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-1.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:23:11 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-2.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:23:11 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Error: unhealthy cluster Sep 25 06:23:11 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: etcdctl failed. Retrying in 5 seconds... Sep 25 06:33:17 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-1.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:33:17 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-2.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:33:17 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-0.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:33:17 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Error: unhealthy cluster Sep 25 06:33:17 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: etcdctl failed. Retrying in 5 seconds... Sep 25 06:43:22 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-1.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-1.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:43:22 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-2.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-2.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:43:22 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: https://etcd-0.wjpurebm.qe.devcluster.openshift.com:2379 is unhealthy: failed to connect: dial tcp: lookup etcd-0.wjpurebm.qe.devcluster.openshift.com on 147.75.207.208:53: server misbehaving Sep 25 06:43:22 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: Error: unhealthy cluster Sep 25 06:43:23 bootstrap.wjpurebm.qe.devcluster.openshift.com bootkube.sh[1829]: etcdctl failed. Retrying in 5 seconds... [core@bootstrap ~]$ podman version Version: 1.4.2-stable2 RemoteAPI Version: 1 Go Version: go1.12.8 OS/Arch: linux/amd64 [core@bootstrap ~]$ rpm-ostree status State: idle AutomaticUpdates: disabled Deployments: ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1506a17d6f21d253cff1b3f84da3a8d9f49e76b8f23bd4eead2487ed003bd63f CustomOrigin: Image generated via coreos-assembler Version: 42.80.20190923.1 (2019-09-23T19:53:33Z)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922