+++ This bug was initially created as a clone of Bug #1886134 +++ This bug was initially created as a copy of Bug #1882191 I am copying this bug because: The root cause of this bug is that they had a registry with a certificate that fails due to this change. The bug that this was copied from resolves part of that problem by setting the GODEBUG environment variable system wide via systemd. However that still leaves one gap and that's Ignition which runs in the initrd and may similarly interface with external endpoints that certificates that fail this validation. Therefore, we should set the environment variable in the initrd as well. This seems like a relatively small gap to close so I don't believe that this should be a 4.6 GA blocker but it'd be nice to get it fixed in early 4.6.z. Description of problem: Performing a OCP 4.6 Installation in a restricted network on zVM fails. The Version-Release number of selected component (if applicable): RHCOS 4.6.0-0.nightly-s390x-2020-09-10-112115 OCP 4.6.0-0.nightly-s390x-2020-09-22-223822 How reproducible: Consistently Steps to Reproduce: 1. Follow steps to configure the mirror host on bastion: https://docs.openshift.com/container-platform/4.5/installing/install_config/installing-restricted-networks-preparations.html 2. Install cluster using restricted network steps: https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-restricted-networks-bare-metal.html#installing-restricted-networks-bare-metal 3. IPL the bootstrap and cluster nodes. Actual results: Bootstrap, master and worker nodes all start. However, the master nodes never become Ready: [root@OSPAMGR2 ~]# oc get nodes NAME STATUS ROLES AGE VERSION master-0.ospamgr2-sep22.zvmocp.notld NotReady master 4h1m v1.19.0+8a39924 master-1.ospamgr2-sep22.zvmocp.notld NotReady master 3h56m v1.19.0+8a39924 master-2.ospamgr2-sep22.zvmocp.notld NotReady master 3h48m v1.19.0+8a39924 Preventing the worker nodes from starting. The bootkube.service reports this: Sep 23 23:02:41 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:41.319432 1 reflector.go:251] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to watch *v1.Pod: Get "https://localhost:6443/api/v1/pods?watch=true": dial tcp [::1]:6443: connect: connection refused Sep 23 23:02:42 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:42.325119 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused Sep 23 23:02:43 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:43.327963 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused Sep 23 23:02:44 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:44.332599 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused Expected results: Master and worker nodes start successfully Additional info: --- Additional comment from Scott Dodson on 2020-10-07 17:44:28 UTC --- Sorry, I forgot to copy/paste what "this change" is. I'm referring to https://golang.google.cn/doc/go1.15#commonname --- Additional comment from Eric Paris on 2020-10-07 18:00:20 UTC --- This bug has set a target release without specifying a severity. As part of triage when determining the importance of bugs a severity should be specified. Since these bugs have not been properly triaged we are removing the target release. Teams will need to add a severity before setting the target release again. --- Additional comment from Scott Dodson on 2020-10-07 19:19:25 UTC --- https://github.com/openshift/machine-config-operator/pull/2141#issuecomment-704989651 is where the discussion as to this need arose --- Additional comment from Steve Milner on 2020-10-07 19:36:02 UTC --- See the note on zipl in https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kernel-command-line-parameters_managing-monitoring-and-updating-the-kernel#understanding-kernel-command-line-parameters_configuring-kernel-command-line-parameters It _may_ make sense to set this directly in the unit or Ignition code rather than setting it in initrd in general. --- Additional comment from Micah Abbott on 2020-10-21 14:00:32 UTC --- @Benjamin do you think it is reasonable to set the GODEBUG variable for just Ignition in the initrd? --- Additional comment from Micah Abbott on 2020-10-21 14:00:52 UTC --- Setting UpcomingSprint keyword as there are other higher priority tasks and issues being worked on. --- Additional comment from Benjamin Gilbert on 2020-10-21 19:42:23 UTC --- Yes, I do. --- Additional comment from Colin Walters on 2020-11-11 22:52:55 UTC --- xref https://github.com/openshift/oc/pull/628#issuecomment-725698791 Note this requires a bootimage update; we already have a request for one to pull in the fix for https://github.com/coreos/fedora-coreos-config/pull/733 too.
I don't have access to z systems but I verified that the dracut module is in 4.6.0-0.nightly-2020-11-22-160856 and RHCOS 46.82.202011210620-0 has the environment variable set. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-11-22-160856 True False 67s Cluster version is 4.6.0-0.nightly-2020-11-22-160856 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-200.us-west-2.compute.internal Ready master 26m v1.19.0+43983cd ip-10-0-137-216.us-west-2.compute.internal Ready worker 16m v1.19.0+43983cd ip-10-0-176-34.us-west-2.compute.internal Ready master 25m v1.19.0+43983cd ip-10-0-189-58.us-west-2.compute.internal Ready worker 16m v1.19.0+43983cd ip-10-0-196-76.us-west-2.compute.internal Ready master 26m v1.19.0+43983cd ip-10-0-209-11.us-west-2.compute.internal Ready worker 17m v1.19.0+43983cd $ oc debug node/ip-10-0-130-200.us-west-2.compute.internal Starting pod/ip-10-0-130-200us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cat /usr/lib/dracut/modules.d/10ignition-godebug/* # https://bugzilla.redhat.com/show_bug.cgi?id=1886134 # Because Ignition which runs in the initrd may interface with external endpoints, # we should set the environment variable in the initrd [Manager] DefaultEnvironment=GODEBUG=x509ignoreCN=0 #!/bin/bash # -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*- # ex: ts=8 sw=4 sts=4 et filetype=sh depends() { echo systemd } install() { inst_simple "$moddir/10-default-env-godebug.conf" \ "/etc/systemd/system.conf.d/10-default-env-godebug.conf" } sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... Entering emergency mode. Exit the shell to continue. Type "journalctl" to view system logs. You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot after mounting them and attach it to a bug report. :/# :/# :/# env DRACUT_SYSTEMD=1 rflags= INVOCATION_ID=1df0a0730de5400dbc5a297437e483eb hook=emergency PWD=/ root= fstype=auto HOME=/ JOURNAL_STREAM=9:13527 UDEVVERSION=239 hookdir=/lib/dracut/hooks NEWROOT=/sysroot DEBUG_MEM_LEVEL=0 action=Boot TERM=vt220 GODEBUG=x509ignoreCN=0 SHLVL=1 RD_DEBUG=no PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin PS1=:${PWD}# _rdshell_name=dracut _=/usr/bin/env
Fixed by https://github.com/openshift/installer/pull/4422.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.6 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5115