Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1886134

Summary: Need to set GODEBUG=x509ignoreCN=0 in initrd
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: RHCOSAssignee: Nikita Dubrovskii (IBM) <ndubrovs>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: high    
Version: 4.6CC: bbreard, bgilbert, danili, hhei, imcleod, jligon, jnordell, miabbott, nstielau, slowrie, smilner, sreber, tmicheli, walters, wvoesch
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: non-multi-arch, bootimage
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1899289 (view as bug list) Environment:
Last Closed: 2021-02-24 15:23:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1899289    

Description Scott Dodson 2020-10-07 17:39:25 UTC
This bug was initially created as a copy of Bug #1882191

I am copying this bug because: 

The root cause of this bug is that they had a registry with a certificate that fails due to this change. The bug that this was copied from resolves part of that problem by setting the GODEBUG environment variable system wide via systemd. However that still leaves one gap and that's Ignition which runs in the initrd and may similarly interface with external endpoints that certificates that fail this validation.

Therefore, we should set the environment variable in the initrd as well.

This seems like a relatively small gap to close so I don't believe that this should be a 4.6 GA blocker but it'd be nice to get it fixed in early 4.6.z.



Description of problem:
Performing a OCP 4.6 Installation in a restricted network on zVM fails.  The 

Version-Release number of selected component (if applicable):
RHCOS 4.6.0-0.nightly-s390x-2020-09-10-112115
OCP 4.6.0-0.nightly-s390x-2020-09-22-223822

How reproducible:
Consistently

Steps to Reproduce:
1. Follow steps to configure the mirror host on bastion:
https://docs.openshift.com/container-platform/4.5/installing/install_config/installing-restricted-networks-preparations.html
2. Install cluster using restricted network steps:
https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-restricted-networks-bare-metal.html#installing-restricted-networks-bare-metal
3. IPL the bootstrap and cluster nodes.

Actual results: Bootstrap, master and worker nodes all start.  However, the master nodes never become Ready:

[root@OSPAMGR2 ~]# oc get nodes
NAME                                   STATUS     ROLES    AGE     VERSION
master-0.ospamgr2-sep22.zvmocp.notld   NotReady   master   4h1m    v1.19.0+8a39924
master-1.ospamgr2-sep22.zvmocp.notld   NotReady   master   3h56m   v1.19.0+8a39924
master-2.ospamgr2-sep22.zvmocp.notld   NotReady   master   3h48m   v1.19.0+8a39924

Preventing the worker nodes from starting.  The bootkube.service reports this:

Sep 23 23:02:41 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:41.319432       1 reflector.go:251] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to watch *v1.Pod: Get "https://localhost:6443/api/v1/pods?watch=true": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:42 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:42.325119       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:43 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:43.327963       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:44 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:44.332599       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused


Expected results: Master and worker nodes start successfully


Additional info:

Comment 1 Scott Dodson 2020-10-07 17:44:28 UTC
Sorry, I forgot to copy/paste what "this change" is. I'm referring to https://golang.google.cn/doc/go1.15#commonname

Comment 3 Scott Dodson 2020-10-07 19:19:25 UTC
https://github.com/openshift/machine-config-operator/pull/2141#issuecomment-704989651 is where the discussion as to this need arose

Comment 5 Micah Abbott 2020-10-21 14:00:32 UTC
@Benjamin do you think it is reasonable to set the GODEBUG variable for just Ignition in the initrd?

Comment 6 Micah Abbott 2020-10-21 14:00:52 UTC
Setting UpcomingSprint keyword as there are other higher priority tasks and issues being worked on.

Comment 7 Benjamin Gilbert 2020-10-21 19:42:23 UTC
Yes, I do.

Comment 8 Colin Walters 2020-11-11 22:52:55 UTC
xref https://github.com/openshift/oc/pull/628#issuecomment-725698791

Note this requires a bootimage update; we already have a request for one to pull in the fix for https://github.com/coreos/fedora-coreos-config/pull/733 too.

Comment 12 Michael Nguyen 2020-11-24 20:12:59 UTC
I do not have access to z system.  I verified that OCP 4.7.0-0.nightly-2020-11-24-113830 has the dracut module aand RHCOS 47.83.202011240323-0 has the environment variable set in the initramfs.


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-24-113830   True        False         51m     Cluster version is 4.7.0-0.nightly-2020-11-24-113830
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-48.us-west-2.compute.internal    Ready    worker   67m   v1.19.2+13d6aa9
ip-10-0-146-93.us-west-2.compute.internal    Ready    master   76m   v1.19.2+13d6aa9
ip-10-0-169-22.us-west-2.compute.internal    Ready    worker   67m   v1.19.2+13d6aa9
ip-10-0-177-164.us-west-2.compute.internal   Ready    master   75m   v1.19.2+13d6aa9
ip-10-0-214-17.us-west-2.compute.internal    Ready    worker   68m   v1.19.2+13d6aa9
ip-10-0-221-212.us-west-2.compute.internal   Ready    master   76m   v1.19.2+13d6aa9
$ oc debug node/ip-10-0-134-48.us-west-2.compute.internal 
Starting pod/ip-10-0-134-48us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ls
bin   dev  home  lib64	mnt  ostree  root  sbin  sys	  tmp  var
boot  etc  lib	 media	opt  proc    run   srv	 sysroot  usr
sh-4.4# cat /usr/lib/dracut/modules.d/10
10coreos-sysctl/    10i18n/             10ignition-godebug/ 
sh-4.4# cat /usr/lib/dracut/modules.d/10ignition-godebug/*
# https://bugzilla.redhat.com/show_bug.cgi?id=1886134
# Because Ignition which runs in the initrd may interface with external endpoints,
# we should set the environment variable in the initrd
[Manager]
DefaultEnvironment=GODEBUG=x509ignoreCN=0
#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

depends() {
    echo systemd
}

install() {
    inst_simple "$moddir/10-default-env-godebug.conf" \
        "/etc/systemd/system.conf.d/10-default-env-godebug.conf"
}
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc debug node/ip-10-0-146-93.us-west-2.compute.internal 
Starting pod/ip-10-0-146-93us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cat /usr/lib/dracut/modules.d/10ignition-godebug/*
# https://bugzilla.redhat.com/show_bug.cgi?id=1886134
# Because Ignition which runs in the initrd may interface with external endpoints,
# we should set the environment variable in the initrd
[Manager]
DefaultEnvironment=GODEBUG=x509ignoreCN=0
#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

depends() {
    echo systemd
}

install() {
    inst_simple "$moddir/10-default-env-godebug.conf" \
        "/etc/systemd/system.conf.d/10-default-env-godebug.conf"
}
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


:/# 
:/# 
:/# env
DRACUT_SYSTEMD=1
rflags=
INVOCATION_ID=1ab6d4613bad44678bcc88fba29c164f
hook=emergency
PWD=/
root=
fstype=auto
HOME=/
JOURNAL_STREAM=9:13127
UDEVVERSION=239
hookdir=/lib/dracut/hooks
NEWROOT=/sysroot
DEBUG_MEM_LEVEL=0
action=Boot
TERM=vt220
GODEBUG=x509ignoreCN=0
SHLVL=1
RD_DEBUG=no
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
PS1=:${PWD}# 
_rdshell_name=dracut
_=/usr/bin/env

Comment 15 errata-xmlrpc 2021-02-24 15:23:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633