Bug 1886134 - Need to set GODEBUG=x509ignoreCN=0 in initrd
Summary: Need to set GODEBUG=x509ignoreCN=0 in initrd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.7.0
Assignee: Nikita Dubrovskii (IBM)
QA Contact: Michael Nguyen
URL:
Whiteboard: non-multi-arch, bootimage
Depends On:
Blocks: 1899289
TreeView+ depends on / blocked
 
Reported: 2020-10-07 17:39 UTC by Scott Dodson
Modified: 2024-03-25 16:39 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1899289 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:23:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 441 0 None closed overlay.d: Add GODEBUG=x509ignoreCN=0 to systemd DefaultEnvironment 2021-02-18 07:26:53 UTC
Github openshift os pull 747 0 None open Add test: verify environment variable `GODEBUG=x509ignoreCN=0` in initrd 2022-03-17 03:32:52 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:24:25 UTC

Description Scott Dodson 2020-10-07 17:39:25 UTC
This bug was initially created as a copy of Bug #1882191

I am copying this bug because: 

The root cause of this bug is that they had a registry with a certificate that fails due to this change. The bug that this was copied from resolves part of that problem by setting the GODEBUG environment variable system wide via systemd. However that still leaves one gap and that's Ignition which runs in the initrd and may similarly interface with external endpoints that certificates that fail this validation.

Therefore, we should set the environment variable in the initrd as well.

This seems like a relatively small gap to close so I don't believe that this should be a 4.6 GA blocker but it'd be nice to get it fixed in early 4.6.z.



Description of problem:
Performing a OCP 4.6 Installation in a restricted network on zVM fails.  The 

Version-Release number of selected component (if applicable):
RHCOS 4.6.0-0.nightly-s390x-2020-09-10-112115
OCP 4.6.0-0.nightly-s390x-2020-09-22-223822

How reproducible:
Consistently

Steps to Reproduce:
1. Follow steps to configure the mirror host on bastion:
https://docs.openshift.com/container-platform/4.5/installing/install_config/installing-restricted-networks-preparations.html
2. Install cluster using restricted network steps:
https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-restricted-networks-bare-metal.html#installing-restricted-networks-bare-metal
3. IPL the bootstrap and cluster nodes.

Actual results: Bootstrap, master and worker nodes all start.  However, the master nodes never become Ready:

[root@OSPAMGR2 ~]# oc get nodes
NAME                                   STATUS     ROLES    AGE     VERSION
master-0.ospamgr2-sep22.zvmocp.notld   NotReady   master   4h1m    v1.19.0+8a39924
master-1.ospamgr2-sep22.zvmocp.notld   NotReady   master   3h56m   v1.19.0+8a39924
master-2.ospamgr2-sep22.zvmocp.notld   NotReady   master   3h48m   v1.19.0+8a39924

Preventing the worker nodes from starting.  The bootkube.service reports this:

Sep 23 23:02:41 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:41.319432       1 reflector.go:251] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to watch *v1.Pod: Get "https://localhost:6443/api/v1/pods?watch=true": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:42 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:42.325119       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:43 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:43.327963       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused
Sep 23 23:02:44 bootstrap-0.ospamgr2-sep22.zvmocp.notld bootkube.sh[19435]: E0923 23:02:44.332599       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get "https://localhost:6443/api/v1/pods": dial tcp [::1]:6443: connect: connection refused


Expected results: Master and worker nodes start successfully


Additional info:

Comment 1 Scott Dodson 2020-10-07 17:44:28 UTC
Sorry, I forgot to copy/paste what "this change" is. I'm referring to https://golang.google.cn/doc/go1.15#commonname

Comment 3 Scott Dodson 2020-10-07 19:19:25 UTC
https://github.com/openshift/machine-config-operator/pull/2141#issuecomment-704989651 is where the discussion as to this need arose

Comment 5 Micah Abbott 2020-10-21 14:00:32 UTC
@Benjamin do you think it is reasonable to set the GODEBUG variable for just Ignition in the initrd?

Comment 6 Micah Abbott 2020-10-21 14:00:52 UTC
Setting UpcomingSprint keyword as there are other higher priority tasks and issues being worked on.

Comment 7 Benjamin Gilbert 2020-10-21 19:42:23 UTC
Yes, I do.

Comment 8 Colin Walters 2020-11-11 22:52:55 UTC
xref https://github.com/openshift/oc/pull/628#issuecomment-725698791

Note this requires a bootimage update; we already have a request for one to pull in the fix for https://github.com/coreos/fedora-coreos-config/pull/733 too.

Comment 12 Michael Nguyen 2020-11-24 20:12:59 UTC
I do not have access to z system.  I verified that OCP 4.7.0-0.nightly-2020-11-24-113830 has the dracut module aand RHCOS 47.83.202011240323-0 has the environment variable set in the initramfs.


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-24-113830   True        False         51m     Cluster version is 4.7.0-0.nightly-2020-11-24-113830
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-48.us-west-2.compute.internal    Ready    worker   67m   v1.19.2+13d6aa9
ip-10-0-146-93.us-west-2.compute.internal    Ready    master   76m   v1.19.2+13d6aa9
ip-10-0-169-22.us-west-2.compute.internal    Ready    worker   67m   v1.19.2+13d6aa9
ip-10-0-177-164.us-west-2.compute.internal   Ready    master   75m   v1.19.2+13d6aa9
ip-10-0-214-17.us-west-2.compute.internal    Ready    worker   68m   v1.19.2+13d6aa9
ip-10-0-221-212.us-west-2.compute.internal   Ready    master   76m   v1.19.2+13d6aa9
$ oc debug node/ip-10-0-134-48.us-west-2.compute.internal 
Starting pod/ip-10-0-134-48us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ls
bin   dev  home  lib64	mnt  ostree  root  sbin  sys	  tmp  var
boot  etc  lib	 media	opt  proc    run   srv	 sysroot  usr
sh-4.4# cat /usr/lib/dracut/modules.d/10
10coreos-sysctl/    10i18n/             10ignition-godebug/ 
sh-4.4# cat /usr/lib/dracut/modules.d/10ignition-godebug/*
# https://bugzilla.redhat.com/show_bug.cgi?id=1886134
# Because Ignition which runs in the initrd may interface with external endpoints,
# we should set the environment variable in the initrd
[Manager]
DefaultEnvironment=GODEBUG=x509ignoreCN=0
#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

depends() {
    echo systemd
}

install() {
    inst_simple "$moddir/10-default-env-godebug.conf" \
        "/etc/systemd/system.conf.d/10-default-env-godebug.conf"
}
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc debug node/ip-10-0-146-93.us-west-2.compute.internal 
Starting pod/ip-10-0-146-93us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cat /usr/lib/dracut/modules.d/10ignition-godebug/*
# https://bugzilla.redhat.com/show_bug.cgi?id=1886134
# Because Ignition which runs in the initrd may interface with external endpoints,
# we should set the environment variable in the initrd
[Manager]
DefaultEnvironment=GODEBUG=x509ignoreCN=0
#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

depends() {
    echo systemd
}

install() {
    inst_simple "$moddir/10-default-env-godebug.conf" \
        "/etc/systemd/system.conf.d/10-default-env-godebug.conf"
}
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


:/# 
:/# 
:/# env
DRACUT_SYSTEMD=1
rflags=
INVOCATION_ID=1ab6d4613bad44678bcc88fba29c164f
hook=emergency
PWD=/
root=
fstype=auto
HOME=/
JOURNAL_STREAM=9:13127
UDEVVERSION=239
hookdir=/lib/dracut/hooks
NEWROOT=/sysroot
DEBUG_MEM_LEVEL=0
action=Boot
TERM=vt220
GODEBUG=x509ignoreCN=0
SHLVL=1
RD_DEBUG=no
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
PS1=:${PWD}# 
_rdshell_name=dracut
_=/usr/bin/env

Comment 15 errata-xmlrpc 2021-02-24 15:23:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.