1828112 – [OSP] Got "invalid argument \"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\" is not a valid IP address" for kubelet service during rhel scaleup scenario

Bug 1828112 - [OSP] Got "invalid argument \"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\" is not a valid IP address" for kubelet service during rhel scaleup scenario

Summary: [OSP] Got "invalid argument \"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\"...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Kirsten Garrison
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-27 03:05 UTC by weiwei jiang
Modified:	2020-07-13 17:31 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:31:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
OSP rhel scaleup failed log (576.63 KB, text/plain) 2020-05-15 01:57 UTC, weiwei jiang	no flags	Details
OSP rhel scaleup failed log with vvv log level (1.20 MB, text/plain) 2020-05-15 03:05 UTC, weiwei jiang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1736	0	None	closed	Bug 1828112: remove quotes around kubelet_node_ip to account for rhel7	2020-11-11 17:44:37 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:31:56 UTC

Description weiwei jiang 2020-04-27 03:05:45 UTC

Description of problem: 
During rhel scaleup testing with UPI on OSP13 cluster, got "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: F0426 06:07:41.766079    9153 server.go:157] invalid argument \"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\" is not a valid IP address" error message. 
After run `systemctl daemon-reload`, got "Apr 26 06:59:46 wj45uos426b-v48vz-rhel-0 hyperkube[16729]: F0426 06:59:46.522585   16729 server.go:157] invalid argument "\"192.168.0.38\"" for "--address" flag: "\"192.168.0.38\"" is not a valid IP address". 
 
 
 
Version-Release number of the following components: 
4.5.0-0.nightly-2020-04-25-170442 
openshift-ansible-4.5.0-202004250717.git.1.9372220.el7.noarch.rpm 
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1. Setup a UPI on OSP cluster 
2. Scaleup a rhel 7 via openshift-ansible 
3. Check the playbook log 
 
Actual results: 
TASK [openshift_node : Gather Debug - Get service status] ********************** 
Sunday 26 April 2020  18:07:43 +0800 (0:03:09.960)       0:09:16.080 **********  
ok: [10.0.97.130] => (item=cri-o) => {"ansible_loop_var": "item", "changed": false, "cmd": ["systemctl", "status", "cri-o"], "delta": "0:00:00.024929", "end": "2020-04-26 06:07:43.679356", "item": "cri-o", "rc": 0, "start": "2020-04-26 06:07:43.654427", "stderr": "Warning: crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.", "stderr_lines": ["Warning: crio.service changed on disk. Run 'systemctl daemon-reload' to reload units."], "stdout": "● crio.service - Open Container Initiative Daemon\n   Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled)\n  Drop-In: /etc/systemd/system/crio.service.d\n           └─10-default-env.conf\n   Active: active (running) since Sun 2020-04-26 06:04:36 EDT; 3min 6s ago\n     Docs: https://github.com/cri-o/cri-o\n Main PID: 3403 (crio)\n   Memory: 45.6M\n   CGroup: /system.slice/crio.service\n           └─3403 /usr/bin/crio --enable-metrics=true --metrics-port=9537\n\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 systemd[1]: Starting Open Container Initiative Daemon...\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.634070089-04:00\" level=info msg=\"Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_NET_RAW, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_SYS_CHROOT, CAP_KILL\"\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636022551-04:00\" level=info msg=\"Using conmon executable: /usr/libexec/crio/conmon\"\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636122265-04:00\" level=info msg=\"No seccomp profile specified, using the internal default\"\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636131967-04:00\" level=info msg=\"AppArmor is disabled by the system or at CRI-O build-time\"\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.638359673-04:00\" level=info msg=\
"Update default CNI network name to \"\nApr 26 06:04:36 wj45uos426b-v48vz-rhel-0 systemd[1]: Started Open Container Initiative Daemon.", "stdout_lines": ["● crio.service - Open Container Initiative Daemon", "   Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled)", "  Drop-In: /etc/systemd/system/crio.service.d", "           └─10-default-env.conf", "   Active: active (running) since Sun 2020-04-26 06:04:36 EDT; 3min 6s ago", "     Docs: https://github.com/cri-o/cri-o", " Main PID: 3403 (crio)", "   Memory: 45.6M", "   CGroup: /system.slice/crio.service", "           └─3403 /usr/bin/crio --enable-metrics=true --metrics-port=9537", "", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 systemd[1]: Starting Open Container Initiative Daemon...", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.634070089-04:00\" level=info msg=\"Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_NET_RAW, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_SYS_CHROOT, CAP_KILL\"", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636022551-04:00\" level=info msg=\"Using conmon executable: /usr/libexec/crio/conmon\"", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636122265-04:00\" level=info msg=\"No seccomp profile specified, using the internal default\"", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.636131967-04:00\" level=info msg=\"AppArmor is disabled by the system or at CRI-O build-time\"", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time=\"2020-04-26 06:04:36.638359673-04:00\" level=info msg=\"Update default CNI network name to \"", "Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 systemd[1]: Started Open Container Initiative Daemon."]} 
failed: [10.0.97.130] (item=kubelet) => {"ansible_loop_var": "item", "changed": false, "cmd": ["systemctl", "status", "kubelet"], "delta": "0:00:00.011062", "end": "2020-04-26 06:07:43.866880", "item": "kubelet", "msg": "non-zero return code", "rc": 3, "start": "2020-04-26 06:07:43.855818", "stderr": "Warning: kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.", "stderr_lines": ["Warning: kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units."], "stdout": "● kubelet.service - Kubernetes Kubelet\n   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)\n  Drop-In: /etc/systemd/system/kubelet.service.d\n           └─10-default-env.conf\n   Active: activating (auto-restart) (Result: exit-code) since Sun 2020-04-26 06:07:41 EDT; 2s ago\n  Process: 9153 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} --node-ip=\"${KUBELET_NODE_IP}\" --address=\"${KUBELET_NODE_IP}\" --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=openstack --cloud-config=/etc/kubernetes/cloud.conf --v=${KUBELET_LOG_LEVEL} (code=exited, status=255)\n  Process: 9150 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)\n  Process: 9148 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)\n Main PID: 9153 (code=exited, status=255)\n\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-cipher-suites strings                                                                                 Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be used. Possibl
e values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-min-version string                                                                                    Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13 (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-private-key-file string                                                                               File containing x509 private key matching --tls-cert-file. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --topology-manager-policy string                            
                                                Topology Manager policy to use. Possible values: 'none', 'best-effort', 'restricted', 'single-numa-node'. (default \"none\") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: -v, --v Level                                                                                                   number for the log level verbosity\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --version version[=true]                                                                                    Print version information and quit\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --vmodule moduleSpec                                                                                        comma-separated list of pattern=N settings for file-filtered logging\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --volume-plugin-dir string                                                                                  The full path of the directory in which to search for additional third party volume plugins (default \"/usr/libexec/kubernetes/kubelet-plugins/volume/exec/\")\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --volume-stats-agg-period duration                                                                          Specifies interval for kubelet to calculate and cache the volume disk usage for all pods and volumes.  To disable volume calculations, set to 0. (default 1m0s) (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)\nApr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: F0426 06:07:41.766079    9153 server.go:157] invalid argument \
"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\" is not a valid IP address", "stdout_lines": ["● kubelet.service - Kubernetes Kubelet", "   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)", "  Drop-In: /etc/systemd/system/kubelet.service.d", "           └─10-default-env.conf", "   Active: activating (auto-restart) (Result: exit-code) since Sun 2020-04-26 06:07:41 EDT; 2s ago", "  Process: 9153 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} --node-ip=\"${KUBELET_NODE_IP}\" --address=\"${KUBELET_NODE_IP}\" --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=openstack --cloud-config=/etc/kubernetes/cloud.conf --v=${KUBELET_LOG_LEVEL} (code=exited, status=255)", "  Process: 9150 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)", "  Process: 9148 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)", " Main PID: 9153 (code=exited, status=255)", "", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-cipher-suites strings                                                                                 Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be used. Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_W
ITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-min-version string                                                                                    Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13 (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --tls-private-key-file string                                                                               File containing x509 private key matching --tls-cert-file. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --topology-manager-policy string                                                                            Topology Manager policy to use. Possible values: 'none', 'best-effort', 'restricted', 'single-numa-node'. (default \"none\") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)", "Apr 26 06:07:4
1 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: -v, --v Level                                                                                                   number for the log level verbosity", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --version version[=true]                                                                                    Print version information and quit", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --vmodule moduleSpec                                                                                        comma-separated list of pattern=N settings for file-filtered logging", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --volume-plugin-dir string                                                                                  The full path of the directory in which to search for additional third party volume plugins (default \"/usr/libexec/kubernetes/kubelet-plugins/volume/exec/\")", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: --volume-stats-agg-period duration                                                                          Specifies interval for kubelet to calculate and cache the volume disk usage for all pods and volumes.  To disable volume calculations, set to 0. (default 1m0s) (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)", "Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: F0426 06:07:41.766079    9153 server.go:157] invalid argument \"\\\"\\\"\" for \"--address\" flag: \"\\\"\\\"\" is not a valid IP address"]} 
...ignoring 
 
TASK [openshift_node : Gather Debug - Get complete node objects] *************** 
Sunday 26 April 2020  18:07:44 +0800 (0:00:00.504)       0:09:16.584 **********  
fatal: [10.0.97.130 -> localhost]: FAILED! => {"changed": false, "cmd": ["oc", "get", "node", "wj45uos426b-v48vz-rhel-0", "--kubeconfig=/tmp/installer-rjVwTF/auth/kubeconfig", "--output=json"], "delta": "0:00:00.127936", "end": "2020-04-26 18:07:44.765525", "msg": "non-zero return code", "rc": 1, "start": "2020-04-26 18:07:44.637589", "stderr": "Error from server (NotFound): nodes \"wj45uos426b-v48vz-rhel-0\" not found", "stderr_lines": ["Error from server (NotFound): nodes \"wj45uos426b-v48vz-rhel-0\" not found"], "stdout": "", "stdout_lines": []} 
...ignoring 
 
TASK [openshift_node : DEBUG - Failed to approve node CSRs] ******************** 
Sunday 26 April 2020  18:07:44 +0800 (0:00:00.337)       0:09:16.921 **********  
fatal: [10.0.97.130]: FAILED! => {"changed": false, "msg": "Failed to approve node-bootstrapper CSR"} 
 
PLAY RECAP ********************************************************************* 
10.0.97.130                : ok=40   changed=27   unreachable=0    failed=2    skipped=5    rescued=0    ignored=2    
localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0    
 
 
Expected results: 
Should succeed 
 
Additional info: 
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Russell Teague 2020-04-27 13:18:44 UTC

A few questionable items from the log in the description.

Warning: crio.service changed on disk. Run 'systemctl daemon-reload' to reload units.

crio journal:
Apr 26 06:04:36 wj45uos426b-v48vz-rhel-0 crio[3403]: time="2020-04-26 06:04:36.638359673-04:00" level=info msg="Update default CNI network name to "

hyperkube journal:
Apr 26 06:07:41 wj45uos426b-v48vz-rhel-0 hyperkube[9153]: F0426 06:07:41.766079    9153 server.go:157] invalid argument \
"\\"\\"" for "--address" flag: "\\"\\"" is not a valid IP address

This appears to be a network configuration issue.

Comment 2 Russell Teague 2020-04-27 14:08:23 UTC

If this is not an issue with crio, we need more details about what is going on with CNI.  Requesting the network team take a look.

Comment 3 Dan Williams 2020-05-01 18:32:44 UTC

@russel this isn't a CNI issue. It's the kubelet systemd service not getting the right IP address arguments on startup. Not sure where it's trying to read the IP from, but whatever its doing, it's doing it wrong. Over to node team.

Comment 4 Ryan Phillips 2020-05-11 16:00:43 UTC

Looks like the configuration has an extra set of quotations around the node IP addresses. Can you try and remove them and report back?

Comment 8 Harshal Patil 2020-05-14 14:22:41 UTC

Please attach logs from ansible-playbook with the -vvv flag

Comment 9 weiwei jiang 2020-05-15 01:57:19 UTC

Created attachment 1688707 [details]
OSP rhel scaleup failed log

Comment 10 weiwei jiang 2020-05-15 03:05:01 UTC

Created attachment 1688723 [details]
OSP rhel scaleup failed log with vvv log level

Comment 11 Harshal Patil 2020-05-18 07:11:24 UTC

It seems like the variable KUBELET_NODE_IP is not handled properly in the ansible playbook.

Comment 12 Russell Teague 2020-05-18 13:41:14 UTC

The openshift-ansible playbooks obtain the config from MCO and run the MCD to lay down the configuration.  The extra quotes are in the config [1] provided by MCO.  This is the same for baremetal, openstack, and vsphere.

https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-kubelet/openstack/units/kubelet.yaml#L26-L27

Comment 13 Russell Teague 2020-05-18 17:43:08 UTC

This is potentially due to a difference between how systemd on RHEL8 and RHEL7 interpret unit files.

https://github.com/openshift/machine-config-operator/pull/1439#issuecomment-582093297

Comment 17 Kirsten Garrison 2020-05-19 17:50:28 UTC

Hi @Russell & @weiwei, PR merged please let me know if you need anything further.

Comment 18 weiwei jiang 2020-05-20 02:29:20 UTC

Checked with 4.5.0-0.nightly-2020-05-19-041951, and it's fixed. so moved to verified.

TASK [openshift_node : Reboot the host and wait for it to come back] ***********
Wednesday 20 May 2020  10:26:00 +0800 (0:00:00.534)       0:05:47.456 ********* 

changed: [10.0.98.71] => {"changed": true, "elapsed": 16, "rebooted": true}

TASK [openshift_node : Approve node CSRs] **************************************
Wednesday 20 May 2020  10:26:18 +0800 (0:00:18.137)       0:06:05.594 ********* 

changed: [10.0.98.71 -> localhost] => {"changed": true, "client_approve_results": ["Attempt: 1, Node wj45uos520a-vzjzr-rhel-0 not present or CSR not yet available", "Attempt: 2, Node wj45uos520a-vzjzr-rhel-0 not present or CSR not yet available", "wj45uos520a-vzjzr-rhel-0: certificatesigningrequest.certificates.k8s.io/csr-wmvj8 approved\n", "Attempt: 3, Node wj45uos520a-vzjzr-rhel-0 not present or CSR not yet available", "Attempt: 4, Node wj45uos520a-vzjzr-rhel-0 not present or CSR not yet available", "Attempt: 5, Node wj45uos520a-vzjzr-rhel-0 not present or CSR not yet available", "Node wj45uos520a-vzjzr-rhel-0 is present in node list"], "rc": 0, "server_approve_results": ["wj45uos520a-vzjzr-rhel-0: certificatesigningrequest.certificates.k8s.io/csr-9sklb approved\n", "Node wj45uos520a-vzjzr-rhel-0 API is ready"]}

TASK [openshift_node : Wait for node to report ready] **************************
Wednesday 20 May 2020  10:26:47 +0800 (0:00:28.179)       0:06:33.774 ********* 
FAILED - RETRYING: Wait for node to report ready (36 retries left).

FAILED - RETRYING: Wait for node to report ready (35 retries left).

FAILED - RETRYING: Wait for node to report ready (34 retries left).

FAILED - RETRYING: Wait for node to report ready (33 retries left).

FAILED - RETRYING: Wait for node to report ready (32 retries left).

FAILED - RETRYING: Wait for node to report ready (31 retries left).

FAILED - RETRYING: Wait for node to report ready (30 retries left).

ok: [10.0.98.71 -> localhost] => {"attempts": 8, "changed": false, "cmd": ["oc", "get", "node", "wj45uos520a-vzjzr-rhel-0", "--kubeconfig=/tmp/installer-lvuvga/auth/kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.133743", "end": "2020-05-20 10:27:24.284389", "rc": 0, "start": "2020-05-20 10:27:24.150646", "stderr": "", "stderr_lines": [], "stdout": "True", "stdout_lines": ["True"]}

PLAY RECAP *********************************************************************
10.0.98.71                 : ok=40   changed=28   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0   

Wednesday 20 May 2020  10:27:24 +0800 (0:00:37.249)       0:07:11.023 ********* 
=============================================================================== 
openshift_node : Install openshift support packages ------------------- 242.90s
openshift_node : Install openshift packages ---------------------------- 60.99s
openshift_node : Wait for node to report ready ------------------------- 37.25s
openshift_node : Approve node CSRs ------------------------------------- 28.18s
openshift_node : Reboot the host and wait for it to come back ---------- 18.14s
openshift_node : Pull release image ------------------------------------- 9.28s
openshift_node : Pull MCD image ----------------------------------------- 8.10s
openshift_node : Wait for bootstrap endpoint to show up ----------------- 3.73s
openshift_node : Fetch bootstrap ignition file locally ------------------ 3.17s
openshift_node : Apply ignition manifest -------------------------------- 2.22s
openshift_node : Get machine controller daemon image from release image --- 1.81s
openshift_node : Get cluster nodes -------------------------------------- 1.20s
openshift_node : Write /etc/containers/registries.conf ------------------ 1.00s
openshift_node : Update CA trust ---------------------------------------- 0.98s
openshift_node : Setting sebool container_manage_cgroup ----------------- 0.95s
openshift_node : Enable the CRI-O service ------------------------------- 0.86s
openshift_node : Restart the CRI-O service ------------------------------ 0.85s
openshift_node : Check for cluster http proxy --------------------------- 0.84s
openshift_node : Check for cluster no proxy ----------------------------- 0.82s
openshift_node : Check for cluster https proxy -------------------------- 0.75s

Comment 19 errata-xmlrpc 2020-07-13 17:31:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.