Bug 1854160 - Ignition file download failing when configuring Windows nodes
Summary: Ignition file download failing when configuring Windows nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-06 15:38 UTC by ravig
Modified: 2020-10-27 16:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:12:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1904 0 None closed Bug 1854160: server: serve v2 if no user-agent is defined 2020-07-30 16:14:42 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:12:46 UTC

Description ravig 2020-07-06 15:38:11 UTC
Description of problem:

WMCO is not able to configure Windows nodes with error related to ignition file download. This is blocking our CI.

Local run output:

2020-07-06T15:12:46.560Z	DEBUG	windows	ignition file download	{"cmd": "C:\\Temp\\wget-ignore-cert.ps1 -server https://api-int.ravig126.devcluster.openshift.com:22623/config/worker -output C:\\Windows\\Temp\\worker.ign", "output": "wget : The remote server returned an error: (400) Bad Request.\r\nAt C:\\Temp\\wget-ignore-cert.ps1:31 char:9\r\n+ $null | wget $server -o $output > $null\r\n+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc \r\n   eption\r\n    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand\r\n \r\n"}
2020-07-06T15:12:47.829Z	DEBUG	windows	output from wmcb	{"output": ""}
2020-07-06T15:12:47.829Z	DEBUG	controller_wmc	destroying the Windows VM	{"ID": "i-0ce63d14fb481b9af"}
2020-07-06T15:14:03.624Z	INFO	controller_wmc	Windows worker has been removed from the cluster	{"ID": "i-0ce63d14fb481b9af"}
2020-07-06T15:14:03.624Z	ERROR	controller_wmc	error adding a Windows worker node	{"error": "VMConfigurationFailure: failed to configure Windows VM: configuring the Windows VM failed: error running bootstrapper: Process exited with status 1"}
github.com/go-logr/zapr.(*zapLogger).Error
	/home/ravig/go/src/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
github.com/openshift/windows-machine-config-operator/pkg/controller/windowsmachineconfig.(*ReconcileWindowsMachineConfig).addWorkerNodes
	windows-machine-config-operator/pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go:358
github.com/openshift/windows-machine-config-operator/pkg/controller/windowsmachineconfig.(*ReconcileWindowsMachineConfig).reconcileWindowsNodes
	windows-machine-config-operator/pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go:250
github.com/openshift/windows-machine-config-operator/pkg/controller/windowsmachineconfig.(*ReconcileWindowsMachineConfig).Reconcile
	windows-machine-config-operator/pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go:211
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/ravig/go/src/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/ravig/go/src/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/home/ravig/go/src/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/ravig/go/src/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/ravig/go/src/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/ravig/go/src/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/home/ravig/go/src/pkg/mod/k8s.io/apimachinery@v0.18.2/pkg/util/wait/wait.go:90


CI Run:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_windows-machine-config-operator/75/pull-ci-openshift-windows-machine-config-operator-master-e2e-operator/1279959578544443392/artifacts/e2e-operator/gather-extra/pods/windows-machine-config-operator_windows-machine-config-operator-57cf95787c-qcn7k_windows-machine-config-operator.log

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
WMCO Windows VM configuration failure.

Expected results:

WMCO configuration Windows node properly and Windows node joining the existing OpenShift cluster

Additional info:

Comment 3 Michael Nguyen 2020-07-14 22:01:35 UTC
Verified on 4.6.0-0.nightly-2020-07-14-092216 


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-07-14-092216   True        False         9h      Cluster version is 4.6.0-0.nightly-2020-07-14-092216

1. Remove the iptables rule that is blocking requests to the MCS from pods on a master node

$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-131-12.us-west-2.compute.internal    Ready    master   9h     v1.18.3+b9ac23f
ip-10-0-133-198.us-west-2.compute.internal   Ready    worker   141m   v1.18.3+b9ac23f
ip-10-0-136-229.us-west-2.compute.internal   Ready    worker   9h     v1.18.3+b9ac23f
ip-10-0-182-165.us-west-2.compute.internal   Ready    master   9h     v1.18.3+b9ac23f
ip-10-0-186-178.us-west-2.compute.internal   Ready    worker   9h     v1.18.3+b9ac23f
ip-10-0-192-116.us-west-2.compute.internal   Ready    worker   9h     v1.18.3+b9ac23f
ip-10-0-210-115.us-west-2.compute.internal   Ready    master   9h     v1.18.3+b9ac23f
$ oc debug node/ip-10-0-182-165.us-west-2.compute.internal
Starting pod/ip-10-0-182-165us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# /sbin/iptables -L -v -n --line-numbers    
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    2430K 4865M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
2    2430K 4865M KUBE-EXTERNAL-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
3      37M   18G KUBE-NODEPORT-NON-LOCAL  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* Ensure that non-local NodePort traffic can flow */
4      37M   18G OPENSHIFT-FIREWALL-ALLOW  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
5      33M   13G KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    3855K 3053M KUBE-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
2    28782 3194K KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
3    10901  743K OPENSHIFT-ADMIN-OUTPUT-RULES  all  --  tun0   !tun0   0.0.0.0/0            0.0.0.0/0            /* administrator overrides */
4    28244 3161K OPENSHIFT-FIREWALL-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
5        0     0 OPENSHIFT-BLOCK-OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    1986K  444M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
2      37M   21G OPENSHIFT-BLOCK-OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
3      37M   21G KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain KUBE-FIREWALL (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
2        0     0 DROP       all  --  *      *      !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-KUBELET-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain OPENSHIFT-BLOCK-OUTPUT (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
1       31  1860 REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22623 reject-with icmp-port-unreachable
2       31  1860 REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22624 reject-with icmp-port-unreachable

Chain OPENSHIFT-FIREWALL-FORWARD (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DROP       all  --  *      *       10.128.0.0/14        0.0.0.0/0            /* attempted resend after connection close */ ctstate INVALID
2    17343 2418K ACCEPT     all  --  *      *       0.0.0.0/0            10.128.0.0/14        /* forward traffic from SDN */
3    10901  743K ACCEPT     all  --  *      *       10.128.0.0/14        0.0.0.0/0            /* forward traffic to SDN */

Chain OPENSHIFT-ADMIN-OUTPUT-RULES (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain OPENSHIFT-FIREWALL-ALLOW (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1    1931K 4835M ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:4789 /* VXLAN incoming */
2    1585K  338M ACCEPT     all  --  tun0   *       0.0.0.0/0            0.0.0.0/0            /* from SDN to localhost */
3        0     0 ACCEPT     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0            /* from docker to localhost */

Chain KUBE-PROXY-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-NODEPORT-NON-LOCAL (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-EXTERNAL-SERVICES (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-SERVICES (3 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-FORWARD (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        6   240 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
2        5   570 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x1/0x1
3     175K  127M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
4        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain OPENSHIFT-SDN-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         
sh-4.4# /sbin/iptables -D OPENSHIFT-BLOCK-OUTPUT 1
sh-4.4# /sbin/iptables -L -v -n --line-numbers
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    2431K 4865M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
2    2431K 4865M KUBE-EXTERNAL-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
3      37M   18G KUBE-NODEPORT-NON-LOCAL  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* Ensure that non-local NodePort traffic can flow */
4      37M   18G OPENSHIFT-FIREWALL-ALLOW  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
5      33M   13G KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    3859K 3055M KUBE-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
2    28795 3195K KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
3    10906  744K OPENSHIFT-ADMIN-OUTPUT-RULES  all  --  tun0   !tun0   0.0.0.0/0            0.0.0.0/0            /* administrator overrides */
4    28257 3163K OPENSHIFT-FIREWALL-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
5        0     0 OPENSHIFT-BLOCK-OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1    1987K  444M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
2      37M   21G OPENSHIFT-BLOCK-OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* firewall overrides */
3      37M   21G KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain KUBE-FIREWALL (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
2        0     0 DROP       all  --  *      *      !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-KUBELET-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain OPENSHIFT-BLOCK-OUTPUT (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
1       37  2220 REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22624 reject-with icmp-port-unreachable

Chain OPENSHIFT-FIREWALL-FORWARD (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DROP       all  --  *      *       10.128.0.0/14        0.0.0.0/0            /* attempted resend after connection close */ ctstate INVALID
2    17351 2419K ACCEPT     all  --  *      *       0.0.0.0/0            10.128.0.0/14        /* forward traffic from SDN */
3    10906  744K ACCEPT     all  --  *      *       10.128.0.0/14        0.0.0.0/0            /* forward traffic to SDN */

Chain OPENSHIFT-ADMIN-OUTPUT-RULES (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain OPENSHIFT-FIREWALL-ALLOW (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1    1932K 4835M ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:4789 /* VXLAN incoming */
2    1586K  338M ACCEPT     all  --  tun0   *       0.0.0.0/0            0.0.0.0/0            /* from SDN to localhost */
3        0     0 ACCEPT     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0            /* from docker to localhost */

Chain KUBE-PROXY-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-NODEPORT-NON-LOCAL (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-EXTERNAL-SERVICES (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-SERVICES (3 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-FORWARD (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        6   240 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
2        5   570 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x1/0x1
3     179K  129M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
4        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain OPENSHIFT-SDN-CANARY (0 references)
num   pkts bytes target     prot opt in     out     source               destination         
sh-4.4# exit
sh-4.2# exit
exit

Removing debug pod ...

2. Quickly get into a worker debug node, and run the curl commands below.  You can get a token by logging into the web console. Verify that 2.2.0 ignition is served when no User-agent is added.  Verify that 3.1.0 ignition is served when the User-agent is 2.3.0

$ oc debug node/ip-10-0-136-229.us-west-2.compute.internal
Starting pod/ip-10-0-136-229us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host

sh-4.4# curl   -kH "Authorization: Bearer <TOKEN>" https://api.mnguyen46.devcluster.openshift.com:22623/config/worker" 
{"ignition":{"version":"2.2.0"},"passwd":{"users":[{"name":"core","sshAuthorizedKeys":
..snip..

sh-4.4# curl   -kH "Authorization: Bearer <TOKEN>" -H "User-agent:Ignition/2.3.0" "https://api.mnguyen46.devcluster.oenshift.com:22623/config/worker" 
{"ignition":{"version":"3.1.0"},"passwd":{"users":[{"name":"core","sshAuthorizedKeys"
..snip..

Comment 5 errata-xmlrpc 2020-10-27 16:12:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.