Description of problem: WMCO is unable to properly recognize the 1060 error being thrown when checking the status of Windows services. ``` 2021-08-31T21:48:31.631Z INFO wc 19.70.72.80 configuring 2021-08-31T21:48:31.772Z ERROR wc 19.70.72.80 error running {"cmd": "sc.exe qc windows_exporter", "out": "[SC] OpenService FAILED 1060:\r\n\r\nThe specified service does not exist as an installed service.\r\n\r\n", "error": "Process exited with status 1"} github.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).serviceExists /build/windows-machine-config-operator/pkg/windows/windows.go:671 github.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).ensureServiceNotRunning /build/windows-machine-config-operator/pkg/windows/windows.go:591 github.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).EnsureRequiredServicesStopped /build/windows-machine-config-operator/pkg/windows/windows.go:268 github.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).Configure /build/windows-machine-config-operator/pkg/windows/windows.go:314 github.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure /build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:159 github.com/openshift/windows-machine-config-operator/controllers.(*instanceReconciler).ensureInstanceIsUpToDate /build/windows-machine-config-operator/controllers/controllers.go:81 github.com/openshift/windows-machine-config-operator/controllers.(*ConfigMapReconciler).ensureInstancesAreUpToDate /build/windows-machine-config-operator/controllers/configmap_controller.go:170 github.com/openshift/windows-machine-config-operator/controllers.(*ConfigMapReconciler).reconcileNodes /build/windows-machine-config-operator/controllers/configmap_controller.go:138 github.com/openshift/windows-machine-config-operator/controllers.(*ConfigMapReconciler).Reconcile /build/windows-machine-config-operator/controllers/configmap_controller.go:118 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 2021-08-31T21:48:31.772Z DEBUG controller-runtime.manager.events Warning {"object": {"kind":"ConfigMap","namespace":"openshift-windows-machine-config-operator","name":"windows-instances","uid":"386a717b-1c56-49d1-ac38-b3098d714b77","apiVersion":"v1","resourceVersion":"3047542"}, "reason": "InstanceSetupFailure", "message": "error configuring host with address 19.70.72.80: configuring the Windows VM failed: unable to stop required services: could not stop service %!d(MISSING): error checking if service exists: error running sc.exe qc windows_exporter: Process exited with status 1"} 2021-08-31T21:48:31.772Z ERROR controller-runtime.manager.controller.configmap Reconciler error {"reconciler group": "", "reconciler kind": "ConfigMap", "name": "windows-instances", "namespace": "openshift-windows-machine-config-operator", "error": "error configuring host with address 19.70.72.80: configuring the Windows VM failed: unable to stop required services: could not stop service %d: error checking if service exists: error running sc.exe qc windows_exporter: Process exited with status 1", "errorVerbose": "Process exited with status 1\nerror running sc.exe qc windows_exporter\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).Run\n\t/build/windows-machine-config-operator/pkg/windows/windows.go:252\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).serviceExists\n\t/build/windows-machine-config-operator/pkg/windows/windows.go ``` Version-Release number of selected component (if applicable): How reproducible: I am not able to reproduce this at the moment. Steps to Reproduce: 1. Configure a Windows node Actual results: WMCO is erroring out when configuring a node due to a service not existing. Expected results: WMCO successfully configures the node. Additional info: The existing string to parse for is not present in these new logs: https://github.com/openshift/windows-machine-config-operator/blob/2d57276975536a12bec210bf3cdca53462fdc5a6/pkg/windows/windows.go#L73-L76
I was able to reproduce this through these steps: 1) ssh into the VM, before adding it to the windows-instances configmap, and run ``` New-ItemProperty -Path "HKLM:\SOFTWARE\OpenSSH" -Name DefaultShell -Value "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -PropertyType String -Force ``` This sets the default SSH shell to powershell 5.x. 2) Add the VM to the windows-instance configmap, to have it configured as a BYOH node. Actual Results: The configuration fails and throws the sc.exe error. Expected results: The configuration succeeds and the VM is added as a BYOH node.
I've updated the PR associated with this bug, I've been able to manually validate that it solves issues when a VM has Powershell as the default SSH shell. The tests have an issue, specifically that the command used to validate that all Windows services are in the correct state is not able to parsed properly. Once I have time to correct that, I will open the PR up for reviews.
I've received feedback that the scope of changes I'm making in the PR should be scaled back if possible. I'm taking another look at ways that better character escaping can be used to fix this issue. I adding the `-Command` argument to the existing `powershell.exe -NonInteractive -ExecutionPolicy Bypass` command prefix might help to solve this issue. When changing the operator code to include this, I was seeing a strange error on a VM with powershell as a default shell. Escaped quotes (\") are present around the `-Command` argument's value in the given command, but they are missing when the command is actually executed: ``` 2021-09-30T16:09:49.758Z ERROR wc ip-10-0-154-150.ec2.internal error running {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command \"if(Test-Path -Path C:\\k\\){rm -r C:\\k\\}\"", "out": "powershell.exe : ScriptBlock should only be specified as a value of the Command parameter.\r\nAt line:1 char:1\r\n+ powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command if(Te ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : InvalidArgument: (:) [], ParameterBindingException\r\n + FullyQualifiedErrorId : IncorrectValueForCommandParameter\r\n \r\n", "error": "Process exited with status 1"} ``` I gave this a try with an interactive SSH terminal, to see if the escaped quotes being missing would cause this error. I wanted to ensure that this was not just a formatting bug with the returned error. ``` PS C:\Users\Administrator> powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command if(Test-Path -Path C:\\k\\){rm -r C:\\k\test\\} powershell.exe : ScriptBlock should only be specified as a value of the Command parameter. At line:1 char:1 + powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command if(Te ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidArgument: (:) [], ParameterBindingException + FullyQualifiedErrorId : IncorrectValueForCommandParameter PS C:\Users\Administrator> powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command "if(Test-Path -Path C:\\k\\){rm -r C:\\k\test\\}" rm : Cannot find path 'C:\k\test\' because it does not exist. At line:1 char:29 + if(Test-Path -Path C:\\k\\){rm -r C:\\k\test\\} + ~~~~~~~~~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (C:\k\test\:String) [Remove-Item], ItemNotFoundException + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.RemoveItemCommand ``` This confirms that the same error is given when the quotes are missing, so it is safe to assume that the quotes are being dropped somewhere when the command is being executed. I now wanted to see if this was an issue with the go implementation of SSH or if this is will occur when using SSH in general. So I tested this using the linux ssh program. ``` sh-4.4# ssh -i key.pem Administrator.internal "powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command \"if(Test-Path -Path C:\\k\\){rm -r C:\\k\test\\}\"" powershell.exe : ScriptBlock should only be specified as a value of the Command parameter. At line:1 char:1 + powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command if(Te ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidArgument: (:) [], ParameterBindingException + FullyQualifiedErrorId : IncorrectValueForCommandParameter ``` We can see the same error here. Tried it against the VM with cmd as the default shell as well, to show there are no parsing issues there: ``` sh-4.4# ssh -i key.pem Administrator.internal "powershell.exe -NonInteractive -ExecutionPolicy Bypass -Command \"if(Test-Path -Path C:\\k\\){rm -r C:\\k\test\\}\"" rm : Cannot find path 'C:\k\test\' because it does not exist. At line:1 char:27 + if(Test-Path -Path C:\k\){rm -r C:\k\test\} + ~~~~~~~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (C:\k\test\:String) [Remove-Item], ItemNotFoundException + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.RemoveItemCommand ``` I wanted to see if I could get quotes to be parsed in powershell at all. So I tried a few commands: ``` sh-4.4# ssh -i key.pem Administrator.internal "echo \"hello\" " hello sh-4.4# ssh -i key.pem Administrator.internal "echo \\"hello\\" " \hello\ sh-4.4# ssh -i key.pem Administrator.internal "echo \\\"hello\\\" " hello sh-4.4# ssh -i key.pem Administrator.internal "echo \"\"\"hello\"\"\" " hello sh-4.4# ssh -i key.pem Administrator.internal "echo 'hello' " hello sh-4.4# ssh -i key.pem Administrator.internal 'echo "hello" ' hello ``` Then tried this against a VM with cmd as the default shell, and had no issues: ``` sh-4.4# ssh -i key.pem Administrator.internal 'echo "hello"' "hello" ``` At this point I'm trying to figure out what I can do to get quotes to be processed properly, and I'm hoping that will help come up with a less invasive solution to this issue.
verified on "5.0.0+a88772f" ssh -i key.pem Administrator.compute.internal 'echo "hello"' load pubkey "key.pem": invalid format "hello"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0577