Bug 2165895
| Summary: | Cannot SSH into VM over NodePort and Console's FQDN when using OVNKubernetes networking | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Petr Horáček <phoracek> |
| Component: | User Experience | Assignee: | Tal Nisan <tnisan> |
| Status: | CLOSED MIGRATED | QA Contact: | Guohua Ouyang <gouyang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13.0 | CC: | dholler, dollierp, gouyang, joherr, nrozen, shaselde, ycui |
| Target Milestone: | --- | ||
| Target Release: | 4.14.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-22 11:14:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Horáček
2023-01-31 12:06:53 UTC
Hello Nir, could QE please try to reproduce it on downstream? This sounds like a serious UX issue. I would like to see if it can be reproduced using the UI, and if so, if we have the same issue on CLI. This issue is not just with the UI. If you create a VM and create a service from command line, then you would encounter it as well if you are using OVNKubernetes for networking. and it is not just for ssh either, it would present a problem no matter what service you create if using OVNKubernetes. By default OVNKubernetes does not route egress traffic from VMs. The patch enables it. but if we provide a checkbox to enable ssh access, then I would think the user should at least get a warning if using OVNKubernetes. Best experience would be to apply the patch automatically. but then you would also need to do the same thing under the VMs Console tab for a windows system using "Desktop viewer (RDP). Since it just has a button to create the RDP service under there. Another option could be to have the virtualization operator apply the patch when it is installed, if needed. Thank you, John, for this additional information and help.
@phoracek I managed to recreate this scenario:
1. Deployed a BM cluster (bm04-cnvqe-rdu2 in my case) with v4.13.
2. Create a vm (vm.yaml).
3. Expose it using a service, I choose a NodePort service (service.yaml).
4. Connect to one of the nodes:
$ oc debug node/cnv-qe-17.cnvqe.lab.eng.rdu2.redhat.com
5. Try to ssh to the VM:
sh-4.4# ssh fedora.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com -p 31621
I got the error message indicated in the bug:
ssh: connect to host console-openshift-console.apps.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com port 31621: Connection timed out
To solve it:
6. Create a public/private key:
$ ssh-keygen -t rsa -b 2048 -f for_vms
7. Add the public key to the VM as part of its creation OR add it to the existing VM, in ~/.ssh/authorized_keys (you can connect to it using virtctl console).
Make sure to add it in a single line (in vi - you can use the 'J' option for that).
8. Apply the patch and ensure that the "routingViaHost" field of the network operator changes from 'false' to 'true':
$ oc patch network.operator cluster -p '{"spec": {"defaultNetwork": {"ovnKubernetesConfig": {"gatewayConfig": {"routingViaHost": true}}}}}' --type merge
9. Move back to the node and ssh again using the '-i' option and the name of the private key:
$ ssh fedora.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com -p 31621 -i for_vms
You should be able to connect to the VM using ssh.
Since this affects a basic UI flow, we should cover this regression as a "known issue". Just a quick comment, the command in the initial comment has a typo, this is the W/A:
oc patch network.operator cluster -p '{"spec": {"defaultNetwork": {"ovnKubernetesConfig": {"gatewayConfig": {"routingViaHost": true}}}}}' --type merge
It seems that if I used the FQDN of Console with NodePort (this is what's suggested by the UI), it gets stuck. However, when I replace it with domain name of one of the nodes, it works fine. I suspect this is a UI bug (or a bug that can be easily resolved by adjusting the UI). When "SSH service type" is set to "SSH over NodePort", we should use domain name or IP of one of the nodes in the copy-paste ssh snippet instead of the console FQDN. This is not VM-specific - the same issue affects pods. *** Bug 2152551 has been marked as a duplicate of this bug. *** I opened the bug 2152551 but didn't find the root cause at that time, checked it again the issue still exists when the network type is 'OVNKubernetes'. I have opened https://issues.redhat.com/browse/OCPBUGS-12710 on OVN Kubernetes, to see if this can be solved directly on the CNI. If that won't be possible, we should start offering LoadBalancer as the primary choice and removing the option to create a NodePort SSH service from the UI, since it cannot be achieved with the default OCP configuration. *** Bug 2186641 has been marked as a duplicate of this bug. *** It is not clear whether this will be ever fixed on OVN Kubernetes. I think we should remove the NodePort option from our UI and leave only LoadBalancer. Moving to UX. Is this only an issue to the nodeport, or is metallb on the first NIC affected the same way? IIUIC this is not related to the index of the NIC, but just to the routing done to the Console FQDN: This only affects NodePort when you try to access it through the console FQDN. Accessing NodePort through cluster node IP or cluster node FQDN directly works ok. LoadBalancer (MetalLB) services get their unique IP, so you would not be able to access them through node IP/FQDN. After discussing this with Dan and Ronen, we would like to resolve this bug in the following way: 1) The "SSH over NodePort" should be removed from the VM->Details "SSH service type" dropdown. 2) A new configuration option "SSH NodePort service address" should be added to CNV's UI configuration. This would be disabled/empty by default. 3) If the admin sets an IP or FQDN in this new option, the "SSH over NodePort" should become available again. 4) When "SSH over NodePort" is used, the generated copy-paste command should be: `ssh centos@<address from configuration> -p <generated port>` |