This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2165895 - Cannot SSH into VM over NodePort and Console's FQDN when using OVNKubernetes networking
Summary: Cannot SSH into VM over NodePort and Console's FQDN when using OVNKubernetes ...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: User Experience
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.14.0
Assignee: Tal Nisan
QA Contact: Guohua Ouyang
URL:
Whiteboard:
: 2152551 2186641 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-31 12:06 UTC by Petr Horáček
Modified: 2023-08-22 11:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-22 11:14:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   CNV-24889 0 None None None 2023-08-22 11:14:17 UTC
Red Hat Issue Tracker OCPBUGS-12710 0 None None None 2023-04-25 12:59:23 UTC

Description Petr Horáček 2023-01-31 12:06:53 UTC
Description of problem:
You cannot ssh into a VM when using "networkType: OVNKubernetes"

Copied from https://issues.redhat.com/browse/CNV-21779

Version-Release number of selected component (if applicable):


How reproducible:
always


Steps to Reproduce:
1. Perform an IPI install on baremetal and set the following in the install-config.yaml
   networking:
   machineNetwork:
   - cidr: 172.22.0.0/16
   networkType: OVNKubernetes
2. Install openshift virtualization
3. Create a VM using the GUI and select the box to enable ssh access to the vm
4. Attach an ssh public key
5. Start the VM
6. Try to ssh into the VM.

Actual results:
get the following message:
ssh: connect to host console-openshift-console.apps.cluster.example.org port 30378: Connection timed out

Expected results:
a successful connection to the VM




Additional info:
Apllying the following command will allow ssh access into the VM.

oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":"routingViaHost":true}}}}}' --type=merge

It would be useful to have this listed in the documentation on creating a VM.

Table 4.6 in the following documentation mentions this option.
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10/html-single/networking/index

But without good understanding of OpenShift networking, this may not be enough for someone to understand this needs to be done.

Comment 1 Petr Horáček 2023-01-31 12:12:16 UTC
Hello Nir, could QE please try to reproduce it on downstream? This sounds like a serious UX issue. I would like to see if it can be reproduced using the UI, and if so, if we have the same issue on CLI.

Comment 2 joherr 2023-03-23 20:44:34 UTC
This issue is not just with the UI. If you create a VM and create a service from command line, then you would encounter it as well if you are using OVNKubernetes for networking.

and it is not just for ssh either, it would present a problem no matter what service you create if using OVNKubernetes. By default OVNKubernetes does not route egress traffic from VMs. The patch enables it.

but if we provide a checkbox to enable ssh access, then I would think the user should at least get a warning if using OVNKubernetes. Best experience would be to apply the patch automatically.

but then you would also need to do the same thing under the VMs Console tab for a windows system using "Desktop viewer (RDP). Since it just has a button to create the RDP service under there.

Another option could be to have the virtualization operator apply the patch when it is installed, if needed.

Comment 5 awax 2023-03-23 21:56:03 UTC
Thank you, John, for this additional information and help.
@phoracek I managed to recreate this scenario:
1. Deployed a BM cluster (bm04-cnvqe-rdu2 in my case) with v4.13.
2. Create a vm (vm.yaml).
3. Expose it using a service, I choose a NodePort service (service.yaml).
4. Connect to one of the nodes:
$ oc debug node/cnv-qe-17.cnvqe.lab.eng.rdu2.redhat.com
5. Try to ssh to the VM:
sh-4.4# ssh fedora.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com -p 31621
I got the error message indicated in the bug:
ssh: connect to host console-openshift-console.apps.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com port 31621: Connection timed out
To solve it:
6. Create a public/private key:
$ ssh-keygen -t rsa -b 2048 -f for_vms
7. Add the public key to the VM as part of its creation OR add it to the existing VM, in ~/.ssh/authorized_keys (you can connect to it using virtctl console).
Make sure to add it in a single line (in vi - you can use the 'J' option for that).
8. Apply the patch and ensure that the "routingViaHost" field of the network operator changes from 'false' to 'true':
$ oc patch network.operator cluster -p '{"spec": {"defaultNetwork": {"ovnKubernetesConfig": {"gatewayConfig": {"routingViaHost": true}}}}}' --type merge
9. Move back to the node and ssh again using the '-i' option and the name of the private key:
$ ssh fedora.bm04-cnvqe-rdu2.cnvqe.lab.eng.rdu2.redhat.com -p 31621 -i for_vms
You should be able to connect to the VM using ssh.

Comment 6 Petr Horáček 2023-04-13 12:14:36 UTC
Since this affects a basic UI flow, we should cover this regression as a "known issue".

Comment 7 awax 2023-04-19 15:18:57 UTC
Just a quick comment, the command in the initial comment has a typo, this is the W/A:
oc patch network.operator cluster -p '{"spec": {"defaultNetwork": {"ovnKubernetesConfig": {"gatewayConfig": {"routingViaHost": true}}}}}' --type merge

Comment 8 Petr Horáček 2023-04-21 14:10:01 UTC
It seems that if I used the FQDN of Console with NodePort (this is what's suggested by the UI), it gets stuck.

However, when I replace it with domain name of one of the nodes, it works fine.

I suspect this is a UI bug (or a bug that can be easily resolved by adjusting the UI). When "SSH service type" is set to "SSH over NodePort", we should use domain name or IP of one of the nodes in the copy-paste ssh snippet instead of the console FQDN.

This is not VM-specific - the same issue affects pods.

Comment 9 Guohua Ouyang 2023-04-24 02:44:29 UTC
*** Bug 2152551 has been marked as a duplicate of this bug. ***

Comment 10 Guohua Ouyang 2023-04-24 03:33:56 UTC
I opened the bug 2152551 but didn't find the root cause at that time, checked it again the issue still exists when the network type is 'OVNKubernetes'.

Comment 11 Petr Horáček 2023-04-25 12:59:23 UTC
I have opened https://issues.redhat.com/browse/OCPBUGS-12710 on OVN Kubernetes, to see if this can be solved directly on the CNI. If that won't be possible, we should start offering LoadBalancer as the primary choice and removing the option to create a NodePort SSH service from the UI, since it cannot be achieved with the default OCP configuration.

Comment 12 Petr Horáček 2023-04-25 16:34:42 UTC
*** Bug 2186641 has been marked as a duplicate of this bug. ***

Comment 13 Petr Horáček 2023-06-26 14:40:16 UTC
It is not clear whether this will be ever fixed on OVN Kubernetes. I think we should remove the NodePort option from our UI and leave only LoadBalancer.

Moving to UX.

Comment 14 Dominik Holler 2023-07-27 09:55:50 UTC
Is this only an issue to the nodeport, or is metallb on the first NIC affected the same way?

Comment 15 Petr Horáček 2023-07-27 10:16:01 UTC
IIUIC this is not related to the index of the NIC, but just to the routing done to the Console FQDN:

This only affects NodePort when you try to access it through the console FQDN. Accessing NodePort through cluster node IP or cluster node FQDN directly works ok.

LoadBalancer (MetalLB) services get their unique IP, so you would not be able to access them through node IP/FQDN.

Comment 16 Petr Horáček 2023-08-22 07:25:18 UTC
After discussing this with Dan and Ronen, we would like to resolve this bug in the following way:

1) The "SSH over NodePort" should be removed from the VM->Details "SSH service type" dropdown.
2) A new configuration option "SSH NodePort service address" should be added to CNV's UI configuration. This would be disabled/empty by default.
3) If the admin sets an IP or FQDN in this new option, the "SSH over NodePort" should become available again.
4) When "SSH over NodePort" is used, the generated copy-paste command should be: `ssh centos@<address from configuration> -p <generated port>`


Note You need to log in before you can comment on or make changes to this bug.