1662687 – Can't ssh a VM via NodePort service after VM restart

Bug 1662687 - Can't ssh a VM via NodePort service after VM restart

Summary: Can't ssh a VM via NodePort service after VM restart

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	1.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Dan Kenigsberg
QA Contact:	Meni Yakove
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-31 16:07 UTC by Yossi Segev
Modified:	2019-01-02 19:22 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-01 15:52:14 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yossi Segev 2018-12-31 16:07:36 UTC

Description of problem:
After exposing a NodePort service on a VM and its pod, the VM is accessible via ssh and the exposed port. If you restart the VM - the service and port are not accessible anymore.

Version-Release number of selected component (if applicable):
Client/server version: v0.12.0-alpha.2

How reproducible:
Always

Steps to Reproduce:
1. Create a cirros VM.
# oc create -f cluster/example/vm-cirros.yaml
2. Start the VM:
# virtctl start vm-cirros
3. Verify VMI is running.
# oc get VMI
NAME AGE PHASE IP NODENAME
vm-cirros 10m Running 10.130.0.46 cnv-executor-ysegev-node1.example.com
4. Get the pod name.
# oc get pods
NAME READY STATUS RESTARTS AGE
virt-launcher-vm-cirros-bdf7w 2/2 Running 0 11m
5. Expose a NodePort service via the pod, to be used for ssh (i.e. target port is 22).
# oc expose pod virt-launcher-vm-cirros-bdf7w --name=testnp --port=27017 -- target-port=22 --type=NodePort
6. View the service and the exposed port.
# oc get svc testnp
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
testnp NodePort 172.30.134.72 <none> 27017:31854/TCP 17m
7. Open ssh connection to the VM via the IP address of the node which hosts the VM, and the exposed port:
# ssh cirros.243.240 -p 31854
8. Make sure you are logged-in to the VM ssh console.
9. Exit the ssh console.
10. Restart the VM.
# virtctl stop vm-cirros
# virtctl start vm-cirros
11. After the VM is up again - try to ssh again to it via the same node IP and port.
# ssh cirros.243.240 -p 31854

Actual results:
ssh: connect to host 10.8.243.240 port 31854: Connection refused

Expected results:
ssh connection to the VM should be available via the exposed NodePort service, just like before restarting the VM.

Additional info:

Comment 1 Yossi Segev 2019-01-01 11:36:50 UTC

Some more investigation:
After restarting the VM:
1. The VM's pod loads with a different name than the one before the restart.
2. The exposed service's iptables rules don't load:
 These are the exposed service rules before restarting the VM:
-A KUBE-NODEPORTS -p tcp -m comment --comment "kubevirt/testnp:" -m tcp --dport 31221 -j KUBE-MARK-MASQ                                                                                                            
-A KUBE-NODEPORTS -p tcp -m comment --comment "kubevirt/testnp:" -m tcp --dport 31221 -j KUBE-SVC-4GYGD62LGYM2BRMM
-A KUBE-SEP-HYW5LKWDE7SML36T -s 10.130.0.46/32 -m comment --comment "kubevirt/testnp:" -j KUBE-MARK-MASQ
-A KUBE-SEP-HYW5LKWDE7SML36T -p tcp -m comment --comment "kubevirt/testnp:" -m tcp -j DNAT --to-destination 10.130.0.46:22
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: cluster IP" -m tcp --dport 27017 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: cluster IP" -m tcp --dport 27017 -j KUBE-SVC-4GYGD62LGYM2BRMM
-A KUBE-SVC-4GYGD62LGYM2BRMM -m comment --comment "kubevirt/testnp:" -j KUBE-SEP-HYW5LKWDE7SML36T

And these the rules after the restart:
-A KUBE-EXTERNAL-SERVICES -p tcp -m comment --comment "kubevirt/testnp: has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31221 -j REJECT --reject-with icmp-port-unreachable                          
-A KUBE-SERVICES -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: has no endpoints" -m tcp --dport 27017 -j REJECT --reject-with icmp-port-unreachable

Comment 2 Yossi Segev 2019-01-01 15:41:20 UTC

After consulting with Dan and Sebastian - the correct way to expose a service on a VM is via "virtctl expose" command, rather than "oc expose".
However - you still cannot ssh the VM via the exposed NodePort service after restarting the VM.

Steps to Reproduce:
1. Create a cirros VM.
# oc create -f cluster/example/vm-cirros.yaml
2. Start the VM:
# virtctl start vm-cirros
3. Verify VMI is running.
# oc get VMI
NAME AGE PHASE IP NODENAME
vm-cirros 10m Running 10.130.0.46 cnv-executor-ysegev-node1.example.com
4. Expose a NodePort service, to be used for ssh (i.e. target port is 22).
# virtctl expose vmi vm-cirros --name=testnp --port=27017 --target-port=22 --type=NodePort
6. View the service and the exposed port.
# oc get svc testnp
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
testnp NodePort 172.30.127.7 <none> 27017:31682/TCP 13m
7. Open ssh connection to the VM via the IP address of the node which hosts the VM, and the exposed port:
# ssh cirros.243.240 -p 31682
8. Make sure you are logged-in to the VM ssh console.
9. Exit the ssh console.
10. Restart the VM.
# virtctl stop vm-cirros
# virtctl start vm-cirros
11. After the VM is up again - try to ssh again to it via the same node IP and port.
# ssh cirros.243.240 -p 31854

Actual results:
ssh request is rejected:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:TvnAFvlwkTUT7zo3p8+bjXc2HTxhM28wjHg0bzAwVm4.
Please contact your system administrator.
Add correct host key in /home/ysegev/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/ysegev/.ssh/known_hosts:35
ECDSA host key for [10.8.243.240]:31682 has changed and you have requested strict checking.
Host key verification failed.

Comment 3 Yossi Segev 2019-01-01 15:52:14 UTC

The way to resolve this is by deleting the previous entry of this connection in ~/.ssh/known_hosts.
The exact entry that should be deleted is referenced in error message:
Offending ECDSA key in /home/ysegev/.ssh/known_hosts:35

Comment 5 Yossi Segev 2019-01-02 09:22:56 UTC

Extra info:
The scenario in comment 2 happens when restarting the VM using virtctl command, i.e.:
 # virtctl stop vm-cirros
 # virtctl start vm-cirros

When restarting from within the VM, on the other hand, there's no prevention to ssh the VM after startup is done.
Try the following the following scenario:
1.a. Open ssh connection to the VM:
 # ssh cirros.243.240 -p 31682
or
1.b. Open a console to the VM using virtctl:
 # virtctl console vm-cirros
2. Once inside the VM console - reboot the VM using bash reboot command:
 # sudo reboot
3. After reboot is done and the VM is up again (it takes ~30 seconds) - try to ssh it again via the NodePort:
 # ssh cirros.243.240 -p 31682
You should be able to connect with no failure or error.

Comment 9 Sebastian Scheinkman 2019-01-02 19:22:51 UTC

Hi Yossi,

This is not a problem.

This append because the vm-cirros is using containerDisk.

That means that every time you reboot the vm I will start from a new disk (the ssh change).

if you don't want this you need to deploy a vm with persistent volume so all the vm configuration will survive a reboot.

Note You need to log in before you can comment on or make changes to this bug.