Bug 1662687 - Can't ssh a VM via NodePort service after VM restart
Summary: Can't ssh a VM via NodePort service after VM restart
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 1.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-31 16:07 UTC by Yossi Segev
Modified: 2019-01-02 19:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-01 15:52:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yossi Segev 2018-12-31 16:07:36 UTC
Description of problem:
After exposing a NodePort service on a VM and its pod, the VM is accessible via ssh and the exposed port. If you restart the VM - the service and port are not accessible anymore.


Version-Release number of selected component (if applicable):
Client/server version: v0.12.0-alpha.2


How reproducible:
Always


Steps to Reproduce:
1. Create a cirros VM.
 # oc create -f cluster/example/vm-cirros.yaml
2. Start the VM:
 # virtctl start vm-cirros
3. Verify VMI is running.
 # oc get VMI
NAME            AGE       PHASE     IP            NODENAME
vm-cirros       10m       Running   10.130.0.46   cnv-executor-ysegev-node1.example.com
4. Get the pod name.
 # oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
virt-launcher-vm-cirros-bdf7w       2/2       Running   0          11m
5. Expose a NodePort service via the pod, to be used for ssh (i.e. target port is 22).
 # oc expose pod virt-launcher-vm-cirros-bdf7w --name=testnp --port=27017 -- target-port=22 --type=NodePort
6. View the service and the exposed port.
 # oc get svc testnp
NAME      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
testnp    NodePort   172.30.134.72   <none>        27017:31854/TCP   17m
7. Open ssh connection to the VM via the IP address of the node which hosts the VM, and the exposed port:
 # ssh cirros.243.240 -p 31854
8. Make sure you are logged-in to the VM ssh console.
9. Exit the ssh console.
10. Restart the VM.
 # virtctl stop vm-cirros
 # virtctl start vm-cirros
11. After the VM is up again - try to ssh again to it via the same node IP and port.
 # ssh cirros.243.240 -p 31854


Actual results:
ssh: connect to host 10.8.243.240 port 31854: Connection refused


Expected results:
ssh connection to the VM should be available via the exposed NodePort service, just like before restarting the VM.

Additional info:

Comment 1 Yossi Segev 2019-01-01 11:36:50 UTC
Some more investigation:
After restarting the VM:
1. The VM's pod loads with a different name than the one before the restart.
2. The exposed service's iptables rules don't load:
 These are the exposed service rules before restarting the VM:
-A KUBE-NODEPORTS -p tcp -m comment --comment "kubevirt/testnp:" -m tcp --dport 31221 -j KUBE-MARK-MASQ                                                                                                            
-A KUBE-NODEPORTS -p tcp -m comment --comment "kubevirt/testnp:" -m tcp --dport 31221 -j KUBE-SVC-4GYGD62LGYM2BRMM
-A KUBE-SEP-HYW5LKWDE7SML36T -s 10.130.0.46/32 -m comment --comment "kubevirt/testnp:" -j KUBE-MARK-MASQ
-A KUBE-SEP-HYW5LKWDE7SML36T -p tcp -m comment --comment "kubevirt/testnp:" -m tcp -j DNAT --to-destination 10.130.0.46:22
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: cluster IP" -m tcp --dport 27017 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: cluster IP" -m tcp --dport 27017 -j KUBE-SVC-4GYGD62LGYM2BRMM
-A KUBE-SVC-4GYGD62LGYM2BRMM -m comment --comment "kubevirt/testnp:" -j KUBE-SEP-HYW5LKWDE7SML36T

And these the rules after the restart:
-A KUBE-EXTERNAL-SERVICES -p tcp -m comment --comment "kubevirt/testnp: has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31221 -j REJECT --reject-with icmp-port-unreachable                          
-A KUBE-SERVICES -d 172.30.251.47/32 -p tcp -m comment --comment "kubevirt/testnp: has no endpoints" -m tcp --dport 27017 -j REJECT --reject-with icmp-port-unreachable

Comment 2 Yossi Segev 2019-01-01 15:41:20 UTC
After consulting with Dan and Sebastian - the correct way to expose a service on a VM is via "virtctl expose" command, rather than "oc expose".
However - you still cannot ssh the VM via the exposed NodePort service after restarting the VM.



Steps to Reproduce:
1. Create a cirros VM.
 # oc create -f cluster/example/vm-cirros.yaml
2. Start the VM:
 # virtctl start vm-cirros
3. Verify VMI is running.
 # oc get VMI
NAME            AGE       PHASE     IP            NODENAME
vm-cirros       10m       Running   10.130.0.46   cnv-executor-ysegev-node1.example.com
4. Expose a NodePort service, to be used for ssh (i.e. target port is 22).
 # virtctl expose vmi vm-cirros --name=testnp --port=27017 --target-port=22 --type=NodePort
6. View the service and the exposed port.
 # oc get svc testnp
NAME      TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)           AGE
testnp    NodePort   172.30.127.7   <none>        27017:31682/TCP   13m
7. Open ssh connection to the VM via the IP address of the node which hosts the VM, and the exposed port:
 # ssh cirros.243.240 -p 31682
8. Make sure you are logged-in to the VM ssh console.
9. Exit the ssh console.
10. Restart the VM.
 # virtctl stop vm-cirros
 # virtctl start vm-cirros
11. After the VM is up again - try to ssh again to it via the same node IP and port.
 # ssh cirros.243.240 -p 31854


Actual results:
ssh request is rejected:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:TvnAFvlwkTUT7zo3p8+bjXc2HTxhM28wjHg0bzAwVm4.
Please contact your system administrator.
Add correct host key in /home/ysegev/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/ysegev/.ssh/known_hosts:35
ECDSA host key for [10.8.243.240]:31682 has changed and you have requested strict checking.
Host key verification failed.

Comment 3 Yossi Segev 2019-01-01 15:52:14 UTC
The way to resolve this is by deleting the previous entry of this connection in ~/.ssh/known_hosts.
The exact entry that should be deleted is referenced in error message:
Offending ECDSA key in /home/ysegev/.ssh/known_hosts:35

Comment 5 Yossi Segev 2019-01-02 09:22:56 UTC
Extra info:
The scenario in comment 2 happens when restarting the VM using virtctl command, i.e.:
 # virtctl stop vm-cirros
 # virtctl start vm-cirros

When restarting from within the VM, on the other hand, there's no prevention to ssh the VM after startup is done.
Try the following the following scenario:
1.a. Open ssh connection to the VM:
 # ssh cirros.243.240 -p 31682
or
1.b. Open a console to the VM using virtctl:
 # virtctl console vm-cirros
2. Once inside the VM console - reboot the VM using bash reboot command:
 # sudo reboot
3. After reboot is done and the VM is up again (it takes ~30 seconds) - try to ssh it again via the NodePort:
 # ssh cirros.243.240 -p 31682
You should be able to connect with no failure or error.

Comment 9 Sebastian Scheinkman 2019-01-02 19:22:51 UTC
Hi Yossi,

This is not a problem.

This append because the vm-cirros is using containerDisk.

That means that every time you reboot the vm I will start from a new disk (the ssh change).

if you don't want this you need to deploy a vm with persistent volume so all the vm configuration will survive a reboot.


Note You need to log in before you can comment on or make changes to this bug.