1608344 – Can't connect to windows vm

Bug 1608344 - Can't connect to windows vm

Summary: Can't connect to windows vm

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sebastian Scheinkman
QA Contact:	Yan Du
Docs Contact:
URL:
Whiteboard:	network
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-25 10:45 UTC by zhe peng
Modified:	2018-10-11 08:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-16 21:01:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Nelly Credi 2018-08-02 12:25:26 UTC

@Zhe can you please update?

Comment 3 Fabian Deutsch 2018-08-07 12:07:23 UTC

This could be because downstream is not supporting SLIRP.

It might be that for CNV we need to use the delegateIP option by default.
This would mean that we need to make the default network approach configurable.

Comment 9 Ihar Hrachyshka 2018-08-10 07:04:23 UTC

(In reply to Fabian Deutsch from comment #3)
> This could be because downstream is not supporting SLIRP.
> 
> It might be that for CNV we need to use the delegateIP option by default.
> This would mean that we need to make the default network approach
> configurable.

Could you please explain why SLIRP is relevant here? The example file used (vmi-windows.yaml) doesn't use slirp but bridge (with implicit delegateIp = true). Even if it would not specify bridge explicitly, kubevirt will add a default pod interface that is of bridge type.

Comment 10 Ihar Hrachyshka 2018-08-10 07:13:19 UTC

(In reply to Vladik Romanovsky from comment #6)
> It's probably somehow related to dhcp. cirros uses a udhcp, while rhel and
> windows have more strict clients.
> 

I am not sure you can link it to DHCP with the limited info captured in the bug. You later say that you don't see any traffic from Windows VMI at all. Have we explored what Windows VMI does when it works in terms of DHCP? How many discovery frames does it send and how frequently? I can imagine that e1000 model (used in the yaml example) takes longer time to initialize link (as we observed in the past with cirros), but I would imagine it doesn't take minutes.

What if we swap e1000 to something else? (virtio? does the image have virtio drivers?) 

> unfortunately, we don't install tcpdump in the pods, so it's harder to debug.
> I'll try to debug with a RHEL vm, when Yan will start one on the environment
> I'm using.
> 

It's not the first time lack of tcpdump in the images makes debugging more complicated. Maybe we should consider adding it, at least in dev builds. BTW later in the comments, you said that you ran tcpdump on pod interfaces. Should we maybe capture how you did it somewhere? (I assume you did it without rebuilding virt-handler image?)

Comment 11 Nelly Credi 2018-08-14 12:29:07 UTC

@Yan, do you have an env devs can look at?
is it reproduced on 0.6.3?

Comment 12 Fabian Deutsch 2018-08-14 13:51:01 UTC

(In reply to Ihar Hrachyshka from comment #9)
> (In reply to Fabian Deutsch from comment #3)
> > This could be because downstream is not supporting SLIRP.
> > 
> > It might be that for CNV we need to use the delegateIP option by default.
> > This would mean that we need to make the default network approach
> > configurable.
> 
> Could you please explain why SLIRP is relevant here? The example file used
> (vmi-windows.yaml) doesn't use slirp but bridge (with implicit delegateIp =
> true). Even if it would not specify bridge explicitly, kubevirt will add a
> default pod interface that is of bridge type.

So, back then - without details - I actually thought (needs to be confirmed) that SLIRP is not supported in qemu-kvm-rhev which is used by the downstream builds.
This would mean that any VM connected to the network using SLIRP would not have connectivity.

What I missed is that windows-vmi.yaml does not use slirp.

Comment 13 Yan Du 2018-08-15 01:50:05 UTC

(In reply to Nelly Credi from comment #11)
> @Yan, do you have an env devs can look at?
> is it reproduced on 0.6.3?

Yes, It could be reproduced on ds v0.6.3. I have preserved a env for debugging on our openstack. Ping me if you need

Comment 14 Vladik Romanovsky 2018-08-15 15:31:42 UTC

(In reply to Yan Du from comment #13)
> (In reply to Nelly Credi from comment #11)
> > @Yan, do you have an env devs can look at?
> > is it reproduced on 0.6.3?
> 
> Yes, It could be reproduced on ds v0.6.3. I have preserved a env for
> debugging on our openstack. Ping me if you need

@Yan, thanks.
I think we need an env. where we could get to the vms' vnc console, to better understand what's happening with the windows OS.
Is there such an environment?

Thanks!

Comment 16 zhe peng 2018-08-16 07:25:57 UTC

I found another way to get the vm status.
we can get screenshot of vm

# oc exec -p virt-launcher-vmi-windows-9pwk6 -- virsh screenshot 1 --file /tmp/t.ppm
then cp ppm file and check status, in my env. the windows not up, it's crashed, i will check another healthy win images.

Comment 17 zhe peng 2018-08-16 11:27:17 UTC

hi
after i change my windows image, i re-test this,
windows vm up after 30mins, and i can ping it now, but i still can't use vnc to connect the vm, it always show "Waiting for display 1", 
when i use virsh screenshot, i get the view of vm, it show windows interface.
I left my environment(comment 15)

Comment 18 Vladik Romanovsky 2018-08-16 18:39:36 UTC

(In reply to zhe peng from comment #17)
> hi
> after i change my windows image, i re-test this,
> windows vm up after 30mins, and i can ping it now, but i still can't use vnc
> to connect the vm, it always show "Waiting for display 1", 
> when i use virsh screenshot, i get the view of vm, it show windows interface.
> I left my environment(comment 15)

Hi,

Thanks a lot for reproducing it.
These are good news, that this is not a networking problem after all.

I'll take a look at the vnc problem, but overall, I'd suggest closing this bug. (perhaps we'll need another one for the vnc issue)

Thanks,
Vladik

Comment 19 Dan Kenigsberg 2018-08-16 21:01:07 UTC

Zhe, please file an independent bug about VNC.

I'd be closing this bug per Vladik recommendation - it seems like very slow storage, not a networking error.

Comment 20 zhe peng 2018-08-21 02:25:08 UTC

Dan, thanks, i already file one bug about vnc connection: https://bugzilla.redhat.com/show_bug.cgi?id=1619218

Note You need to log in before you can comment on or make changes to this bug.