Bug 1532472
| Summary: | [virtio-win][netkvm] it takes a long time to get ip after "set_link on" with "status=off" on windows 2008-32/64 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yu Wang <wyu> |
| Component: | virtio-win | Assignee: | Yvugenfi <yvugenfi> |
| virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | jherrman, lijin, phou, wyu, xiagao, yvugenfi |
| Version: | 7.5 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Prior to this update, Windows Server 2008 guests in some cases took very long to acquire a system IP after rebooting. The virtio driver has been fixed to properly follow its configuration for querying link states. As a result, acquiring IP now takes significantly less time on Windows Server 2008 guests.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-30 16:21:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Is it a regression? (In reply to Yan Vugenfirer from comment #2) > Is it a regression? Not a regression, RHEL7.4 release qemu and driver can also reproduce this issue Thanks Yu Wang Hi Yan, Additional: If I add a step "ipconfig /renew" after step 5 in comment#0, the guest will get ip immediately. 1 When we test this case, need we renew ip after "set_link on"? 2 And if not use "ipconfig /renew", how many seconds for ip recover are acceptable for us? 3 Win8 and win8+ guest recover time is shorter than win7 and win2008. Thanks Yu Wang (In reply to Yu Wang from comment #4) > Hi Yan, > > Additional: > If I add a step "ipconfig /renew" after step 5 in comment#0, the guest will > get ip immediately. > > 1 When we test this case, need we renew ip after "set_link on"? No need to include this step. "ipconfig /renew" forces Windows DHCP client to renew IP > 2 And if not use "ipconfig /renew", how many seconds for ip recover are > acceptable for us? I don't think we have a definition for now. > 3 Win8 and win8+ guest recover time is shorter than win7 and win2008. Might be related to the way we get interrupts on those OSes. Do we get legacy interrupts or MSI interrupts on older OSes? To see it - go to device manager, open Network device and go to "Resources" tab. > > > Thanks > Yu Wang (In reply to Yan Vugenfirer from comment #5) > (In reply to Yu Wang from comment #4) > > Hi Yan, > > 2 And if not use "ipconfig /renew", how many seconds for ip recover are > > acceptable for us? > > I don't think we have a definition for now. So, for now, if we can get the ip after "set_link on", the case can be counted as pass, right? > > > 3 Win8 and win8+ guest recover time is shorter than win7 and win2008. > > Might be related to the way we get interrupts on those OSes. Do we get > legacy interrupts or MSI interrupts on older OSes? > To see it - go to device manager, open Network device and go to "Resources" > tab. I checked the irq for win2008-32 and win10-32, they both have 4 MSIXs in device manager--> resources, and shows "MSIX message table available, count = 4" in DebugView. my command line for network is : -device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,status=off,id=hostnet0,vectors=4,netdev=net0,bus=pci.0,addr=0x9 -netdev tap,id=net0,vhost=on > > > > > > > Thanks > > Yu Wang (In reply to Yu Wang from comment #6) > (In reply to Yan Vugenfirer from comment #5) > > (In reply to Yu Wang from comment #4) > > > Hi Yan, > > > > 2 And if not use "ipconfig /renew", how many seconds for ip recover are > > > acceptable for us? > > > > I don't think we have a definition for now. > > So, for now, if we can get the ip after "set_link on", the case can be > counted as pass, right? Yes. > > > > > > 3 Win8 and win8+ guest recover time is shorter than win7 and win2008. > > > > Might be related to the way we get interrupts on those OSes. Do we get > > legacy interrupts or MSI interrupts on older OSes? > > To see it - go to device manager, open Network device and go to "Resources" > > tab. > > I checked the irq for win2008-32 and win10-32, they both have 4 MSIXs in > device manager--> resources, > and shows "MSIX message table available, count = 4" in DebugView. > > my command line for network is : > -device > virtio-net-pci,mac=9a:89:8a:8b:8c:8d,status=off,id=hostnet0,vectors=4, > netdev=net0,bus=pci.0,addr=0x9 -netdev tap,id=net0,vhost=on > So it looks like they should behave the same regarding interrupt handling. This is not the case of having legacy PCI interrupt. > > > > > > > > > > > Thanks > > > Yu Wang (In reply to Yan Vugenfirer from comment #8) > > So, for now, if we can get the ip after "set_link on", the case can be > > counted as pass, right? > > Yes. > > > I checked the irq for win2008-32 and win10-32, they both have 4 MSIXs in > > device manager--> resources, > > and shows "MSIX message table available, count = 4" in DebugView. > > > > my command line for network is : > > -device > > virtio-net-pci,mac=9a:89:8a:8b:8c:8d,status=off,id=hostnet0,vectors=4, > > netdev=net0,bus=pci.0,addr=0x9 -netdev tap,id=net0,vhost=on > > > > So it looks like they should behave the same regarding interrupt handling. > This is not the case of having legacy PCI interrupt. > Above all, win2008-32/64 should behave the same with other guests. But actually, win2008-32/64 may recover network slower than other guests (some times needs 20s, sometimes needs 3-5min), win7 perform better, but still slower than win10 guests, can we accept this ? If no, could you pls help to fix this bug? If yes, please feel free to close this bug. Thanks a lot Yu Wang Let's keep this bug for further investigation So looks we have some bug here. Even if status=off in the command line, we will try to read the status in some cases. And it will cause code misbehavior. On the other hand - is "status=off", we will always set the link as on in Windows. What it means that setting link on in QEMU while triggering an interrupt to the guest - will not cause the driver to re-read the status, and the driver will not notify OS, and OS will not run DHCP client. So this scenario is kind of broken, to begin with. To trigger DHCP client run "ipconfig /renew" Hi Yan, I still can reproduce this problem with build153 (test on win2008-64). It will take 2-3 min to recover ip without "ipconfig /renew" And what do you mean by comment#12, need I use "ipconfig /renew" ? In comment#5, you said not using it. Thanks Yu Wang (In reply to Yu Wang from comment #13) > Hi Yan, > > I still can reproduce this problem with build153 (test on win2008-64). It > will take 2-3 min to recover ip without "ipconfig /renew" > > And what do you mean by comment#12, need I use "ipconfig /renew" ? In > comment#5, you said not using it. > > Thanks > Yu Wang The changes are not in the build yet. The task in "Post" state - the code is committed upstream but there is still no downstream build with the change as far as I know. Oh, sorry, I thought following commit solve this issue.
commit 9fca6c0ee81f7ad29ed3ba20dbd7358b20f75245
Author: Bishara AbuHattoum <bishara>
Date: Wed May 16 12:41:39 2018 +0300
NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off
VIRTIO_NET_F_STATUS off misbehaviours fixed:
1. Reading the current link state,
fix: ReadLinkState is not to be called in this case.
2. Link state is not initialized,
fix: pContext->bConnected initialized to TRUE.
We will verify it when the changed are merged in downstream build.
(In reply to lijin from comment #15) > Oh, sorry, I thought following commit solve this issue. > Yes, this is the right commit. > commit 9fca6c0ee81f7ad29ed3ba20dbd7358b20f75245 > Author: Bishara AbuHattoum <bishara> > Date: Wed May 16 12:41:39 2018 +0300 > > NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off > > VIRTIO_NET_F_STATUS off misbehaviours fixed: > 1. Reading the current link state, > fix: ReadLinkState is not to be called in this case. > 2. Link state is not initialized, > fix: pContext->bConnected initialized to TRUE. > > We will verify it when the changed are merged in downstream build. I don't think it is in the downstream build yet: http://git.engineering.redhat.com/git/users/vrozenfe/internal-kvm-guest-drivers-windows/.git/refs/ (In reply to Yan Vugenfirer from comment #16) > (In reply to lijin from comment #15) > > Oh, sorry, I thought following commit solve this issue. > > > > Yes, this is the right commit. > > > commit 9fca6c0ee81f7ad29ed3ba20dbd7358b20f75245 > > Author: Bishara AbuHattoum <bishara> > > Date: Wed May 16 12:41:39 2018 +0300 > > > > NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off > > > > VIRTIO_NET_F_STATUS off misbehaviours fixed: > > 1. Reading the current link state, > > fix: ReadLinkState is not to be called in this case. > > 2. Link state is not initialized, > > fix: pContext->bConnected initialized to TRUE. > > > > We will verify it when the changed are merged in downstream build. > > I don't think it is in the downstream build yet: > http://git.engineering.redhat.com/git/users/vrozenfe/internal-kvm-guest- > drivers-windows/.git/refs/ I checked the stable branch: http://git.engineering.redhat.com/git/users/vrozenfe/internal-kvm-guest-drivers-windows/.git/commit/?h=stable +New Features: + +NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off ... Is it right to check the stable branch's commit? (In reply to lijin from comment #17) > (In reply to Yan Vugenfirer from comment #16) > > (In reply to lijin from comment #15) > > > Oh, sorry, I thought following commit solve this issue. > > > > > > > Yes, this is the right commit. > > > > > commit 9fca6c0ee81f7ad29ed3ba20dbd7358b20f75245 > > > Author: Bishara AbuHattoum <bishara> > > > Date: Wed May 16 12:41:39 2018 +0300 > > > > > > NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off > > > > > > VIRTIO_NET_F_STATUS off misbehaviours fixed: > > > 1. Reading the current link state, > > > fix: ReadLinkState is not to be called in this case. > > > 2. Link state is not initialized, > > > fix: pContext->bConnected initialized to TRUE. > > > > > > We will verify it when the changed are merged in downstream build. > > > > I don't think it is in the downstream build yet: > > http://git.engineering.redhat.com/git/users/vrozenfe/internal-kvm-guest- > > drivers-windows/.git/refs/ > > I checked the stable branch: > http://git.engineering.redhat.com/git/users/vrozenfe/internal-kvm-guest- > drivers-windows/.git/commit/?h=stable > +New Features: > + > +NetKVM: Fixing behaviour when VIRTIO_NET_F_STATUS is off > ... > > Is it right to check the stable branch's commit? Sorry, my mistake. I looked at the wrong branch. (In reply to Yu Wang from comment #13) > Hi Yan, > > I still can reproduce this problem with build153 (test on win2008-64). It > will take 2-3 min to recover ip without "ipconfig /renew" > > And what do you mean by comment#12, need I use "ipconfig /renew" ? In > comment#5, you said not using it. > > Thanks > Yu Wang Hi Yu, As I mentioned in comment #11 and comment #12: The bug is a little bit different from the description. If you set "status=off" in QEMU command line the driver should not read virtio_net_configuration when Link up\down interrupt arrives and always set link as UP. So actually the fix is not to touch the configuration space (this is something that we did and then had a bug in notification). And the behavior will be: 1. Run VM with link down and "status=off" 2. No IP on the guest 3. Set link up in QEMU - must run "ipconfig /renew" or Windows will renew IP with some timeout that we don't have control over after some time. When running without "status=off": Windows should re-acquire IP by itself, because the driver will notify link change. (In reply to Yan Vugenfirer from comment #19) > (In reply to Yu Wang from comment #13) Hi Yan > > Hi Yan, > > > > I still can reproduce this problem with build153 (test on win2008-64). It > > will take 2-3 min to recover ip without "ipconfig /renew" > > > > And what do you mean by comment#12, need I use "ipconfig /renew" ? In > > comment#5, you said not using it. > > > > Thanks > > Yu Wang > > Hi Yu, > > As I mentioned in comment #11 and comment #12: > > The bug is a little bit different from the description. > > If you set "status=off" in QEMU command line the driver should not read > virtio_net_configuration when Link up\down interrupt arrives and always set > link as UP. > > So actually the fix is not to touch the configuration space (this is > something that we did and then had a bug in notification). > And the behavior will be: > 1. Run VM with link down and "status=off" > 2. No IP on the guest > 3. Set link up in QEMU - must run "ipconfig /renew" or Windows will renew IP > with some timeout that we don't have control over after some time. > > > When running without "status=off": Windows should re-acquire IP by itself, > because the driver will notify link change. if not reboot the guests (step4), can get ip successfully in 5 seconds in comment#0 This issue only happened after reboot. Is that normal? Thanks Yu Wang (In reply to Yu Wang from comment #20) > (In reply to Yan Vugenfirer from comment #19) > > (In reply to Yu Wang from comment #13) > > Hi Yan > > > > Hi Yan, > > > > > > I still can reproduce this problem with build153 (test on win2008-64). It > > > will take 2-3 min to recover ip without "ipconfig /renew" > > > > > > And what do you mean by comment#12, need I use "ipconfig /renew" ? In > > > comment#5, you said not using it. > > > > > > Thanks > > > Yu Wang > > > > Hi Yu, > > > > As I mentioned in comment #11 and comment #12: > > > > The bug is a little bit different from the description. > > > > If you set "status=off" in QEMU command line the driver should not read > > virtio_net_configuration when Link up\down interrupt arrives and always set > > link as UP. > > > > So actually the fix is not to touch the configuration space (this is > > something that we did and then had a bug in notification). > > And the behavior will be: > > 1. Run VM with link down and "status=off" > > 2. No IP on the guest > > 3. Set link up in QEMU - must run "ipconfig /renew" or Windows will renew IP > > with some timeout that we don't have control over after some time. > > > > > > When running without "status=off": Windows should re-acquire IP by itself, > > because the driver will notify link change. > > if not reboot the guests (step4), can get ip successfully in 5 seconds in > comment#0 > > This issue only happened after reboot. Is that normal? > > Thanks > Yu Wang Yes. From the guest perspective link is up. Because if is "status=off", guest cannot query link status. According to comment#19 and comment#21, this bug has been fixed, so change status to verified Thanks all Yu Wang Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3413 |
Description of problem: it takes a long time to get ip after reboot and "set_link on" with "status=off" on windows 2008-32/64 Version-Release number of selected component (if applicable): virtio-win-prewhql-144 qemu-kvm-rhev-2.10.0-13.el7.x86_64 kernel-3.10.0-825.el7.x86_64 seabios-1.11.0-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot a win2008-32/64 guest of virtio-net-pci with "status=off" and vhost=on: /usr/libexec/qemu-kvm -name 145SCS2012R2O9T -enable-kvm -m 3G -smp 2 -uuid 10c15b2d-fb88-4712-973e-81ae4bde3157 -nodefconfig -nodefaults -M pc-i440fx-rhel7.4.0 -cpu Nehalem,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -chardev socket,id=charmonitor,path=/tmp/145SCS2012R2O9T,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=143BLN200832CXD,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:2 -vga std -qmp tcp:0:4445,server,nowait -monitor stdio -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:52:68:26:31:03,mq=on,vectors=10,status=off 2. ping out from guest 3. Under qemu monitor,set network link down. { "execute": "set_link", "arguments": { "name": "net0", "up": false } } 4. reboot the guests(system_reset or shutdown -t 0 -r) 5. Under qemu monitor, set network link up. { "execute": "set_link", "arguments": { "name": "net0", "up":true}} 6. check the ip and ping out from guest Actual results: in step6, it takes 4-5min to get ip in guests. Expected results: get ip in a short time, eg 5s-10s Additional info: 1 try on win7/win10/win2012R2 guests, can get ip successfully in 5 seconds 2 if replace with "status=on", can get ip successfully in 5 seconds 3 if not reboot the guests (step4), can get ip successfully in 5 seconds 3 RHEL7.4 release qemu and driver can reproduce this issue 4 with/without mq can reproduce this issue