Bug 874406
Summary: | windows server 2012 with e1000 stuck at 100% cpu during reboot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Cao <bcao> |
Component: | qemu-kvm | Assignee: | Andrew Jones <drjones> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.4 | CC: | acathrow, areis, bcao, bsarathy, ddumas, dyasny, jasowang, juzhang, knoel, michen, mkenneth, mst, perfbz, qzhang, rhod, virt-maint, xfu, yunzheng, yvugenfi |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Microsoft Windows Server 2012 guests using the e1000 driver can become unresponsive consuming 100% CPU during reboot/boot.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-01-23 16:41:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Mike Cao
2012-11-08 06:06:51 UTC
CLI: /usr/libexec/qemu-kvm -name ss4-1 -S -M rhel6.4.0 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -uuid eb28b9ee-894f-79c4-a64d-a883c435fd1f -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ss4-1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/svvp/windows_server_2012_1,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:48:34:80,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 Reproduced this issue even only boot 3 win2012-64 VMs (1G mem and 1 vcpu for each vm) on a 8G mem and 4 cpu host. 1) Guests will stuck when reboot multiple guests at the same time. 2) Re-test with win2k8-r2 guests with the same command line, does not reproduce after tried 5 times. CLI: # /usr/libexec/qemu-kvm -name ss4-1 -S -M rhel6.4.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid eb28b9ee-894f-79c4-a64d-a883c435fd1f -nodefconfig -nodefaults -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/win2012-64.raw,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:48:34:80,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -monitor stdio (In reply to comment #3) > Reproduced this issue even only boot 3 win2012-64 VMs (1G mem and 1 vcpu for > each vm) on a 8G mem and 4 cpu host. > 1) Guests will stuck when reboot multiple guests at the same time. > 2) Re-test with win2k8-r2 guests with the same command line, does not > reproduce after tried 5 times. > > CLI: > # /usr/libexec/qemu-kvm -name ss4-1 -S -M rhel6.4.0 -enable-kvm -m 1024 -smp > 1,sockets=1,cores=1,threads=1 -uuid eb28b9ee-894f-79c4-a64d-a883c435fd1f > -nodefconfig -nodefaults -rtc base=utc -no-shutdown -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > file=/home/win2012-64.raw,if=none,id=drive-ide0-0-0,format=raw,cache=none > -device > ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 > -netdev tap,id=hostnet0 -device > e1000,netdev=hostnet0,id=net0,mac=52:54:00:48:34:80,bus=pci.0,addr=0x3 > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 -vnc :0 -vga cirrus -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -monitor stdio Host package: kernel-2.6.32-339.el6.x86_64 qemu-kvm-0.12.1.2-2.334.el6.x86_64 Top info during guest stuck: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32420 root 20 0 1355m 1.0g 4536 S 100.1 13.4 3:22.07 qemu-kvm 32405 root 20 0 1355m 1.0g 4536 S 99.7 13.4 3:37.39 qemu-kvm 32436 root 20 0 1355m 1.0g 4536 S 97.7 13.4 2:41.26 qemu-kvm Can you please provide a perf trace? Created attachment 646184 [details]
ftrace log for this bug.
Created attachment 646186 [details]
The log is a little large, upload a tar compressed file instead
(In reply to comment #3) > Reproduced this issue even only boot 3 win2012-64 VMs (1G mem and 1 vcpu for > each vm) on a 8G mem and 4 cpu host. > 1) Guests will stuck when reboot multiple guests at the same time. So the boot hang only occurs when more than one win2012-64 vm is booted simultaneously. I.e. when booting them individually they run fine? Does the problem also reproduce with a single vm that has more than one vcpu? And does it still reproduce with multiple vms, even when each vm's vcpus are pinned to unique pcpus? > 2) Re-test with win2k8-r2 guests with the same command line, does not > reproduce after tried 5 times. So we've only seen this boot problem with one guest type (win2012-64) so far. Is this a regression? or has win2012-64 never been tested before? Can somebody give me access to a system with these windows guests ready for testing? Or point me to where I can install them from? If so, then I can try the above experiments and whatever else comes to mind myself. Please grab /proc/cpuinfo from the machine this reproduced on. (In reply to comment #13) > Please grab /proc/cpuinfo from the machine this reproduced on. Tried both AMD and intel host ,all cause this issue . eg: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz stepping : 7 cpu MHz : 1600.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6185.40 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 2293.756 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips : 4587.51 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate I am sure qzhang tested it in a different hosts from mine,but not sure configuration of her host My host cpu configuration, seems same as bcao's Intel host. processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz stepping : 7 cpu MHz : 1600.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6185.62 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: From Gleb: So QE should reproduce, kill all other VMs and run ftrace. Looking at Windows dump may provide us with a clue. (In reply to comment #16) > From Gleb: > So QE should reproduce, kill all other VMs > and run ftrace. Looking at Windows dump may provide us with a clue. Retest again, collecting the trace log after kill other vms, and there's only 1 vm running and it hangs. Steps: 1. Boot 5 vms (each with 1G mem and 1 vcpu) on a 8G mem and 4 cpu host. Command line is the same as comment 3 (of course need to replace the uuid, macaddr, vnc port for each vm). 2. (qemu) cont (do this nearly at the same time for the 5 vms.) 3. Watch the guests and found 3 of the 5 guests hangs and can not boot up successfully. 4. kill other 4 vms, and only leave 1 vm running ( a hang guest) 5. On host, #mount -t debugfs nodev /sys/kernel/debug #cd /sys/kernel/debug/tracing #echo function > current_tracer #echo 1 > tracing_enabled #cp trace /tmp/trace-qzhang.txt Created attachment 650223 [details]
ftrace log (kill other vm and only leave 1)
Reproduced with the original environment and saw in kvm_stat that tons of interrupts were being injected. Gleb suggested using ftrace to figure out which interrupt and then qemu-monitor to figure with device that interrupt is mapped to. ftrace showed thousands of kvm_set_irq for gsi 11 qemu-monitor showed Bus 0, device 3, function 0: Ethernet controller: PCI device 8086:100e IRQ 11. BAR0: 32 bit memory at 0xf2020000 [0xf203ffff]. BAR1: I/O at 0xc040 [0xc07f]. id "net0" So it's the nic, which I neglected to look closely enough at before, and now see that it's the e1000 model for the failing configs. Changing that for my vm configs on my machine allows me to easily reproduce it there too. Removing the <model type='e1000'/> again, fixes it again. So the win2012 e1000 driver or the e1000 emulation seems to be the culprit. Considering win2k8-r2 guests work on the same host with the same command line, then I guess it's most likely the 2012 driver. I was thinking the next step should be to get MS involved (and maybe it still is), but I guess there's a chance that the driver is behaving correctly, and just trying to use more of the e1000 spec than is currently emulated (and more than is used by win2k8). I'll look at the traces some more to try and see if that's the case. I still wouldn't mind looking at a the windows dmp either if I can get access to it. to verify that this is not same issue as https://bugzilla.redhat.com/show_bug.cgi?id=869095 (which we decided not to fix) I created a bew build fixing above issue: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=5132461 can you please try with the above build? (In reply to comment #24) > can you please try with the above build? I sort of already tried it. I built a latest qemu-kvm (which has the revert of the revert for the interrupt disable bit emulation) and tried that, but the issue still reproduced. Couple new data points: 1) Yesterday the bug stopped reproducing for me while I messing around with it (turning on e1000 debug output, etc.). After putting everything back to the way it was before, it still didn't reproduce. I tried many times. Finally, I just tried booting the vms on the realtek driver, then shutting them down, switching the config to the e1000 and booting again. It reproduced on the first try. 2) mst asked me to check for pending interrupts on a vm exhibiting the 100% cpu load. I just checked now and don't see any. I have all zeros for 'info pic'. Update: Re-test with the following package version, all can reproduce. 1) RHEL6.3 released kernel and qemu-kvm kernel-2.6.32-279.el6.x86_64 qemu-kvm-0.12.1.2-2.295.el6.x86_64 2) mst's build in comment 24. qemu-kvm-0.12.1.2-2.337.el6.mstfc5c7506b65b9c0beecc1cdc6c1c317185615c90.x86_64 3) Jason's build in comment 27: qemu-kvm-0.12.1.2-2.337.el6bz874406.x86_64 CLI: /usr/libexec/qemu-kvm -name ss4-1 -S -M rhel6.4.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid eb28b9ee-894f-79c4-a64d-a883c435fd1f -nodefconfig -nodefaults -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/win2012-64-new.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:48:34:80,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -monitor stdio (In reply to comment #19) > Reproduced with the original environment and saw in kvm_stat that tons of > interrupts were being injected. Gleb suggested using ftrace to figure out > which interrupt and then qemu-monitor to figure with device that interrupt > is mapped to. > > ftrace showed thousands of kvm_set_irq for gsi 11 > > qemu-monitor showed > > Bus 0, device 3, function 0: > Ethernet controller: PCI device 8086:100e > IRQ 11. > BAR0: 32 bit memory at 0xf2020000 [0xf203ffff]. > BAR1: I/O at 0xc040 [0xc07f]. > id "net0" > > So it's the nic, which I neglected to look closely enough at > before, and now see that it's the e1000 model for the failing > configs. Changing that for my vm configs on my machine allows me to > easily reproduce it there too. Removing the <model type='e1000'/> > again, fixes it again. So the win2012 e1000 driver or the e1000 > emulation seems to be the culprit. Considering win2k8-r2 guests work > on the same host with the same command line, then I guess it's most > likely the 2012 driver. Andrew, did you saw other devices sharing the same IRQ? On Windows virtio-serial and virtio-balloon will use legacy interrupts and not MSI. (In reply to comment #32) > Andrew, did you saw other devices sharing the same IRQ? > On Windows virtio-serial and virtio-balloon will use legacy interrupts and > not MSI. I just looked again and see that all my vms now use IRQ 10 for the nic. I double checked kvm_stat and trace-cmd and see that it's still the source of the issue, i.e. thousands of injected interrupts to gsi 10 instead of 11 now. So same problem, just different IRQ. And no other devices are sharing IRQ 10. While poking in the trace I looked closer at the mmio activity, which kvm_stat also reports as super high. I see almost exactly the same number (many thousands) of the message below as I see interrupt injections to the nic. mmio unsatisfied-read len 4 gpa 0xf20200c0 val 0x0 I turned more debug output on in the e1000 model DBGBIT(INTERRUPT) | DBGBIT(MMIO) | DBGBIT(UNKNOWN) | DBGBIT(GENERAL) but I don't see anything interesting leading up to the interrupt storm. (In reply to comment #33) > mmio unsatisfied-read len 4 gpa 0xf20200c0 val 0x0 qemu logs say the base addr is 0xf2020000, and the e1000 manual says c0 is the interrupt cause read register (ICR). I tried looking at the dump, but it was too slow to do anything over the vpn with windbg running in a windows vm. I did see it was in E1G6032E!E1000ISR+101 at the time, which isn't a big surprise. Note that comment 25 says issue reproduced on upstream so cherry-picking upstream commits should not help. Reproduces on a guest using the e1000 model, even when all other vms are using the realtek model. I still haven't seen it without [re]booting multiple vms simultaneously though. So, maybe a scheduling sensitive issue during the e1000 init? Next thing to try is to just create a few network-busy vms, and then see if the e1000 vm still hits the issue on boot. i would see if anything appears by enabling debug in code. looks like debugflags = 0xffff should do it. Also maybe add printf in e1000_mmio_writel and readl and see what does guest do. Created attachment 659288 [details]
e1000 logs, one normal set and one set with the problem
(In reply to comment #48) > i would see if anything appears by enabling debug in code. > looks like debugflags = 0xffff should do it. > Yeah, I probably should have just started with 0xffff, rather than trying be selective with the debug (comment 33). I've now attached logs, one for a normal boot and one for the same vm when it hits the problem. Looking through it appears that most of the differences are simply ordering of events, rather than different events. What stands out though is that in the failing case it appears to initialize twice (the eeprom gets read twice). I've been able to reproduce this without other VMs involved. I just booted a single win2012 vm that had a single vcpu and then went into device manager and did a series of e1000 driver uninstalls followed by scans for hardware changes - which auto-installs it back again. Once I was able to reproduce the issue in only two rounds. I captured all logs in all cases, but I still don't see anything too suspicious in them. looking at it - I kind of expected to see guest doing some thing in a loop again and again but this does not apppear to be the case? (In reply to comment #52) > looking at it - I kind of expected to see guest doing > some thing in a loop again and again but this does not > apppear to be the case? Nope, desn't appear to. The only "loop" is the repeated sequence of interrupt related logs. I decided to go back to analysing ftrace logs instead, but unfortunately haven't found anything suspicious in those either. I was able to reproduce the 100% vcpu issue while uninstalling the driver with device manager now as well. So I guess the issue isn't on the init path. > The only "loop" is the repeated sequence of interrupt related logs.
but it does not repeat forever right?
(In reply to comment #54) > > The only "loop" is the repeated sequence of interrupt related logs. > but it does not repeat forever right? It does. The log just fills with it. Ah, one other note. disabling/enabling the nic over and over didn't reproduce the issue (at least not in the 30 or so times I tried). So if the bug appears on the uninstall path, then maybe it's occurring after the nic is already disabled? I'll try to test some more to confirm. Got it to reproduce with a powershell script doing disable/enable while (1) { netsh interface set interface "Ethernet 2" disabled netsh interface set interface "Ethernet 2" enabled sleep 1 } It reproduced fairly quickly once, but then another time it ran until I stopped waiting and killed it. A more reliable way to reproduce is by removing the device and then rescanning for it with devcon. I wrote a script that does that, and it also synchronizes the starting/stopping of tracing - in order to try to reduce the traces we have to look at. I'll attach the traces and scripts now. The traces are much smaller, but I still don't see any glaring log. I haven't looked much at them yet though. Created attachment 662579 [details]
reproducer script with trace sync
Created attachment 662580 [details]
traces (ftrace and qemu log) made with trace sync
Created attachment 662581 [details]
setupapi.dev.log for the same failure as traces.tgz
Some clues The qemu logs show that the driver attempts to generate a TXQE interrupt after it has set the interrupt mask to 0. This should ok, and the TXQE bit just stays set in the ICR until an allowed interrupt type occurs, then the ICR is read, clearing the whole thing. So it's not clear yet how this might cause a problem, but the one line hack below appears to stop the interrupt storm. diff --git a/hw/e1000.c b/hw/e1000.c index ec3a7c4..70b8b57 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -237,6 +237,7 @@ set_ics(E1000State *s, int index, uint32_t val) { DBGOUT(INTERRUPT, "set_ics %x, ICR %x, IMR %x\n", val, s->mac_reg[ICR], s->mac_reg[IMS]); + val &= s->mac_reg[IMS]; set_interrupt_cause(s, 0, val | s->mac_reg[ICR]); } The other clue we still have is that the interrupt storm seems to only reproduce on device enable/disable or uninstall/rescan. This seems consistent with the known issue where interrupts can arrive before the handler is registered, keeping the cpu too busy to ever finish registering the handler. That issue should be addressed for the device install case by Jason Wang with "b9d03e35: e1000: link auto-negotiation emulation", however it may still exist for the other three cases. Actually in the end the TXQE type interrupt was a red herring, but one that seems to have led me in the right direction. I've changed the hack to be more specific and now have narrowed it down to a set_ics that comes from the driver (comes from macreg_writeops, not a model function) and to only change val if IMS = 0 (sets val = 0). This still "fixes" things. Watching trace I see that all set_ics calls from the driver are for interrupt type 0x80, which is RXT0 (rx timer intr). During controller init we usually only get these ICS writes with IMS non-zero, and so the hack allows them, and we still don't break. However, during normal operation (e.g. running iexplorer) we get tons more of the ICS writes, and IMS is never non-zero, so the hack "disallows" them all - probably unnecessarily, as the problem seems to only manifest itself during enable/disable. Thus, it's still a hack. So it seems the driver has an RX timer that wants to periodically generate interrupts, but it has all controller interrupt types masked at the time, likely on purpose, wanting it to come later for the normal interrupt handler to manage. However, on enable/disable that interrupt may come before/after the handler is ready. I wonder what I'd see looking for the same thing on a driver for e.g. win2k8? The bug easily reproduces with win2k8, using whatever version of the e1000 driver comes with it. I simply booted a win2k8r2x64 server and did a few enable/disables with netsh and then bam, interrupt storm. The trace of the RXT0 set_ics calls match win2012. Next I watched trace for a Linux guest (3.3.4-5.fc17.x86_64). There were no RXT0 set_ics calls. In place of those is a constant flow of RXDMT0 (rx desc min. threshold) set_ics calls [1]. All of them are called with a non-zero IMS, which means the hack would always allow them, i.e. it does nothing, so there's no need for the hack. I think the next thing to do is to get Intel and MS involved in order to try and see if my hypothesis of a race between the receiver timer and the [un]registering of the E1000 interrupt handler holds water. If so, then they should really fix that. I don't know of any clean way to hack the model for this. The hack I have, which I'll attach for completeness now, could possibly break other drivers - although it doesn't look like it would break Linux. [1] Footnote: Maybe the Linux driver should bump up the RCTL.RDMTS value it uses to try and avoid the frequency that RXDMT0 gets generated? Created attachment 666693 [details]
hack to deal with RXT0 set_ics calls that have IMS=0
To comment 65: I agree let's get intel involved. Maybe a cleaner work-around is to ignore (mask) RXT0 interrupts even if the bit in IMS is set as long as as receive descriptor ring has been set up. Let's try? Created attachment 668921 [details]
this is a simple hack: after reset, disable rxt0 interrupts until RDT is set
Basically real hardware doesn't send receive interrupts
until RDH/RDT are programmed, so maybe this also
applies to receive timer interrupt even if it
was set by the driver?
Strange, but let's see:
this hack is untested and needs some polish anyway,
but I wonder whether it changes anything.
Could you try?
I think we should talk to Intel anyway.
(In reply to comment #58) > Got it to reproduce with a powershell script doing disable/enable > > while (1) { > netsh interface set interface "Ethernet 2" disabled > netsh interface set interface "Ethernet 2" enabled > sleep 1 > } > > It reproduced fairly quickly once, but then another time it ran until I > stopped waiting and killed it. A more reliable way to reproduce is by > removing the device and then rescanning for it with devcon. I wrote a script > that does that, and it also synchronizes the starting/stopping of tracing - > in order to try to reduce the traces we have to look at. I'll attach the > traces and scripts now. The traces are much smaller, but I still don't see > any glaring log. I haven't looked much at them yet though. I run additional script twice during enable\disable script just to keep CPU busy. 100% reproduction rate in less than a minute using one VM on the host. while (1) { print "something" } (In reply to comment #62) > diff --git a/hw/e1000.c b/hw/e1000.c > index ec3a7c4..70b8b57 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -237,6 +237,7 @@ set_ics(E1000State *s, int index, uint32_t val) > { > DBGOUT(INTERRUPT, "set_ics %x, ICR %x, IMR %x\n", val, s->mac_reg[ICR], > s->mac_reg[IMS]); > + val &= s->mac_reg[IMS]; > set_interrupt_cause(s, 0, val | s->mac_reg[ICR]); > } > I end up with blue screen (exception in e1000 driver) using this patch after 5-10 minutes of disable\enable cycles. yan what about one from comment 70? (In reply to comment #73) > yan what about one from comment 70? Tried that and variation of looking on "(s->mac_reg[RCTL] & E1000_RCTL_EN" as well and it didn't helped. (In reply to comment #73) > yan what about one from comment 70? Tried that and variation of looking on "(s->mac_reg[RCTL] & E1000_RCTL_EN" as well and it didn't helped. Created attachment 675644 [details]
Possible fix
I started writing a note to Intel. While summarizing the situation it occurred to me that it doesn't make much sense to save writes to ICS (=> ICR) that are masked by IMS. It's not spelled out to do so or not to do so in the manual though. This attached patch disappears those writes and also appears to fix the issue (enable/disable loop running now). Assuming real hardware works this way as well, which I'll try to test, then this patch would be a real fix rather than another hack.
(In reply to comment #78) > Created attachment 675644 [details] > Possible fix > > I started writing a note to Intel. While summarizing the situation it > occurred to me that it doesn't make much sense to save writes to ICS (=> > ICR) that are masked by IMS. It's not spelled out to do so or not to do so > in the manual though. This attached patch disappears those writes and also > appears to fix the issue (enable/disable loop running now). Assuming real > hardware works this way as well, which I'll try to test, then this patch > would be a real fix rather than another hack. TXQE that was set from QEMU just before interrupt storm although there was no mask for it looked very suspicious. In any case - on my setup the guest with this patch is alive considerably longer time that was needed for the reproduction of the hang. Also there is no side effect of BSOD caused by e1000 driver after a while as with patch from comment #62. I left my enable/disable loop running all night and Windows eventually crashed. bugcheck 7e - system thread exception not handled stack: E1G6032E!AllowSmbusAccess+0x10 E1G6032E!PeriodicWorkItem+0x80 nt!IopProcessWorkItem+0x5f nt!ExpWorkerThread+0x142 nt!PspSystemThreadStartup+0x59 nt!KiStartSystemThread+0x16 FAULTING_IP: E1G6032E!AllowSmbusAccess+10 mov ecx,dword ptr [rax+5820h] rax is null, which comes from offset 0 of rcx, and appears to be some structure handed to E1G6032E!PeriodicWorkItem. The byte at offset d70 of this structure is used to determine if the pointer at offset 0 should be dereferenced (see disassembly below). If it's non-zero, as in this case (rcx+d70 == 1), it attempts to use rax and crashes. E1G6032E!AllowSmbusAccess: fffff880`04420d04 32d2 xor dl,dl fffff880`04420d06 3891700d0000 cmp byte ptr [rcx+0D70h],dl fffff880`04420d0c 7503 jne E1G6032E!AllowSmbusAccess+0xd (fffff880`04420d11) E1G6032E!AllowSmbusAccess+0xa: fffff880`04420d0e 32c0 xor al,al fffff880`04420d10 c3 ret E1G6032E!AllowSmbusAccess+0xd: fffff880`04420d11 488b01 mov rax,qword ptr [rcx] fffff880`04420d14 8b8820580000 mov ecx,dword ptr [rax+5820h] There's some circumstantial evidence connecting interrupt (not) enabling to this as well. The disassembly of E1G6032E!PeriodicWorkItem shows that just above the call to E1G6032E!AllowSmbusAccess there's a potential call to E1G6032E!E1000EnableInterrupt. However there are several conditions that must be met, otherwise that call is skipped. In this case it was skipped. I tried to further reverse engineer this driver, but without a good disassembler, it's driving me crazy. In any case my fix in comment 78 doesn't appear to be fix. (In reply to comment #78) > Created attachment 675644 [details] > Possible fix > > I started writing a note to Intel. While summarizing the situation it > occurred to me that it doesn't make much sense to save writes to ICS (=> > ICR) that are masked by IMS. It's not spelled out to do so or not to do so > in the manual though. This attached patch disappears those writes and also > appears to fix the issue (enable/disable loop running now). Assuming real > hardware works this way as well, which I'll try to test, then this patch > would be a real fix rather than another hack. Hi, Andrew Could you make stratch build for QE to test ? I don't undertand what's the relationship between guest stuck at 100% during reboot and disable/enable in a loop issue ,Do you mean they are the exactly same issue ? Thanks, Mike (In reply to comment #85) > Could you make stratch build for QE to test ? See comment 83; I'm not happy with the fix from comment 78 any more. > > I don't undertand what's the relationship between guest stuck at 100% during > reboot and disable/enable in a loop issue ,Do you mean they are the exactly > same issue ? > Yes, it's the same issue. I was able to reduce the reproducer to something much simpler and faster. (In reply to comment #86) > (In reply to comment #85) > > Could you make stratch build for QE to test ? > > See comment 83; I'm not happy with the fix from comment 78 any more. > > > > > I don't undertand what's the relationship between guest stuck at 100% during > > reboot and disable/enable in a loop issue ,Do you mean they are the exactly > > same issue ? > > > > Yes, it's the same issue. I was able to reduce the reproducer to something > much simpler and faster. Could you provide me strach build first ? I still want to check whether guest will hang during reboot or not Thanks, Mike FYI, I can reproduce this issue by: 1) Flood ping from local host to guest 2) Disable the driver in guest Looks like the bug could be just reproduced then. Maybe we can ask help from qemu upstream? (In reply to comment #87) > Could you provide me strach build first ? I still want to check whether > guest will hang during reboot or not It wouldn't be worth it. The reboot test would likely be fine, and maybe we'd even pass cert, but we know there's still an issue that we haven't completely understood yet. So the patch is still in progress. mst pointed out a pattern he saw in the qemu traces, indicating the guest is just trying to preserve the RXT0 bits it sees in ICR by setting them in ICS. I attempted to confirm/deny that yesterday, and think I denied it. I see ICS settings initiated by the guest (i.e. initiated through mmio) for RXT0 at different times whether ICR has the bit set or not. I'm still improving the tracing output to try and better understand what happens though. I like Jason's suggestion about getting a thread started upstream. We can try to get some Intel people on CC as well. I'll send a mail soon. *** Bug 956702 has been marked as a duplicate of this bug. *** *** Bug 990053 has been marked as a duplicate of this bug. *** This bug may not be e1000 specific. See bug 999296. And then again, it may... we don't know the root cause yet of either bug. |