| Summary: | Virtio-win: XP32 SP3 guests BSOD/crash in afd.sys | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | acrow | ||||||||
| Component: | virtio-win | Assignee: | Yvugenfi <yvugenfi> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 5.6 | CC: | an.skovorodkin, bcao, juzhang, qzhang, rhod | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Windows | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-02-01 11:24:02 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
acrow
2011-06-02 10:21:21 UTC
Can you upload or attach to BZ the minidump? Can you also take a snapshot of the registry with regedit32? Created attachment 503523 [details]
Mini dump file
Minidump file from crash
Created attachment 503526 [details]
Registry dump of crashing XP guest
Created attachment 503528 [details]
Registry dump of crashing XP guest (bzip2)
1. Is the crash always reproducible? 2. Did you tried to run same scenario without changes to registry parameters? (In reply to comment #6) > 1. Is the crash always reproducible? I haven't found a way to reproduce it interactively. It always happens when I'm not looking at the box unfortunately, however both VMs have since crashed several times more with the same error. > 2. Did you tried to run same scenario without changes to registry parameters? Yes, I ran for a while (about a week) with no changes. We did not have a crash but network performance was very poor (iperf was showing around 150Mbps max, with the tweaks I can get ~600-700Mbps). It was not deemed acceptable for production use until we made the registry changes. In one instance on the machine running the Delphi code (a batch job manager) one of the jobs actually failed because it could not write logs to a Samba server fast enough (so one of the developers told me anyway). Alex The change in registry is very radical - it is setting default TCP window to 1M. First of all this much more than usual default in more recent OSes and you can try to set it to lower value, for example 256K. Regarding the usage of TCP windows. It depends on the application - some may set it some may use default. For example you can instruct iperf to use specific TCP window also (-w <size> parameter). Crash dump analysis: KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: 8052b600, The address that the exception occurred at Arg3: b8fae9d0, Trap Frame Arg4: 00000000 Debugging Details: ------------------ EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid FAULTING_IP: nt!DbgBreakPoint+0 8052b600 cc int 3 TRAP_FRAME: b8fae9d0 -- (.trap 0xffffffffb8fae9d0) ErrCode = 00000000 eax=00000001 ebx=89ce6d00 ecx=8052b734 edx=00000031 esi=00000000 edi=00000000 eip=8052b601 esp=b8faea44 ebp=b8faea70 iopl=0 nv up ei ng nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286 nt!DbgBreakPoint+0x1: 8052b601 c3 ret Resetting default scope CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO BUGCHECK_STR: 0x8E PROCESS_NAME: Transact_IASJob LAST_CONTROL_TRANSFER: from b9af8293 to 8052b601 STACK_TEXT: b8faea40 b9af8293 00000000 b8faeac0 89d2cc80 nt!DbgBreakPoint+0x1 b8faea70 b9af4de9 89d4a3a0 b8faea90 0000000c afd!AfdIssueDeviceControl+0x134 b8faea9c b9af66f3 89d4a3a0 00000002 b9b0a6fb afd!AfdSetEventHandler+0x2e b8faec30 b9afd2d7 8a0d22e0 8a651030 b8faec64 afd!AfdBind+0x40c b8faec40 804ef19f 8a6d3280 89c91660 806e7410 afd!AfdDispatchDeviceControl+0x53 b8faec50 8057f98e 89c9173c 8a0d22e0 89c91660 nt!IopfCallDriver+0x31 b8faec64 8058081d 8a6d3280 89c91660 8a0d22e0 nt!IopSynchronousServiceTail+0x70 b8faed00 80579298 000004d8 00000104 00000000 nt!IopXxxControlFile+0x5c5 b8faed34 8054167c 000004d8 00000104 00000000 nt!NtDeviceIoControlFile+0x2a b8faed34 7c90e514 000004d8 00000104 00000000 nt!KiFastCallEntry+0xfc WARNING: Frame IP not in any known module. Following frames may be wrong. 0012f3bc 00000000 00000000 00000000 00000000 0x7c90e514 STACK_COMMAND: kb FOLLOWUP_IP: afd!AfdIssueDeviceControl+134 b9af8293 e90ecbffff jmp afd!AfdIssueDeviceControl+0x134 (b9af4da6) SYMBOL_STACK_INDEX: 1 SYMBOL_NAME: afd!AfdIssueDeviceControl+134 FOLLOWUP_NAME: MachineOwner MODULE_NAME: afd IMAGE_NAME: afd.sys DEBUG_FLR_IMAGE_TIMESTAMP: 48a40333 FAILURE_BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 Followup: MachineOwner --------- (In reply to comment #8) > The change in registry is very radical - it is setting default TCP window to > 1M. > First of all this much more than usual default in more recent OSes and you can > try to set it to lower value, for example 256K. > > Regarding the usage of TCP windows. It depends on the application - some may > set it some may use default. For example you can instruct iperf to use specific > TCP window also (-w <size> parameter). I did try iperf with different window sizes, with the default registry settings it was pretty hopeless. I only got an improvement by those changes (although not specifically 1M). The settings I used came from the linux-kvm.org website page about tuning the virtio drivers on XP, so I assumed they were reasonable for that OS. Is that not the case? I will try changing to 256k and see if we get an improvement. One of the VMs (the IBReplicator) has been removed from production due to this and an application crash that only happens under KVM (not on a physical box or an ESXi VM) - I think unrelated to virtio drivers (kernel32!InterlockedDecrement). Alex 1M setting was first used for performance benchmarking, but as we didn't see any issue with it and it was extensively tested with performance test it stayed as recommendation. On other hand MS recommend to increase TCP window with caution. By the way, is it possible to provide access to kernel memory dump (please archive it, not sure you will be able to attach it to BZ)? (In reply to comment #11) > 1M setting was first used for performance benchmarking, but as we didn't see > any issue with it and it was extensively tested with performance test it stayed > as recommendation. On other hand MS recommend to increase TCP window with > caution. > > By the way, is it possible to provide access to kernel memory dump (please > archive it, not sure you will be able to attach it to BZ)? I did not have full dumps enabled. I will do so and wait for a crash. Thanks Alex Any updates? (In reply to comment #13) > Any updates? I have some full dumps. They are quite large so I don't think the list will accept them. Is there somewhere preferred for me to drop the files? Thanks Alex Hi, I have these files but they obviously contain data that I should not be passing over public channels, and even over encrypted/hidden channels I would probably need some kind of NDA from RedHat. Even if I remove the obvious nasties from the dump files like database connection creds, it would be very difficult to get rid of all the corporate data embedded in there. Any ideas? Thanks Alex (In reply to comment #13) > Any updates? Can anyone from RH comment on how I should proceed given my last posting? (In reply to comment #16) > (In reply to comment #13) > > Any updates? > > Can anyone from RH comment on how I should proceed given my last posting? First ,this is not the right component for you to report the issue .:) If you can not upload the dmp in public ,pls send to assignee directly. Mike Not RHEL, so anyhow, we will not get to handle it in RHEL5.9 Hi Guys I have same problem with your driver. System: Microsoft Windows Server 2003 R2 x64 We have not changed registry keys. All settings by default. But our system crashes several times per day. We have developed small program for testing. Basically, we have used this example http://msdn.microsoft.com/en-us/library/windows/desktop/ms739168(v=vs.85).aspx. The call stack is: fffffadf`c48b12e8 fffff800`010413b4 nt!KeBugCheckEx fffffadf`c48b12f0 fffff800`01040e3b nt!KiBugCheckDispatch+0x74 fffffadf`c48b1470 fffff800`0105876d nt!KiSystemServiceHandler+0x7b fffffadf`c48b14b0 fffff800`010307f7 nt!RtlpExecuteHandlerForException+0xd fffffadf`c48b14e0 fffff800`010328f3 nt!RtlDispatchException+0x2bf fffffadf`c48b1ba0 fffff800`010414af nt!KiDispatchException+0xd9 fffffadf`c48b21a0 fffff800`0103f7f7 nt!KiExceptionExit fffffadf`c48b2320 fffff800`01039eb1 nt!KiBreakpointTrap+0xb7 fffffadf`c48b24b8 fffffadf`c5f080bd nt!DbgBreakPoint+0x1 fffffadf`c48b24c0 fffffadf`c5f04973 afd!AfdIssueDeviceControl+0x1a8 fffffadf`c48b2580 fffffadf`c5f063ca afd!AfdCreateConnection+0x2ab fffffadf`c48b26c0 fffffadf`c5eed274 afd!AfdAddFreeConnection+0x4a fffffadf`c48b2700 fffffadf`c5f03d05 afd!AfdStartListen+0x220 fffffadf`c48b2790 fffff800`012def6a afd!AfdFastIoDeviceControl+0x10de fffffadf`c48b2a70 fffff800`012df046 nt!IopXxxControlFile+0x5a3 fffffadf`c48b2b90 fffff800`0104113d nt!NtDeviceIoControlFile+0x56 fffffadf`c48b2c00 00000000`78b83e48 nt!KiSystemServiceCopyEnd+0x3 00000000`0012ed88 00000000`6b006a5a wow64cpu!DeviceIoctlFileFault+0x35 00000000`0012ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa 00000000`0012eea0 00000000`77ed8060 wow64!Wow64LdrpInitialize+0x2ed Before died, afd.sys said: kd> da fffffadf`c8e27b40 fffffadf`c8e27b40 "*AFD: IoCallDriver returned STAT" fffffadf`c8e27b60 "US_SUCCESS, but event in the IRP" fffffadf`c8e27b80 " (%p) is NOT signalled!!!." Did you know, how we can fix it? Maybe you have some suggestions... We have used latest driver from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ If you need, I have full memory dump and test program. Please, let me know if you have ideas Thanks, Anykey Skovorodkin (In reply to comment #20) > Hi Guys > > I have same problem with your driver. > > System: Microsoft Windows Server 2003 R2 x64 > > We have not changed registry keys. All settings by default. > But our system crashes several times per day. > > We have developed small program for testing. Basically, we have used this > example > http://msdn.microsoft.com/en-us/library/windows/desktop/ms739168(v=vs.85).aspx. > Hello, 1. Can you provide the way you use the sample to reproduce the crash? (exact steps for reproduction). 2. Can you please upload memory dump for review? 3. Please provide command line that you used to run the guest VM. Thanks, Vasya Kryachkin. |