Hide Forgot
Description of problem: XP32 SP3 guests BSOD/crash in afd.sys under network load with virtio-net drivers. Version-Release number of selected component (if applicable): Virtio 1.1.16 iso available from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ How reproducible: Two different guests on the same host experienced the issue Steps to Reproduce: 1. Install XP32 on guest 2. Install latest virtio-net 3. Tune registry parameters as recommended at http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry: HKLM\SYSTEM\CurrentControlSet\Services\AFD\Parameters: DefaultReceiveWindow 0x00100000 DefaultSendWindow 0x0010000 FastSendDataGramThreshold 0x00004000 HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters: Tcp1323Opts 0x00000001 TcpWindowSize 0x00100000 3. Run a production load (occurred with Firebird database server and company internal software developed in Delphi). Actual results: XP BSODs, WinDBG output follows: Internal software (Transact_IASJob): Microsoft (R) Windows Debugger Version 6.11.0001.404 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\WINDOWS\Minidump\Mini052611-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows XP Kernel Version 2600 (Service Pack 3) MP (8 procs) Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Built by: 2600.xpsp_sp3_gdr.101209-1647 Machine Name: Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055d720 Debug session time: Thu May 26 02:13:19.673 2011 (GMT+1) System Uptime: 2 days 11:59:18.101 Loading Kernel Symbols ............................................................... ............................... Loading User Symbols Loading unloaded module list .......... ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 1000008E, {80000003, 8052b600, b8fae9d0, 0} Probably caused by : afd.sys ( afd!AfdIssueDeviceControl+134 ) Followup: MachineOwner --------- 1: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: 8052b600, The address that the exception occurred at Arg3: b8fae9d0, Trap Frame Arg4: 00000000 Debugging Details: ------------------ EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid FAULTING_IP: nt!DbgBreakPoint+0 8052b600 cc int 3 TRAP_FRAME: b8fae9d0 -- (.trap 0xffffffffb8fae9d0) ErrCode = 00000000 eax=00000001 ebx=89ce6d00 ecx=8052b734 edx=00000031 esi=00000000 edi=00000000 eip=8052b601 esp=b8faea44 ebp=b8faea70 iopl=0 nv up ei ng nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286 nt!DbgBreakPoint+0x1: 8052b601 c3 ret Resetting default scope CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO BUGCHECK_STR: 0x8E PROCESS_NAME: Transact_IASJob LAST_CONTROL_TRANSFER: from b9af8293 to 8052b601 STACK_TEXT: b8faea40 b9af8293 00000000 b8faeac0 89d2cc80 nt!DbgBreakPoint+0x1 b8faea70 b9af4de9 89d4a3a0 b8faea90 0000000c afd!AfdIssueDeviceControl+0x134 b8faea9c b9af66f3 89d4a3a0 00000002 b9b0a6fb afd!AfdSetEventHandler+0x2e b8faec30 b9afd2d7 8a0d22e0 8a651030 b8faec64 afd!AfdBind+0x40c b8faec40 804ef19f 8a6d3280 89c91660 806e7410 afd!AfdDispatchDeviceControl+0x53 b8faec50 8057f98e 89c9173c 8a0d22e0 89c91660 nt!IopfCallDriver+0x31 b8faec64 8058081d 8a6d3280 89c91660 8a0d22e0 nt!IopSynchronousServiceTail+0x70 b8faed00 80579298 000004d8 00000104 00000000 nt!IopXxxControlFile+0x5c5 b8faed34 8054167c 000004d8 00000104 00000000 nt!NtDeviceIoControlFile+0x2a b8faed34 7c90e514 000004d8 00000104 00000000 nt!KiFastCallEntry+0xfc WARNING: Frame IP not in any known module. Following frames may be wrong. 0012f3bc 00000000 00000000 00000000 00000000 0x7c90e514 STACK_COMMAND: kb FOLLOWUP_IP: afd!AfdIssueDeviceControl+134 b9af8293 e90ecbffff jmp afd!AfdIssueDeviceControl+0x134 (b9af4da6) SYMBOL_STACK_INDEX: 1 SYMBOL_NAME: afd!AfdIssueDeviceControl+134 FOLLOWUP_NAME: MachineOwner MODULE_NAME: afd IMAGE_NAME: afd.sys DEBUG_FLR_IMAGE_TIMESTAMP: 48a40333 FAILURE_BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 Followup: MachineOwner --------- Firebird database server (Classic): Microsoft (R) Windows Debugger Version 6.11.0001.404 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [X:\Transfer\ajc\Mini053111-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available Symbol search path is: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows XP Kernel Version 2600 (Service Pack 3) MP (2 procs) Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Built by: 2600.xpsp_sp3_gdr.101209-1647 Machine Name: Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055d720 Debug session time: Tue May 31 00:40:03.678 2011 (GMT+1) System Uptime: 5 days 6:26:01.024 Loading Kernel Symbols ............................................................... ................................ Loading User Symbols Loading unloaded module list ........ ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 1000008E, {80000003, 8052b600, f0602b48, 0} Probably caused by : afd.sys ( afd!AfdIssueDeviceControl+134 ) Followup: MachineOwner --------- 1: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: 8052b600, The address that the exception occurred at Arg3: f0602b48, Trap Frame Arg4: 00000000 Debugging Details: ------------------ EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid FAULTING_IP: nt!DbgBreakPoint+0 8052b600 cc int 3 TRAP_FRAME: f0602b48 -- (.trap 0xfffffffff0602b48) ErrCode = 00000000 eax=00000001 ebx=85f2ca78 ecx=8052b734 edx=00000031 esi=00000000 edi=00000000 eip=8052b601 esp=f0602bbc ebp=f0602be8 iopl=0 nv up ei ng nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286 nt!DbgBreakPoint+0x1: 8052b601 c3 ret Resetting default scope CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO BUGCHECK_STR: 0x8E PROCESS_NAME: fb_inet_server. LAST_CONTROL_TRANSFER: from f12ad293 to 8052b601 STACK_TEXT: f0602bb8 f12ad293 862f97e8 00000000 862ebc40 nt!DbgBreakPoint+0x1 f0602be8 f12a9de9 86175500 f0602c08 0000000c afd!AfdIssueDeviceControl+0x134 f0602c14 f12b83c7 86175500 00000000 00000000 afd!AfdSetEventHandler+0x2e f0602c50 f12b2fe4 85ffe5e0 862c6b10 f0602ca0 afd!AfdCleanup+0x606 f0602c60 804ef19f 862e57c0 862f97d8 862f97d8 afd!AfdDispatch+0xbb f0602c70 80583979 85ffe5c8 00000038 865e7e70 nt!IopfCallDriver+0x31 f0602ca0 805bca4e 86383460 862e57c0 001f01ff nt!IopCloseFile+0x26b f0602cd4 805bc377 86383460 00000001 865e7e70 nt!ObpDecrementHandleCount+0xd8 f0602cfc 805bc415 e1212148 85ffe5e0 00000690 nt!ObpCloseHandleTableEntry+0x14d f0602d44 805bc54d 00000690 00000001 00000000 nt!ObpCloseHandle+0x87 f0602d58 8054167c 00000690 0022f110 7c90e526 nt!NtClose+0x1d f0602d58 7c90e526 00000690 0022f110 7c90e526 nt!KiFastCallEntry+0xfc WARNING: Frame IP not in any known module. Following frames may be wrong. 0022f110 00000000 00000000 00000000 00000000 0x7c90e526 STACK_COMMAND: kb FOLLOWUP_IP: afd!AfdIssueDeviceControl+134 f12ad293 e90ecbffff jmp afd!AfdIssueDeviceControl+0x134 (f12a9da6) SYMBOL_STACK_INDEX: 1 SYMBOL_NAME: afd!AfdIssueDeviceControl+134 FOLLOWUP_NAME: MachineOwner MODULE_NAME: afd IMAGE_NAME: afd.sys DEBUG_FLR_IMAGE_TIMESTAMP: 48f752f4 FAILURE_BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 Followup: MachineOwner --------- Expected results: Guest does not crash. Additional info: Host OS is not RHEL, however this was the only place I could find the virtio-win component which I believe is developed by RH.
Can you upload or attach to BZ the minidump?
Can you also take a snapshot of the registry with regedit32?
Created attachment 503523 [details] Mini dump file Minidump file from crash
Created attachment 503526 [details] Registry dump of crashing XP guest
Created attachment 503528 [details] Registry dump of crashing XP guest (bzip2)
1. Is the crash always reproducible? 2. Did you tried to run same scenario without changes to registry parameters?
(In reply to comment #6) > 1. Is the crash always reproducible? I haven't found a way to reproduce it interactively. It always happens when I'm not looking at the box unfortunately, however both VMs have since crashed several times more with the same error. > 2. Did you tried to run same scenario without changes to registry parameters? Yes, I ran for a while (about a week) with no changes. We did not have a crash but network performance was very poor (iperf was showing around 150Mbps max, with the tweaks I can get ~600-700Mbps). It was not deemed acceptable for production use until we made the registry changes. In one instance on the machine running the Delphi code (a batch job manager) one of the jobs actually failed because it could not write logs to a Samba server fast enough (so one of the developers told me anyway). Alex
The change in registry is very radical - it is setting default TCP window to 1M. First of all this much more than usual default in more recent OSes and you can try to set it to lower value, for example 256K. Regarding the usage of TCP windows. It depends on the application - some may set it some may use default. For example you can instruct iperf to use specific TCP window also (-w <size> parameter).
Crash dump analysis: KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: 8052b600, The address that the exception occurred at Arg3: b8fae9d0, Trap Frame Arg4: 00000000 Debugging Details: ------------------ EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid FAULTING_IP: nt!DbgBreakPoint+0 8052b600 cc int 3 TRAP_FRAME: b8fae9d0 -- (.trap 0xffffffffb8fae9d0) ErrCode = 00000000 eax=00000001 ebx=89ce6d00 ecx=8052b734 edx=00000031 esi=00000000 edi=00000000 eip=8052b601 esp=b8faea44 ebp=b8faea70 iopl=0 nv up ei ng nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286 nt!DbgBreakPoint+0x1: 8052b601 c3 ret Resetting default scope CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO BUGCHECK_STR: 0x8E PROCESS_NAME: Transact_IASJob LAST_CONTROL_TRANSFER: from b9af8293 to 8052b601 STACK_TEXT: b8faea40 b9af8293 00000000 b8faeac0 89d2cc80 nt!DbgBreakPoint+0x1 b8faea70 b9af4de9 89d4a3a0 b8faea90 0000000c afd!AfdIssueDeviceControl+0x134 b8faea9c b9af66f3 89d4a3a0 00000002 b9b0a6fb afd!AfdSetEventHandler+0x2e b8faec30 b9afd2d7 8a0d22e0 8a651030 b8faec64 afd!AfdBind+0x40c b8faec40 804ef19f 8a6d3280 89c91660 806e7410 afd!AfdDispatchDeviceControl+0x53 b8faec50 8057f98e 89c9173c 8a0d22e0 89c91660 nt!IopfCallDriver+0x31 b8faec64 8058081d 8a6d3280 89c91660 8a0d22e0 nt!IopSynchronousServiceTail+0x70 b8faed00 80579298 000004d8 00000104 00000000 nt!IopXxxControlFile+0x5c5 b8faed34 8054167c 000004d8 00000104 00000000 nt!NtDeviceIoControlFile+0x2a b8faed34 7c90e514 000004d8 00000104 00000000 nt!KiFastCallEntry+0xfc WARNING: Frame IP not in any known module. Following frames may be wrong. 0012f3bc 00000000 00000000 00000000 00000000 0x7c90e514 STACK_COMMAND: kb FOLLOWUP_IP: afd!AfdIssueDeviceControl+134 b9af8293 e90ecbffff jmp afd!AfdIssueDeviceControl+0x134 (b9af4da6) SYMBOL_STACK_INDEX: 1 SYMBOL_NAME: afd!AfdIssueDeviceControl+134 FOLLOWUP_NAME: MachineOwner MODULE_NAME: afd IMAGE_NAME: afd.sys DEBUG_FLR_IMAGE_TIMESTAMP: 48a40333 FAILURE_BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 BUCKET_ID: 0x8E_afd!AfdIssueDeviceControl+134 Followup: MachineOwner ---------
(In reply to comment #8) > The change in registry is very radical - it is setting default TCP window to > 1M. > First of all this much more than usual default in more recent OSes and you can > try to set it to lower value, for example 256K. > > Regarding the usage of TCP windows. It depends on the application - some may > set it some may use default. For example you can instruct iperf to use specific > TCP window also (-w <size> parameter). I did try iperf with different window sizes, with the default registry settings it was pretty hopeless. I only got an improvement by those changes (although not specifically 1M). The settings I used came from the linux-kvm.org website page about tuning the virtio drivers on XP, so I assumed they were reasonable for that OS. Is that not the case? I will try changing to 256k and see if we get an improvement. One of the VMs (the IBReplicator) has been removed from production due to this and an application crash that only happens under KVM (not on a physical box or an ESXi VM) - I think unrelated to virtio drivers (kernel32!InterlockedDecrement). Alex
1M setting was first used for performance benchmarking, but as we didn't see any issue with it and it was extensively tested with performance test it stayed as recommendation. On other hand MS recommend to increase TCP window with caution. By the way, is it possible to provide access to kernel memory dump (please archive it, not sure you will be able to attach it to BZ)?
(In reply to comment #11) > 1M setting was first used for performance benchmarking, but as we didn't see > any issue with it and it was extensively tested with performance test it stayed > as recommendation. On other hand MS recommend to increase TCP window with > caution. > > By the way, is it possible to provide access to kernel memory dump (please > archive it, not sure you will be able to attach it to BZ)? I did not have full dumps enabled. I will do so and wait for a crash. Thanks Alex
Any updates?
(In reply to comment #13) > Any updates? I have some full dumps. They are quite large so I don't think the list will accept them. Is there somewhere preferred for me to drop the files? Thanks Alex
Hi, I have these files but they obviously contain data that I should not be passing over public channels, and even over encrypted/hidden channels I would probably need some kind of NDA from RedHat. Even if I remove the obvious nasties from the dump files like database connection creds, it would be very difficult to get rid of all the corporate data embedded in there. Any ideas? Thanks Alex
(In reply to comment #13) > Any updates? Can anyone from RH comment on how I should proceed given my last posting?
(In reply to comment #16) > (In reply to comment #13) > > Any updates? > > Can anyone from RH comment on how I should proceed given my last posting? First ,this is not the right component for you to report the issue .:) If you can not upload the dmp in public ,pls send to assignee directly. Mike
Not RHEL, so anyhow, we will not get to handle it in RHEL5.9
Hi Guys I have same problem with your driver. System: Microsoft Windows Server 2003 R2 x64 We have not changed registry keys. All settings by default. But our system crashes several times per day. We have developed small program for testing. Basically, we have used this example http://msdn.microsoft.com/en-us/library/windows/desktop/ms739168(v=vs.85).aspx. The call stack is: fffffadf`c48b12e8 fffff800`010413b4 nt!KeBugCheckEx fffffadf`c48b12f0 fffff800`01040e3b nt!KiBugCheckDispatch+0x74 fffffadf`c48b1470 fffff800`0105876d nt!KiSystemServiceHandler+0x7b fffffadf`c48b14b0 fffff800`010307f7 nt!RtlpExecuteHandlerForException+0xd fffffadf`c48b14e0 fffff800`010328f3 nt!RtlDispatchException+0x2bf fffffadf`c48b1ba0 fffff800`010414af nt!KiDispatchException+0xd9 fffffadf`c48b21a0 fffff800`0103f7f7 nt!KiExceptionExit fffffadf`c48b2320 fffff800`01039eb1 nt!KiBreakpointTrap+0xb7 fffffadf`c48b24b8 fffffadf`c5f080bd nt!DbgBreakPoint+0x1 fffffadf`c48b24c0 fffffadf`c5f04973 afd!AfdIssueDeviceControl+0x1a8 fffffadf`c48b2580 fffffadf`c5f063ca afd!AfdCreateConnection+0x2ab fffffadf`c48b26c0 fffffadf`c5eed274 afd!AfdAddFreeConnection+0x4a fffffadf`c48b2700 fffffadf`c5f03d05 afd!AfdStartListen+0x220 fffffadf`c48b2790 fffff800`012def6a afd!AfdFastIoDeviceControl+0x10de fffffadf`c48b2a70 fffff800`012df046 nt!IopXxxControlFile+0x5a3 fffffadf`c48b2b90 fffff800`0104113d nt!NtDeviceIoControlFile+0x56 fffffadf`c48b2c00 00000000`78b83e48 nt!KiSystemServiceCopyEnd+0x3 00000000`0012ed88 00000000`6b006a5a wow64cpu!DeviceIoctlFileFault+0x35 00000000`0012ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa 00000000`0012eea0 00000000`77ed8060 wow64!Wow64LdrpInitialize+0x2ed Before died, afd.sys said: kd> da fffffadf`c8e27b40 fffffadf`c8e27b40 "*AFD: IoCallDriver returned STAT" fffffadf`c8e27b60 "US_SUCCESS, but event in the IRP" fffffadf`c8e27b80 " (%p) is NOT signalled!!!." Did you know, how we can fix it? Maybe you have some suggestions... We have used latest driver from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ If you need, I have full memory dump and test program. Please, let me know if you have ideas Thanks, Anykey Skovorodkin
(In reply to comment #20) > Hi Guys > > I have same problem with your driver. > > System: Microsoft Windows Server 2003 R2 x64 > > We have not changed registry keys. All settings by default. > But our system crashes several times per day. > > We have developed small program for testing. Basically, we have used this > example > http://msdn.microsoft.com/en-us/library/windows/desktop/ms739168(v=vs.85).aspx. > Hello, 1. Can you provide the way you use the sample to reproduce the crash? (exact steps for reproduction). 2. Can you please upload memory dump for review? 3. Please provide command line that you used to run the guest VM. Thanks, Vasya Kryachkin.