Description of problem: Driver Verifier will be enabled when testing svvp jobs System-Common Scenario Stress with IO,after Driver Verifier enabled, then reboot the guest automatically. It will last about 40mins to reboot the guest, and the test can not continue after reboot. Version-Release number of selected component (if applicable): kvm-83-105.el5_4.9 kernel-2.6.18-164.2.1.el5 How reproducible: always. Steps to Reproduce: 1.start windows 2008 R2 datacenter guest on local disk, cmd:/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -name win08-test -m 120G -smp 8 -vnc :1 -net nic,vlan=1,macaddr=00:4a:fe:45:32:11,model=virtio -net tap,vlan=1,script=/data/images/qemu-ifup -drive file=/data/win08-test.qcow2,if=virtio,boot=on -smbios file=/data/smbios/smbios_type_0.bin -smbios type=1,manufacturer=redhat,product=RHEV,version=5.4.2.1,serial=11111-55555-dddd-3f45f55,sku=SKU,uuid=4567ea23-1d34-3456-fe34-e32145ac567c -smbios file=/data/smbios/smbios_type_4.bin -smbios file=/data/smbios/smbios_type_4_2.bin -smbios file=/data/smbios/smbios_type_4_3.bin -smbios file=/data/smbios/smbios_type_4_4.bin -smbios file=/data/smbios/smbios_type_4_5.bin -smbios file=/data/smbios/smbios_type_4_6.bin -smbios file=/data/smbios/smbios_type_4_7.bin -smbios file=/data/smbios/smbios_type_4_8.bin -smbios file=/data/smbios_type_3.bin -smbios file=/data/smbios_type_16.bin 2.run test job System-Common Scenario Stress with IO,Driver verifier will be enabled, and then reboot, the rebooting process will last about 40mins, and test can not continue. IBM x3950-M2 configuration: total 24cores and 256G RAM CPU: 2 x Intel Xeon Processor Six-Core E7450/2.4GHz/9MB L2 Cache/16MB L3 Cache/1066 MHz, 0GB 2.5 HS SAS HDD, DVD Combo, 4*Memory Card, 2*Giga Ethernet, Light Path, 2*1440w,RSAII 1 Intel Xeon Processor E7450 - 2.40 Ghz 9 MB L2 Cache 1066 Mhz Intel Six Core Processor Upgrade 2 RAM: 16 GB (2x8GB Kit) PC2-5300 CL5 ECC DDR2 SDRAM RDIMM x16 Additional info: This issue does not happen on Dell R900 and IBM3755, and guest with driver verifier enabled need only 10-15mins to reboot. For trouble shoot problem, try following methods, 1. change IBM x3950-M2 processor from E7450 to E7420(same as Dell R900), the problem still exists. 2 change IBM x3950-M2 memory from 8G/module to 4G/module( same as Dell R900 and IBM3755),the problem still exists 3. Install windows 2008 R2 datacenter directly on IBM x3950-M2 not vm, this issue disappear, need about 10mins to reboot 4. change guest block from virtio to IDE, reproduce 5. change guest image from qcow2 to raw, reproduce
Created attachment 374145 [details] kvm-stat info
Created attachment 374146 [details] top info
Created attachment 374147 [details] vmstat 1 info
Please retry with cache=off flag (it shouldn't change allot, but this should be our base). After that, please pin all the vcpus to different pcpu, let's see if there is effect, I can see many IPIs are been sent. Also kvmtrace output of 1 second should be added,
Does it work with -smp 1? With less memory?
(In reply to comment #5) > Does it work with -smp 1? With less memory? It does work with smp 1 and less memory, such as 32G memory.
Can you please narrow it down. Does it work with smp 8 and 32G? Does it work with smp 1 and 120G?
(In reply to comment #4) > Please retry with cache=off flag (it shouldn't change allot, but this should be > our base). > > After that, please pin all the vcpus to different pcpu, let's see if there is > effect, I can see many IPIs are been sent. > > Also kvmtrace output of 1 second should be added, Using cache=off option, and using command "taskset -pc 8-15 11428(pid of qemu-kvm)" to pin 8vcpus, last about 30mins to reboot, and can continue test after reboot. Attach kvmtrace.tar.gz for kvmtrace output.
Created attachment 374720 [details] kvmtrace
(In reply to comment #8) > (In reply to comment #4) > > Please retry with cache=off flag (it shouldn't change allot, but this should be > > our base). > > > > After that, please pin all the vcpus to different pcpu, let's see if there is > > effect, I can see many IPIs are been sent. > > > > Also kvmtrace output of 1 second should be added, > > Using cache=off option, and using command "taskset -pc 8-15 11428(pid of > qemu-kvm)" to pin 8vcpus, last about 30mins to reboot, and can continue test > after reboot. > Attach kvmtrace.tar.gz for kvmtrace output. Please change only one thing during each test. taskset -pc 8-15 doesn't bind each vcpus to a different pcpu, it binds all vcpus to a set of pcpus. When you say the test cannot continue after 40min reset what do you mean? What error do you get? How many times you've tried and failed?
(In reply to comment #7) > Can you please narrow it down. Does it work with smp 8 and 32G? Does it work > with smp 1 and 120G? smp 8 and 32G, last about 22mins, can continue test after reboot. smp 1 and 120G, last about 8 mins, can continue test after reboot.
(In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #4) > > > Please retry with cache=off flag (it shouldn't change allot, but this should be > > > our base). > > > > > > After that, please pin all the vcpus to different pcpu, let's see if there is > > > effect, I can see many IPIs are been sent. > > > > > > Also kvmtrace output of 1 second should be added, > > > > Using cache=off option, and using command "taskset -pc 8-15 11428(pid of > > qemu-kvm)" to pin 8vcpus, last about 30mins to reboot, and can continue test > > after reboot. > > Attach kvmtrace.tar.gz for kvmtrace output. > > Please change only one thing during each test. taskset -pc 8-15 doesn't bind > each vcpus to a different pcpu, it binds all vcpus to a set of pcpus. Would you please guide me how to bind each vcpus to a different pcpu? Thanks. > When you say the test cannot continue after 40min reset what do you mean? > What error do > you get? How many times you've tried and failed? There is no error happens. The DTM controller can not hear the guest reboot ok, so it will not run test script on guest.
(In reply to comment #12) > (In reply to comment #10) > > (In reply to comment #8) > > > (In reply to comment #4) > > > > Please retry with cache=off flag (it shouldn't change allot, but this should be > > > > our base). > > > > > > > > After that, please pin all the vcpus to different pcpu, let's see if there is > > > > effect, I can see many IPIs are been sent. > > > > > > > > Also kvmtrace output of 1 second should be added, > > > > > > Using cache=off option, and using command "taskset -pc 8-15 11428(pid of > > > qemu-kvm)" to pin 8vcpus, last about 30mins to reboot, and can continue test > > > after reboot. > > > Attach kvmtrace.tar.gz for kvmtrace output. > > > > Please change only one thing during each test. taskset -pc 8-15 doesn't bind > > each vcpus to a different pcpu, it binds all vcpus to a set of pcpus. > > Would you please guide me how to bind each vcpus to a different pcpu? Thanks. You should run "taskset -pc $i $pid" for each vcpu thread separately. With different $i for each thread. > > When you say the test cannot continue after 40min reset what do you mean? > > What error do > > you get? How many times you've tried and failed? > > There is no error happens. The DTM controller can not hear the guest reboot ok, > so it will not run test script on guest. Aha, so is this timeout issue? Can timeout be enlarged on DTM controller?
(In reply to comment #10) > > Please change only one thing during each test. taskset -pc 8-15 doesn't bind > each vcpus to a different pcpu, it binds all vcpus to a set of pcpus. When you > say the test cannot continue after 40min reset what do you mean? What error do > you get? How many times you've tried and failed? 1. cache=off, need 38mins to reboot, and can continue to test, kvmtrace output: CPU 0: 12 KiB data CPU 1: 17 KiB data CPU 2: 1 KiB data CPU 3: 1 KiB data CPU 4: 1 KiB data CPU 5: 1124 KiB data CPU 6: 503 KiB data CPU 7: 1 KiB data CPU 8: 1041 KiB data CPU 9: 1 KiB data CPU 10: 1039 KiB data CPU 11: 1667 KiB data CPU 12: 802 KiB data CPU 13: 1 KiB data CPU 14: 1 KiB data CPU 15: 1 KiB data CPU 16: 1315 KiB data CPU 17: 1906 KiB data CPU 18: 452 KiB data CPU 19: 54 KiB data CPU 20: 14 KiB data CPU 21: 1 KiB data CPU 22: 1021 KiB data CPU 23: 1607 KiB data Total: lost 0, 12566 KiB data 2. pin vcpus to pcpus, need 41mins to reboot, can continue test after reboot 2.1 info cpus for guest: [root@intel-XE7450-512-1 task]# /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -name RHEV2 -m 120G -smp 8 -vnc :1 -net nic,vlan=1,macaddr=00:27:3e:45:f2:22,model=virtio -net tap,vlan=1,script=/data/qemu-ifup -drive file=/data/RHEV2.qcow2,if=virtio,boot=on -smbios file=/data/smbios/smbios_type_0.bin -smbios type=1,manufacturer=redhat,product=RHEVH,version=2.1,serial=11111-55555-dddd-3f45e55,sku=SKU,uuid=45f7fa23-1d34-3456-1e34-e32745ac5f7c -smbios file=/data/smbios/smbios_type_4.bin -smbios file=/data/smbios/smbios_type_4_2.bin -smbios file=/data/smbios/smbios_type_4_3.bin -smbios file=/data/smbios/smbios_type_4_4.bin -smbios file=/data/smbios/smbios_type_4_5.bin -smbios file=/data/smbios/smbios_type_4_6.bin -smbios file=/data/smbios/smbios_type_4_7.bin -smbios file=/data/smbios/smbios_type_4_8.bin -smbios file=/data/smbios_type_3.bin -smbios file=/data/smbios_type_16.bin -monitor stdio QEMU 0.9.1 monitor - type 'help' for more information (qemu) info cpus * CPU #0: pc=0x000000000002082b thread_id=14507 CPU #1: pc=0x000000000009f02c thread_id=14508 CPU #2: pc=0x000000000009f02c thread_id=14509 CPU #3: pc=0x000000000009f02c thread_id=14510 CPU #4: pc=0x000000000009f02c thread_id=14511 CPU #5: pc=0x000000000009f02c thread_id=14512 CPU #6: pc=0x000000000009f02c thread_id=14513 CPU #7: pc=0x000000000009f02c thread_id=14514 2.1 taskset [root@intel-XE7450-512-1 data]# taskset -pc 8 14507 pid 14507's current affinity list: 0-23 pid 14507's new affinity list: 8 [root@intel-XE7450-512-1 data]# taskset -pc 9 14508 pid 14508's current affinity list: 0-23 pid 14508's new affinity list: 9 [root@intel-XE7450-512-1 data]# taskset -pc 10 14509 pid 14509's current affinity list: 0-23 pid 14509's new affinity list: 10 [root@intel-XE7450-512-1 data]# taskset -pc 11 14510 pid 14510's current affinity list: 0-23 pid 14510's new affinity list: 11 [root@intel-XE7450-512-1 data]# taskset -pc 12 14511 pid 14511's current affinity list: 0-23 pid 14511's new affinity list: 12 [root@intel-XE7450-512-1 data]# taskset -pc 13 14512 pid 14512's current affinity list: 0-23 pid 14512's new affinity list: 13 [root@intel-XE7450-512-1 data]# taskset -pc 14 14513 pid 14513's current affinity list: 0-23 pid 14513's new affinity list: 14 [root@intel-XE7450-512-1 data]# taskset -pc 15 14514 pid 14514's current affinity list: 0-23 pid 14514's new affinity list: 15 2.3 kvmtrace output [root@intel-XE7450-512-1 data]# kvmtrace -D /data/kvmtrace/ -o svvp -w 1 CPU 0: 1 KiB data CPU 1: 1 KiB data CPU 2: 1 KiB data CPU 3: 1 KiB data CPU 4: 1 KiB data CPU 5: 1 KiB data CPU 6: 1 KiB data CPU 7: 1 KiB data CPU 8: 1602 KiB data CPU 9: 1573 KiB data CPU 10: 1511 KiB data CPU 11: 1474 KiB data CPU 12: 1551 KiB data CPU 13: 1440 KiB data CPU 14: 1412 KiB data CPU 15: 1439 KiB data CPU 16: 1 KiB data CPU 17: 1 KiB data CPU 18: 1 KiB data CPU 19: 1 KiB data CPU 20: 1 KiB data CPU 21: 1 KiB data CPU 22: 1 KiB data CPU 23: 1 KiB data Total: lost 0, 11998 KiB data
So what timeout DTM controller sets? Is there an option to change it? How much time reboot takes on Nehalem CPU?
(In reply to comment #15) > So what timeout DTM controller sets? Is there an option to change it? How much > time reboot takes on Nehalem CPU? There is no timeout limit when test job running, like if I shutdown rebooting guest(like smp 4, 4G) for a long time, then reboot the guest, the test can continue. So I do not think it is a timeout issue. And we can not set timeout from DTM controller.
(In reply to comment #16) > (In reply to comment #15) > > So what timeout DTM controller sets? Is there an option to change it? How much > > time reboot takes on Nehalem CPU? > > There is no timeout limit when test job running, like if I shutdown rebooting > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > continue. > So I do not think it is a timeout issue. And we can not set timeout from DTM > controller. In comment #12 you wrote: > There is no error happens. The DTM controller can not hear the guest reboot ok, > so it will not run test script on guest. So does guest reboot OK or not OK? If it reboots OK why DTM controller can not hear this? If it doesn't reboot OK what is the error?
Gleb, can you check the trace to see if there is something for us to do? Maybe increasing the shadow mmu caches might help?
All attached traces are from working runs as far as I can see. Furthermore taking kvmtrace of ~40min reboot is not feasible _and_ bug reporter claims that reboot succeedes, so what should we look for in the trace anyway? We first need to understand if this is timeout issue or not. If it is not then what is the error? Windows doesn't run properly after reboot? There is no network connectivity? Why DTM doesn't continue the test? Is it possible to force it to continue? What info DTM provides? Attached kvm_stat output shows mmu_recycled is zero, so this is not shadow mmu cache problem.
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > > So what timeout DTM controller sets? Is there an option to change it? How much > > > time reboot takes on Nehalem CPU? > > > > There is no timeout limit when test job running, like if I shutdown rebooting > > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > > continue. > > So I do not think it is a timeout issue. And we can not set timeout from DTM > > controller. > > In comment #12 you wrote: > > There is no error happens. The DTM controller can not hear the guest reboot ok, > > so it will not run test script on guest. > So does guest reboot OK or not OK? If it reboots OK why DTM controller can not > hear this? If it doesn't reboot OK what is the error? The guest reboot ok, I can operate into guest, while the operation is very slow. The network is ok, it can get dhcp ip address, and can ping DTM controller.
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > > So what timeout DTM controller sets? Is there an option to change it? How much > > > time reboot takes on Nehalem CPU? > > > > There is no timeout limit when test job running, like if I shutdown rebooting > > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > > continue. > > So I do not think it is a timeout issue. And we can not set timeout from DTM > > controller. > > In comment #12 you wrote: > > There is no error happens. The DTM controller can not hear the guest reboot ok, > > so it will not run test script on guest. > So does guest reboot OK or not OK? If it reboots OK why DTM controller can not > hear this? If it doesn't reboot OK what is the error? DTM controller keep waiting rebooting end, no error happens.
(In reply to comment #21) > (In reply to comment #17) > > (In reply to comment #16) > > > (In reply to comment #15) > > > > So what timeout DTM controller sets? Is there an option to change it? How much > > > > time reboot takes on Nehalem CPU? > > > > > > There is no timeout limit when test job running, like if I shutdown rebooting > > > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > > > continue. > > > So I do not think it is a timeout issue. And we can not set timeout from DTM > > > controller. > > > > In comment #12 you wrote: > > > There is no error happens. The DTM controller can not hear the guest reboot ok, > > > so it will not run test script on guest. > > So does guest reboot OK or not OK? If it reboots OK why DTM controller can not > > hear this? If it doesn't reboot OK what is the error? > > DTM controller keep waiting rebooting end, no error happens. And what happens if you shutdown VM and boot it again at this stage?
(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #17) > > > (In reply to comment #16) > > > > (In reply to comment #15) > > > > > So what timeout DTM controller sets? Is there an option to change it? How much > > > > > time reboot takes on Nehalem CPU? > > > > > > > > There is no timeout limit when test job running, like if I shutdown rebooting > > > > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > > > > continue. > > > > So I do not think it is a timeout issue. And we can not set timeout from DTM > > > > controller. > > > > > > In comment #12 you wrote: > > > > There is no error happens. The DTM controller can not hear the guest reboot ok, > > > > so it will not run test script on guest. > > > So does guest reboot OK or not OK? If it reboots OK why DTM controller can not > > > hear this? If it doesn't reboot OK what is the error? > > > > DTM controller keep waiting rebooting end, no error happens. > And what happens if you shutdown VM and boot it again at this stage? It need another 40mins to reboot, and can not continue testing.I tried.
(In reply to comment #23) > (In reply to comment #22) > > (In reply to comment #21) > > > (In reply to comment #17) > > > > (In reply to comment #16) > > > > > (In reply to comment #15) > > > > > > So what timeout DTM controller sets? Is there an option to change it? How much > > > > > > time reboot takes on Nehalem CPU? > > > > > > > > > > There is no timeout limit when test job running, like if I shutdown rebooting > > > > > guest(like smp 4, 4G) for a long time, then reboot the guest, the test can > > > > > continue. > > > > > So I do not think it is a timeout issue. And we can not set timeout from DTM > > > > > controller. > > > > > > > > In comment #12 you wrote: > > > > > There is no error happens. The DTM controller can not hear the guest reboot ok, > > > > > so it will not run test script on guest. > > > > So does guest reboot OK or not OK? If it reboots OK why DTM controller can not > > > > hear this? If it doesn't reboot OK what is the error? > > > > > > DTM controller keep waiting rebooting end, no error happens. > > And what happens if you shutdown VM and boot it again at this stage? > > It need another 40mins to reboot, and can not continue testing.I tried. Not reboot. Shutdown. You can even do it by running "quit" in the monitor. Then start VM again.
Note the number of mmu pages created is almost the same as the number destroyed: mmu_cache_miss 25160802 524 mmu_shadow_zapped 25381361 521 And in the trace (1 second) the same addresses fault several times: faults for same address count 10 429 9 645 8 1279 5 1039 The vcpus change CR4.PGE constantly. The problem seems to be the constant zapping/creation of the same pagetables. szhou, can you please collect "sar -B 1 100" output (paging statistics) while the long-running shutdown is executing? Also, can you collect kvm_stat info with "oos_shadow=0" parameter to kvm.ko module? (and how long does it take with that parameter). Thanks
(In reply to comment #25) > Note the number of mmu pages created is almost the same as the number > destroyed: > > mmu_cache_miss 25160802 524 > mmu_shadow_zapped 25381361 521 > > And in the trace (1 second) the same addresses fault several times: > > faults for same address count > 10 429 > 9 645 > 8 1279 > 5 1039 > > The vcpus change CR4.PGE constantly. The problem seems to be the constant > zapping/creation of the same pagetables. > > szhou, can you please collect "sar -B 1 100" output (paging statistics) while > the long-running shutdown is executing? > > Also, can you collect kvm_stat info with "oos_shadow=0" parameter to kvm.ko > module? (and how long does it take with that parameter). > It last more than 1 hour to reboot after set "oos_shadow=0" parameter to kvm.ko module, attach kvm-stat info. > Thanks
Created attachment 378458 [details] kvm_stat with oos_shadow=0
Created attachment 378459 [details] sar -B 1 100
Please collect the following information (with default parameters, oos_shadow=1), during the long running shutdown: 1) strace -p pid_of_qemu 2) sar -R -r 1 100 3) kvm_stat For one second (all for the same second). Thanks
Created attachment 380756 [details] strace -p pid-qemu-kvm
Created attachment 380757 [details] sar -R -r 1 100
Created attachment 380758 [details] kvm-stat
could you please help us check the bug. SVVP need big mem test thank you very much jiabo
wang, szhou, Can you please provide a recipe (or a URL to instructions) on how to install the System-Common Scenario Stress with IO Driver verifier On Windows 2008 ?
Unfortunately there is no way to avoid trapping cr4.pge on amd without npt. So this will be slow no matter what we do. As far as I can tell there is no way to fix this for RHEL 5 without risking destabilization. I recommend testing this on an npt capable machine (which is needed anyway for these memory sizes).
(In reply to comment #37) > Unfortunately there is no way to avoid trapping cr4.pge on amd without npt. So > this will be slow no matter what we do. > > As far as I can tell there is no way to fix this for RHEL 5 without risking > destabilization. I recommend testing this on an npt capable machine (which is > needed anyway for these memory sizes). Hi, Avi The issue happens on Intel e7450(IBM3950-M2) host, and it does not happen on AMD 8356(IBM3755) host. So this bug block svvp big memory certify on intel host not amd host.