Bug 1031987
Summary: | Randrw kvm exits is higher on virtio_blk driver with native gluster backend | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Xiaomei Gao <xigao> | ||||||||||
Component: | qemu-kvm | Assignee: | Jeff Cody <jcody> | ||||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 6.6 | CC: | ailan, chayang, jcody, juzhang, michen, mkenneth, qzhang, rbalakri, rpacheco, virt-maint, wquan, yama | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-10-26 21:18:18 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1359965 | ||||||||||||
Attachments: |
|
Comment 2
Qunfang Zhang
2013-11-19 11:09:23 UTC
Xiaomei, What is the old version and is there any number of the test on it, so that I can compare them? Thanks, Fam Created attachment 827005 [details]
qemu_kvm_415.el6_exit_trace
Created attachment 827006 [details]
qemu_kvm_415.el6_5.3_exit_trace
(In reply to Fam Zheng from comment #8) > It will be helpful if we can also compare the exit reasons, Xigao, could you > run both tests again and collect the two traces with: > > 1. Before starting test, run as root on host: > > # mount -t debugfs none /sys/kernel/debug > # echo 1 >/sys/kernel/debug/tracing/events/kvm/enable > # cat /sys/kernel/debug/tracing/trace_pipe | grep kvm_exit > > /tmp/kvm_exit_trace > > 2. Leave the above command running, and run the fio test in the guest > > 3. When the test finishes, copy out /tmp/kvm_exit_trace. - Run following fio command in guest: # fio --rw=randrw --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/randrw_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=deadline - Results on qemu-kvm-tools-0.12.1.2-2.415.el6.x86_64 BW:2.825MB/s IOPS:723 Kvm_Exits: 201574 * Kvm_exit trace is in the comment #10 "qemu_kvm_415.el6_exit_trace" - Results on qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64 BW:3.281MB/s IOPS:839 Kvm_Exits: 663718 * Kvm_exit trace is in the comment #11 "qemu_kvm_415.el6_5.3_exit_trace" (In reply to Fam Zheng from comment #14) > Xiaomei, please see if using no-op scheduler + setting nomerges [*] can > reproduce this. We expect the numbers be linear in that case. > > (Or alternatively, setting option "use_bio=true" to virtio_blk module > parameter should have similar effects.) We could still reproduce the issue when using noop scheduler and nomerges. In guest: # echo noop > /sys/block/vdb/queue/scheduler # echo 2 > /sys/block/vdb/queue/nomerges # fio --rw=randrw --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/randrw_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=noop Results: Qemu-img-0.12.1.2-2.415.el6.x86_64 BW=2868.3KB/s IOPS=717 KVM_Exits=205470 Qemu-img-0.12.1.2-2.415.el6_5.3.x86_64 BW=3440.4KB/s IOPS=860 KVM_Exits=705056 Please check attachment for KVM_Exit trace. Created attachment 827662 [details]
noop + nomerges + kvm_exit_trace_415.el6
Created attachment 827663 [details]
noop + nomerges + kvm_exit_trace_415.el6_5.3
260175 MSR_WRITE 0xffffffff8103ec08 native_write_msr_safe 107396 IO_INSTRUCTION 0xffffffff81293431 iowrite16 101080 HLT 0xffffffff8103eaca native_safe_halt 88341 EXCEPTION_NMI 0xffffffff8100c644 __math_state_restore 88341 CR_ACCESS 0xffffffff81009777 __switch_to 24427 EXCEPTION_NMI 0x37dba845fb unknown 9298 EXCEPTION_NMI 0xffffffff8128d39e copy_user_generic_unrolled 3462 PENDING_INTERRUPT 0xffffffff8103eacb native_safe_halt 3207 EXCEPTION_NMI 0xffffffff8103f6e7 native_set_pte_at 2401 EXCEPTION_NMI 0x37de28020b unknown 2011 EXCEPTION_NMI 0x418cf6 unknown 1508 INVLPG 0xffffffff8104fc2d flush_tlb_page 1486 CR_ACCESS 0xffffffff8103efbc native_flush_tlb 1486 CR_ACCESS 0xffffffff8103efb9 native_flush_tlb 1479 EXCEPTION_NMI 0x37dba79ddd unknown No difference with previous results. Jeff, do you think there's anything special about gluster backend? Fam (In reply to Fam Zheng from comment #18) > 260175 MSR_WRITE 0xffffffff8103ec08 native_write_msr_safe > 107396 IO_INSTRUCTION 0xffffffff81293431 iowrite16 > 101080 HLT 0xffffffff8103eaca native_safe_halt > 88341 EXCEPTION_NMI 0xffffffff8100c644 __math_state_restore > 88341 CR_ACCESS 0xffffffff81009777 __switch_to > 24427 EXCEPTION_NMI 0x37dba845fb unknown > 9298 EXCEPTION_NMI 0xffffffff8128d39e copy_user_generic_unrolled > 3462 PENDING_INTERRUPT 0xffffffff8103eacb native_safe_halt > 3207 EXCEPTION_NMI 0xffffffff8103f6e7 native_set_pte_at > 2401 EXCEPTION_NMI 0x37de28020b unknown > 2011 EXCEPTION_NMI 0x418cf6 unknown > 1508 INVLPG 0xffffffff8104fc2d flush_tlb_page > 1486 CR_ACCESS 0xffffffff8103efbc native_flush_tlb > 1486 CR_ACCESS 0xffffffff8103efb9 native_flush_tlb > 1479 EXCEPTION_NMI 0x37dba79ddd unknown > > No difference with previous results. > > Jeff, do you think there's anything special about gluster backend? > > Fam This is really odd - Fam noticed that my previous comment in this bug has disappeared. Here is what it said: -- That is a good question. I am honestly not sure if there is something specific to gluster that would cause this. Do we see increased exit events on other network block drivers? I do wonder if this is somehow similar or related to BZ 1010638. In that bug, when running fio continuously in a guest that has a data drive that is using the qemu native gluster driver, memory usage continues to increase until we hit the kernel OOM killer. Do you want to reassign this BZ to me, Fam? (In reply to Jeff Cody from comment #19) > This is really odd - Fam noticed that my previous comment in this bug has > disappeared. Here is what it said: > That is a good question. I am honestly not sure if there is something > specific to gluster that would cause this. Do we see increased exit events > on other network block drivers? On Netapp NFS backend, we could not see increased exit events. (In reply to Jeff Cody from comment #20) > I do wonder if this is somehow similar or related to BZ 1010638. In that > bug, when running fio continuously in a guest that has a data drive that is > using the qemu native gluster driver, memory usage continues to increase > until we hit the kernel OOM killer. Do you want to reassign this BZ to me, > Fam? OK. I don't know whether this is closely related to BZ 1010638 but since this BZ is only reproduced with QEMU's gluster driver so far, I'm reassigning it to Jeff. Fam Removing regression keyword and the z-stream flag, as this is not a general problem (there was no gluster in RHEL6.4, so it can't be a regression). |