RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1031987 - Randrw kvm exits is higher on virtio_blk driver with native gluster backend
Summary: Randrw kvm exits is higher on virtio_blk driver with native gluster backend
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jeff Cody
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1359965
TreeView+ depends on / blocked
 
Reported: 2013-11-19 10:14 UTC by Xiaomei Gao
Modified: 2016-10-26 21:18 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-26 21:18:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qemu_kvm_415.el6_exit_trace (1.39 MB, application/x-gzip)
2013-11-21 05:54 UTC, Xiaomei Gao
no flags Details
qemu_kvm_415.el6_5.3_exit_trace (3.50 MB, application/x-gzip)
2013-11-21 05:56 UTC, Xiaomei Gao
no flags Details
noop + nomerges + kvm_exit_trace_415.el6 (1.39 MB, application/x-gzip)
2013-11-22 09:56 UTC, Xiaomei Gao
no flags Details
noop + nomerges + kvm_exit_trace_415.el6_5.3 (3.72 MB, application/x-gzip)
2013-11-22 09:59 UTC, Xiaomei Gao
no flags Details

Comment 2 Qunfang Zhang 2013-11-19 11:09:23 UTC
Hi, Xiaomei

As you said "KVM_Exits is much higher (~70%-95%) on latest version compared to old version", so could you update which older version you tested is good?  Thanks.

Comment 5 Fam Zheng 2013-11-20 07:47:16 UTC
Xiaomei,

What is the old version and is there any number of the test on it, so that I can compare them?

Thanks,
Fam

Comment 10 Xiaomei Gao 2013-11-21 05:54:12 UTC
Created attachment 827005 [details]
qemu_kvm_415.el6_exit_trace

Comment 11 Xiaomei Gao 2013-11-21 05:56:17 UTC
Created attachment 827006 [details]
qemu_kvm_415.el6_5.3_exit_trace

Comment 12 Xiaomei Gao 2013-11-21 05:59:41 UTC
(In reply to Fam Zheng from comment #8)
> It will be helpful if we can also compare the exit reasons, Xigao, could you
> run both tests again and collect the two traces with:
> 
> 1. Before starting test, run as root on host:
> 
> # mount -t debugfs none /sys/kernel/debug
> # echo 1 >/sys/kernel/debug/tracing/events/kvm/enable
> # cat /sys/kernel/debug/tracing/trace_pipe | grep kvm_exit >
> /tmp/kvm_exit_trace
> 
> 2. Leave the above command running, and run the fio test in the guest
> 
> 3. When the test finishes, copy out /tmp/kvm_exit_trace.

- Run following fio command in guest:
  # fio --rw=randrw --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/randrw_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=deadline 

- Results on qemu-kvm-tools-0.12.1.2-2.415.el6.x86_64
  BW:2.825MB/s      IOPS:723    Kvm_Exits: 201574
  * Kvm_exit trace is in the comment #10 "qemu_kvm_415.el6_exit_trace"

- Results on qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64
  BW:3.281MB/s      IOPS:839    Kvm_Exits: 663718
  * Kvm_exit trace is in the comment #11 "qemu_kvm_415.el6_5.3_exit_trace"

Comment 15 Xiaomei Gao 2013-11-22 09:47:52 UTC
(In reply to Fam Zheng from comment #14)

> Xiaomei, please see if using no-op scheduler + setting nomerges [*] can
> reproduce this. We expect the numbers be linear in that case.
> 
> (Or alternatively, setting option "use_bio=true" to virtio_blk module
> parameter should have similar effects.)

We could still reproduce the issue when using noop scheduler and nomerges.

In guest:
# echo noop > /sys/block/vdb/queue/scheduler
# echo 2 > /sys/block/vdb/queue/nomerges
# fio --rw=randrw --bs=4k --iodepth=8 --runtime=1m --direct=1 --filename=/mnt/randrw_4k_8 --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --ioscheduler=noop

Results:
Qemu-img-0.12.1.2-2.415.el6.x86_64
BW=2868.3KB/s    IOPS=717    KVM_Exits=205470

Qemu-img-0.12.1.2-2.415.el6_5.3.x86_64
BW=3440.4KB/s    IOPS=860    KVM_Exits=705056

Please check attachment for KVM_Exit trace.

Comment 16 Xiaomei Gao 2013-11-22 09:56:57 UTC
Created attachment 827662 [details]
noop + nomerges + kvm_exit_trace_415.el6

Comment 17 Xiaomei Gao 2013-11-22 09:59:19 UTC
Created attachment 827663 [details]
noop + nomerges + kvm_exit_trace_415.el6_5.3

Comment 18 Fam Zheng 2013-11-22 10:23:55 UTC
260175  MSR_WRITE          0xffffffff8103ec08  native_write_msr_safe
107396  IO_INSTRUCTION     0xffffffff81293431  iowrite16
101080  HLT                0xffffffff8103eaca  native_safe_halt
88341   EXCEPTION_NMI      0xffffffff8100c644  __math_state_restore
88341   CR_ACCESS          0xffffffff81009777  __switch_to
24427   EXCEPTION_NMI      0x37dba845fb        unknown
9298    EXCEPTION_NMI      0xffffffff8128d39e  copy_user_generic_unrolled
3462    PENDING_INTERRUPT  0xffffffff8103eacb  native_safe_halt
3207    EXCEPTION_NMI      0xffffffff8103f6e7  native_set_pte_at
2401    EXCEPTION_NMI      0x37de28020b        unknown
2011    EXCEPTION_NMI      0x418cf6            unknown
1508    INVLPG             0xffffffff8104fc2d  flush_tlb_page
1486    CR_ACCESS          0xffffffff8103efbc  native_flush_tlb
1486    CR_ACCESS          0xffffffff8103efb9  native_flush_tlb
1479    EXCEPTION_NMI      0x37dba79ddd        unknown

No difference with previous results.

Jeff, do you think there's anything special about gluster backend?

Fam

Comment 19 Jeff Cody 2013-12-10 05:36:19 UTC
(In reply to Fam Zheng from comment #18)
> 260175  MSR_WRITE          0xffffffff8103ec08  native_write_msr_safe
> 107396  IO_INSTRUCTION     0xffffffff81293431  iowrite16
> 101080  HLT                0xffffffff8103eaca  native_safe_halt
> 88341   EXCEPTION_NMI      0xffffffff8100c644  __math_state_restore
> 88341   CR_ACCESS          0xffffffff81009777  __switch_to
> 24427   EXCEPTION_NMI      0x37dba845fb        unknown
> 9298    EXCEPTION_NMI      0xffffffff8128d39e  copy_user_generic_unrolled
> 3462    PENDING_INTERRUPT  0xffffffff8103eacb  native_safe_halt
> 3207    EXCEPTION_NMI      0xffffffff8103f6e7  native_set_pte_at
> 2401    EXCEPTION_NMI      0x37de28020b        unknown
> 2011    EXCEPTION_NMI      0x418cf6            unknown
> 1508    INVLPG             0xffffffff8104fc2d  flush_tlb_page
> 1486    CR_ACCESS          0xffffffff8103efbc  native_flush_tlb
> 1486    CR_ACCESS          0xffffffff8103efb9  native_flush_tlb
> 1479    EXCEPTION_NMI      0x37dba79ddd        unknown
> 
> No difference with previous results.
> 
> Jeff, do you think there's anything special about gluster backend?
> 
> Fam

This is really odd - Fam noticed that my previous comment in this bug has disappeared.  Here is what it said:

--

That is a good question.  I am honestly not sure if there is something specific to gluster that would cause this.  Do we see increased exit events on other network block drivers?

Comment 20 Jeff Cody 2013-12-10 05:39:54 UTC
I do wonder if this is somehow similar or related to BZ 1010638.  In that bug, when running fio continuously in a guest that has a data drive that is using the qemu native gluster driver, memory usage continues to increase until we hit the kernel OOM killer.  Do you want to reassign this BZ to me, Fam?

Comment 21 Xiaomei Gao 2013-12-10 06:30:07 UTC
(In reply to Jeff Cody from comment #19)

> This is really odd - Fam noticed that my previous comment in this bug has
> disappeared.  Here is what it said:
> That is a good question.  I am honestly not sure if there is something
> specific to gluster that would cause this.  Do we see increased exit events
> on other network block drivers?

On Netapp NFS backend, we could not see increased exit events.

Comment 22 Fam Zheng 2013-12-10 09:40:43 UTC
(In reply to Jeff Cody from comment #20)
> I do wonder if this is somehow similar or related to BZ 1010638.  In that
> bug, when running fio continuously in a guest that has a data drive that is
> using the qemu native gluster driver, memory usage continues to increase
> until we hit the kernel OOM killer.  Do you want to reassign this BZ to me,
> Fam?

OK. I don't know whether this is closely related to BZ 1010638 but since this BZ is only reproduced with QEMU's gluster driver so far, I'm reassigning it to Jeff.

Fam

Comment 23 Ademar Reis 2014-01-15 20:11:39 UTC
Removing regression keyword and the z-stream flag, as this is not a general problem (there was no gluster in RHEL6.4, so it can't be a regression).


Note You need to log in before you can comment on or make changes to this bug.