Bug 613907

Summary: Device mapper multipath devices are breaking up I/Os requests into page size chunks.
Product: Red Hat Enterprise Linux 5 Reporter: Lachlan McIlroy <lmcilroy>
Component: kernelAssignee: Lachlan McIlroy <lmcilroy>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: bmarzins, tao, vgaikwad
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-21 01:04:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lachlan McIlroy 2010-07-13 08:05:52 UTC
Description of problem:
Device mapper multipath devices are breaking up I/Os requests into page size chunks.

I'm using this dd command to test performance:

# dd if=/dev/zero of=/dev/mapper/mpath0 bs=256K count=1000000

Here we can see the avgrq-sz for the dm-0 device is 8 sectors (4KB):

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  0.00  1.00     0.00     8.00     8.00     0.00    4.00   4.00   0.40
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  0.00  1.00     0.00     8.00     8.00     0.00    4.00   4.00   0.40
sda               0.00 19513.00  0.00 208.00     0.00 159536.00   767.00   108.34  514.15   4.81 100.10
dm-0              0.00     0.00  0.00 19748.00     0.00 157984.00     8.00  9688.11  491.68   0.05 100.10

The avgrq-sz for the sda device looks okay until we switch to the noop scheduler and get this:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  2.70  0.90    21.62     7.21     8.00     0.01    3.25   3.25   1.17
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  2.70  0.90    21.62     7.21     8.00     0.01    3.25   3.25   1.17
sda               0.00  3912.61  0.00 3743.24     0.00 63272.07    16.90   132.68   36.73   0.27  99.82
dm-0              0.00     0.00  0.00 7613.51     0.00 60908.11     8.00   297.13   42.58   0.13  99.82

which tells us that the elevator is recombining the broken up I/Os back into the larger I/Os they started out as.

If we use direct I/O (oflag=direct) then the requests don't get broken up:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  4.00  0.00    32.00     0.00     8.00     0.00    0.25   0.25   0.10
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  4.00  0.00    32.00     0.00     8.00     0.00    0.25   0.25   0.10
sda               0.00     0.00  0.00 341.00     0.00 174592.00   512.00     0.98    2.89   2.89  98.40
dm-0              0.00     0.00  0.00 341.00     0.00 174592.00   512.00     0.98    2.88   2.88  98.30

And of course we get better performance.

Version-Release number of selected component (if applicable):
reported on kernel-2.6.18-164.2.1 and also reproduced on kernel-2.6.18-206

How reproducible:
Setup a device mapper multipath device and use dd to issue I/O directly to /dev/dm-N or /dev/mapper/mpathN devices, monitor stats with iostat.

Comment 2 Issue Tracker 2010-07-13 13:57:04 UTC
Event posted on 13-07-2010 02:57pm BST by breeves

> Now it gets stranger.  I changed the test to issue I/O through 
> the /dev/dm-N device and I'm seeing the same avgrq-sz as the 
> customer - 4KB.  There's definitely something strange going on 
> with device-mapper. 

That seems bizarre and a little hard to believe; the only difference
between the two nodes should be the path name - the two should otherwise
be identical.

Will read back over the history and see if there's anything I spot.

Is the system set up for testing still available somewhere?

Thanks,



This event sent from IssueTracker by breeves 
 issue 1075963

Comment 3 Lachlan McIlroy 2010-07-14 05:29:25 UTC
On later kernels (-206) the two devices (/dev/dm-0 and /dev/mapper/mpath0) both behave the same way and report all I/Os as page sized now so that discrepency between the two devices must have been fixed somehow.

Comment 4 Lachlan McIlroy 2010-07-14 05:53:08 UTC
This problem is not caused by device mapper splitting up I/Os.  The dd writes are going into the device cache and later pushed out to disk by a writepage operation (kswapd, pdflush or a flush on file close).  Kswapd writes out one dirty page at a time and relies on the elevator to merge them into larger I/Os.  Device mapper sits above the elevator so it is showing all the unmerged requests from kswapd and that's why they are all one page in size.

Comment 6 RHEL Program Management 2011-06-20 21:59:53 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.