Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 613907 - Device mapper multipath devices are breaking up I/Os requests into page size chunks.
Device mapper multipath devices are breaking up I/Os requests into page size ...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.4
All Linux
high Severity high
: rc
: ---
Assigned To: Lachlan McIlroy
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-13 04:05 EDT by Lachlan McIlroy
Modified: 2015-04-12 19:14 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-06-20 21:04:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Lachlan McIlroy 2010-07-13 04:05:52 EDT
Description of problem:
Device mapper multipath devices are breaking up I/Os requests into page size chunks.

I'm using this dd command to test performance:

# dd if=/dev/zero of=/dev/mapper/mpath0 bs=256K count=1000000

Here we can see the avgrq-sz for the dm-0 device is 8 sectors (4KB):

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  0.00  1.00     0.00     8.00     8.00     0.00    4.00   4.00   0.40
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  0.00  1.00     0.00     8.00     8.00     0.00    4.00   4.00   0.40
sda               0.00 19513.00  0.00 208.00     0.00 159536.00   767.00   108.34  514.15   4.81 100.10
dm-0              0.00     0.00  0.00 19748.00     0.00 157984.00     8.00  9688.11  491.68   0.05 100.10

The avgrq-sz for the sda device looks okay until we switch to the noop scheduler and get this:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  2.70  0.90    21.62     7.21     8.00     0.01    3.25   3.25   1.17
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  2.70  0.90    21.62     7.21     8.00     0.01    3.25   3.25   1.17
sda               0.00  3912.61  0.00 3743.24     0.00 63272.07    16.90   132.68   36.73   0.27  99.82
dm-0              0.00     0.00  0.00 7613.51     0.00 60908.11     8.00   297.13   42.58   0.13  99.82

which tells us that the elevator is recombining the broken up I/Os back into the larger I/Os they started out as.

If we use direct I/O (oflag=direct) then the requests don't get broken up:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda               0.00     0.00  4.00  0.00    32.00     0.00     8.00     0.00    0.25   0.25   0.10
hda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hda3              0.00     0.00  4.00  0.00    32.00     0.00     8.00     0.00    0.25   0.25   0.10
sda               0.00     0.00  0.00 341.00     0.00 174592.00   512.00     0.98    2.89   2.89  98.40
dm-0              0.00     0.00  0.00 341.00     0.00 174592.00   512.00     0.98    2.88   2.88  98.30

And of course we get better performance.

Version-Release number of selected component (if applicable):
reported on kernel-2.6.18-164.2.1 and also reproduced on kernel-2.6.18-206

How reproducible:
Setup a device mapper multipath device and use dd to issue I/O directly to /dev/dm-N or /dev/mapper/mpathN devices, monitor stats with iostat.
Comment 2 Issue Tracker 2010-07-13 09:57:04 EDT
Event posted on 13-07-2010 02:57pm BST by breeves

> Now it gets stranger.  I changed the test to issue I/O through 
> the /dev/dm-N device and I'm seeing the same avgrq-sz as the 
> customer - 4KB.  There's definitely something strange going on 
> with device-mapper. 

That seems bizarre and a little hard to believe; the only difference
between the two nodes should be the path name - the two should otherwise
be identical.

Will read back over the history and see if there's anything I spot.

Is the system set up for testing still available somewhere?

Thanks,



This event sent from IssueTracker by breeves 
 issue 1075963
Comment 3 Lachlan McIlroy 2010-07-14 01:29:25 EDT
On later kernels (-206) the two devices (/dev/dm-0 and /dev/mapper/mpath0) both behave the same way and report all I/Os as page sized now so that discrepency between the two devices must have been fixed somehow.
Comment 4 Lachlan McIlroy 2010-07-14 01:53:08 EDT
This problem is not caused by device mapper splitting up I/Os.  The dd writes are going into the device cache and later pushed out to disk by a writepage operation (kswapd, pdflush or a flush on file close).  Kswapd writes out one dirty page at a time and relies on the elevator to merge them into larger I/Os.  Device mapper sits above the elevator so it is showing all the unmerged requests from kswapd and that's why they are all one page in size.
Comment 6 RHEL Product and Program Management 2011-06-20 17:59:53 EDT
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Note You need to log in before you can comment on or make changes to this bug.