Bug 520655 - I/O to DASD partitions appears to be forced sync/direct
Summary: I/O to DASD partitions appears to be forced sync/direct
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: s390x
OS: Linux
high
high
Target Milestone: rc
: 5.6
Assignee: Hendrik Brueckner
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 690968
TreeView+ depends on / blocked
 
Reported: 2009-09-01 16:26 UTC by Bryn M. Reeves
Modified: 2018-10-27 15:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-17 00:25:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bryn M. Reeves 2009-09-01 16:26:15 UTC
Description of problem:
When performing I/O to a whole-disk DASD device I/O appears to show caching effects as expected. When I/O is performed to a partition of the same whole-disk device all I/O appears to be synchronous causing a dramatic drop in perceived performance.

Version-Release number of selected component (if applicable):
2.6.18.128.el5

How reproducible:
100%

Steps to Reproduce:
1. Issue read or write I/O to a DASD whole-disk device
2. Issue read or write I/O to a DASD partition device
3. Compare read/write throughput and physical I/O for the two cases
  
Actual results:
# uname -a
Linux z15 2.6.18-128.el5 #1 SMP Wed Dec 17 11:45:02 EST 2008 s390x s390x s390x GNU/Linux

Whole-disk:
# dd if=/dev/zero of=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.078468 seconds, 522 MB/s
# dd of=/dev/zero if=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.061346 seconds, 668 MB/s

Partition:
# dd if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.28048 seconds, 18.0 MB/s
# dd of=/dev/zero if=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.57545 seconds, 15.9 MB/s
# dd if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.85153 seconds, 14.4 MB/s

Expected results:
I/O to the partitions should see the same caching effects as to the whole-disk device. The I/O here appears to be synchronous for writes - the process is blocked until the data is on the physical devices but we also appear to be bypassing the page cache for data being read in via the partition device nodes.

Dropping the pagecache via /proc/sys/vm/drop_caches before reading from the whole-disk device yields almost identical performance for the whole disk on the next run, as expected:

# echo 3 > /proc/sys/vm/drop_caches 
# dd of=/dev/zero if=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 2.70517 seconds, 15.1 MB/s


Additional info:

Comment 2 Bryn M. Reeves 2009-09-02 17:08:41 UTC
I noticed a difference between the DASD device I was using for testing and others on the guest:

dasd_devmap: turning on fixed buffer mode
dasd(eckd): 0.0.0100: 3390/0A(CU:3990/01) Cyl:3338 Head:15 Sec:224
dasd(eckd): 0.0.0100: (4kB blks): 2403360kB at 48kB/trk compatible disk layout
 dasda:VOL1/  0X0100: dasda1 dasda2 dasda3
dasd(eckd): 0.0.0101: 3390/0A(CU:3990/01) Cyl:3338 Head:15 Sec:224
dasd(eckd): 0.0.0101: (4kB blks): 2403360kB at 48kB/trk compatible disk layout
 dasdb:VOL1/  0X0101: dasdb1
dasd(eckd): 0.0.0150: 3390/0A(CU:3990/01) Cyl:3338 Head:15 Sec:224
dasd(eckd): 0.0.0150: (4kB blks): 2403360kB at 48kB/trk compatible disk layout
 dasdc:(nonl)/        : dasdc1

dasdc doesn't have a valid volume label. The dasd driver seems to fake a partition spanning the whole device in this case. Running fdasd on the device confirms there's no label:

# fdasd /dev/dasdc
reading volume label ..: no known label
Should I create a new one? (y/n): n

I'm not able to reproduce the large performance difference for reads on dasda or dasdb on this system:

# dd if=/dev/dasda1 of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.063291 seconds, 647 MB/s
# dd if=/dev/dasda of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.06125 seconds, 669 MB/s

# dd if=/dev/dasdb of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.0614 seconds, 667 MB/s
# dd if=/dev/dasdb1 of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.063253 seconds, 648 MB/s

Just adding a label to dasdc doesn't change the situation; I/O via the partition device node doesn't appear to be cached for either reads or writes.

Both dasda and dasdb are in use as LVM2 physical volumes on the system and provide segments to the root file system.

Creating and mounting a file system on dasdc1 appears to cause dd's I/O to be cached as with the other devices:

# dd if=/dev/dasdc of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.062391 seconds, 657 MB/s
# dd if=/dev/dasdc1 of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.061607 seconds, 665 MB/s

Writes to the partition device node also appear to be cached in this case (even if not very useful ;):

# dd if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.073299 seconds, 559 MB/s

Comment 3 Hans-Joachim Picht 2009-09-12 16:44:10 UTC
Bryn, can you probably re-run your tests using the dd oflag=sync option?

Regards,

Hans

Comment 4 Bryn M. Reeves 2009-09-16 09:36:00 UTC
Hans,

Adding oflag=sync does further change the I/O performance although it does appear to make it consistent between the partition and the whole disk device nodes:

# dd oflag=sync if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.83633 seconds, 4.6 MB/s
# dd oflag=sync if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.83221 seconds, 4.6 MB/s
# dd oflag=sync if=/dev/zero of=/dev/dasdc1 bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.36947 seconds, 4.9 MB/s

# dd oflag=sync if=/dev/zero of=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.74207 seconds, 4.7 MB/s
# dd oflag=sync if=/dev/zero of=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.41363 seconds, 4.9 MB/s
# dd oflag=sync if=/dev/zero of=/dev/dasdc bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 8.42589 seconds, 4.9 MB/s

Regards,
Bryn

Comment 6 Peter Martuccelli 2009-12-03 15:23:34 UTC
Still working on this issue upstream, moving out to R5.6.


Note You need to log in before you can comment on or make changes to this bug.