Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 689040

Summary:

Disc writes are significantly slower with RHEL kernel than using kernel built from kernel.org.

Product:

Red Hat Enterprise Linux 6

Reporter:

dcardimi

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED NOTABUG

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

6.0

CC:

esandeen, jmoyer, rwheeler

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-03-25 16:45:48 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Kernel config from 2.6.32.27 build	none
Kernel config from 2.6.38 build	none

Description dcardimi 2011-03-18 21:40:50 UTC

Description of problem:
Using a binary compiled from the test source referenced below, run the program on RHEL 6 kernel-2.6.32-71.18.2.el6.x86_64.rpm or kernel-2.6.32-71.el6.x86_64.rpm.  Notice the run-time output from the program is significantly longer than the kernel built from kernel.org (I tried version 2.6.32.27 & 2.6.38).  In my case, the kernels built from kernel.org were averaging 2-2.5 seconds to run this test program.  The RHEL kernels were averaging 40-41 seconds for the same program.  I'm not sure if it matters, but I was using an ext3 filesystem.  I also mounted my ext3 filesystem as ext4 with the same results.

There were some changes related to this issue on fs/buffer.c checked in kernel.org on 2/18/2009 & 4/6/2009.

Version-Release number of selected component (if applicable):
kernel-2.6.32-71.18.2.el6.x86_64.rpm
kernel-2.6.32-71.el6.x86_64.rpm

How reproducible:
Every time.

Steps to Reproduce:
Using a testcase similar to this (noted from kernel.org commit http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78f707bfc723552e8309b7c38a8d0cc51012e813)

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

#define ROWS 2000

int main(int argc, char **argv)
{

int fdes, i;
FILE *fp;
struct timeval start;
struct timeval end;
struct timeval res;

gettimeofday(&start, NULL);
for (i=0; i<ROWS; i++) {
fp = fopen("test_file", "a");
fprintf(fp, "Some Text Data\n");
fdes = fileno(fp);
fsync(fdes);
fclose(fp);
}
gettimeofday(&end, NULL);

timersub(&end, &start, &res);
fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS,
(res.tv_sec*1000000 + res.tv_usec)/1000);

return 0;
}
  
Actual results:
The program runtime is significantly longer using the RHEL kernel (~40 seconds compared to ~2 seconds)

Expected results:
The program runtime should be the same using the RHEL kernel or the stock kernel from kernel.org.

Additional info:

Comment 2 dcardimi 2011-03-18 21:59:04 UTC

This affects applications using sqlite in synchronous write mode.

Comment 3 dcardimi 2011-03-18 22:35:49 UTC

I also tried Fedora 14 kernel kernel-2.6.35.11-83.fc14.x86_64.rpm, with the same results (slow writes).

Comment 4 Ric Wheeler 2011-03-19 12:00:29 UTC

Can you add to this some description of your storage hardware and your server?

If you are using an array (i.e. something with a non-volatile write cache), can you retry with "mount -o nobarrier" ?

Thanks!

Comment 5 dcardimi 2011-03-20 16:12:36 UTC

This was tested on a Pentium 4 2.8GHz, w/ 2G RAM, with a single 250GB serial ATA Western Digital drive (no RAID setup).

Comment 6 Jeff Moyer 2011-03-21 13:47:11 UTC

Were you using the same mount options between kernel revisions?  Specifically, if barriers are enbled, it will definitely take longer to fsync.  What I/O scheduler was used in testing?

Comment 7 dcardimi 2011-03-21 19:27:34 UTC

Yes, the same mount options were used in all tests.  In all cases, the CFQ IO scheduler was used.

Comment 8 Eric Sandeen 2011-03-21 20:46:29 UTC

Can you please provide the kernel config used for the upstream kernel build & test?

Thanks,
-Eric

Comment 9 dcardimi 2011-03-21 21:18:44 UTC

Created attachment 486697 [details]
Kernel config from 2.6.32.27 build

I've attached the .config from my 2.6.32.27 build.

Comment 10 Eric Sandeen 2011-03-21 21:34:18 UTC

# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set

is likely relevant, especially in light of the fsync() call in your test.

data=writeback is faster, but can lead to data exposure after a crash, so in fedora and rhel we have data=ordered as the default.  This was changed back again later, upstream.  data=ordered does, however, have issues with fsyncs getting tangled up with all buffered data for the filesystem.

If you have the time, could you re-compare with CONFIG_EXT3_DEFAULTS_TO_ORDERED=y on your upstream kernel?

Thanks,

-Eric

Comment 11 dcardimi 2011-03-21 22:46:47 UTC

Created attachment 486701 [details]
Kernel config from 2.6.38 build

Here is a .config from my 2.6.38 kernel, which did have CONFIG_EXT3_DEFAULTS_TO_ORDERED=y.  This kernel also showed the faster runtimes.

Comment 12 Eric Sandeen 2011-03-21 22:55:51 UTC

Yes, the default was changed again by 2.6.38 (it changed back in 2.6.36)

That one config option really could make all the difference.

See also:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=aa32a796389bedbcf1c7714385b18714a0743810

-Eric

Comment 13 dcardimi 2011-03-21 23:11:26 UTC

Should I try CONFIG_EXT3_DEFAULTS_TO_ORDERED=y with the 2.6.32.27 kernel to see if the runtime of the test case is slower?  My 2.6.38 config had it set to "y", and the runtime of the test case was still much faster than the 2.6.32-71.18.2 RH kernel.

Comment 14 Eric Sandeen 2011-03-21 23:47:33 UTC

so 2.6.32/RHEL6 and 2.6.35/F14 were "slow" runs (RHEL6 was ~40s, F14 unspecified); these both default to ordered mode.

2.6.32/vanilla was "fast" (~2s) with writeback mode.
2.6.38/vanilla was "fast" (?s) with ordered mode.

2.6.38 does seem to be a bit of an anomaly, not quite sure what to make of that yet, I guess I'll give it a whirl myself.

FWIW, you could test your vanilla 2.6.32 without rebuilding, just mount the filesystem with -o data=ordered to override the writeback default.

Comment 15 Eric Sandeen 2011-03-21 23:49:45 UTC

One other question, are you testing on a dedicated filesystem, or on the root fs?

Comment 16 dcardimi 2011-03-21 23:53:55 UTC

Yes, 2.6.32/RHEL6 & 2.6.35/F14 both took around 40s.  2.6.32 & 2.6.38 vanilla were around 2s.  In all cases, tests were done on the root filesystem.

I'll try the 2.6.32 test tomorrow with the different mount option.

Comment 17 Eric Sandeen 2011-03-22 00:13:01 UTC

Ok, for starters, testing on the root filesystem may skew results, because in data=ordered mode, the fsyncs in your testcase may end up flushing out system data.  This is a known, but unfortunate, behavior of the ext3 journal design.

In any case, I was able to do some testing here.  I should have gone with Ric's first hunch about the barriers in comment #4, because that's making all the difference, not the journal mode.

With:

# mount -o data=writeback,barrier=0 /dev/sdb5 /mnt/test

(upstream defaults in 2.6.32)

I got < 1.5s.

With:

# mount -o data=ordered,barrier=1 /dev/sdb5 /mnt/test

(RHEL6 defaults - data integrity over speed, not vice versa)

I got 55s.

For the full ordered/writeback and barrier/nobarrier matrix, it looks like this:

           barrier    nobarrier
ordered      55s         1.5s
writeback    55s         1.5s

With barriers enabled, an fsync() causes a disk cache flush.  Without this, data is not persistent on disk; it is only in the disk's write cache, and will be lost if power is lost.

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarr.html

I also see the fast time on upstream .38 (well, the F15 kernel, close enough) but again that's mounting without barriers.

With barriers on, on 2.6.38, I see 56s.

Sorry for the wild goosechase on the journal config; this is entirely a write barrier issue.

Comment 18 dcardimi 2011-03-25 16:33:14 UTC

I've retested this with a number of kernel versions w/ different mount options, and confirmed that this is simply a barrier issue.

Thanks for investigating.

Comment 19 Eric Sandeen 2011-03-25 16:45:48 UTC

Ok, thanks for testing!