Bug 174182 - journal commit starvation
journal commit starvation
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks: 170417
  Show dependency treegraph
 
Reported: 2005-11-25 09:38 EST by Bastien Nocera
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-05 17:51:58 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test.pl (197 bytes, text/plain)
2005-11-25 09:38 EST, Bastien Nocera
no flags Details
messages file (109.38 KB, text/plain)
2005-11-25 09:40 EST, Bastien Nocera
no flags Details

  None (edit)
Description Bastien Nocera 2005-11-25 09:38:56 EST
Using the attached testcase, applications using the disk will "freeze" for large
periods of time (tens of seconds).

The same problem doesn't occur when ext2 is the filesystem used.

kernel 2.4.21-37.ELsmp

Full Alt+SysRq+T attached below. Selected samples (the dd is replaced in this
case by the attached test program):

Nov 23 11:55:05 maroon kernel: dd            R current   3776  1275   1274     
               (NOTLB)
Nov 23 11:55:05 maroon kernel: Call Trace:   [<c016787a>] create_buffers
[kernel] 0x6a (0xf6917e24)
Nov 23 11:55:05 maroon kernel: [<f8886532>] ext3_get_block [ext3] 0x52 (0xf6917e38)
Nov 23 11:55:05 maroon kernel: [<c016814b>] __block_prepare_write [kernel] 0x1ab
(0xf6917e5c)
Nov 23 11:55:05 maroon kernel: [<c0168b09>] block_prepare_write [kernel] 0x39
(0xf6917ea0)
Nov 23 11:55:05 maroon kernel: [<f88864e0>] ext3_get_block [ext3] 0x0 (0xf6917eb4)
Nov 23 11:55:05 maroon kernel: [<f8886bb9>] ext3_prepare_write [ext3] 0xc9
(0xf6917ec0)
Nov 23 11:55:05 maroon kernel: [<f88864e0>] ext3_get_block [ext3] 0x0 (0xf6917ed0)
Nov 23 11:55:05 maroon kernel: [<c014c053>] do_generic_file_write [kernel] 0x1e3
(0xf6917ef4)
Nov 23 11:55:05 maroon kernel: [<c014c5bf>] generic_file_write [kernel] 0x13f
(0xf6917f48)
Nov 23 11:55:05 maroon kernel: [<f8883e99>] ext3_file_write [ext3] 0x39 (0xf6917f74)
Nov 23 11:55:05 maroon kernel: [<c0164b27>] sys_write [kernel] 0x97 (0xf6917f94)

Nov 23 11:55:05 maroon kernel: vi            D 00000001  3764  1326   1276     
               (NOTLB)
Nov 23 11:55:05 maroon kernel: Call Trace:   [<c0124a52>] sleep_on [kernel] 0x52
(0xf2679ee8)
Nov 23 11:55:05 maroon kernel: [<f8879be8>] log_wait_commit_Rsmp_4dfe4007 [jbd]
0x68 (0xf2679f18)
Nov 23 11:55:05 maroon kernel: [<f8874bd3>] journal_stop_Rsmp_74af6844 [jbd]
0x193 (0xf2679f34)
Nov 23 11:55:05 maroon kernel: [<f8873445>] journal_start_Rsmp_25661df5 [jbd]
0xa5 (0xf2679f40)
Nov 23 11:55:05 maroon kernel: [<f8874ccc>] journal_force_commit_Rsmp_2a9443c3
[jbd] 0x7c (0xf2679f64)
Nov 23 11:55:05 maroon kernel: [<f888f091>] ext3_force_commit [ext3] 0x51
(0xf2679f70)
Nov 23 11:55:05 maroon kernel: [<f8883fb4>] ext3_sync_file [ext3] 0x84 (0xf2679f7c)
Nov 23 11:55:05 maroon kernel: [<f8887270>] ext3_writepage [ext3] 0x0 (0xf2679f84)
Nov 23 11:55:05 maroon kernel: [<c01667f8>] sys_fsync [kernel] 0x98 (0xf2679f9c)
Comment 1 Bastien Nocera 2005-11-25 09:38:56 EST
Created attachment 121488 [details]
test.pl
Comment 2 Bastien Nocera 2005-11-25 09:40:34 EST
Created attachment 121489 [details]
messages file
Comment 3 Stephen Tweedie 2005-11-29 15:28:23 EST
ext3 is a journaled filesystem.  The journal is a limited resource.  If you
completely fill the journal, then no more writes can be scheduled, at all; ext2
has no such bottleneck simply because it has no journal.  

And your test case is the worst-case scenario because you're forcing ext3 to
flush out large amounts of data for each transaction, bottlenecking the
transaction itself on the data queue.  This may not be particularly pleasant but
it's largely as expected in this case.

It is unlikely we're going to do much work to significantly rebalance the
interactions between ext3 and the VM for RHEL-3 at this stage.  Is there a major
problem being caused here?
Comment 4 Bastien Nocera 2005-11-30 04:14:07 EST
About 200 users are logged in via ssh to this machine, running text editors, and
it would hang for between 10 to 30 seconds when snapshots of the database are taken.
Comment 5 Bastien Nocera 2005-11-30 04:15:08 EST
Would increasing the size of the journal help?
Comment 6 Thomas Uebermeier 2005-11-30 04:23:44 EST
the original information, that ext2 is solving the problem is probably wrong. 
Although most tests (including this one) were done on RHEL4, the customer did 
also see a freeze on a ext2 partition - it just came in later. 
Comment 10 Ernie Petrides 2006-12-05 17:51:58 EST
Closing based on last comment.

Note You need to log in before you can comment on or make changes to this bug.