Bug 174182 - journal commit starvation
Summary: journal commit starvation
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.0
Hardware: All Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 170417
TreeView+ depends on / blocked
 
Reported: 2005-11-25 14:38 UTC by Bastien Nocera
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-05 22:51:58 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test.pl (197 bytes, text/plain)
2005-11-25 14:38 UTC, Bastien Nocera
no flags Details
messages file (109.38 KB, text/plain)
2005-11-25 14:40 UTC, Bastien Nocera
no flags Details

Description Bastien Nocera 2005-11-25 14:38:56 UTC
Using the attached testcase, applications using the disk will "freeze" for large
periods of time (tens of seconds).

The same problem doesn't occur when ext2 is the filesystem used.

kernel 2.4.21-37.ELsmp

Full Alt+SysRq+T attached below. Selected samples (the dd is replaced in this
case by the attached test program):

Nov 23 11:55:05 maroon kernel: dd            R current   3776  1275   1274     
               (NOTLB)
Nov 23 11:55:05 maroon kernel: Call Trace:   [<c016787a>] create_buffers
[kernel] 0x6a (0xf6917e24)
Nov 23 11:55:05 maroon kernel: [<f8886532>] ext3_get_block [ext3] 0x52 (0xf6917e38)
Nov 23 11:55:05 maroon kernel: [<c016814b>] __block_prepare_write [kernel] 0x1ab
(0xf6917e5c)
Nov 23 11:55:05 maroon kernel: [<c0168b09>] block_prepare_write [kernel] 0x39
(0xf6917ea0)
Nov 23 11:55:05 maroon kernel: [<f88864e0>] ext3_get_block [ext3] 0x0 (0xf6917eb4)
Nov 23 11:55:05 maroon kernel: [<f8886bb9>] ext3_prepare_write [ext3] 0xc9
(0xf6917ec0)
Nov 23 11:55:05 maroon kernel: [<f88864e0>] ext3_get_block [ext3] 0x0 (0xf6917ed0)
Nov 23 11:55:05 maroon kernel: [<c014c053>] do_generic_file_write [kernel] 0x1e3
(0xf6917ef4)
Nov 23 11:55:05 maroon kernel: [<c014c5bf>] generic_file_write [kernel] 0x13f
(0xf6917f48)
Nov 23 11:55:05 maroon kernel: [<f8883e99>] ext3_file_write [ext3] 0x39 (0xf6917f74)
Nov 23 11:55:05 maroon kernel: [<c0164b27>] sys_write [kernel] 0x97 (0xf6917f94)

Nov 23 11:55:05 maroon kernel: vi            D 00000001  3764  1326   1276     
               (NOTLB)
Nov 23 11:55:05 maroon kernel: Call Trace:   [<c0124a52>] sleep_on [kernel] 0x52
(0xf2679ee8)
Nov 23 11:55:05 maroon kernel: [<f8879be8>] log_wait_commit_Rsmp_4dfe4007 [jbd]
0x68 (0xf2679f18)
Nov 23 11:55:05 maroon kernel: [<f8874bd3>] journal_stop_Rsmp_74af6844 [jbd]
0x193 (0xf2679f34)
Nov 23 11:55:05 maroon kernel: [<f8873445>] journal_start_Rsmp_25661df5 [jbd]
0xa5 (0xf2679f40)
Nov 23 11:55:05 maroon kernel: [<f8874ccc>] journal_force_commit_Rsmp_2a9443c3
[jbd] 0x7c (0xf2679f64)
Nov 23 11:55:05 maroon kernel: [<f888f091>] ext3_force_commit [ext3] 0x51
(0xf2679f70)
Nov 23 11:55:05 maroon kernel: [<f8883fb4>] ext3_sync_file [ext3] 0x84 (0xf2679f7c)
Nov 23 11:55:05 maroon kernel: [<f8887270>] ext3_writepage [ext3] 0x0 (0xf2679f84)
Nov 23 11:55:05 maroon kernel: [<c01667f8>] sys_fsync [kernel] 0x98 (0xf2679f9c)

Comment 1 Bastien Nocera 2005-11-25 14:38:56 UTC
Created attachment 121488 [details]
test.pl

Comment 2 Bastien Nocera 2005-11-25 14:40:34 UTC
Created attachment 121489 [details]
messages file

Comment 3 Stephen Tweedie 2005-11-29 20:28:23 UTC
ext3 is a journaled filesystem.  The journal is a limited resource.  If you
completely fill the journal, then no more writes can be scheduled, at all; ext2
has no such bottleneck simply because it has no journal.  

And your test case is the worst-case scenario because you're forcing ext3 to
flush out large amounts of data for each transaction, bottlenecking the
transaction itself on the data queue.  This may not be particularly pleasant but
it's largely as expected in this case.

It is unlikely we're going to do much work to significantly rebalance the
interactions between ext3 and the VM for RHEL-3 at this stage.  Is there a major
problem being caused here?


Comment 4 Bastien Nocera 2005-11-30 09:14:07 UTC
About 200 users are logged in via ssh to this machine, running text editors, and
it would hang for between 10 to 30 seconds when snapshots of the database are taken.

Comment 5 Bastien Nocera 2005-11-30 09:15:08 UTC
Would increasing the size of the journal help?

Comment 6 Thomas Uebermeier 2005-11-30 09:23:44 UTC
the original information, that ext2 is solving the problem is probably wrong. 
Although most tests (including this one) were done on RHEL4, the customer did 
also see a freeze on a ext2 partition - it just came in later. 

Comment 10 Ernie Petrides 2006-12-05 22:51:58 UTC
Closing based on last comment.


Note You need to log in before you can comment on or make changes to this bug.