Bug 773377

Summary: Long EXT4 lockups and errors
Product: Red Hat Enterprise Linux 6 Reporter: ilya m. <imusayev>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: dchinner, esandeen, jbacik, lczerner, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-15 13:09:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description ilya m. 2012-01-11 17:26:40 UTC
Description of problem:

While in process of bench-marking various file systems, we came across an issue where EXT4 partition would lock up under specific load. In current tests it was under iozone testing.

A repetitive message i'd see in messages file is attached:

Jan  9 23:38:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more than 120 seconds.
Jan  9 23:38:51 minigiant5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  9 23:38:51 minigiant5 kernel: jbd2/md0-8    D 0000000000000005     0 16169      2 0x00000000
Jan  9 23:38:51 minigiant5 kernel: ffff88010153bc20 0000000000000046 0000000000000000 ffff88010153bbe4
Jan  9 23:38:51 minigiant5 kernel: 0000000000000000 ffff88012dc24c00 ffff880028215f80 0000000000000400
Jan  9 23:38:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8 000000000000f4e8 ffff880108f230f8
Jan  9 23:38:51 minigiant5 kernel: Call Trace:
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff814edd08>] out_of_line_wait_on_bit+0x78/0x90
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811a9236>] __wait_on_buffer+0x26/0x30
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa022a0e6>] jbd2_journal_commit_transaction+0xa76/0x14b0 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ? try_to_del_timer_sync+0x7b/0xe0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan  9 23:38:51 minigiant5 kernel: INFO: task iozone:17280 blocked for more than 120 seconds.
Jan  9 23:38:51 minigiant5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  9 23:38:51 minigiant5 kernel: iozone        D 0000000000000000     0 17280  16178 0x00000000
Jan  9 23:38:51 minigiant5 kernel: ffff880127e779a8 0000000000000086 ffff880127e77998 00000000ffffffff
Jan  9 23:38:51 minigiant5 kernel: ffffffff81ea1100 0000000000000286 ffff880127e77958 ffffffff8107c99b
Jan  9 23:38:51 minigiant5 kernel: ffff8800b9071b38 ffff880127e77fd8 000000000000f4e8 ffff8800b9071b38
Jan  9 23:38:51 minigiant5 kernel: Call Trace:
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ? try_to_del_timer_sync+0x7b/0xe0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090cfe>] ? prepare_to_wait+0x4e/0x80
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa022909d>] do_get_write_access+0x29d/0x520 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0229471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa027db78>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0259253>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa02592cc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa02595c0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8119fc6b>] __mark_inode_dirty+0x3b/0x160
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811901e2>] file_update_time+0xf2/0x170
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81112b50>] __generic_file_aio_write+0x220/0x480
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8105e770>] ? default_wake_function+0x0/0x20
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81112e1f>] generic_file_aio_write+0x6f/0xe0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0252de1>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811b4ccd>] ? fsnotify_add_notify_event+0x12d/0x280
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811760fa>] do_sync_write+0xfa/0x140
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811b4613>] ? fsnotify+0x113/0x160
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81218e2b>] ? selinux_file_permission+0xfb/0x150
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8120c1d6>] ? security_file_permission+0x16/0x20
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811763f8>] vfs_write+0xb8/0x1a0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81176e01>] sys_write+0x51/0x90
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81176753>] ? sys_lseek+0x53/0x80
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Jan  9 23:38:51 minigiant5 kernel: INFO: task iozone:17282 blocked for more than 120 seconds.
Jan  9 23:38:51 minigiant5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  9 23:38:51 minigiant5 kernel: iozone        D 0000000000000000     0 17282  16178 0x00000000
Jan  9 23:38:51 minigiant5 kernel: ffff880108fe59a8 0000000000000082 0000000000000000 00000000ffffffff
Jan  9 23:38:51 minigiant5 kernel: ffffffff81ea1100 0000000000000286 ffff880108fe5958 ffffffff8107c99b
Jan  9 23:38:51 minigiant5 kernel: ffff880128e45078 ffff880108fe5fd8 000000000000f4e8 ffff880128e45078
Jan  9 23:38:51 minigiant5 kernel: Call Trace:
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ? try_to_del_timer_sync+0x7b/0xe0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa022909d>] do_get_write_access+0x29d/0x520 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0229471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa027db78>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0259253>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa02592cc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa02595c0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8119fc6b>] __mark_inode_dirty+0x3b/0x160
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811901e2>] file_update_time+0xf2/0x170
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81112b50>] __generic_file_aio_write+0x220/0x480
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8105e770>] ? default_wake_function+0x0/0x20
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81112e1f>] generic_file_aio_write+0x6f/0xe0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffffa0252de1>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811b4ccd>] ? fsnotify_add_notify_event+0x12d/0x280
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811760fa>] do_sync_write+0xfa/0x140
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811b4613>] ? fsnotify+0x113/0x160
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81218e2b>] ? selinux_file_permission+0xfb/0x150
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8120c1d6>] ? security_file_permission+0x16/0x20
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff811763f8>] vfs_write+0xb8/0x1a0
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff81176e01>] sys_write+0x51/0x90
Jan  9 23:38:51 minigiant5 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Jan 10 00:06:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more than 120 seconds.
Jan 10 00:06:51 minigiant5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 00:06:51 minigiant5 kernel: jbd2/md0-8    D 0000000000000005     0 16169      2 0x00000000
Jan 10 00:06:51 minigiant5 kernel: ffff88010153bc20 0000000000000046 ffff88010153bbe8 ffff88010153bbe4
Jan 10 00:06:51 minigiant5 kernel: 0000000000000001 ffff88012dc24c00 ffff8800282f5f80 0000000000000400
Jan 10 00:06:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8 000000000000f4e8 ffff880108f230f8
Jan 10 00:06:51 minigiant5 kernel: Call Trace:
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814edd08>] out_of_line_wait_on_bit+0x78/0x90
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9236>] __wait_on_buffer+0x26/0x30
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022a7a1>] jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8107c99b>] ? try_to_del_timer_sync+0x7b/0xe0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220 [jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220 [jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan 10 00:16:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more than 120 seconds.
Jan 10 00:16:51 minigiant5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 00:16:51 minigiant5 kernel: jbd2/md0-8    D 0000000000000005     0 16169      2 0x00000000
Jan 10 00:16:51 minigiant5 kernel: ffff88010153bc20 0000000000000046 ffff88010153bbe8 ffff88010153bbe4
Jan 10 00:16:51 minigiant5 kernel: 0000000000000001 ffff88012dc24c00 ffff880028235f80 0000000000000400
Jan 10 00:16:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8 000000000000f4e8 ffff880108f230f8
Jan 10 00:16:51 minigiant5 kernel: Call Trace:
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814edd08>] out_of_line_wait_on_bit+0x78/0x90
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9236>] __wait_on_buffer+0x26/0x30
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022a7a1>] jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8107c99b>] ? try_to_del_timer_sync+0x7b/0xe0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220 [jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220 [jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
 


Version-Release number of selected component (if applicable):

[root@minigiant5 ~]# uname -a
Linux minigiant5 2.6.32-220.2.1.el6.x86_64 #1 SMP Tue Dec 13 16:21:34 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@minigiant5 ~]# dmsetup version
Library version:   1.02.66-RHEL6 (2011-10-12)
Driver version:    4.22.6

2 x 1 TB SATA drives with software RAID 1.
scsi-SATA_SAMSUNG_HD103SJS246J9KB510183
scsi-SATA_SAMSUNG_HD103SJS246J9KB510187

System Information
        Manufacturer: Dell Inc.
        Product Name: OptiPlex 990

RAM: 4GB
CPU: Quad Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz with HT

How reproducible:
Always

Steps to Reproduce:

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext4 /dev/md0
mount /dev/md0 /storage/ -o noatime	
iozone -R -l 5 -u 5 -r 4k -s 5000m -F /storage/f1 /storage/f2 /storage/f3 /storage/f4 /storage/f5

  
Actual results:

Process takes much longer to complete due to EXT4 FS lockups

Expected results:

The partition along with the process should not lock due to FS issues


Additional info:

It takes a to run while but eventually locks up the partition.

Comment 1 ilya m. 2012-01-11 17:32:50 UTC
Forced FSCK had no errors.

Comment 3 Ondrej Vasik 2012-01-11 18:31:58 UTC
Package filesystem contains just basic system directories layout. It has nothing to do with ext4... reassigning to kernel.

Comment 4 Ric Wheeler 2012-01-12 07:01:11 UTC
Ilya, please open a support ticket with Red Hat support - they can help you gather information and debug the issue.

Best regards,

Ric

Comment 5 ilya m. 2012-01-12 23:08:27 UTC
FYI: This issue does not occur on XFS fs. The IO Scheduler is "noop".

Comment 7 RHEL Program Management 2012-05-03 05:27:27 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.