Bug 819864

Summary: Frequent hang in file system
Product: Red Hat Enterprise Linux 6 Reporter: Simon Holm <sho>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED CURRENTRELEASE QA Contact: Filesystem QE <fs-qe>
Severity: urgent Docs Contact:
Priority: medium    
Version: 6.2CC: asolanas, esandeen, gea, kzhang, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.32-279.5.2.el6.x86_64 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-18 16:41:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 846704    

Description Simon Holm 2012-05-08 12:15:18 UTC
Description of problem:
Kernel hangs in file system, see stack dump of hung processes below. It doesn't crash the machine but hard reboot is necessary since anything that needs disk access hangs.

Version-Release number of selected component (if applicable):

RHEL 6.2

How reproducible:
Not on demand but happens a few times every week. The machine runs a Netflow collector (nfsen - http://nfsen.sourceforge.net/) that most likely triggers the hang. A backup client also run each night but the crash below is not within the backup window.

May  8 10:38:43 nflow.aau.dk INFO: task scsi_eh_0:403 blocked for more than 120 seconds.
May  8 10:38:43 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:43 nflow.aau.dk  0000000000000000 
May  8 10:38:43 nflow.aau.dk     0   403      2 0x00000000
May  8 10:38:43 nflow.aau.dk  ffff88019561fbb0
May  8 10:38:43 nflow.aau.dk  0000000000000046
May  8 10:38:43 nflow.aau.dk  ffffffff81e2ecc8
May  8 10:38:43 nflow.aau.dk  0000000000000000
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  0000000000000001
May  8 10:38:43 nflow.aau.dk  ffff88000002db00
May  8 10:38:43 nflow.aau.dk  ffff880196718080
May  8 10:38:43 nflow.aau.dk  0000000000008024
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  ffff880196718638
May  8 10:38:43 nflow.aau.dk  ffff88019561ffd8
May  8 10:38:43 nflow.aau.dk  000000000000f4e8
May  8 10:38:43 nflow.aau.dk  ffff880196718638
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  [<ffffffff81069bb5>] ? __call_console_drivers+0x75/0x90
May  8 10:38:43 nflow.aau.dk  [<ffffffff814ed975>] schedule_timeout+0x215/0x2e0
May  8 10:38:43 nflow.aau.dk  [<ffffffff8106a2af>] ? release_console_sem+0x1cf/0x220
May  8 10:38:43 nflow.aau.dk  [<ffffffff814ed5f3>] wait_for_common+0x123/0x180
May  8 10:38:43 nflow.aau.dk  [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
May  8 10:38:43 nflow.aau.dk  [<ffffffffa008faab>] ? enqueue_cmd_and_start_io+0x11b/0x180 [hpsa]
May  8 10:38:43 nflow.aau.dk  [<ffffffff814ed70d>] wait_for_completion+0x1d/0x20
May  8 10:38:43 nflow.aau.dk  [<ffffffffa0091f46>] hpsa_eh_device_reset_handler+0x116/0x3c0 [hpsa]
May  8 10:38:43 nflow.aau.dk  [<ffffffff8135ec91>] scsi_eh_ready_devs+0x231/0x850
May  8 10:38:43 nflow.aau.dk  [<ffffffff8135f943>] scsi_error_handler+0x483/0x660
May  8 10:38:43 nflow.aau.dk  [<ffffffff8135f4c0>] ? scsi_error_handler+0x0/0x660
May  8 10:38:43 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:38:43 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:38:43 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:38:43 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:38:43 nflow.aau.dk INFO: task jbd2/dm-0-8:519 blocked for more than 120 seconds.
May  8 10:38:43 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:43 nflow.aau.dk jbd2/dm-0-8   D
May  8 10:38:43 nflow.aau.dk  0000000000000007 
May  8 10:38:43 nflow.aau.dk     0   519      2 0x00000000
May  8 10:38:43 nflow.aau.dk  ffff8801956b9c20
May  8 10:38:43 nflow.aau.dk  0000000000000046
May  8 10:38:43 nflow.aau.dk  ffff8801956b9be0
May  8 10:38:43 nflow.aau.dk  ffffffffa00041fc
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  0000000000000001
May  8 10:38:43 nflow.aau.dk  ffff8800d15a41c0
May  8 10:38:43 nflow.aau.dk  ffff8801956b9bc0
May  8 10:38:43 nflow.aau.dk  ffff8800d166b680
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  ffff880195695ab8
May  8 10:38:43 nflow.aau.dk  ffff8801956b9fd8
May  8 10:38:43 nflow.aau.dk  000000000000f4e8
May  8 10:38:43 nflow.aau.dk  ffff880195695ab8
May  8 10:38:43 nflow.aau.dk 
May  8 10:38:43 nflow.aau.dk  [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
May  8 10:38:43 nflow.aau.dk  [<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
May  8 10:38:43 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:43 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:43 nflow.aau.dk  [<ffffffff811a9410>] sync_buffer+0x40/0x50
May  8 10:38:43 nflow.aau.dk  [<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
May  8 10:38:43 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:43 nflow.aau.dk  [<ffffffff814edcf8>] out_of_line_wait_on_bit+0x78/0x90
May  8 10:38:43 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a93c6>] __wait_on_buffer+0x26/0x30
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dd0e6>] jbd2_journal_commit_transaction+0xa76/0x14b0 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
May  8 10:38:44 nflow.aau.dk  [<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01e2958>] kjournald2+0xb8/0x220 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01e28a0>] ? kjournald2+0x0/0x220 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:38:44 nflow.aau.dk INFO: task jbd2/dm-2-8:1339 blocked for more than 120 seconds.
May  8 10:38:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:44 nflow.aau.dk jbd2/dm-2-8   D
May  8 10:38:44 nflow.aau.dk  0000000000000007 
May  8 10:38:44 nflow.aau.dk     0  1339      2 0x00000080
May  8 10:38:44 nflow.aau.dk  ffff8801970dfa80
May  8 10:38:44 nflow.aau.dk  0000000000000046
May  8 10:38:44 nflow.aau.dk  0000000000000000
May  8 10:38:44 nflow.aau.dk  ffff8801970dfa70
May  8 10:38:44 nflow.aau.dk 
May  8 10:38:44 nflow.aau.dk  ffff8801970df9f0
May  8 10:38:44 nflow.aau.dk  ffffffff81012b59
May  8 10:38:44 nflow.aau.dk  ffff8801970dfa30
May  8 10:38:44 nflow.aau.dk  0000000000000282
May  8 10:38:44 nflow.aau.dk 
May  8 10:38:44 nflow.aau.dk  ffff880194dafb38
May  8 10:38:44 nflow.aau.dk  ffff8801970dffd8
May  8 10:38:44 nflow.aau.dk  000000000000f4e8
May  8 10:38:44 nflow.aau.dk  ffff880194dafb38
May  8 10:38:44 nflow.aau.dk 
May  8 10:38:44 nflow.aau.dk  [<ffffffff81012b59>] ? read_tsc+0x9/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110ac0>] ? sync_page+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110afd>] sync_page+0x3d/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110cb3>] wait_on_page_bit+0x73/0x80
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff81127155>] ? pagevec_lookup_tag+0x25/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffff81125f8b>] write_cache_pages+0x17b/0x4a0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81124c30>] ? __writepage+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffff811262d4>] generic_writepages+0x24/0x30
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc4d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc9e5>] jbd2_journal_commit_transaction+0x375/0x14b0 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
May  8 10:38:44 nflow.aau.dk  [<ffffffff8107bf8c>] ? lock_timer_base+0x3c/0x70
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01e2958>] kjournald2+0xb8/0x220 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01e28a0>] ? kjournald2+0x0/0x220 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:38:44 nflow.aau.dk INFO: task flush-253:2:1669 blocked for more than 120 seconds.
May  8 10:38:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:44 nflow.aau.dk flush-253: 2   D 0000000000000000     0  1669      2 0x00000080
May  8 10:38:44 nflow.aau.dk  ffff8801998178d0 0000000000000046 0000000000000000 ffff880199817880
May  8 10:38:44 nflow.aau.dk  0000000000000000 ffff88019ac123f0 ffff8801996a7050 0000000000000286
May  8 10:38:44 nflow.aau.dk  ffff880195b70638 ffff880199817fd8 000000000000f4e8 ffff880195b70638
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110ac0>] ? sync_page+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110afd>] sync_page+0x3d/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110a97>] __lock_page+0x67/0x70
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff81127155>] ? pagevec_lookup_tag+0x25/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01f8a16>] ext4_num_dirty_pages+0x256/0x260 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fd1d4>] ext4_da_writepages+0x594/0x660 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffff8126a509>] ? cpumask_next_and+0x29/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff810550b4>] ? find_busiest_group+0x244/0xb20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81126301>] do_writepages+0x21/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a041d>] writeback_single_inode+0xdd/0x2c0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a085e>] writeback_sb_inodes+0xce/0x180
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a09bb>] writeback_inodes_wb+0xab/0x1b0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a0d5b>] wb_writeback+0x29b/0x3f0
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ecb0e>] ? thread_return+0x4e/0x760
May  8 10:38:44 nflow.aau.dk  [<ffffffff8107caa2>] ? del_timer_sync+0x22/0x30
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a1049>] wb_do_writeback+0x199/0x240
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a1153>] bdi_writeback_task+0x63/0x1b0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090957>] ? bit_waitqueue+0x17/0xd0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81134cf0>] ? bdi_start_fn+0x0/0x100
May  8 10:38:44 nflow.aau.dk  [<ffffffff81134d76>] bdi_start_fn+0x86/0x100
May  8 10:38:44 nflow.aau.dk  [<ffffffff81134cf0>] ? bdi_start_fn+0x0/0x100
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:38:44 nflow.aau.dk INFO: task rsyslogd:1782 blocked for more than 120 seconds.
May  8 10:38:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:44 nflow.aau.dk  ffff8801975bdac8 0000000000000082 0000000000000000 ffffffff811b2007
May  8 10:38:44 nflow.aau.dk  ffff8801975bdb18 ffffffff811b2905 ffffffffa01fbcf0 0000000000000282
May  8 10:38:44 nflow.aau.dk  ffff880195583b38 ffff8801975bdfd8 000000000000f4e8 ffff880195583b38
May  8 10:38:44 nflow.aau.dk  [<ffffffff811b2007>] ? mpage_bio_submit+0x27/0x30
May  8 10:38:44 nflow.aau.dk  [<ffffffff811b2905>] ? mpage_readpages+0x115/0x130
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fbcf0>] ? ext4_get_block+0x0/0x120 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110ac0>] ? sync_page+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110afd>] sync_page+0x3d/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110cb3>] wait_on_page_bit+0x73/0x80
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff81110d5a>] __lock_page_or_retry+0x3a/0x60
May  8 10:38:44 nflow.aau.dk  [<ffffffff81111eef>] filemap_fault+0x2df/0x500
May  8 10:38:44 nflow.aau.dk  [<ffffffff8113b274>] __do_fault+0x54/0x510
May  8 10:38:44 nflow.aau.dk  [<ffffffff810566a3>] ? perf_event_task_sched_out+0x33/0x80
May  8 10:38:44 nflow.aau.dk  [<ffffffff8113b827>] handle_pte_fault+0xf7/0xb50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ecb0e>] ? thread_return+0x4e/0x760
May  8 10:38:44 nflow.aau.dk  [<ffffffff8113c464>] handle_mm_fault+0x1e4/0x2b0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81042b79>] __do_page_fault+0x139/0x480
May  8 10:38:44 nflow.aau.dk  [<ffffffff811daa35>] ? pde_users_dec+0x25/0x60
May  8 10:38:44 nflow.aau.dk  [<ffffffff814f253e>] do_page_fault+0x3e/0xa0
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ef8f5>] page_fault+0x25/0x30
May  8 10:38:44 nflow.aau.dk INFO: task master:2246 blocked for more than 120 seconds.
May  8 10:38:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:44 nflow.aau.dk  ffff880197b33958 0000000000000086 0000000000000000 ffffffffa00041fc
May  8 10:38:44 nflow.aau.dk  0000000000000000 ffff88002dd15fe8 ffff88002dd15f80 0000000000000000
May  8 10:38:44 nflow.aau.dk  ffff880194b186b8 ffff880197b33fd8 000000000000f4e8 ffff880194b186b8
May  8 10:38:44 nflow.aau.dk  [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a9410>] sync_buffer+0x40/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edbd8>] out_of_line_wait_on_bit_lock+0x78/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a95b6>] __lock_buffer+0x36/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc293>] do_get_write_access+0x493/0x520 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa021eb78>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa253>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa2cc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa5c0>] ext4_dirty_inode+0x40/0x60 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffff8119fdfb>] __mark_inode_dirty+0x3b/0x160
May  8 10:38:44 nflow.aau.dk  [<ffffffff81190372>] file_update_time+0xf2/0x170
May  8 10:38:44 nflow.aau.dk  [<ffffffff811800b2>] pipe_write+0x2d2/0x650
May  8 10:38:44 nflow.aau.dk  [<ffffffff8117628a>] do_sync_write+0xfa/0x140
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffff8117b664>] ? cp_new_stat+0xe4/0x100
May  8 10:38:44 nflow.aau.dk  [<ffffffff81218e3b>] ? selinux_file_permission+0xfb/0x150
May  8 10:38:44 nflow.aau.dk  [<ffffffff8120c1e6>] ? security_file_permission+0x16/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81176588>] vfs_write+0xb8/0x1a0
May  8 10:38:44 nflow.aau.dk  [<ffffffff810d4692>] ? audit_syscall_entry+0x272/0x2a0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81176f91>] sys_write+0x51/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
May  8 10:38:44 nflow.aau.dk INFO: task pickup:11985 blocked for more than 120 seconds.
May  8 10:38:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:38:44 nflow.aau.dk  ffff880199bdf968 0000000000000082 0000000000000000 ffffffffa00041fc
May  8 10:38:44 nflow.aau.dk  0000000000000002 ffff88000002db00 ffff880196718ac0 0000000000000450
May  8 10:38:44 nflow.aau.dk  ffff880196719078 ffff880199bdffd8 000000000000f4e8 ffff880196719078
May  8 10:38:44 nflow.aau.dk  [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a9410>] sync_buffer+0x40/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81061c95>] ? enqueue_entity+0x125/0x420
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff814edbd8>] out_of_line_wait_on_bit_lock+0x78/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:38:44 nflow.aau.dk  [<ffffffff811a95b6>] __lock_buffer+0x36/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc293>] do_get_write_access+0x493/0x520 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffff8115f80b>] ? __kmalloc_node+0x7b/0x100
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01dc471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa021eb78>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa253>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa2cc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffff811b9f6b>] ? ep_poll_callback+0xbb/0xf0
May  8 10:38:44 nflow.aau.dk  [<ffffffffa01fa5c0>] ext4_dirty_inode+0x40/0x60 [ext4]
May  8 10:38:44 nflow.aau.dk  [<ffffffff8119fdfb>] __mark_inode_dirty+0x3b/0x160
May  8 10:38:44 nflow.aau.dk  [<ffffffff8119051d>] touch_atime+0x12d/0x170
May  8 10:38:44 nflow.aau.dk  [<ffffffff81180705>] pipe_read+0x2d5/0x4e0
May  8 10:38:44 nflow.aau.dk  [<ffffffff811763ca>] do_sync_read+0xfa/0x140
May  8 10:38:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:38:44 nflow.aau.dk  [<ffffffff81218dff>] ? selinux_file_permission+0xbf/0x150
May  8 10:38:44 nflow.aau.dk  [<ffffffff8120c1e6>] ? security_file_permission+0x16/0x20
May  8 10:38:44 nflow.aau.dk  [<ffffffff81176dc5>] vfs_read+0xb5/0x1a0
May  8 10:38:44 nflow.aau.dk  [<ffffffff810d4692>] ? audit_syscall_entry+0x272/0x2a0
May  8 10:38:44 nflow.aau.dk  [<ffffffff81176f01>] sys_read+0x51/0x90
May  8 10:38:44 nflow.aau.dk  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
May  8 10:40:44 nflow.aau.dk INFO: task scsi_eh_0:403 blocked for more than 120 seconds.
May  8 10:40:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:40:44 nflow.aau.dk  0000000000000000 
May  8 10:40:44 nflow.aau.dk     0   403      2 0x00000000
May  8 10:40:44 nflow.aau.dk  ffff88019561fbb0
May  8 10:40:44 nflow.aau.dk  0000000000000046
May  8 10:40:44 nflow.aau.dk  ffffffff81e2ecc8
May  8 10:40:44 nflow.aau.dk  0000000000000000
May  8 10:40:44 nflow.aau.dk 
May  8 10:40:44 nflow.aau.dk  0000000000000001
May  8 10:40:44 nflow.aau.dk  ffff88000002db00
May  8 10:40:44 nflow.aau.dk  ffff880196718080
May  8 10:40:44 nflow.aau.dk  0000000000008024
May  8 10:40:44 nflow.aau.dk 
May  8 10:40:44 nflow.aau.dk  ffff880196718638
May  8 10:40:44 nflow.aau.dk  ffff88019561ffd8
May  8 10:40:44 nflow.aau.dk  000000000000f4e8
May  8 10:40:44 nflow.aau.dk  ffff880196718638
May  8 10:40:44 nflow.aau.dk 
May  8 10:40:44 nflow.aau.dk  [<ffffffff81069bb5>] ? __call_console_drivers+0x75/0x90
May  8 10:40:44 nflow.aau.dk  [<ffffffff814ed975>] schedule_timeout+0x215/0x2e0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8106a2af>] ? release_console_sem+0x1cf/0x220
May  8 10:40:44 nflow.aau.dk  [<ffffffff814ed5f3>] wait_for_common+0x123/0x180
May  8 10:40:44 nflow.aau.dk  [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffffa008faab>] ? enqueue_cmd_and_start_io+0x11b/0x180 [hpsa]
May  8 10:40:44 nflow.aau.dk  [<ffffffff814ed70d>] wait_for_completion+0x1d/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffffa0091f46>] hpsa_eh_device_reset_handler+0x116/0x3c0 [hpsa]
May  8 10:40:44 nflow.aau.dk  [<ffffffff8135ec91>] scsi_eh_ready_devs+0x231/0x850
May  8 10:40:44 nflow.aau.dk  [<ffffffff8135f943>] scsi_error_handler+0x483/0x660
May  8 10:40:44 nflow.aau.dk  [<ffffffff8135f4c0>] ? scsi_error_handler+0x0/0x660
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:40:44 nflow.aau.dk INFO: task jbd2/dm-0-8:519 blocked for more than 120 seconds.
May  8 10:40:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:40:44 nflow.aau.dk jbd2/dm-0-8   D 0000000000000007     0   519      2 0x00000000
May  8 10:40:44 nflow.aau.dk  ffff8801956b9c20 0000000000000046 ffff8801956b9be0 ffffffffa00041fc
May  8 10:40:44 nflow.aau.dk  0000000000000001 ffff8800d15a41c0 ffff8801956b9bc0 ffff8800d166b680
May  8 10:40:44 nflow.aau.dk  ffff880195695ab8 ffff8801956b9fd8 000000000000f4e8 ffff880195695ab8
May  8 10:40:44 nflow.aau.dk  [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
May  8 10:40:44 nflow.aau.dk  [<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
May  8 10:40:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:40:44 nflow.aau.dk  [<ffffffff811a9410>] sync_buffer+0x40/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
May  8 10:40:44 nflow.aau.dk  [<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff814edcf8>] out_of_line_wait_on_bit+0x78/0x90
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff811a93c6>] __wait_on_buffer+0x26/0x30
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01dd0e6>] jbd2_journal_commit_transaction+0xa76/0x14b0 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
May  8 10:40:44 nflow.aau.dk  [<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01e2958>] kjournald2+0xb8/0x220 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01e28a0>] ? kjournald2+0x0/0x220 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20
May  8 10:40:44 nflow.aau.dk INFO: task jbd2/dm-2-8:1339 blocked for more than 120 seconds.
May  8 10:40:44 nflow.aau.dk "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:40:44 nflow.aau.dk jbd2/dm-2-8   D 0000000000000007     0  1339      2 0x00000080
May  8 10:40:44 nflow.aau.dk  ffff8801970dfa80 0000000000000046 0000000000000000 ffff8801970dfa70
May  8 10:40:44 nflow.aau.dk  ffff8801970df9f0 ffffffff81012b59 ffff8801970dfa30 0000000000000282
May  8 10:40:44 nflow.aau.dk  ffff880194dafb38 ffff8801970dffd8 000000000000f4e8 ffff880194dafb38
May  8 10:40:44 nflow.aau.dk  [<ffffffff81012b59>] ? read_tsc+0x9/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
May  8 10:40:44 nflow.aau.dk  [<ffffffff81110ac0>] ? sync_page+0x0/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff814ed293>] io_schedule+0x73/0xc0
May  8 10:40:44 nflow.aau.dk  [<ffffffff81110afd>] sync_page+0x3d/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
May  8 10:40:44 nflow.aau.dk  [<ffffffff81110cb3>] wait_on_page_bit+0x73/0x80
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
May  8 10:40:44 nflow.aau.dk  [<ffffffff81127155>] ? pagevec_lookup_tag+0x25/0x40
May  8 10:40:44 nflow.aau.dk  [<ffffffff81125f8b>] write_cache_pages+0x17b/0x4a0
May  8 10:40:44 nflow.aau.dk  [<ffffffff81124c30>] ? __writepage+0x0/0x40
May  8 10:40:44 nflow.aau.dk  [<ffffffff811262d4>] generic_writepages+0x24/0x30
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01dc4d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01dc9e5>] jbd2_journal_commit_transaction+0x375/0x14b0 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
May  8 10:40:44 nflow.aau.dk  [<ffffffff8107bf8c>] ? lock_timer_base+0x3c/0x70
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01e2958>] kjournald2+0xb8/0x220 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
May  8 10:40:44 nflow.aau.dk  [<ffffffffa01e28a0>] ? kjournald2+0x0/0x220 [jbd2]
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090726>] kthread+0x96/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c14a>] child_rip+0xa/0x20
May  8 10:40:44 nflow.aau.dk  [<ffffffff81090690>] ? kthread+0x0/0xa0
May  8 10:40:44 nflow.aau.dk  [<ffffffff8100c140>] ? child_rip+0x0/0x20

Comment 2 Ric Wheeler 2012-05-08 15:17:37 UTC
Hi Simon,

Can you please open a support call with Red Hat support? They will help you debug and gather information.

Best regards,

Ric

Comment 3 Simon Holm 2012-05-08 16:03:36 UTC
Hi Ric

I'm unsure which kind of support contract we have. We have the OS subscription of course, so if that qualifies I'll call right away tomorrow, but otherwise I'll have to check our contracts.


Simon

Comment 4 RHEL Program Management 2012-05-12 04:13:50 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Eric Sandeen 2012-05-18 17:00:27 UTC
From a quick look, everything is stuck in io_schedule.

The first bit pasted above:

[<ffffffffa0091f46>] hpsa_eh_device_reset_handler+0x116/0x3c0 [hpsa]
[<ffffffff8135ec91>] scsi_eh_ready_devs+0x231/0x850
[<ffffffff8135f943>] scsi_error_handler+0x483/0x660

makes me wonder if you have storage problems.  Any scsi errors in the logs from around this time?

Comment 7 Simon Holm 2012-05-21 09:15:41 UTC
Nothing in dmesg at all between boot up and the hang(s). Do you suggest adding som e kind of SCSI debug?

The disk consists of two physical disks locally attached in HW RAID-1.

Comment 8 Gerardo Arceri 2012-11-08 18:09:37 UTC
Could this be related to the following HP Advisory ?
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03555882

"HP ProLiant Servers: HP Smart Array Controllers - DRIVER UPGRADE REQUIRED to Prevent a Read Only Linux File System or an Unresponsive Server When a Target Reset Is Issued by the Smart Array Controller Driver for Linux (hpsa)"

Comment 9 RHEL Program Management 2012-12-14 07:12:35 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 10 Ric Wheeler 2012-12-14 22:35:45 UTC
Is this still an issue? Can you reproduce this on RHEL6.3 (or 6.4 when it comes out)?

Comment 11 Simon Holm 2012-12-18 14:45:06 UTC
I just reviewed status and actually the problem seems to be fixed somewhere between kernel 2.6.32-220.7.1 and 2.6.32-279.5.2.el6.x86_64 (now on RHEL 6.3). We've used the latter for 103 days without a crash.

We consider it as fixed.

Comment 12 Ric Wheeler 2012-12-18 16:41:29 UTC
Great, thanks for testing and letting us know!