Hide Forgot
Description of problem: We have a workload only set/get extended attribute to inode as inline, all file size is zero. After running the test workload for around 1 hour, we observe kernel oops. Version-Release number of selected component (if applicable): RHEL6.1 GA kernel: 2.6.32-131.0.15 How reproducible: We reproduced the error in 2.6.32-71.29.1, 2.6.32-131.0.15, both in Ext3 and Ext4 with data=ordered journal. We suspect this may be a per-bdi writeback related issue. Steps to Reproduce: 1. checkout the test code: svn checkout http://code.taobao.org/svn/dirbench 2. compile the test code cd dirbench/trunk/meta_test/press/set_vs_get make 3. format Ext4 or Ext3 file system with default options 4. mount Ext4 or Ext3 with data=ordered mode journal. 5. run dirbench/trunk/meta_test/press/set_vs_get/run.sh 6, check dmesg, around 1 hour, oops can be observed. Actual results: The following oops info is observed: May 25 18:50:48 core097002 kernel: [ 4680.778040] INFO: task kjournald:172 blocked for more than 120 seconds. May 25 18:51:47 core097002 snmpd[2666]: Connection from UDP: [172.23.14.66]:33299 May 25 18:57:45 core097002 kernel: [ 4680.778181] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 25 18:57:45 core097002 kernel: [ 4680.778297] kjournald D 0000000000000000 0 172 2 0x00000000 May 25 18:57:45 core097002 kernel: [ 4680.778304] ffff880224efdd40 0000000000000046 0000000000000000 0000000000016980 May 25 18:57:45 core097002 kernel: [ 4680.778454] ffff880224e84738 ffff880224e84240 ffff880224e84180 ffff880226de8240 May 25 18:57:45 core097002 snmpd[2666]: Received SNMP packet(s) from UDP: [172.23.14.66]:33299 May 25 18:57:45 core097002 kernel: [ 4680.778603] ffff880224e84738 00000001003fda2c 00000000249100e0 0000000000000000 May 25 18:57:45 core097002 kernel: [ 4680.778753] Call Trace: May 25 18:57:45 core097002 kernel: [ 4680.778818] [<ffffffffa000a49c>] journal_commit_transaction+0x17c/0x1070 [jbd] May 25 18:57:45 core097002 kernel: [ 4680.778938] [<ffffffff81059742>] ? finish_task_switch+0x42/0xe0 May 25 18:57:46 core097002 kernel: [ 4680.779037] [<ffffffff8108d520>] ? autoremove_wake_function+0x0/0x40 May 25 18:57:46 core097002 kernel: [ 4680.779145] [<ffffffff810788b9>] ? try_to_del_timer_sync+0x49/0xe0 May 25 18:57:46 core097002 kernel: [ 4680.779255] [<ffffffffa000e0b2>] kjournald+0xe2/0x250 [jbd] May 25 18:57:46 core097002 kernel: [ 4680.779353] [<ffffffff8108d520>] ? autoremove_wake_function+0x0/0x40 May 25 18:57:47 core097002 kernel: [ 4680.779451] [<ffffffffa000dfd0>] ? kjournald+0x0/0x250 [jbd] May 25 18:57:49 core097002 kernel: [ 4680.779535] [<ffffffff8108d3a6>] kthread+0x96/0xa0 May 25 18:57:49 core097002 kernel: [ 4680.779621] [<ffffffff8101408a>] child_rip+0xa/0x20 May 25 18:57:49 core097002 kernel: [ 4680.779710] [<ffffffff8108d310>] ? kthread+0x0/0xa0 May 25 18:57:49 core097002 kernel: [ 4680.779797] [<ffffffff81014080>] ? child_rip+0x0/0x20 May 25 18:57:49 core097002 kernel: [ 4680.779881] INFO: task syslogd:2430 blocked for more than 120 seconds. May 25 18:57:49 core097002 kernel: [ 4680.779980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 25 18:57:49 core097002 kernel: [ 4680.780105] syslogd D 0000000000000000 0 2430 1 0x00000004 May 25 18:57:49 core097002 kernel: [ 4680.780110] ffff880221ae98c8 0000000000000082 0000000000000000 0000000000016980 May 25 18:57:50 core097002 kernel: [ 4680.780260] ffff88022505ec38 ffff88022505e740 ffff88022505e680 ffff880226e1e340 May 25 18:57:50 core097002 kernel: [ 4680.780408] ffff88022505ec38 00000001003fd7ce 0000000023d91340 0000000000000000 May 25 18:57:50 core097002 kernel: [ 4680.780557] Call Trace: May 25 18:57:50 core097002 kernel: [ 4680.780600] [<ffffffff814ae453>] io_schedule+0x73/0xc0 May 25 18:57:50 core097002 kernel: [ 4680.780685] [<ffffffff8118f925>] sync_buffer+0x45/0x50 May 25 18:57:50 core097002 kernel: [ 4680.780769] [<ffffffff814aeb3a>] __wait_on_bit_lock+0x5a/0xb0 May 25 18:57:50 core097002 kernel: [ 4680.780872] [<ffffffff8118f8e0>] ? sync_buffer+0x0/0x50 May 25 18:57:50 core097002 kernel: [ 4680.780960] [<ffffffff8118f8e0>] ? sync_buffer+0x0/0x50 May 25 18:57:50 core097002 kernel: [ 4680.781046] [<ffffffff814aec0d>] out_of_line_wait_on_bit_lock+0x7d/0x90 May 25 18:57:51 core097002 kernel: [ 4680.781161] [<ffffffff8108d560>] ? wake_bit_function+0x0/0x40 May 25 18:57:51 core097002 kernel: [ 4680.781252] [<ffffffff8118fb1c>] __lock_buffer+0x2c/0x30 May 25 18:57:51 core097002 kernel: [ 4680.781345] [<ffffffffa0009c2f>] do_get_write_access+0x3bf/0x480 [jbd] May 25 18:57:51 core097002 kernel: [ 4680.781449] [<ffffffff811904ce>] ? __getblk+0x2e/0x240 May 25 18:57:51 core097002 kernel: [ 4680.781535] [<ffffffffa0009e81>] journal_get_write_access+0x31/0x50 [jbd] May 25 18:57:51 core097002 kernel: [ 4680.781656] [<ffffffffa0031cd4>] __ext3_journal_get_write_access+0x34/0x70 [ext3] May 25 18:57:51 core097002 kernel: [ 4680.781784] [<ffffffffa0023ab3>] ext3_reserve_inode_write+0x83/0xb0 [ext3] May 25 18:57:51 core097002 kernel: [ 4680.781904] [<ffffffffa0023b16>] ext3_mark_inode_dirty+0x36/0x60 [ext3] May 25 18:57:51 core097002 snmpd[2666]: Connection from UDP: [172.23.14.66]:33299 May 25 18:57:51 core097002 kernel: [ 4680.782014] [<ffffffffa0023cf3>] ext3_dirty_inode+0x93/0xa0 [ext3] May 25 18:57:52 core097002 kernel: [ 4680.782151] [<ffffffff811893da>] __mark_inode_dirty+0x3a/0x170 May 25 18:57:52 core097002 kernel: [ 4680.782246] [<ffffffff8117d492>] ? mnt_clone_write+0x12/0x40 May 25 18:57:52 core097002 snmpd[2666]: Connection from UDP: [172.23.14.66]:35344 May 25 18:57:52 core097002 kernel: [ 4680.782330] [<ffffffff8117a39f>] file_update_time+0xef/0x190 May 25 18:57:52 core097002 snmpd[2666]: Received SNMP packet(s) from UDP: [172.23.14.66]:35344 May 25 18:57:52 core097002 kernel: [ 4680.782416] [<ffffffff81105f3b>] __generic_file_aio_write+0x23b/0x420 May 25 18:57:52 core097002 snmpd[2666]: Connection from UDP: [172.23.14.66]:35344 May 25 18:57:52 core097002 kernel: [ 4680.782515] [<ffffffff8110617f>] generic_file_aio_write+0x5f/0xc0 Expected results: No oops info.
Not an oops, but a hung task. kjournald is stuck... Coly, can you please retest with http://patchwork.ozlabs.org/patch/94270/ and/or http://patchwork.ozlabs.org/patch/94254/ ? Thanks, -Eric
Hi Eric, We have merged these two patches in our kernel package for days, current reported issue is reported with kernel merged these 2 patches. The current workload is not common IMHO, all I/O are set/get extended attribute caused, the application stores key-value info in ext4 inline EA eare, with journal enabled. Thanks for the reply :-) Coly
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/