Bug 735768 - kernel BUG at fs/jbd2/commit.c:353 or fs/jbd/commit.c:319 hitting J_ASSERT(journal->j_running_transaction != NULL) in journal_commit_transaction
kernel BUG at fs/jbd2/commit.c:353 or fs/jbd/commit.c:319 hitting J_ASSERT(jo...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.1
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Dave Wysochanski
Eryu Guan
: ZStream
Depends On:
Blocks: 1066323 1066325
  Show dependency treegraph
 
Reported: 2011-09-05 06:22 EDT by Stefan Sakalik
Modified: 2014-08-29 00:19 EDT (History)
15 users (show)

See Also:
Fixed In Version: kernel-2.6.32-304.el6
Doc Type: Bug Fix
Doc Text:
A bug in the journaling block device (jbd and jbd2) code could, under certain circumstances, trigger a BUG_ON() assertion and result in a kernel oops. This happened when an application performed an extensive number of commits to the journal of the ext3 file system and there was no currently active transaction while synchronizing the file's in-core state. This problem has been resolved by correcting respective test conditions in the jbd and jbd2 code.
Story Points: ---
Clone Of:
: 980268 (view as bug list)
Environment:
Last Closed: 2013-02-21 00:54:27 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
3.0.1 lsmod, lspci and kernel .config (178.38 KB, application/octet-stream)
2011-09-05 06:22 EDT, Stefan Sakalik
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 184473 None None None 2012-08-10 10:33:15 EDT

  None (edit)
Description Stefan Sakalik 2011-09-05 06:22:00 EDT
Created attachment 521471 [details]
3.0.1 lsmod, lspci and kernel .config

Description of problem:

We have users' directories on external IBM DS4000 disk array mounted as ext4 to /home (2.3TB of 2.7TB used). There is
some concurrent activity, users are reading mail with mutt and imap. There is also postfix, some spamfilters, nfs
and samba server.

After some time (1-3 days, depending on load), kernel panics with the following message:

kernel:------------[ cut here ]------------

Message from syslogd@anxur at Aug 29 09:24:14 ...
 kernel:invalid opcode: 0000 [#1] SMP

Message from syslogd@anxur at Aug 29 09:24:14 ...
 kernel:last sysfs file: /sys/module/lockd/initstate

Message from syslogd@anxur at Aug 29 09:24:14 ...
 kernel:Stack:

Message from syslogd@anxur at Aug 29 09:24:14 ...
 kernel:Call Trace:

Message from syslogd@anxur at Aug 29 09:24:14 ...
 kernel:Code: 89 ef e8 da 53 00 00 e9 79 f8 ff ff 4c 89 ef e8 cd 53 00 00 e9 1b f2 ff ff be 01 00 00 00 4c 89 ef e8 0b 52 00 00 e9 4e ee ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00

Message from syslogd@anxur at Aug 29 09:24:14 ... 
 kernel:Kernel panic - not syncing: Fatal exception


------------[ cut here ]------------
kernel BUG at fs/jbd2/commit.c:353! 
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/lockd/initstate
CPU 0
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt 
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt 
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]
Pid: 1881, comm: jbd2/dm-8-8 Tainted: G           ---------------- T 2.6.32-131.12.1.el6.x86_64 #1 ProLiant DL585 G6
RIP: 0010:[<ffffffffa022a8aa>]  [<ffffffffa022a8aa>] jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
RSP: 0000:ffff88085bafdd20  EFLAGS: 00010246
RAX: 0000000000000008 RBX: ffff88105ce63000 RCX: 0000000000003c99
RDX: ffff88105a96d000 RSI: 0000000000000286 RDI: ffff88105ce63000
RBP: ffff88085bafde60 R08: ffffffff8160d000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88105ce63000 R14: ffff88085baf0080 R15: ffff88105ce63098
FS:  00007f576e0a7700(0000) GS:ffff880028200000(0000) knlGS:00000000f77ae700
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f9313492000 CR3: 00000015022b7000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process jbd2/dm-8-8 (pid: 1881, threadinfo ffff88085bafc000, task ffff88085baf0080)
Stack:
 0000000000000000 0000000000000000 ffff8801bb3470b8 ffff880052e12b10
<0> ffff88101b2303c0 ffff88105ce6339c 0000d2d803a3fbfd ffff88101b230420
<0> ffff88105ce633b8 ffff88107fca6000 0000000800000000 ffff88101b230420
Call Trace:
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa022f948>] kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa022f890>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff8108de16>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 89 ef e8 da 53 00 00 e9 79 f8 ff ff 4c 89 ef e8 cd 53 00 00 e9 1b f2 ff ff be 01 00 00 00 4c 89 ef e8 0b 52 00 00 e9 4e ee ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00
RIP  [<ffffffffa022a8aa>] jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 RSP <ffff88085bafdd20>
---[ end trace 2ed71d84eb70b07c ]---
Kernel panic - not syncing: Fatal exception
Pid: 1881, comm: jbd2/dm-8-8 Tainted: G      D    ---------------- T 2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814da648>] ? panic+0x78/0x143
 [<ffffffff814de694>] ? oops_end+0xe4/0x100
 [<ffffffff8100f2eb>] ? die+0x5b/0x90
 [<ffffffff814ddf64>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffffa022a8aa>] ? jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffffa022a8aa>] ? jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa022f948>] ? kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa022f890>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff8108de16>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
panic occurred, switching back to text console
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60() (Tainted: G      D    ---------------- T)
Hardware name: ProLiant DL585 G6
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]
Pid: 55, comm: migration/13 Tainted: G      D    ---------------- T 2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
 [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106714a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff810294fc>] ? native_smp_send_reschedule+0x5c/0x60
 [<ffffffff8104b828>] ? resched_task+0x68/0x80
 [<ffffffff8104b855>] ? check_preempt_curr_idle+0x15/0x20
 [<ffffffff8105cc78>] ? pull_task+0x58/0x80
 [<ffffffff8105cd57>] ? move_one_task_fair+0xb7/0x110
 [<ffffffff8105f98d>] ? migration_thread+0x21d/0x2e0
 [<ffffffff8105f770>] ? migration_thread+0x0/0x2e0
 [<ffffffff8108de16>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
---[ end trace 2ed71d84eb70b07d ]---

last sysfs file: /sys/module/lockd/initstate
CPU 0
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt 
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt 
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]
Pid: 1881, comm: jbd2/dm-8-8 Tainted: G           ---------------- T 2.6.32-131.12.1.el6.x86_64 #1 ProLiant DL585 G6
RIP: 0010:[<ffffffffa022a8aa>]  [<ffffffffa022a8aa>] jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
RSP: 0000:ffff88085bafdd20  EFLAGS: 00010246
RAX: 0000000000000008 RBX: ffff88105ce63000 RCX: 0000000000003c99
RDX: ffff88105a96d000 RSI: 0000000000000286 RDI: ffff88105ce63000
RBP: ffff88085bafde60 R08: ffffffff8160d000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88105ce63000 R14: ffff88085baf0080 R15: ffff88105ce63098
FS:  00007f576e0a7700(0000) GS:ffff880028200000(0000) knlGS:00000000f77ae700
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f9313492000 CR3: 00000015022b7000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process jbd2/dm-8-8 (pid: 1881, threadinfo ffff88085bafc000, task ffff88085baf0080)
Stack:
 0000000000000000 0000000000000000 ffff8801bb3470b8 ffff880052e12b10
<0> ffff88101b2303c0 ffff88105ce6339c 0000d2d803a3fbfd ffff88101b230420
<0> ffff88105ce633b8 ffff88107fca6000 0000000800000000 ffff88101b230420
Call Trace:
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa022f948>] kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa022f890>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff8108de16>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 89 ef e8 da 53 00 00 e9 79 f8 ff ff 4c 89 ef e8 cd 53 00 00 e9 1b f2 ff ff be 01 00 00 00 4c 89 ef e8 0b 52 00 00 e9 4e ee ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 84 00 00 00 00
RIP  [<ffffffffa022a8aa>] jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 RSP <ffff88085bafdd20>
---[ end trace 2ed71d84eb70b07c ]---
Kernel panic - not syncing: Fatal exception
Pid: 1881, comm: jbd2/dm-8-8 Tainted: G      D    ---------------- T 2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814da648>] ? panic+0x78/0x143
 [<ffffffff814de694>] ? oops_end+0xe4/0x100
 [<ffffffff8100f2eb>] ? die+0x5b/0x90
 [<ffffffff814ddf64>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffffa022a8aa>] ? jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffffa022a8aa>] ? jbd2_journal_commit_transaction+0x11fa/0x1490 [jbd2]
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa022f948>] ? kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa022f890>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff8108de16>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
panic occurred, switching back to text console
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60() (Tainted: G      D    ---------------- T)
Hardware name: ProLiant DL585 G6
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 bnx2 ipmi_si ipmi_msghandler hpilo hpwdt sg serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt
sr_mod cdrom ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]
Pid: 55, comm: migration/13 Tainted: G      D    ---------------- T 2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
 [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106714a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff810294fc>] ? native_smp_send_reschedule+0x5c/0x60
 [<ffffffff8104b828>] ? resched_task+0x68/0x80
 [<ffffffff8104b855>] ? check_preempt_curr_idle+0x15/0x20
 [<ffffffff8105cc78>] ? pull_task+0x58/0x80
 [<ffffffff8105cd57>] ? move_one_task_fair+0xb7/0x110
 [<ffffffff8105f98d>] ? migration_thread+0x21d/0x2e0
 [<ffffffff8105f770>] ? migration_thread+0x0/0x2e0
 [<ffffffff8108de16>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
---[ end trace 2ed71d84eb70b07d ]---

We are currently using vanilla kernel (I'm attaching lsmod output and kernel config) and we are not experiencing any problems.
I have manually removed fscache module from 3.0.1, but I don't know if that caused the issue (I did it, because I had some problems
with fscache in FC13 systems).

This partition was migrated from ext3 to ext4 because we had similar issues when running RHEL and vanilla 3.0.1 kernels on ext3:
------------[ cut here ]------------
kernel BUG at fs/jbd/commit.c:319!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/lockd/initstate
CPU 4
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs
autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 ext3 jbd bnx2
netxen_nic ipmi_si ipmi_msghandler hpwdt hpilo sg serio_raw k10temp
amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod  
crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt sr_mod cdrom ata_generic
pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core
dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs
autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 ext3 jbd bnx2
netxen_nic ipmi_si ipmi_msghandler hpwdt hpilo sg serio_raw k10temp
amd64_edac_mod edac_core edac_mce_amd shpchp ext4 mbcache jbd2 sd_mod
crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt sr_mod cdrom ata_generic
pata_acpi pata_amd radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core
dm_mod [last unloaded: scsi_wait_scan]
Pid: 1829, comm: kjournald Tainted: G           ---------------- T
2.6.32-131.6.1.el6.x86_64 #1 ProLiant DL585 G6
RIP: 0010:[<ffffffffa03423d7>]  [<ffffffffa03423d7>]
journal_commit_transaction+0xde7/0x1140 [jbd]
RSP: 0018:ffff88085a82dd50  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88205ac84400 RCX: 0000000000004cd2
RDX: ffff88205ca3f000 RSI: 0000000000000286 RDI: ffff88205ac84400
RBP: ffff88085a82de60 R08: ffffffff8160c800 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88205ac84400 R14: ffff88085a2a0b00 R15: ffff88205ac84498
FS:  00007f6bc0a8b700(0000) GS:ffff880028240000(0000) knlGS:00000000f77d96c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fff514b16d8 CR3: 000000185b31b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kjournald (pid: 1829, threadinfo ffff88085a82c000, task
ffff88085a2a0b00)
Stack:
 0000000000000000 ffff880028253b40 ffff880000000000 ffff88205ac84400
<0> 000045d78d0d008f ffff88205ac84568 ffff88205ca3f000 0000000000000000
<0> ffff880800000fdc 000004c15b35e980 0000000000000000 ffff88205ac84424
Call Trace:
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa03450f8>] kjournald+0xe8/0x250 [jbd]
 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0345010>] ? kjournald+0x0/0x250 [jbd]
 [<ffffffff8108dd96>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 02 bf 4c 89 f7 e8 6a 45 00 00 f0 41 ff 4e 60 e9 ae f6 ff ff be fb ff ff
ff 4c 89 ef e8 43 31 00 00 4d 8b 74 24 28 e9 5d f7 ff ff <0f> 0b eb fe be c1
1c 00 00 4c 89 f7 e8 a8 28 e6 e0 e9 5f ff ff
RIP  [<ffffffffa03423d7>] journal_commit_transaction+0xde7/0x1140 [jbd]
 RSP <ffff88085a82dd50>
---[ end trace 84b75b55b6cd8510 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1829, comm: kjournald Tainted: G      D    ---------------- T
2.6.32-131.6.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814da518>] ? panic+0x78/0x143
 [<ffffffff814de564>] ? oops_end+0xe4/0x100
 [<ffffffff8100f2eb>] ? die+0x5b/0x90
 [<ffffffff814dde34>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffffa03423d7>] ? journal_commit_transaction+0xde7/0x1140 [jbd]
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffffa03423d7>] ? journal_commit_transaction+0xde7/0x1140 [jbd]
 [<ffffffff8107966c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa03450f8>] ? kjournald+0xe8/0x250 [jbd]
 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0345010>] ? kjournald+0x0/0x250 [jbd]
 [<ffffffff8108dd96>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
panic occurred, switching back to text console


Migration to ext4 seems to have fixed the issue for vanilla but not RHEL kernel. It worked without any issues
on RHEL5 (this problem has started to manifest just after upgrade to RHEL6). Since this is our production server
I didn't experiment with it very much. I also do not know how to reproduce this issue.
Comment 2 Ric Wheeler 2011-09-05 11:36:57 EDT
When you say "migration to ext4", are you creating a new ext4 file system or mounting an old ext3 file system with the ext4 code?

Can you provide precise version numbers for the kernels you tested both in upstream and in RHEL6/RHEL5?

Thanks!
Comment 3 Stefan Sakalik 2011-09-05 12:08:37 EDT
(In reply to comment #2)
> When you say "migration to ext4", are you creating a new ext4 file system or
> mounting an old ext3 file system with the ext4 code?
I just mounted ext3 as ext4.

> 
> Can you provide precise version numbers for the kernels you tested both in
> upstream and in RHEL6/RHEL5?
upstream: 3.0.1 stable from kernel.org
RHEL6: Red Hat Enterprise Linux Server (2.6.32-131.12.1.el6.x86_64)

I'm not sure about this but I think all RHEL5 kernels are OK.
RHEL5: Red Hat Enterprise Linux Server (2.6.18-194.26.1.el5)

> 
> Thanks!
Comment 4 Ric Wheeler 2011-09-05 12:42:52 EDT
Just to be clear, we do not support/test migration in place from ext3 to ext4. 

To get the benefits of ext4, you need to create a fresh file system so that the data will be laid out properly.

Still worth understanding the issue though, thanks for the report!
Comment 5 Stefan Sakalik 2011-09-05 13:27:46 EDT
Kernel panicked when this filesystem was mounted originally as ext3 on newest RHEL6 and also on
vanilla 3.0.1 kernel. We needed for this server to be fully functional so we tried mounting the
partition as ext4 and use jbd2 instead of jbd. So I guess the original bug is an upstream bug:
------------[ cut here ]------------
kernel BUG at fs/jbd/commit.c:319

I should note that I put fscache module to /etc/modprobe.d/blacklist.conf. I've discovered only
later that this doesn't work since nfs module depends on fscache (you need to recompile the kernel
to have working nfs w/o fscache). I'm not sure if fscache was enabled when running 3.0.1 w. ext3 
so it might still be an fscache issue.

After migration vanilla (w/o fscache) works fine, but RHEL6 (w. fscache) gives message:
------------[ cut here ]------------
kernel BUG at fs/jbd2/commit.c:353! 
...

I think those two are related. I should try newest vanilla with fscache and report bug to
upstream if the issue repeats but I'm quite reluctant to experiment on this particular server :(.
Comment 6 RHEL Product and Program Management 2011-10-07 11:47:24 EDT
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 7 Dave Wysochanski 2012-08-10 09:28:12 EDT
The first oops in the description:
kernel BUG at fs/jbd2/commit.c:353! 
...
Pid: 1881, comm: jbd2/dm-8-8 Tainted: G           ---------------- T 2.6.32-131.12.1.el6.x86_64 #1 ProLiant DL585 G6

corresponds to the following code:
fs/jbd2/commit.c
310 void jbd2_journal_commit_transaction(journal_t *journal)
...
344 
345 	/* Do we need to erase the effects of a prior jbd2_journal_flush? */
346 	if (journal->j_flags & JBD2_FLUSHED) {
347 		jbd_debug(3, "super block updated\n");
348 		jbd2_journal_update_superblock(journal, 1);
349 	} else {
350 		jbd_debug(3, "superblock not updated\n");
351 	}
352 
353-->	J_ASSERT(journal->j_running_transaction != NULL);
354 	J_ASSERT(journal->j_committing_transaction == NULL);


This last oops in the description:
------------[ cut here ]------------
kernel BUG at fs/jbd/commit.c:319!
...
Pid: 1829, comm: kjournald Tainted: G           ---------------- T
2.6.32-131.6.1.el6.x86_64 #1 ProLiant DL585 G6
RIP: 0010:[<ffffffffa03423d7>]  [<ffffffffa03423d7>]

corresponds to this code:
fs/jbd/commit.c
280 void journal_commit_transaction(journal_t *journal)
281 {
...
311 	/* Do we need to erase the effects of a prior journal_flush? */
312 	if (journal->j_flags & JFS_FLUSHED) {
313 		jbd_debug(3, "super block updated\n");
314 		journal_update_superblock(journal, 1);
315 	} else {
316 		jbd_debug(3, "superblock not updated\n");
317 	}
318 
319-->	J_ASSERT(journal->j_running_transaction != NULL);
320 	J_ASSERT(journal->j_committing_transaction == NULL);
321 

I'm going to rename this bug based on the above information.
Comment 20 Dave Wysochanski 2012-08-14 09:33:43 EDT
Patches have been identified that are believed to fix this problem, and a test kernel has been built.  If you are still seeing it, let us know and we can provide the test kernel.
Comment 25 RHEL Product and Program Management 2012-08-24 09:40:35 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 26 Stefan Sakalik 2012-08-24 09:50:22 EDT
(In reply to comment #20)
> Patches have been identified that are believed to fix this problem, and a
> test kernel has been built.  If you are still seeing it, let us know and we
> can provide the test kernel.

Unfortunately we are not able to test this new kernel. We have already replaced our disk array and recreated the filesystem.
Comment 30 Jarod Wilson 2012-08-31 14:44:39 EDT
Patch(es) available on kernel-2.6.32-304.el6
Comment 33 Bjoern Engels 2012-11-28 02:26:29 EST
I'm running into the same problem on a production server that's running a MySQL Master node with high disk I/O. The server started panicing a few hours ago and
is now crashing again and again within a couple of minutes after mysqld has been started. Can you please provide kernel-2.6.32-304.el6 so we can test if it fixes the issue?
Comment 34 Bjoern Engels 2012-11-29 04:34:47 EST
Update: I testet against 3.7.0-rc1, server is still crashing with that kernel version.
Comment 37 errata-xmlrpc 2013-02-21 00:54:27 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.