Red Hat Bugzilla – Bug 51644
"Journal_write_metadata_buffer" error running cerberus on IA64
Last modified: 2007-04-18 12:35:48 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2 i686; en-US; rv:0.9.1)
Description of problem:
On a pe7150 w/ beta3 kernel 2.4.7-0.12.0smp, onboard scsi booting, 2GB ram,
2GB swap, bios X15.
Cerberus 9-20 running for over 24 hours. Console reports:
Assertion failure in journal_write_metadata_buffer() at journal.c: 365:
After Ctrl-C out of cerberus, system is unresponsive. Text could be typed
at prompt, but command not executed. Could not log into other console.
Steps to Reproduce:
1.Install above configuration or equivalent.
2.Run cerberus for atleast 24 hours.
3.Look for error at console
Actual Results: No error
Expected Results: Error
This defect is considered MUST-FIX for Fairfax.
I'll send you a more recent kernel build with the current ext3, which has a few
bugs fixed. If you can still reproduce with that kernel, then I'll build you a
debugging version which produces extensive buffer tracing information when it
detects such errors. Do you have a serial console set up so that you can trap
such debug output?
Testing 2.4.7-2.9 w/ cerberus. Yes I can capture serial console output.
Reproduced in 2.4.7-2.9. I have serial console redirection setup.
has kernels with extensive ext3 debugging enabled. Could you try with this new
Looks like I need mkinitrd 3.2.2 to install the kernel rpms.
Any update on this? I've put the new mkinitrd into the same url directory as
the kernel images, in case you missed Arjan's email about where to get it.
Problem reproduced with diagnostic kernel, but the only thing in the serial
console is the error at the console.
There _should_ have been a [long] debugging history trace of the buffer
concenerned printed before the assert failure log message, if you're using a
kernel dated later than August 28th. Is there nothing at all of that type in
the console output?
I was using kernel 2.4.7-6smp downloaded from
There was no trace in the serial console log.
Hmm. I've had a look through the code and I can't see any way that the debug
code in question would be inactive on a modular ext3 (as we configure it), nor
any way in which it would be omitted from a display of that particular oops. Is
the total log for that boot (from initial kernel version to final oops) short
enough for you to mail to me so that I can try to pick out why it's being missed?
Created attachment 31920 [details]
Serial log of bootup and error
The only thing I can think of here is that the debug buffer tracing is being
done at a log level which is not going to the console. The tracing is done at
log level KERN_WARNING (level 4) by default. Could you try setting the console
loglevel higher than that and see if you can capture the trace? Or see if
/var/log/messages has captured the info (hidden console traffic may still show
up in the /var/log files if the root filesystem is still working)?
To raise the log level, try
dmesg -n 7
or edit the LOGLEVEL in /etc/sysconfig/init. You can check the current loglevel
in /proc/sys/kernel/printk: the first value in that pseudo-file is the current
filter log level used to determine which messages get sent to console and which
just get logged to /var/log/.
If the trace has indeed been captured in /var/log/messages successfully then we
won't need to worry about the serial console, but filesystem crashes may
obviously prevent the /var/log spools from operating correctly.
Changed log level to 7 with dmesg -n 7 and verified in /proc/sys/kernel/printk,
but nothing further in serial console.
You mean you reproduced the fault after setting dmesg? Was there anything in
/var/log/messages after the bug report, or was that not accessible?
I've double-checked, and the debugging code is most definitely present in the
kernel image I sent you, so if we really can't get it to trigger then I'll try
doing fault-injection to trigger the assertion manually on a test box to make
sure that the debug code is behaving as expected.
Did not occur with kernel 2.4.9-0.18smp and 2GB ram after 24 hours of newburn.
Has it ever survived that long before?
It had been occuring within a few hours, usually within an hour, so this looks
promising. I am letting it run, and will continue to monitor it.