Red Hat Bugzilla – Bug 247205
System hung with Ext-fs error (device dm-5) in start_transaction: Journal has aborted
Last modified: 2008-12-04 16:57:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:220.127.116.11) Gecko/20070515 Firefox/18.104.22.168
Description of problem:
I have no idea what caused this, nor could I run any forensics before rebooting.
A database server, which was quietly sitting there, serving databases, hung.
It was still pingable, and my existing ssh connections didn't die, but any attempted command failed to run.
When I got to the console, the error in the subject was scrolling as fast as it could.
A power-cycle was necessary to restart the machine, which booted fine.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
This is a production system and downtime is bad. I hope to never see this failure again. As no one was doing anything on the machine at the time, reproducibility is not likely.
We have a fair number of RHEL4u4 systems and this is the 1st time we've seen this.
This is only our 2nd Dell PowerEdge 2950, and the other one is running u5 (which, oddly, isn't a valid choice in the version picker, earlier) and this one will be soon, but as Google shows this bug to have been around for at least 4 years, I'm not wildly hopeful that it's already been fixed.
After reboot, was there anything in the system logs prior to the "journal has
aborted" message? Or was it the root filesystem which had the problem, and so I
suppose no further messages were logged to disk...?
This has been in needinfo for over a year; closing.