Bug 719112 - [RHEL5.7][Kernel] EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only
Summary: [RHEL5.7][Kernel] EXT3-fs error (device dm-0): ext3_journal_start_sb: Detecte...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-05 19:49 UTC by Jeff Burke
Modified: 2018-11-30 20:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-23 16:08:27 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jeff Burke 2011-07-05 19:49:57 UTC
Description of problem:
 While running kernel testing an AMD host was selected. This system gets into a bad state while running several of the kernel tier tests.

Version-Release number of selected component (if applicable):
 2.6.18-273.el5 x86_64

How reproducible:
 Often with x86_64
  
Actual results:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:00:55:31:3b/00:00:16:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:56:31:3b/61:00:16:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:00:55:31:3b/00:00:16:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:56:31:3b/61:00:16:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x18 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:18:55:31:3b/00:00:16:00:00/40 tag 3 ncq 4096 in 
         res 41/40:08:56:31:3b/61:00:16:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:08:55:31:3b/00:00:16:00:00/40 tag 1 ncq 4096 in 
         res 41/40:08:56:31:3b/61:00:16:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0xe SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:08:55:31:3b/00:00:16:00:00/40 tag 1 ncq 4096 in 
         res 41/40:08:56:31:3b/61:00:16:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:00:55:31:cb/00:00:15:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:57:31:cb/61:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:00:55:31:cb/00:00:15:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:57:31:cb/61:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:10:55:31:cb/00:00:15:00:00/40 tag 2 ncq 4096 in 
         res 41/40:08:57:31:cb/61:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:e0:55:31:cb/00:00:15:00:00/40 tag 28 ncq 4096 in 
         res 41/40:08:57:31:cb/00:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x4 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:10:55:31:cb/00:00:15:00:00/40 tag 2 ncq 4096 in 
         res 41/40:08:57:31:cb/00:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 
ata1.00: irq_stat 0x40000008 
ata1.00: cmd 60/08:00:55:31:cb/00:00:15:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:57:31:cb/61:00:15:00:00/40 Emask 0x409 (media error) <F> 
ata1.00: status: { DRDY ERR } 
ata1.00: error: { UNC } 
 
end_request: I/O error, dev sda, sector 365637975 
[-- MARK -- Tue Jul  5 13:00:00 2011] 
EXT3-fs error (device dm-0): read_inode_bitmap: Cannot read inode bitmap - block_group = 1394, inode_bitmap = 45678593<5>SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) 
sda: Write Protect is off 
SCSI device sda: drive cache: write back 
 
Aborting journal on device dm-0. 
EXT3-fs error (device dm-0) in ext3_new_inode: IO failure 
EXT3-fs error (device dm-0) in ext3_create: IO failure 
ext3_abort called. 
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal 
Remounting filesystem read-only 
__journal_remove_journal_head: freeing b_committed_data 
__journal_remove_journal_head: freeing b_committed_data

Comment 2 Eric Sandeen 2011-07-05 21:23:03 UTC
nb: This is a storage error, not a filesystem error.

The storage blew up:

> ata1.00: cmd 60/08:00:55:31:cb/00:00:15:00:00/40 tag 0 ncq 4096 in 
         res 41/40:08:57:31:cb/61:00:15:00:00/40 Emask 0x409 (media error) <F> 
> ata1.00: status: { DRDY ERR } 
> ata1.00: error: { UNC } 

IO failed:

> end_request: I/O error, dev sda, sector 365637975 

ext3 noticed:

> EXT3-fs error (device dm-0): read_inode_bitmap: Cannot read inode bitmap -

and shut down as it should have:

> Aborting journal on device dm-0. 

-Eric

Comment 3 Eric Sandeen 2011-07-05 21:27:39 UTC
May just be a hardware problem with that host.

Comment 4 Bryn M. Reeves 2011-07-05 21:58:42 UTC
Maybe re-run the i386 one and see if blows up in the same place this time? The x86_64 runs seem to have been later..

Comment 5 Eric Sandeen 2011-07-05 22:03:06 UTC
I'll log in and look at SMART data as a start, ok?

Comment 6 Jeff Burke 2011-07-06 18:43:32 UTC
(In reply to comment #4)
> Maybe re-run the i386 one and see if blows up in the same place this time? The
> x86_64 runs seem to have been later..
A second i386 Beaker job was successful:
 https://beaker.engineering.redhat.com/jobs/107543

Thanks,
Jeff

Comment 7 Eric Sandeen 2011-07-06 18:49:45 UTC
If I can get a few minutes on the machine, I'd run smartctl to query the drive and see if it is reporting media failures...

Comment 8 Jeff Burke 2011-07-06 19:05:00 UTC
(In reply to comment #7)
> If I can get a few minutes on the machine, I'd run smartctl to query the drive
> and see if it is reporting media failures...

Eric,
 I just reinstalled the system with RHEL5.7-Server-20110630.0 x86_64. I was setting it up for you to grab the data you need. But once the installation finished. On first boot it put the system back into "read only" mode

This system just finished running a complete kerneltier1 test suite with i386. Once I reinstalled it to be x86_64 I hit the issue again.


----<snip>----
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:18:5d:31:07/00:00:16:00:00/40 tag 3 ncq 4096 in
         res 41/40:08:61:31:07/c3:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:d8:5d:31:07/00:00:16:00:00/40 tag 27 ncq 4096 in
         res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
sda: Write Protect is off
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0xb SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:18:5d:31:07/00:00:16:00:00/40 tag 3 ncq 4096 in
         res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0x3ffff SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:00:5d:31:07/00:00:16:00:00/40 tag 0 ncq 4096 in
         res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0x2c001 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:88:5d:31:07/00:00:16:00:00/40 tag 17 ncq 4096 in
         res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: cmd 60/08:00:5d:31:07/00:00:16:00:00/40 tag 0 ncq 4096 in
         res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }

end_request: I/O error, dev sda, sector 369570145
EXT3-fs error (device dm-0): ext3_get_inode_loc: <6>ata1: EH complete
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
unable to read inode block - inode=46170113, block=46170114
Aborting journal on device dm-0.
journal commit I/O error
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
Traceback (most recent call last):
  File "/usr/lib64/python2.4/logging/__init__.py", line 740, in emit
    self.flush()
  File "/usr/lib64/python2.4/logging/__init__.py", line 718, in flush
    self.stream.flush()
IOError: [Errno 30] Read-only file system
[  OK  ]
touch: cannot touch `/var/lock/subsys/haldaemon': Read-only file system
touch: cannot touch `/var/lock/subsys/hidd': Read-only file system

touch: cannot touch `/var/lock/subsys/lvm2-monitor': Read-only file system
Starting autofs:  Loading autofs4: [  OK  ]
Starting automount: automount: test mount forbidden or incorrect kernel protocol version, kernel protocol version 5.00 or above required.
[FAILED]
[FAILED]
Starting hpiod: can't open or create /var/run/hpiod.pid: Read-only file system io/hpiod/hpiod.cpp 195
[FAILED]
Starting hpssd: can't open pidfile /var/run/hpssd.pid: (30) Read-only file system
[FAILED]
Generating SSH1 RSA host key: [FAILED]
Starting cups: [  OK  ]
touch: cannot touch `/var/lock/subsys/cups': Read-only file system
Starting xinetd: [  OK  ]
touch: cannot touch `/var/lock/subsys/xinetd': Read-only file system
ntpd: Synchronizing with time server: [  OK  ]
Starting ntpd: [  OK  ]
touch: cannot touch `/var/lock/subsys/ntpd': Read-only file system
Starting sendmail: [  OK  ]
touch: cannot touch `/var/lock/subsys/sendmail': Read-only file system
Starting sm-client: touch: cannot touch `/var/run/sm-client.pid': Read-only file system
chown: cannot access `/var/run/sm-client.pid': No such file or directory
lstat(/var/run/sm-client.pid) failed: No such file or directory
[  OK  ]
touch: cannot touch `/var/lock/subsys/sm-client': Read-only file system
Starting console mouse services: O0o.oops(): [startup.c(126)]: audit(1309978830.275:7): audit_pid=0 old=3012 by auid=4294967295 subj=system_u:system_r:auditd_t:s0
[  OK  ]
touch: cannot touch `/var/lock/subsys/gpm': Read-only file system
Starting crond: crond: can't open or create /var/run/crond.pid: Read-only file system
[FAILED]
Starting xfs: mkdir: cannot create directory `/tmp/.font-unix': Read-only file system
lstat(/tmp/.font-unix) failed: No such file or directory
touch: cannot touch `/var/lock/subsys/xfs': Read-only file system

Starting anacron: [  OK  ]
touch: cannot touch `/var/lock/subsys/atd': Read-only file system

Starting background readahead: [  OK  ]
Starting rhsmcertd 240[FAILED]
Starting yum-updatesd: [  OK  ]
touch: cannot touch `/var/lock/subsys/yum-updatesd': Read-only file system
Starting Avahi daemon... [FAILED]
touch: cannot touch `/var/lock/subsys/anamon': Read-only file system

failed to make symlink /etc/rc2.d/K95anamon: File exists
failed to make symlink /etc/rc3.d/K95anamon: Read-only file system
failed to make symlink /etc/rc4.d/K95anamon: File exists
failed to make symlink /etc/rc5.d/K95anamon: Read-only file system
touch: cannot touch `/var/lock/subsys/local': Read-only file system
Starting smartd: type=1400 audit(1309978833.585:8): avc:  denied  { read } for  pid=3672 comm="sh" name="tmp" dev=dm-0 ino=47809723 scontext=system_u:system_r:fsdaemon_t:s0 tcontext=syste
type=1300 audit(1309978833.585:8): arch=c000003e syscall=4 success=no exit=-13 a0=49d0ff a1=7fff8e202a70 a2=7fff8e202a70 a3=0 items=0 ppid=3671 pid=3672 auid=4294967295 uid=0 gid=0 euid=)
type=1400 audit(1309978833.641:9): avc:  denied  { read } for  pid=3674 comm="sh" name="tmp" dev=dm-0 ino=47809723 scontext=system_u:system_r:fsdaemon_t:s0 tcontext=system_u:object_r:usre
type=1300 audit(1309978833.641:9): arch=c000003e syscall=4 success=no exit=-13 a0=49d0ff a1=7fff62735840 a2=7fff62735840 a3=0 items=0 ppid=3673 pid=3674 auid=4294967295 uid=0 gid=0 euid=)
[  OK  ]
touch: cannot touch `/var/lock/subsys/smartd': Read-only file system

Red Hat Enterprise Linux Server release 5.7 Beta (Tikanga)
Kernel 2.6.18-269.el5 on an x86_64

amd-snook-01.lab.bos.redhat.com login: 

----<\snip>----

Comment 9 Eric Sandeen 2011-07-06 20:12:28 UTC
Disk hardware errors:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       20%      9646         348344579

Comment 10 Majoosh 2013-07-23 00:16:25 UTC
Hi,

We are seeing exact same issue.

===================================
EXT3-fs error (device sdd): htree_dirblock_to_tree: bad entry in directory #5833454: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Aborting journal on device sdd.
__journal_remove_journal_head: freeing b_committed_data
ext3_abort called.
EXT3-fs error (device sdd): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device sdd): htree_dirblock_to_tree: bad entry in directory #5833458: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
EXT3-fs error (device sdd): htree_dirblock_to_tree: bad entry in directory #5833461: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0

========================================

Is there any work around other than upgrading the kernel/ files system to ext4 ?


barrier=1 option in fstab seems to help as per our test.  IS there any other recommendations form Redhat?


Regards
Majoosh

Comment 11 Eric Sandeen 2013-07-23 13:58:14 UTC
I believe your problem is di(In reply to Majoosh from comment #10)
> Hi,
> 
> We are seeing exact same issue.

Perhaps, but by a different vector; powering off a virtual host, and guests not using barriers, if I recall correctly.  Please engage your support contacts to help you troubleshoot that issue.

Comment 12 Eric Sandeen 2013-07-23 14:00:57 UTC
Jeff, I think this one should be closed, it sure look like a drive error, no?

> ata1.00: cmd 60/08:18:5d:31:07/00:00:16:00:00/40 tag 3 ncq 4096 in
>          res 41/40:08:61:31:07/00:00:16:00:00/40 Emask 0x409 (media error) <F>
> ata1.00: status: { DRDY ERR }
> ata1.00: error: { UNC }

Comment 14 Eric Sandeen 2013-07-23 16:08:27 UTC
Closing NOTABUG, this looks like a hardware error causing the shutdown.


Note You need to log in before you can comment on or make changes to this bug.