Description of problem: Hidehiro Kawai discovered some problems with jbd's error handling that can in rare failure situations lead to file system corruption during journal recovery: http://lkml.org/lkml/2008/4/18/154 Although upstream is planning to move past the current jbd implementation in a way that may make these changes irrelevant users of the current code are still vulnerable to these problems. Environments where a very large number of disks are in use increases the probability that one of these problems will occur. Version-Release number of selected component (if applicable): 2.6.18-* How reproducible: Very difficult; may require hardware with fault injection capabilities. Problems discovered via code inspection. Steps to Reproduce: 1. n/a Additional info: [PATCH 1/4] jbd: strictly check for write errors on data buffers [PATCH 2/4] jbd: ordered data integrity fix [PATCH 3/4] jbd: abort when failed to log metadata buffers [PATCH 4/4] jbd/ext3: fix error handling for checkpoint io
Updating PM score.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Josef, What is the status of this BZ? Will it be going into 5.4? It is unclear by the comments and I have a customer that is in need of this. I attached the IT to this. Debbie Errors they are seeing... Mar 4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 24588 on sdb3 Mar 4 15:08:47 jb-601 kernel: JBD: bad block at offset 24588 Mar 4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 24588 on sdb3 Mar 4 15:08:47 jb-601 kernel: JBD: bad block at offset 24588 Mar 4 15:08:47 jb-601 kernel: JBD: Failed to read block at offset 24585 Mar 4 15:08:47 jb-601 kernel: JBD: recovery failed Mar 4 15:08:47 jb-601 kernel: EXT3-fs: error loading journal. Mar 4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 21516 on sdb4 Mar 4 15:08:47 jb-601 kernel: JBD: bad block at offset 21516 Mar 4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 21685 on sdb4 Mar 4 15:08:47 jb-601 kernel: JBD: bad block at offset 21685 Mar 4 15:08:47 jb-601 kernel: JBD: recovery failed Mar 4 15:08:47 jb-601 kernel: EXT3-fs: error loading journal.
I'm pretty sure Hitachi has already posted these, but they will not fix the problem it looks like your customer is having. These patches are simply to make sure we always abort the transaction when we are supposed to, it seems like your customers is suffering from data corruption.
the recovery patches referenced in c1 have largely been accepted already via other bz's. I'm closing this bz.