Bug 453809 - Problems with jbd error handling
Problems with jbd error handling
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: Josef Bacik
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-02 13:41 EDT by Bryn M. Reeves
Modified: 2010-10-22 22:35 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 11:37:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bryn M. Reeves 2008-07-02 13:41:10 EDT
Description of problem:
Hidehiro Kawai discovered some problems with jbd's error handling that can in
rare failure situations lead to file system corruption during journal recovery:

http://lkml.org/lkml/2008/4/18/154

Although upstream is planning to move past the current jbd implementation in a
way that may make these changes irrelevant users of the current code are still
vulnerable to these problems. Environments where a very large number of disks
are in use increases the probability that one of these problems will occur.

Version-Release number of selected component (if applicable):
2.6.18-*

How reproducible:
Very difficult; may require hardware with fault injection capabilities. Problems
discovered via code inspection.

Steps to Reproduce:
1. n/a
  
Additional info:
[PATCH 1/4] jbd: strictly check for write errors on data buffers
[PATCH 2/4] jbd: ordered data integrity fix
[PATCH 3/4] jbd: abort when failed to log metadata buffers
[PATCH 4/4] jbd/ext3: fix error handling for checkpoint io
Comment 2 RHEL Product and Program Management 2009-02-16 10:27:14 EST
Updating PM score.
Comment 3 RHEL Product and Program Management 2009-02-24 12:32:43 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Debbie Johnson 2009-04-09 12:32:18 EDT
Josef,

What is the status of this BZ?  Will it be going into 5.4?  It is unclear by the comments and I have a customer that is in need of this.  I attached the IT to this.

Debbie

Errors they are seeing...

Mar  4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 24588 on sdb3
Mar  4 15:08:47 jb-601 kernel: JBD: bad block at offset 24588
Mar  4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 24588 on sdb3
Mar  4 15:08:47 jb-601 kernel: JBD: bad block at offset 24588
Mar  4 15:08:47 jb-601 kernel: JBD: Failed to read block at offset 24585
Mar  4 15:08:47 jb-601 kernel: JBD: recovery failed
Mar  4 15:08:47 jb-601 kernel: EXT3-fs: error loading journal.
Mar  4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 21516 on sdb4
Mar  4 15:08:47 jb-601 kernel: JBD: bad block at offset 21516
Mar  4 15:08:47 jb-601 kernel: journal_bmap: journal block not found at offset 21685 on sdb4
Mar  4 15:08:47 jb-601 kernel: JBD: bad block at offset 21685
Mar  4 15:08:47 jb-601 kernel: JBD: recovery failed
Mar  4 15:08:47 jb-601 kernel: EXT3-fs: error loading journal.
Comment 5 Josef Bacik 2009-04-09 12:39:49 EDT
I'm pretty sure Hitachi has already posted these, but they will not fix the problem it looks like your customer is having.  These patches are simply to make sure we always abort the transaction when we are supposed to, it seems like your customers is suffering from data corruption.
Comment 6 Josef Bacik 2009-04-21 11:37:20 EDT
the recovery patches referenced in c1 have largely been accepted already via other bz's.  I'm closing this bz.

Note You need to log in before you can comment on or make changes to this bug.