Bug 173350

Summary:

jbd I/O errors after ext3 orphan processing on readonly device

Product:

Red Hat Enterprise Linux 3

Reporter:

Scott D. Anderson <scott.anderson>

Component:

kernel

Assignee:

Eric Sandeen <esandeen>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

high

Version:

3.0

CC:

agk, dwysocha, esandeen, heinzm, jbacik, jtillots, mazurekd, mbroz, petrides, sct

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

RHSA-2007-0436

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-06-11 17:53:05 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

174649, 217926, 217928

Attachments:

Description	Flags
nightly cron job	none
the shell functions that do the real work of creating the snapshots	none
Entries in the messages log associated with this bug	none
kernel panic	none
patch that resolves the problem.	none

Description Scott D. Anderson 2005-11-16 15:22:12 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.7.12-1.1.3.2

Description of problem:
I'm running RedHat Enterprise Linux AS Release 3 (Taroon Update 6) on my
main server. Since August, the /home1 and /home2 partitions have been
using LVM on RAID5 with snapshot partitions being used for tape dumps.
The actual filesystems are ext3, which has journaling.

Twice now, I've gotten the following message on the console:

Message from syslogd@puma at Tue Nov 15 04:02:26 2005 ...
puma kernel: journal commit I/O error

Message from syslogd@puma at Tue Nov 15 04:02:26 2005 ...
puma kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions"

Message from syslogd@puma at Tue Nov 15 04:02:30 2005 ...
puma kernel: Kernel panic: Fatal exception

FYI, at 4:02am, the current snapshot is removed, a new one is created, and
a tape dump is begun. The effects of this error are that (1) I can no
longer "lvremove" the snapshot partitions, because the system says they
are in use, but they're really not; and, (2) I can't do tape dumps because
the "sync" system call hangs. The hung processes cannot even be killed with
kill -9. A reboot is necessary to fix all these problems.

Version-Release number of selected component (if applicable):
kernel 2.4.21-37.ELsmp

How reproducible:
Didn't try

Steps to Reproduce:
I don't know how to reproduce it; it just happens. I believe it happens due to the nightly cron job that removes the old snapshots and kicks off a tape dump, but that cron job runs every night, so it's run over 100 times since August and the kernel panic has only happened twice. I'll submit the cron job, in case you care.

Additional info:

Comment 1 Scott D. Anderson 2005-11-16 15:34:49 UTC

Created attachment 121131 [details]
nightly cron job

runs at 4:02 am daily.	Creating the new snapshots may be what triggers the
kernel panic.

Comment 2 Scott D. Anderson 2005-11-16 15:36:15 UTC

Created attachment 121132 [details]
the shell functions that do the real work of creating the snapshots

This file is sourced by the main cron job.

Comment 3 Stephen Tweedie 2005-11-16 20:27:12 UTC

Are there any other errors in the logs just prior to this ext3 journal error?

Comment 4 Scott D. Anderson 2005-11-16 21:02:50 UTC

(In reply to comment #3)
> Are there any other errors in the logs just prior to this ext3 journal error?

Yes.  Actually, this error doesn't appear in the logs; it seems to go only to
the console, which is odd.  Anyhow, below are the relevant log entries.  I have
filtered out the "raid5: switching cache buffer size" entries, which appear
several times a second continuously.  

Nov 14 04:02:14 puma kernel: Can't write to read-only device 3a:07
Nov 14 04:02:15 puma kernel: Can't write to read-only device 3a:08
Nov 14 04:02:16 puma modprobe: modprobe: Can't locate module block-major-43
Nov 14 04:02:17 puma kernel: Can't write to read-only device 3a:07
Nov 14 04:02:17 puma kernel: kjournald starting.  Commit interval 5 seconds
Nov 14 04:02:17 puma kernel: Can't write to read-only device 3a:07
Nov 14 04:02:17 puma kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov 14 04:02:17 puma modprobe: modprobe: Can't locate module block-major-43
Nov 14 04:02:19 puma kernel: Can't write to read-only device 3a:08
Nov 14 04:02:19 puma kernel: kjournald starting.  Commit interval 5 seconds
Nov 14 04:02:19 puma kernel: Can't write to read-only device 3a:08
Nov 14 04:02:19 puma kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov 14 04:02:19 puma kernel: sym53c1010-33-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
(25.0 ns, offset 31)
Nov 14 04:02:19 puma kernel: sym53c1010-33-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
(25.0 ns, offset 31)
Nov 14 05:20:03 puma kernel: sym53c1010-33-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
(25.0 ns, offset 31)
Nov 14 05:20:05 puma kernel: sym53c1010-33-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s
(25.0 ns, offset 31)

In order to interpret the message about being unable to write to a read-only
device, you'll need the following information:

# ls -ln /dev/vghome
total 0
crw-r-----    1 0        6        109,   0 Oct 29 13:17 group
brw-rw----    1 0        6         58,   7 Nov 15 13:32 homesnap
brw-rw----    1 0        6         58,   1 Oct 29 13:17 lvhome
brw-rw----    1 0        6         58,   3 Oct 29 13:17 lvstud
brw-rw----    1 0        6         58,   8 Nov 15 04:02 studsnap

Just FYI, /dev/vghome/lvhome is the home filesystem and /dev/vghome/homesnap is
the snapshot of it.  Similarly for lvstud (a real filesystem which has student
home directories) and studsnap (a snapshot of lvstud).

Thanks for your effort.  Let me know if I can supply any more information.

Comment 5 Stephen Tweedie 2005-11-16 21:37:18 UTC

OK, the errors are occurring on the snapshot filesystems themselves.  It looks
like there are a number of things happening here: first, ext3 is not completely
getting flushed out and quiesced properly when the snapshot is created, leaving
behind a snapshot that needs recovery when mounted; secondly, the snapshot
device is not being marked as a true readonly device by LVM, so ext3 tries to
recover it and fails; finally, ext3 is getting confused enough by that failure
that it panics.

The second problem appears to be an LVM bug.  The final problem is ext3's: no
matter how badly the underlying storage is behaving, ext3 shouldn't be panicing. 

These are being triggered by the first problem, though, and it's not clear
whether it's LVM or ext3 failing to do the quiescing properly in this case. 
I'll have to have a think about how to narrow that down.

Comment 8 Ernie Petrides 2005-11-30 08:02:06 UTC

Reassigning to "lvm" component based on comment #7.

Comment 9 Jenett Tillotson 2006-01-09 14:11:27 UTC

We are seeing the same thing.  We are using the 2.4.21-37 kernel with lvm
version 1.0.8-14.  I'll attach our log messages that occur when this happens.

Comment 10 Jenett Tillotson 2006-01-09 14:13:11 UTC

Created attachment 122947 [details]
Entries in the messages log associated with this bug

These are the error messages we're seeing associated with this problem.

Comment 16 Debbie Mazurek 2006-11-02 21:33:59 UTC

Created attachment 140190 [details]
kernel panic

Comment 17 Debbie Mazurek 2006-11-02 21:37:10 UTC

see "kernel panic" attachment.

Any news on this?  We are seeing this happen often, say, once a week or so.
Running 2.4.21-47.ELsmp kernel, and lvm-1.0.8-14.

Comment 18 Josef Bacik 2006-11-02 21:40:00 UTC

Created attachment 140191 [details]
patch that resolves the problem.

This is the patch that appears to resolve the problem.	If anybody out there
wants to test this and verify it that would be helpfull, as the person I had
testing it has not come back with a definite "yes" yet.

Comment 19 Debbie Mazurek 2006-11-02 21:53:34 UTC

see "kernel panic" attachment.

Any news on this?  We are seeing this happen often, say, once a week or so.
Running 2.4.21-47.ELsmp kernel, and lvm-1.0.8-14.

Comment 20 Josef Bacik 2006-11-02 21:56:47 UTC

Test my patch out and see if it helps.  I need to get confirmation if it 
works.

Comment 21 Debbie Mazurek 2006-11-02 22:30:04 UTC

I'll give it a whirl & let you know.

Thanks.

Comment 22 Debbie Mazurek 2006-11-21 17:12:59 UTC

I applied the patch on Tuesday Nov 7.  The problem re-occurred on Sat Nov 11.  I
rebooted the machine and removed the offending snapshots on Monday Nov 13.  It
has been ok since then - 8 days.  The patch was applied to the latest kernel - 
2.4.21-47.0.1

I'm still seeing messages in the logs as above:

kjournald starting.  Commit interval 5 seconds
Can't write to read-only device 3a:02
EXT3-fs: mounted filesystem with ordered data mode.
Can't write to read-only device 3a:03
kjournald starting.  Commit interval 5 seconds
Can't write to read-only device 3a:03
EXT3-fs: mounted filesystem with ordered data mode.
lv_access for /dev/ug_vg/ug0 is 5
Can't write to read-only device 3a:01
kjournald starting.  Commit interval 5 seconds
Can't write to read-only device 3a:01
EXT3-fs: mounted filesystem with ordered data mode.

/dev/ug_vg/ug0 is the snapshot from which I take a backup.
It is (generally) this snapshot that corrupts and sends ext3 into a panic.

Not sure if this info helps you or not - I'll keep running the patched kernel on
this machine (I have others without the patch too) and advise if anything is
different.  The unpatched machines, by the way, have continued the panic
behaviour as before.

Comment 23 Debbie Mazurek 2006-11-28 18:39:27 UTC

I can confirm that the patch does not fix this error.

deb

Comment 24 Josef Bacik 2006-11-28 19:00:52 UTC

Grep through your logs, there should be a message like this

lv_access for <device> is <some number>

please put those messages in here, or attach your logs, either one works for 
me.

Comment 25 Debbie Mazurek 2006-11-28 20:03:55 UTC

root@nsb<8>grep -i lv_access /var/log/messages

Nov 26 23:55:05 nsb kernel: lv_access for /dev/ug_vg/ug0 is 5
Nov 27 23:55:04 nsb kernel: lv_access for /dev/ug_vg/ug0 is 5

Comment 26 Josef Bacik 2006-11-28 20:10:33 UTC

sorry, may have the panic text as well in its entirety.  in the meantime I 
will re-evaluate this fix, its been a few months since I've looked at this 
issue.

Comment 27 Debbie Mazurek 2006-11-28 20:31:07 UTC

sure, here it is:

Nov 27 23:55:03 nsb kernel: kjournald starting.  Commit interval 5 seconds
Nov 27 23:55:03 nsb kernel: Can't write to read-only device 3a:03
Nov 27 23:55:03 nsb kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov 27 23:55:03 nsb modprobe: modprobe: Can't locate module block-major-43
Nov 27 23:55:03 nsb last message repeated 31 times
Nov 27 23:55:04 nsb kernel: lv_access for /dev/ug_vg/ug0 is 5
Nov 27 23:55:04 nsb kernel: Can't write to read-only device 3a:01
Nov 27 23:55:04 nsb kernel: kjournald starting.  Commit interval 5 seconds
Nov 27 23:55:04 nsb kernel: EXT3-fs: lvm(58,1): orphan cleanup on readonly fs
Nov 27 23:55:04 nsb kernel: EXT3-fs: lvm(58,1): 1 orphan inode deleted
Nov 27 23:55:04 nsb kernel: EXT3-fs: recovery complete.
Nov 27 23:55:04 nsb kernel: lvm - lvm_map: ll_rw_blk write for readonly LV
/dev/ug_vg/ug0
Nov 27 23:55:04 nsb last message repeated 5 times
Nov 27 23:55:04 nsb kernel: Error (-5) on journal on device 3a:01
Nov 27 23:55:04 nsb kernel: Aborting journal on device lvm(58,1).
Nov 27 23:55:05 nsb kernel: journal commit I/O error
Nov 27 23:55:05 nsb kernel: Can't write to read-only device 3a:01
Nov 27 23:55:07 nsb kernel: Assertion failure in journal_flush_Rsmp_e2f189ce()
at journal.c:1356: "!journal->j_checkpoint_transactions"
Nov 27 23:55:07 nsb kernel: ------------[ cut here ]------------
Nov 27 23:55:07 nsb kernel: kernel BUG at journal.c:1356!
Nov 27 23:55:07 nsb kernel: invalid operand: 0000
Nov 27 23:55:07 nsb kernel: parport_pc lp parport floppy sr_mod ide-cd cdrom sg
nfsd lockd sunrpc dcdipm dcdbas autofs4 iptable_filter ip_tables e1000 microcode
st loop keybdev mousedev
Nov 27 23:55:07 nsb kernel: CPU:    2
Nov 27 23:55:07 nsb kernel: EIP:    0060:[<f8884d5c>]    Tainted: PF
Nov 27 23:55:07 nsb kernel: EFLAGS: 00010282
Nov 27 23:55:07 nsb kernel:
Nov 27 23:55:07 nsb kernel: EIP is at journal_flush_Rsmp_e2f189ce [jbd] 0x1bc
(2.4.21-47.0.1.ELcustom-lvm-patch/i686)
Nov 27 23:55:07 nsb kernel: eax: 0000006f   ebx: f61f4680   ecx: 00000001   edx:
c0388e98
Nov 27 23:55:07 nsb kernel: esi: f5d39400   edi: 00000000   ebp: fffffffb   esp:
e7eede18
Nov 27 23:55:07 nsb kernel: ds: 0068   es: 0068   ss: 0068
Nov 27 23:55:07 nsb kernel: Process mount (pid: 21518, stackpage=e7eed000)
Nov 27 23:55:07 nsb kernel: Stack: f88872b0 f88861e5 f88860ee 0000054c f88875c8
e2632400 e2632400 c97ca400
Nov 27 23:55:07 nsb kernel:        00000001 f8898f29 f5d39400 e2632448 e2632400
e2632448 f889867c e2632400
Nov 27 23:55:07 nsb kernel:        c97ca400 00000001 00000000 e7eede9c 00000000
0000000d 00000000 00000000
Nov 27 23:55:07 nsb kernel: Call Trace:   [<f88872b0>] .rodata.str1.4 [jbd]
0xf70 (0xe7eede18)
Nov 27 23:55:07 nsb kernel: [<f88861e5>] .rodata.str1.1 [jbd] 0x675 (0xe7eede1c)
Nov 27 23:55:07 nsb kernel: [<f88860ee>] .rodata.str1.1 [jbd] 0x57e (0xe7eede20)
Nov 27 23:55:07 nsb kernel: [<f88875c8>] .rodata.str1.4 [jbd] 0x1288 (0xe7eede28)
Nov 27 23:55:07 nsb kernel: [<f8898f29>] ext3_mark_recovery_complete [ext3] 0x19
(0xe7eede3c)
Nov 27 23:55:07 nsb kernel: [<f889867c>] ext3_read_super [ext3] 0x73c (0xe7eede50)
Nov 27 23:55:07 nsb kernel: [<c016ce62>] get_sb_bdev [kernel] 0x1b2 (0xe7eedeb0)
Nov 27 23:55:07 nsb kernel: [<f88a1a34>] ext3_fs_type [ext3] 0x0 (0xe7eedef4)
Nov 27 23:55:07 nsb kernel: [<c016d211>] do_kern_mount [kernel] 0x121 (0xe7eedefc)
Nov 27 23:55:07 nsb kernel: [<f88a1a34>] ext3_fs_type [ext3] 0x0 (0xe7eedf00)
Nov 27 23:55:07 nsb kernel: [<c0185b23>] do_add_mount [kernel] 0x93 (0xe7eedf20)
Nov 27 23:55:07 nsb kernel: [<c0185e50>] do_mount [kernel] 0x160 (0xe7eedf40)
Nov 27 23:55:07 nsb kernel: [<c0185ca1>] copy_mount_options [kernel] 0x81
(0xe7eedf70)
Nov 27 23:55:07 nsb kernel: [<c018633f>] sys_mount [kernel] 0xdf (0xe7eedf90)
Nov 27 23:55:07 nsb kernel: [<c02b006f>] no_timing [kernel] 0x7 (0xe7eedfc0)
Nov 27 23:55:07 nsb kernel:
Nov 27 23:55:07 nsb kernel: Code: 0f 0b 4c 05 ee 60 88 f8 e9 0d ff ff ff 8d b4
26 00 00 00 00
Nov 27 23:55:07 nsb kernel:
Nov 27 23:55:07 nsb kernel: Kernel panic: Fatal exception
Nov 27 23:55:07 nsb kernel: lvm - lvm_map: ll_rw_blk write for readonly LV
/dev/ug_vg/ug0
Nov 27 23:55:07 nsb last message repeated 3 times

Let me know if you need anything else.

deb

Comment 28 Josef Bacik 2006-11-28 20:36:31 UTC

well that patch I wrote was just straight up wrong.  I don't beleive this to 
be a problem with the LVM package itself, at least not in the sense that the 
problem will never occur with the previous package, as there is nothing 
between the two packages that changed that would cause different behavior in 
this case.  I'm going to try and reproduce again and hopefully get something 
viable today or tomorrow.

Comment 31 Josef Bacik 2006-11-29 14:01:19 UTC

ok so here's the sequence of events as far as i understand it

1) main FS deletes a file, and that inode gets added to the orphan list but the
file is not yet deleted.  ( the inode is added to the orphan list on the disk,
so in case of a crash the orphan recovery stuff will delete the inode before
completely mounting the FS)
2) mount the snapshot LV
3) the FS is ext3 so we see the inode on the orphan list on the disk, and go to
do recover
4) you see the

EXT3-fs: lvm(58,1): orphan cleanup on readonly fs

message, at this point we automatically set the superblock to RW and do the
orphan cleanup.
5) the inode deletion is a journaled operation, so before mounting completely we
do a journal_recover_completed or something like that, which in turn does a
journal_flush(), which of course fails b/c the disk is RO.

6) we assert that the journal has been completed, which of course it wasn't, and
we panic.

Now this is a series of events that are completely seperate from eachother so
I'm still a little puzzled on how to fix it.  In the meantime, I beleive a
workaround for this problem would be to mount the snapshot as ext2, which takes
journalling completely out of the equation and so we should avoid this problem.

I have yet to reproduce, but I think i've got a good enough idea of whats going
on to try and come up with a fix.  Debbie will you change your scripts to mount
the snapshot volume as ext2 to make sure the problem doesn't reproduce.  It
shouldn't so at least you will be more stable until a solution is found.

Comment 32 Eric Sandeen 2006-11-29 21:21:41 UTC

What happens if you open a file, delete it while holding it open, then create a
snapshot, and try to mount that snapshot.

Something like:

# touch foo
# tail -f foo &
# rm foo
# <snapshot>
# <mount snapshot>

Does that recreate the problem?
-Eric

Comment 33 Josef Bacik 2006-11-29 21:27:29 UTC

eric you rule, reproduced first shot.

Comment 34 Debbie Mazurek 2006-11-30 00:13:00 UTC

I'll be mounting the snapshot as ext2 tonight on 3 servers, I'll let you know
how it goes.

Thanks guys.

Comment 35 Josef Bacik 2006-11-30 15:08:28 UTC

ok looks like the correct behavior is to check if the underlying block device 
is read only when we go to do the orphan recovery during mount, and if it is 
just print a message and return without doing the orphan recovery.  I tested 
your path this morning eric and it worked well, and foo was not visible when I 
mounted the snapshot.

Comment 36 Eric Sandeen 2006-11-30 15:29:27 UTC

I'll get the patch to fix this submitted to the relevant places.

Thanks for the analysis and testing, Josef!  This probably needs to be fixed in
rhel4 and 5 too.

Comment 37 Eric Sandeen 2006-12-05 21:49:17 UTC

For what it's worth, this does not seem to be a problem on RHEL4/5, but it is
possibly a different problem - because something is allowing ext3 orphan inode
recovery to proceed on the readonly lvm snapshot, which writes to it, which is
probably not good.

Comment 39 Ernie Petrides 2006-12-06 02:12:52 UTC

Correcting component (from "lvm" to "kernel").

Comment 40 Eric Sandeen 2006-12-07 18:24:20 UTC

Update description to something more descriptive.

Comment 41 Jay Turner 2006-12-18 14:19:23 UTC

QE ack for RHEL3.9.

Comment 42 Eric Sandeen 2006-12-21 19:08:33 UTC

update - the first component of this (skip orphan processing if underlying
device is readonly) has made it into the stable upstream kernel tree.  The 2nd
part, processing the orphan list if the underlying device & the filesystem both
transition to readonly via remount, is still languishing upstream.  Pinged
Andrew on that today.

Comment 43 Eric Sandeen 2006-12-21 19:11:54 UTC

Taking out of POST, will re-post when the 2nd part that Stephen asked for has
been approved for upstream inclusion.

Comment 45 Ernie Petrides 2007-02-08 04:07:51 UTC

A fix for this problem has just been committed to the RHEL3 U9
patch pool this evening (in kernel version 2.4.21-47.5.EL).

Comment 50 Red Hat Bugzilla 2007-06-11 17:53:05 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0436.html