Bug 167916

Summary: GFS: Assertion failed on line 480 of file lops.c
Product: [Retired] Red Hat Cluster Suite Reporter: Nate Straz <nstraz>
Component: gfsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WORKSFORME QA Contact: GFS Bugs <gfs-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-27 15:27:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nate Straz 2005-09-09 15:11:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6

Description of problem:
While running the regression load on the latest 6.0 build for RHEL3 U6,

GFS 6.0.2.27 (built Wed 07 Sep 2005 02:00:40 PM CDT)
Kernel 2.4.21-37.EL on i686

Systems: tank-02 - tank-05.

 - tank-02 hung
 - tank-03 took over as Master
 - tank-03 fences tank-02
 - tank-04 does journal recovery of tank-02
 - tank-04 panics
Bad metadata at 2395840
  mh_magic = 0x0116193A
  mh_type = 7
  mh_generation = 416611827715
  mh_format = 700
  mh_incarn = 0
e59bbc14 f8b19372 00000010 00000000 c0121942 0000000a 00000400 f8b38c11 
       e59bbc60 00000000 ce8f94d8 f8b2848d e59bbc7c 00000000 ce8f94d8 00000000 
       f8b26274 f8b38889 f8b38807 000001e0 00000013 f8b5b000 e59bbc78 00000001 
Call Trace:   [<f8b19372>] gfs_asserti [gfs] 0x32 (0xe59bbc18)
[<c0121942>] printk [kernel] 0x122 (0xe59bbc24)
[<f8b38c11>] .rodata.str1.1 [gfs] 0x14c5 (0xe59bbc30)
[<f8b2848d>] gfs_meta_header_print [gfs] 0x5d (0xe59bbc40)
[<f8b26274>] replay_block [gfs] 0x274 (0xe59bbc54)
[<f8b38889>] .rodata.str1.1 [gfs] 0x113d (0xe59bbc58)
[<f8b38807>] .rodata.str1.1 [gfs] 0x10bb (0xe59bbc5c)
[<f8b2649c>] buf_scan_elements [gfs] 0x1bc (0xe59bbcec)
[<f8b2f75f>] foreach_descriptor [gfs] 0x27f (0xe59bbd44)
[<c010d61b>] do_IRQ [kernel] 0xfb (0xe59bbd8c)
[<f8b300d6>] gfs_recover_journal [gfs] 0x2d6 (0xe59bbe98)
[<f8b30298>] gfs_check_journals [gfs] 0x68 (0xe59bbfb4)
[<f8b0f0a8>] gfs_recoverd [gfs] 0x48 (0xe59bbfd4)
[<f8b097d0>] gfs_recoverd_bounce [gfs] 0x0 (0xe59bbfdc)
[<f8b097df>] gfs_recoverd_bounce [gfs] 0xf (0xe59bbfe8)
[<c010945d>] kernel_thread_helper [kernel] 0x5 (0xe59bbff0)

Kernel panic: GFS: Assertion failed on line 480 of file lops.
c
GFS: assertion: "meta_check_magic == GFS_MAGIC"
GFS: time = 1126227785
GFS: fsid=tank-cluster:stripe-512K.2

 - tank-03 fences tank-04

The loads on each system where:

tank-02:
 # Sequential Read/Write large files - buffered
 rwsbuflarge     (iogen -f buffered -i 30s -m sequential -s read,write,readv,writev -t 1b -T 400000b  400000b:rwbuflarge | doio -av) && rm -f rwbuflarge

 # Random Read/Write large files - buffered
 rwranbuflarge   (iogen -f buffered -i 30s -m random -s read,write,readv,writev -t 1b -T 400000b  400000b:rwranbuflarge | doio -av) && rm -f rwranbuflarge

tank-03:
 # Random Read/Write large files - buffered
 rwranbuflarge   (iogen -f buffered -i 30s -m random -s read,write,readv,writev -t 1b -T 400000b  400000b:rwranbuflarge | doio -av) && rm -f rwranbuflarge

 # Reverse Read/Write large files - buffered
 rwrevbuflarge   (iogen -f buffered -i 30s -m reverse -s read,write,readv,writev -t 1b -T 400000b  400000b:rwrevbuflarge | doio -av) && rm -f rwrevbuflarge

tank-04:
 # Random Read/Write large files - buffered
rwranbuflarge   (iogen -f buffered -i 30s -m random -s read,write,readv,writev -t 1b -T 400000b  400000b:rwranbuflarge | doio -av) && rm -f rwranbuflarge

 # Reverse Read/Write large files - buffered
 rwrevbuflarge   (iogen -f buffered -i 30s -m reverse -s read,write,readv,writev -t 1b -T 400000b  400000b:rwrevbuflarge | doio -av) && rm -f rwrevbuflarge

tank-05:
 # Random Read/Write large files - buffered
 rwranbuflarge   (iogen -f buffered -i 30s -m random -s read,write,readv,writev -t 1b -T 400000b  400000b:rwranbuflarge | doio -av) && rm -f rwranbuflarge

 # Reverse Read/Write large files - buffered
 rwrevbuflarge   (iogen -f buffered -i 30s -m reverse -s read,write,readv,writev -t 1b -T 400000b  400000b:rwrevbuflarge | doio -av) && rm -f rwrevbuflarge


Version-Release number of selected component (if applicable):
GFS-6.0.2.27-0

How reproducible:
Didn't try


Additional info:

Comment 1 Ben Marzinski 2005-09-13 17:07:56 UTC
Huh. The metaheader looks fine except for the first byte of the magic number,
which is wrong.  It looks like it was overwritten, but IO to the journal is
block based, and the magic number is the first thing on the block, so I don't
see how.
Ideally, If someone could recreate this, and save the busted journal, so I could
look though it by hand, that might shed some light on this... or it might not.

Comment 2 AJ Lewis 2005-09-13 17:25:56 UTC
Just a note: bad magic usually means bad disks, or possibly bad memory i suppose
- it's pretty hard to get GFS to write out a block with corrupted magic.

Comment 3 Ben Marzinski 2005-09-19 19:05:34 UTC
If someone can reproduce this and get me the journal (or just give me access to
the block device that the journal was on, and I'll get it myself), I'll look into
it.

Comment 4 Nate Straz 2006-11-27 15:27:22 UTC
I haven't seen this while running loads on any of our errata releases so I'm
going to close it.  RHEL3 is such a low priority now that I'm not going to take
the time to reproduce these hard to hit bugs on it.