Bug 140361 - Bad metadata assertion in trans.c: "meta_check_magic == GFS_MAGIC"
Bad metadata assertion in trans.c: "meta_check_magic == GFS_MAGIC"
Status: CLOSED WORKSFORME
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Wendy Cheng
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-22 11:34 EST by Corey Marthaler
Modified: 2010-01-11 22:01 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-11 18:18:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-11-22 11:34:27 EST
Description of problem:
I was running a regression load this weekend on the latest 6.0 rpms,
 
GFS v6.0.0 (built Nov 18 2004 14:10:18) installed
Kernel 2.4.21-26.ELsmp on an i686

on morph-01 - morph-06, and morph-03 hit this panic/assertion. 

I/O load at the time:
iogen -f sync -i 30s -m random -s read,write,readv,writev
iogen -f sync -i 30s -m sequential -s read,write,readv,write
iogen -f buffered -i 30s -m sequential -s read,write,readv,write

Nov 20 04:42:15 morph-03 kernel: Bad metadata at 65130
Nov 20 04:42:15 morph-03 kernel:   mh_magic = 0x35303030
Nov 20 04:42:15 morph-03 kernel:   mh_type = 980250482
Nov 20 04:42:15 morph-03 kernel:   mh_generation = 8099773614866981999
Nov 20 04:42:15 morph-03 kernel:   mh_format = 1768892995
Nov 20 04:42:15 morph-03 kernel:   mh_incarn = 976303408
Nov 20 04:42:15 morph-03 kernel: Kernel panic: GFS: Assertion failed
on line 318 of file tr
ans.c
Nov 20 04:42:15 morph-03 kernel: GFS: assertion: "meta_check_magic ==
GFS_MAGIC"
Nov 20 04:42:15 morph-03 kernel: GFS: time = 1100947335
Nov 20 04:42:15 morph-03 kernel: GFS: fsid=morph-cluster:stripe-504K.0



Shortly afterwards, morph-01 fences morph-03 and then hits the exact
same panic/assertion:

GFS: fsid=morph-cluster:stripe-504K.1: Joined cluster. Now mounting FS...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Trying to acquire
journal lock...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Looking at journal...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Done
lock_gulm: Checking for journals for node "morph-03.lab.msp.redhat.com"
GFS: fsid=morph-cluster:stripe-504K.1: jid=0: Trying to acquire
journal lock...
GFS: fsid=morph-cluster:stripe-504K.1: jid=0: Busy
Bad metadata at 65130
  mh_magic = 0x35303030
  mh_type = 980250482
  mh_generation = 8099773614866981999
  mh_format = 1768892995
  mh_incarn = 976303408
Kernel panic: GFS: Assertion failed on line 318 of file trans.c
GFS: assertion: "meta_check_magic == GFS_MAGIC"
GFS: time = 1100947822
GFS: fsid=morph-cluster:stripe-504K.1

How reproducible:
Sometimes
Comment 1 Corey Marthaler 2004-11-22 11:36:11 EST
FWIW, morph-01 was the Gulm server
Comment 2 Corey Marthaler 2004-11-24 17:25:27 EST
complete cmdlines:

iogen -f sync -i 30s -m random -s read,write,readv,writev -t 1b -T
40000b 40000b:rwransynclarge | doio -av

iogen -f sync -i 30s -m sequential -s read,write,readv,writev -t 1b -T
 40000b 40000b:rwransynclarge | doio -av

iogen -f buffered -i 30s -m sequential -s read,write,readv,writev -t
1b -T 40000b 40000b:rwbuflarge | doio -av

You can reduce/increase the 40000b filesize depending on your gfs size
Comment 3 michael conrad tadpol tilstra 2004-11-29 11:39:05 EST
For what its worth, playing with this setup, and I don't get asserts,
but the qla2200 driver dumps the following line to syslog sometimes.

Nov 29 10:34:08 va13 kernel: Invalid packet 21 count! 15

Comment 4 michael conrad tadpol tilstra 2004-11-29 15:01:23 EST
which fc card does the morph machines have?
Comment 5 Corey Marthaler 2004-11-29 15:05:20 EST
qla2300: morph-01, morph-02, morph-03, morph-06 
 
lpfc: morph-04, morph-05 
Comment 6 michael conrad tadpol tilstra 2004-11-29 15:35:10 EST
looking at the qla2x00.c file, that message is printed when there are
more items in an array than the array is defined to hold.  That's
nice.  And yet I'm not panicing? weird.

I don't suppose you know if there were any messages like the one I
posted above on your machines? (syslog might have caught them.)
Comment 7 Corey Marthaler 2004-11-29 15:46:05 EST
It's possible, but the machines have been reimaged a few times since 
then so all data in /var/log/messages has long been lost. :(  
 
Hopefully we'll reproduce this in our RHEL3 rpm testing coming up. 
Comment 8 michael conrad tadpol tilstra 2004-11-29 16:01:57 EST
Reformated my FC raid to ext2, ran the iogen load, got the same Invalid packet
message.  I very much am wondering if this is actually a driver issue.

Will wait to see what your results are.
Comment 9 michael conrad tadpol tilstra 2004-11-29 16:49:57 EST
For good measure, I took pool out of the path as well.  Still getting the
Invalid packet counts.
Comment 10 michael conrad tadpol tilstra 2004-11-29 17:04:35 EST
oh, for the tests I did with ext2, I was only using one node.  So no cluster needed.
Then three filesystems, each a 1/3T.  All three get the iogen load above.
Comment 11 michael conrad tadpol tilstra 2005-01-05 09:33:39 EST
waiting to see if the qlogic driver the morph nodes is printing out warnings or
errors under the given load.
Comment 12 Cory Ranschau 2005-03-01 16:47:48 EST
I have a qla2200 module loaded on a 2.4.21-27.0.2.ELsmp kernel that is
also giving me the message "Invalid packet 21 count! 15" when I write
to the attached disk device.  There doesn't appear to be any errors
when writing and performance is as expected.  Any ideas?
Comment 14 Corey Marthaler 2005-10-11 18:18:57 EDT
Have not seen this bug in almost a year, will reopen if seen again.

Note You need to log in before you can comment on or make changes to this bug.