140361 – Bad metadata assertion in trans.c: "meta_check_magic == GFS_MAGIC"

Bug 140361 - Bad metadata assertion in trans.c: "meta_check_magic == GFS_MAGIC"

Summary: Bad metadata assertion in trans.c: "meta_check_magic == GFS_MAGIC"

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Wendy Cheng
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-11-22 16:34 UTC by Corey Marthaler
Modified:	2010-01-12 03:01 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-10-11 22:18:57 UTC
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2004-11-22 16:34:27 UTC

Description of problem:
I was running a regression load this weekend on the latest 6.0 rpms,
 
GFS v6.0.0 (built Nov 18 2004 14:10:18) installed
Kernel 2.4.21-26.ELsmp on an i686

on morph-01 - morph-06, and morph-03 hit this panic/assertion. 

I/O load at the time:
iogen -f sync -i 30s -m random -s read,write,readv,writev
iogen -f sync -i 30s -m sequential -s read,write,readv,write
iogen -f buffered -i 30s -m sequential -s read,write,readv,write

Nov 20 04:42:15 morph-03 kernel: Bad metadata at 65130
Nov 20 04:42:15 morph-03 kernel:   mh_magic = 0x35303030
Nov 20 04:42:15 morph-03 kernel:   mh_type = 980250482
Nov 20 04:42:15 morph-03 kernel:   mh_generation = 8099773614866981999
Nov 20 04:42:15 morph-03 kernel:   mh_format = 1768892995
Nov 20 04:42:15 morph-03 kernel:   mh_incarn = 976303408
Nov 20 04:42:15 morph-03 kernel: Kernel panic: GFS: Assertion failed
on line 318 of file tr
ans.c
Nov 20 04:42:15 morph-03 kernel: GFS: assertion: "meta_check_magic ==
GFS_MAGIC"
Nov 20 04:42:15 morph-03 kernel: GFS: time = 1100947335
Nov 20 04:42:15 morph-03 kernel: GFS: fsid=morph-cluster:stripe-504K.0



Shortly afterwards, morph-01 fences morph-03 and then hits the exact
same panic/assertion:

GFS: fsid=morph-cluster:stripe-504K.1: Joined cluster. Now mounting FS...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Trying to acquire
journal lock...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Looking at journal...
GFS: fsid=morph-cluster:stripe-504K.1: jid=1: Done
lock_gulm: Checking for journals for node "morph-03.lab.msp.redhat.com"
GFS: fsid=morph-cluster:stripe-504K.1: jid=0: Trying to acquire
journal lock...
GFS: fsid=morph-cluster:stripe-504K.1: jid=0: Busy
Bad metadata at 65130
  mh_magic = 0x35303030
  mh_type = 980250482
  mh_generation = 8099773614866981999
  mh_format = 1768892995
  mh_incarn = 976303408
Kernel panic: GFS: Assertion failed on line 318 of file trans.c
GFS: assertion: "meta_check_magic == GFS_MAGIC"
GFS: time = 1100947822
GFS: fsid=morph-cluster:stripe-504K.1

How reproducible:
Sometimes

Comment 1 Corey Marthaler 2004-11-22 16:36:11 UTC

FWIW, morph-01 was the Gulm server

Comment 2 Corey Marthaler 2004-11-24 22:25:27 UTC

complete cmdlines:

iogen -f sync -i 30s -m random -s read,write,readv,writev -t 1b -T
40000b 40000b:rwransynclarge | doio -av

iogen -f sync -i 30s -m sequential -s read,write,readv,writev -t 1b -T
 40000b 40000b:rwransynclarge | doio -av

iogen -f buffered -i 30s -m sequential -s read,write,readv,writev -t
1b -T 40000b 40000b:rwbuflarge | doio -av

You can reduce/increase the 40000b filesize depending on your gfs size

Comment 3 michael conrad tadpol tilstra 2004-11-29 16:39:05 UTC

For what its worth, playing with this setup, and I don't get asserts,
but the qla2200 driver dumps the following line to syslog sometimes.

Nov 29 10:34:08 va13 kernel: Invalid packet 21 count! 15

Comment 4 michael conrad tadpol tilstra 2004-11-29 20:01:23 UTC

which fc card does the morph machines have?

Comment 5 Corey Marthaler 2004-11-29 20:05:20 UTC

qla2300: morph-01, morph-02, morph-03, morph-06 
 
lpfc: morph-04, morph-05

Comment 6 michael conrad tadpol tilstra 2004-11-29 20:35:10 UTC

looking at the qla2x00.c file, that message is printed when there are
more items in an array than the array is defined to hold.  That's
nice.  And yet I'm not panicing? weird.

I don't suppose you know if there were any messages like the one I
posted above on your machines? (syslog might have caught them.)

Comment 7 Corey Marthaler 2004-11-29 20:46:05 UTC

It's possible, but the machines have been reimaged a few times since 
then so all data in /var/log/messages has long been lost. :(  
 
Hopefully we'll reproduce this in our RHEL3 rpm testing coming up.

Comment 8 michael conrad tadpol tilstra 2004-11-29 21:01:57 UTC

Reformated my FC raid to ext2, ran the iogen load, got the same Invalid packet
message.  I very much am wondering if this is actually a driver issue.

Will wait to see what your results are.

Comment 9 michael conrad tadpol tilstra 2004-11-29 21:49:57 UTC

For good measure, I took pool out of the path as well.  Still getting the
Invalid packet counts.

Comment 10 michael conrad tadpol tilstra 2004-11-29 22:04:35 UTC

oh, for the tests I did with ext2, I was only using one node.  So no cluster needed.
Then three filesystems, each a 1/3T.  All three get the iogen load above.

Comment 11 michael conrad tadpol tilstra 2005-01-05 14:33:39 UTC

waiting to see if the qlogic driver the morph nodes is printing out warnings or
errors under the given load.

Comment 12 Cory Ranschau 2005-03-01 21:47:48 UTC

I have a qla2200 module loaded on a 2.4.21-27.0.2.ELsmp kernel that is
also giving me the message "Invalid packet 21 count! 15" when I write
to the attached disk device.  There doesn't appear to be any errors
when writing and performance is as expected.  Any ideas?

Comment 14 Corey Marthaler 2005-10-11 22:18:57 UTC

Have not seen this bug in almost a year, will reopen if seen again.

Note You need to log in before you can comment on or make changes to this bug.