Description of problem: This assertion tripped as revolver happened to be building the sistina-test tree on each on the nodes in the morph cluster, I'm sure that was not the cause however. It had checked that all the filesystem were fine on each of the nodes and then started the build of sistina-test and then when it was time to starting shooting nodes, morph-06 had already paniced. Aug 2 16:46:19 morph-06 kernel: dlm: gfs2: ignore master reply 101f0 4 Aug 2 16:46:49 morph-06 kernel: Bad metadata at 65812, should be 5 Aug 2 16:46:49 morph-06 kernel: mh_magic = 0x01161970 Aug 2 16:46:49 morph-06 kernel: mh_type = 0 Aug 2 16:46:49 morph-06 kernel: mh_generation = 0 Aug 2 16:46:49 morph-06 kernel: mh_format = 0 Aug 2 16:46:49 morph-06 kernel: mh_incarn = 0 Aug 2 16:46:49 morph-06 kernel: Aug 2 16:46:49 morph-06 kernel: GFS: Assertion failed on line 1181 of file /usr/src/cluster/gfs-kernel/src/gfs/dio.c Aug 2 16:46:49 morph-06 kernel: GFS: assertion: "metatype_check_magic == GFS_MAGIC && metatype_check_type == ((height) ? (5) : (4))" Aug 2 16:46:49 morph-06 kernel: GFS: time = 1091483209 Aug 2 16:46:49 morph-06 kernel: GFS: fsid=morph-cluster:gfs0new.0 Aug 2 16:46:49 morph-06 kernel: Aug 2 16:46:49 morph-06 kernel: Kernel panic: GFS: Record message above and reboot. Aug 2 16:46:00 morph-05 kernel: CMAN: no HELLO from morph-06, removing from the cluster Aug 2 16:46:05 morph-05 kernel: dlm: gfs4: recover event 565 Aug 2 16:46:05 morph-05 kernel: dlm: gfs4: remove node 1 How reproducible: Didn't try
I reproduced this panic on 4 out of six nodes last night after about 3 - 4 hours of I/O load. The load was: genesis accordion growfiles iogen/doio
reproduced this again last night while running above I/O.
Reassign
This appears to be the same assertion with a stack trace this time. I've been seeing this quite a bit while running revolver lately GFS: fsid=morph-cluster:corey0.3: jid=2: Busy dlm: corey0: resent 0 requests dlm: corey0: recover event 78 finished Info fld=0x0, Current sda: sense key No Sense Bad metadata at 67395942, should be 4 mh_magic = 0x05004400 mh_type = 3305182976 mh_generation = 288230380446679040 mh_format = 16777216 mh_incarn = 0 [<f8a4de72>] gfs_assert_i+0x32/0xc0 [gfs] [<c01230c1>] vprintk+0x111/0x160 [<c0122fa7>] printk+0x17/0x20 [<f8a3837e>] gfs_meta_header_print+0x6e/0x80 [gfs] [<f8a1f579>] gfs_get_meta_buffer+0x1e9/0x360 [gfs] [<f8a2de5d>] gfs_copyin_dinode+0x2d/0x1b0 [gfs] [<c011f2d0>] default_wake_function+0x0/0x10 [<c011f2d0>] default_wake_function+0x0/0x10 [<f8a2d52d>] inode_go_lock+0x4d/0x60 [gfs] [<f8a2a6e5>] glock_wait_internal+0x105/0x220 [gfs] [<f8a2aa9f>] gfs_glock_nq+0x6f/0x100 [gfs] [<f8a2b1ae>] gfs_glock_nq_init+0x1e/0x40 [gfs] [<f8a4286a>] gfs_permission+0x4a/0x80 [gfs] [<f8a42820>] gfs_permission+0x0/0x80 [gfs] [<c0169708>] permission+0x68/0x70 [<c016b1d7>] may_open+0x47/0x260 [<c016b4a1>] open_namei+0xb1/0x650 [<c015bb9d>] filp_open+0x2d/0x60 [<c015be08>] get_unused_fd+0x78/0xd0 [<c015bf4c>] sys_open+0x3c/0xa0 [<c0105f5d>] sysenter_past_esp+0x52/0x71 Kernel panic - not syncing: GFS: fsid=morph-cluster:corey0.3: assertion "(metatype_check_magic == (0x01161970) && metatype_check_type == ((height) ? (5) : (4)))" failed GFS: fsid=morph-cluster:corey0.3: function = gfs_get_meta_buffer GFS: fsid=morph-cluster:corey0.3: file = /usr/src/cluster/gfs-kernel/src/gfs/dio.c, line = 1214 GFS: fsid=morph-cluster:corey0.3: time = 1103049397
I've been running the revolver load set to HEAVY on four filesystems in my cluster for over two days now, with no sign of this bug. Corey ran his tests again last night, and did not see the bug. Either it got fixed as a side effect of fixing something else, or it will rear it's ugly head later... in which case, I'll deal with it then.
this has not been seen in almost a year, closing.