Bug 167962
Summary: | GFS causes kernel panic when moving 8G file | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Joseph Salisbury <jts> |
Component: | gfs | Assignee: | Wendy Cheng <nobody+wcheng> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Mike McLean <mikem> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | bmarzins, ccaulfie, dshaks, jbrassow, kanderso, rkenna |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-30 20:19:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joseph Salisbury
2005-09-09 21:12:58 UTC
We need the system logs and the console messages that got printed out when the panic happened. Let me add some background on the configuation and the errors Joe and I are seeing; System - 4-node x86_64 dl585 4-cpu, 16 GB of memory (amd64) SW - RHEL3 U6 with GFS 6.0 with 2-dual card QLA2300's per node. We used Qlogic driver for failover which reduced paths from 4 downto 2 Presented 4 luns to 2 HBA each to POOL and enabled round-robin striping. The Oracle 1-G R1 database ran fine with sync IO or async IO. We filled up 1lun with "space", so Joe was attempting to move the file (via "mv") or copy it using (via "cp"). Either operation failed. We are getting an Oracle error (reported to Oracle) if /when we attempt to used Direct_IO (Oracle simply attempts to open the file with O_DIRECT flag). We are in the process of rebuilding without round-robin stripping and/or attempting to reproduce on 4-node at Westford (rather than at HP) The system panics during boot now. Below is the console output while trying to boot the system. Seems to be metadata corruption: GFS: fsid=gfsrac:oraindex.2: Done GFS: fsid=gfsrac:oraundo.2: Joined cluster. Now mounting FS... GFS: fsid=gfsrac:oraundo.2: jid=2: Trying to acquire journal lock... GFS: fsid=gfsrac:oraundo.2: jid=2: Looking at journal... GFS: fsid=gfsrac:oraundo.2: jid=2: Done GFS: fsid=gfsrac:oraundo.2: Scanning for log elements... GFS: fsid=gfsrac:oraundo.2: Found 0 unlinked inodes GFS: fsid=gfsrac:oraundo.2: Found quota changes for 0 IDs GFS: fsid=gfsrac:oraundo.2: Done [ OK ] Starting automount: Bad metadata at 7039232, should be 5 mh_magic = 0x00C50453 mh_type = 223219726 mh_generation = 878477087085035717 mh_format = 69010723 t k mh_incarn = 537791556 This architecture does not implement dump_stack() Kernel panic: GFS: Assertion failed on line 1253 of file linux_dio.c GFS: assertion: "metatype_check_magic == GFS_MAGIC && metatype_check_type == ((h eight) ? (5) : (4))" GFS: time = 1126539848 GFS: fsid=gfsrac:oraindex.2 We setup a RHEL3 U6 3-node (used to be 4-node), cluster here in Westford, no problem copying 8 15 GB files between mount points. Running Oracle now with DIO (o_direct) to a gfs volume now with tpc-H 10G size. Results will be available by tomorrow AM Direct IO runs completed, comparison of performance is below; Not using directIO TPC-H fills up the file caches and we can only get 11 MB/sec Timing is Seconds m1timing:Execution Time Throughput Test: 89020.63 m2timing:Execution Time Throughput Test: 75675.87 With Direct IO tpc-H can run at 1 FC adapter bandwidth of 180 MB/sec (of 200 MB/sec peak). m1timing:Execution Time Throughput Test: 6550.27 m2timing:Execution Time Throughput Test: 6616.08 A nice 12 X improvement. Back to re-create the larger cluster file systems with round-robin for pool
We can not reproduce this on x86 cluster, DIO works fine to create and run TPC-
H on 3 nodes (1-node died on me) this is running RHEL3-U6 with GFS 6.0.
Additional file corruptions have occured any time we try to create databases
with Direct IO ... the file system get corrupted;
Sep 14 21:57:02 spa65 su(pam_unix)[3756]: session opened for user oracle by
(uid=0)
Sep 14 21:57:02 spa65 su(pam_unix)[3596]: session opened for user oracle by
(uid=0)
Sep 14 21:57:03 spa65 kernel: This architecture does not implement dump_stack()
>> Sep 14 21:57:03 spa65 kernel: Kernel panic: GFS: Assertion failed on line
1227 of file rgrp.c
Sep 14 22:16:03 spa65 syslogd 1.4.1: restart.
Sep 14 22:16:03 spa65 syslog: syslogd startup succeeded
Sep 14 22:16:03 spa65 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Sep 14 22:16:03 spa65 kernel: ok
We then re-fsck each file system
gfs_fsck -v -y /dev/pool/oraundo
gfs_fsck -v -y /dev/pool/oraindex
gfs_fsck -v -y /dev/pool/oradata
[root@spa65 log]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/cciss/c0d0p6 159738020 4735588 146888132 4% /
/dev/cciss/c0d0p1 518000 47196 444492 10% /boot
none 8005308 0 8005308 0% /dev/shm
/dev/cciss/c0d0p5 8256808 39928 7797460 1% /tmp
/dev/cciss/c0d0p3 8256824 222572 7614828 3% /var
/dev/pool/oracle_base
38572512 4938704 33633808 13% /opt/oracle
/dev/pool/oradata 720029760 320741308 399288452 45% /oradata
/dev/pool/oraindex 38572512 540 38571972 1% /oraindex
/dev/pool/oraundo 38572512 23599260 14973252 62% /oraundo
We have ALSO reproduced this on a 2nd x86_64 system with the same 10G R1 system
running on AMD64 systems using iSCSI under RHEL4 U2 running GFS 6.1
The original estimation about this issue was to: * take VFS layer fix from bz 147870 (rhel 3 oracle pwrite issue) * add similar GFS patch from bz 169154 (rhel 4 direct io) Unfortunately, after really going thru the code today, the gfs direct io implementations are different between rhel 4 (where most of the works is done by base kernel) and rhel 3 (gfs does its own thing). In short, this problem is different from bz 169154. I don't have a good understanding about the root cause of this rhel 3 problem this moment. hi, I have to re-examine the issue ... Is this problem is about "cp an 8G file between different GFS filessytems" could cause a panic ? It is fuzzy to me why this is DirectIO related ? Could you explain more ? I need to be more specific - when we do "cp /mnt/gfs1/myfile /mnt/gfs2/myfile", there should *not* have any Direct IO code involved. How do you make this command crash the kernel in DIO code ? Did you set inherit_direct_io flag on the filesystem ? The current thinking, is the file got corrupt when it was written w/ DIO by Oracle, previously Oracle had already bailed out. After a reboot, Joe's attempt to copy the corrupt file caused GFS to panic as it detected the corruption during the read. Could I have one of these files to do experiments ? Phone call with dshak and he said this couldn't get recreated on perf5-7 (i686) cluster (that I'm currently working on) but I forgot to ask: 1. Did this *only* happen with x86_64 nodes ? Running 32 bit or 64 bit ? Was the copy between i686 and x86_64 nodes ? 2. Did the crashes happen on "dest" node or "src" node (when doing "cp") ? Sorry, typo in 2) - should be: "Did the crash happen on "dest" file or "src" file" ? However, I guess since we never capture the vmcore and/or the stack dump, it is unknown at this moment. So don't worry about it.. I'll try to get a pair of x86_64 nodes to try this out. This does not happen when moving basic 8GB files around ... We have completed full IOzone testing to 1m-32 GB files (doubling each time). This only occured AFTER a database was built with DIO. Then trying to isolate, we were moving files around when GFS would panic. We are running up2date on the cluster to fix a different EXT3 issue https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171931 After that we will try to reproduce, and will give you access the the system if we can reproduce. Able to crash the RHEL3 U6 cluster again running the TPC-H part of Oracle 10G Stress test - OAST. Repeated 9 times, trying different nodes, and cleaning up with fsck_gfs # DISK SUMMARY (/sec) #Reads R-Merged R-KBytes Writes W-Merged W-KBytes 5 0 17 9818 71128 88661 Ouch! [root@spa65 GFS]# vmstat 3 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 18 0 13200292 29992 1543228 0 0 112 3927 188 919 24 6 66 4 0 18 0 13198076 30064 1546904 0 0 305 88885 6163 1429 2 7 1 91 2 17 0 13197820 30064 1547880 0 0 31 89352 6030 641 2 5 0 93 0 18 0 13197404 30072 1548316 0 0 18 89529 6066 719 2 6 0 93 0 16 0 13197168 30080 1548896 0 0 29 89062 6218 915 2 6 0 93 0 16 0 13196744 30080 1549408 0 0 28 88999 6310 913 1 5 0 94 3 1 0 13685528 30080 1053740 0 0 23 58755 12405 64286 2 18 17 62 2 3 0 13685536 30080 1055000 0 0 36 77750 24667 222595 7 41 20 32 1 16 0 13683084 30088 1059720 0 0 383 91229 13249 78496 3 16 2 79 2 0 0 13684024 30096 1055052 0 0 134 51498 15714 163853 3 28 21 48 0 14 0 13676516 30096 1067272 0 0 3475 95926 8216 16311 2 8 1 89 7 0 0 13675676 30096 1066504 0 0 143 83475 7261 14529 1 7 4 88 Message from syslogd@spa65 at Wed Nov 16 15:10:42 2005 ... spa65 kernel: Kernel panic: GFS: Assertion failed on line 550 of file rgrp.c Nov 13 04:03:01 spa65 syslogd 1.4.1: restart. Nov 14 14:01:47 spa65 sshd(pam_unix)[17741]: session opened for user root by (uid=0) Nov 14 14:01:51 spa65 sshd(pam_unix)[17741]: session closed for user root Nov 14 14:02:13 spa65 sshd(pam_unix)[17794]: session opened for user oracle by (uid=0) Nov 14 19:04:11 spa65 kernel: Bad metadata at 72654208, should be 5 Nov 14 19:04:11 spa65 kernel: mh_magic = 0x00020000 Nov 14 19:04:11 spa65 kernel: mh_type = 2097152 Nov 14 19:04:11 spa65 kernel: mh_generation = 8444250867653466 Nov 14 19:04:11 spa65 kernel: mh_format = 2692808704 Nov 14 19:04:11 spa65 kernel: mh_incarn = 2797273088 Nov 14 19:04:11 spa65 kernel: This architecture does not implement dump_stack() Nov 14 19:08:28 spa65 syslogd 1.4.1: restart. Nov 14 19:08:28 spa65 syslog: syslogd startup succeeded Nov 14 19:08:28 spa65 kernel: klogd 1.4.1, log source = /proc/kmsg started. Nov 14 19:08:28 spa65 kernel: ok Nov 14 19:08:28 spa65 kernel: Bootdata ok (command line is ro root=LABEL=/ ) Nov 14 19:08:28 spa65 kernel: Linux version 2.4.21-37.ELsmp (bhcompile.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-53)) #1 SMP Wed Sep 7 13:32:18 EDT 2005 Was this same as the email you sent earlier that happened with-and-without DirectIO on x86_64 only nodes ? Yes this is with or without DIO. Shak Building a debug kernel now.. let me know if the machine slot is available. Cut and paste part of the report from dshak for reference purpose:
>
> In September 2005, issues with RHEL3 x86_64 were identified running
> heavy I/O workloads against
> the 4-node amd64 system w/ 16 GB of memory, 2 FC adapter/node, 4 200
> GB luns RAID0+1 HW, used Pool striping on GFS initially. Pool
> striping appears to be cause of kernel panics;
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167962
> We are working around the current stripped pool problem, using a
> re-layout to separate luns to match previous 9i CERT and Westford x86
> cluster. We keep the striped luns for debug later.
comment #17 and #18 refer to DIO. Is it possible to create this problem entirely without using DIO? answer to comment #21: YES (according to shak). We can request time on the HP system again if still available? Remember the symptom - POOL was configured using POOL striping with 4 FC paths to the same LUN Does anyone have a machine with that many PATHs here at Red Hat? Shak - I've thrown this to you for "need info" given that Salisbury is "splitsville". - Rob We no longer have a system to try to reproduce this problem. Since we havn't seen field report for this issue, I'll close this as non-reproducible. Reopen it if necessary. |