Bug 220762
Summary: | GFS2 kernel BUG | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Smotritsky <mike.smey> |
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | swhiteho, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-02-27 16:58:20 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michael Smotritsky
2006-12-26 01:36:16 UTC
*** Bug 220761 has been marked as a duplicate of this bug. *** The same problem occurs on blade2 every time i modify file1 on blade1 and try to touch it on blade2... I suspect that this might alreaby be fixed in the upstream kernel as some changes were made in that area since FC6. Are you able to try an upstream kernel, or even the kernel from rawhide? Sorry, can't go to upstream or rawhide at the moment...these blades are getting ready to be in prod pilot... Which kernel should be more stable for GFS2? We realy would like to use it... When you think more stable (in GFS2 area) kernel will get into FC6? Thanks! Are you planning on using GFS2 in production? I would advise against that at this time. There are a couple of issues that are being working on at the moment that prevent gfs2 from being used in a production environment. I've just sent a bunch of GFS2 updates to go into the next FC6 kernel. Its in testing at the moment and should be released next week if all goes to plan. Please try that one when its released and see if that fixes your problem. I suspect it will, but do let us know if it doesn't. Great! Will try the new kernel as soon as it's out and let you know about the results... Thank you very much! Please test kernel 2.6.19-1.2895 and let me know if that fixes the problem for you. Thanks for the heads up... As soon as 2.6.19-1.2895 will show up in the updates repos I'll test it. I'v install new kernel, unfortunately I can't boot the blade - kernel panic in initrd - https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=224289 As soon as it's resolved i'll continue with testing gfs2 I've fixed my problem by recompiling initrd image with lvm support.. Will be testing gfs2 over the weekend... Also I've got all new updates - lvm2, cman and gfs2-tools After installing 2.6.19-1.2895 the second blade is failing to start fenced: *********************** Feb 12 22:07:44 vvsorms2 openais[10801]: [CLM ] Members Left: Feb 12 22:07:44 vvsorms2 openais[10801]: [CLM ] Members Joined: Feb 12 22:07:44 vvsorms2 openais[10801]: [CLM ] r(0) ip(192.168.1.163) Feb 12 22:07:44 vvsorms2 openais[10801]: [SYNC ] This node is within the primary component and will provide service. Feb 12 22:07:44 vvsorms2 openais[10801]: [TOTEM] entering OPERATIONAL state. Feb 12 22:07:44 vvsorms2 openais[10801]: [CMAN ] quorum regained, resuming activity Feb 12 22:07:44 vvsorms2 openais[10801]: [CLM ] got nodejoin message 192.168.1.163 Feb 12 22:07:51 vvsorms2 fenced[10817]: 192.168.1.160 not a cluster member after 6 sec post_join_delay Feb 12 22:07:51 vvsorms2 fenced[10817]: 192.168.1.163 not a cluster member after 6 sec post_join_delay Feb 12 22:07:51 vvsorms2 fenced[10817]: fencing node "192.168.1.160" Feb 12 22:07:51 vvsorms2 fenced[10817]: fence "192.168.1.160" failed Feb 12 22:07:56 vvsorms2 fenced[10817]: fencing node "192.168.1.160" Feb 12 22:07:56 vvsorms2 fenced[10817]: fence "192.168.1.160" failed Feb 12 22:08:01 vvsorms2 fenced[10817]: fencing node "192.168.1.160" Feb 12 22:08:01 vvsorms2 fenced[10817]: fence "192.168.1.160" failed ********************************************** First blade has no problem. I have tried to do clean start, post join delay - nothing seems to be diff in the log - 6 secs... cluster.conf: ************************** <?xml version="1.0"?> <cluster name="vvso" config_version="2"> <cman expected_votes="1" two_node="1"> </cman> <clusternodes> <clusternode name="vvsorms1" nodeid="1"> <fence> <method> <device name="blade_center" blade="1"/> </method> </fence> </clusternode> <clusternode name="vvsorms2" nodeid="2"> <fence> <method> <device name="blade_center" blade="2"/> </method> </fence> </clusternode> </clusternodes> <fence_daemon clean_start="0"> </fence_daemon> <fencedevices> <fencedevice name="blade_center" agent="fence_bladecenter" ipaddr="xxx.xxx.xxx.xxx" login="xxxxxxxx" passwd="xxxxxxxx"/> </fencedevices> </cluster> ***************************************************************** I wonder what could be the problem here? It was OK on the prev kernel... Thanks! Mike Patrick, any ideas on this one? That's just a fencing failure. The way to find out why fencing is failing is to try running it manually: 'fence_node 192.1.168.1.160'. Actually it looks slightly odd that fenced is referring to nodes by IP address rather than name - which might explain why it's failing as they have names in the config file. I don't know where fenced gets its node names from so I can't really add much beyond that. Thanks for the suggestion... After replacing IPs with names in the cluster.conf (I had ips originally - that is why log shows that) the services startted up fine and the original Kernel BUG is gone! The new kernel looks like fixed it... I have another question though: when i stop gfs2 service I get ***************************************** [vvsorms2:/capsit]# service gfs2 stop Unmounting GFS2 filesystems: [ OK ] FATAL: Module lock_dlm is in use. FATAL: Module gfs2 is in use. ***************************************** is it OK? if I try to start it up again it starts up just fine... thank you very much for your help.. I will be contiuing testing now with the real application. Our application is very demanding of the file system - it stores 10s of millons of small files (10-30k) and some very large ones (1-2G) so it will be good test for the GFS2... Steve... Do you want me to post my notes here or on some other forum? I'm sure giving the nature of our app I'll have some info to share... This particular bug has been resolved... Thanks again Mike Note that since the FC-6 kernel got upgraded to 2.6.20 it seems that some of the GFS2 patches got lost along the way. I've just sent a new patch to bring it uptodate so hopefully that will make it in shortly. Its probably best to post your notes to the cluster-devel mailing list unless there is a confidential aspect to them. If you are happy with the current state of this bug, then I'll mark it closed. thanks for the warning, does 2.6.20 after the update willhave anything new in gfs2 since 2.6.19? I'll post updates of my testing on cluster-devel.. and this bug is closed, thanks Yes, there will be a number of new patches and they are well worth having. Not really sure which resolution is best, none of them seem to fit the situation really, so I've picked what I think is the closest. |