Description of problem: When a cluster starts up initially the first node's cman/openais gets killed when the second node joins. Version-Release number of selected component (if applicable): This behaviour can be seen with 5.2 and 5.3. How reproducible: Steps to Reproduce: 1. Shutdown cman on both nodes 2. Startup cman initscript on node1; wait for cluster to be fully running (i.e. fencedomain must be joined; validate with cman_tool services) 3. Startup cman initscript on node2; node1 will be killed by cman. No fencing will take place but all cluster dependent services will crash. Actual results: Cman/openais, .. crash on the first node of the cluster Expected results: Both nodes should stay in cluster. Additional info: This seems to be independent from the node count and qdisk and token timeouts. We've seen it with 2 node clusters with or without qdisk 3 node cluster with or without qdisk. This is reproducable.
Is this a crash (ie a segfault, and if so can we have a traceback please)? or are there messages in syslog saying why the node left the cluster ?
Created attachment 331540 [details] cluster.conf and syslog of both nodes
we are experiencing the same issue on our two node rhel 5.3 cluster with quorum disk. messages and cluster conf are attached. this happened at 18:12 10-02-2009. greetings daniel
There is no traceback jet. But syslogs look the as described by daniel. Should we set ulimit -c and then start up the cluster again?
Feb 10 18:11:32 dmse1-vm openais[1880]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart That's the important message. It indicates a cluster split and rejoin. OR it can be caused by cisco switches being very slow to adjust to the multicast addresses. See bugzillas passim.
Note: 1. we already setup the token timeout to 54000 as described in the cluster faq 2. This also happens on virtual machines being connected to a locally terminated bridge. 3. It also happens on non cisco switches.
The FAQ only recommends that token timeout for huge clusters. You shouldn't need it for anything under 30 nodes.
hallo, we are using foundry core switches type RX16
Yes we only set it because of timeouts related to qdisk and multipathing. I don't think it's related to the switches as we have it on nearly all clusters with different switches and even on locally terminated virtual guest clusters without any switch being involved. Marc.
Looking closely at the syslogs, my guess is that the two nodes have come up as independent single-node clusters and hen joined together AFTER services have started on each. A couple of suggestion would be to get cman_tool to wait for quorum before services get started using 'cman_tool join -wq', or (horrible hack!) put a sleep before the service start to allow the cluster to stabilise before they get going.
We already set the CMAN_QUORUM_TIMEOUT in the initscript to 60 (totemtimeout 54000 < 60000) seconds. And the cluster came up quorate before joining the fencedomain. But still the problem persists. Side note: When we set the timeout for waiting for fencejoin (FENCE_START_TIMEOUT=30) very low. Then boot the first node. The join of the fencedomain will fail. Then we boot up the second one and it will work this time. Then we can join the first one to the fence domain (fence_tool join). But we have to wait for the second node. That means if we do it the other way round (implies a BIG sleep): 1. Boot first node (fails to join fencedomain because of low timeout) 2. fence_tool join on first node 3. Boot second node => first node gets killed I would understand it in that way that the manipulation of the timeouts does not really help out. Marc.
(In reply to comment #9) > Yes we only set it because of timeouts related to qdisk and multipathing. > > I don't think it's related to the switches as we have it on nearly all clusters > with different switches and even on locally terminated virtual guest clusters > without any switch being involved. > > Marc.
Can you enable full CMAN logging and post the output from all nodes please ?
Oops... :) Sorry about that quote -- checked a bad box. Here's a silly question -- can you try switching the multicast address to 225.0.0.11 or something? We used to use this address years ago, and still have people using it on RHEL3 clusters. Some soft-switch vendors don't correctly forward site-local mcast addresses, despite IANA specs. Now, it's in the 'reserved' address range, but it seems to work in years past, and has worked at least once where the default multicast address was being ignored by an "intelligent" switch.
http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_cisco_switches Solution #2 in the above link -- how to make cman give a different multicast address to openais.
(In reply to comment #13) > Can you enable full CMAN logging and post the output from all nodes please ? How can we do this?
I'm not sure what I noted in comment #15 will have any effect on a Xen cluster. Hrm.
Created attachment 331581 [details] cluster.conf cman status with multicast
hi, tried the multicast tag in a realserver cluster and still the same error.
<logging to_file="yes" logfile="/tmp/ais.log" fileline="on"> <logger ident="CMAN" debug="on"/> </logging>
That's strange; your logs were claiming a different multicast address: Feb 11 16:56:35 xenlx08 openais[8957]: [MAIN ] Using default multicast address of 239.192.111.201 ... netstat -an | grep 5405 on my machine says the following: udp 0 0 192.168.122.4:5405 0.0.0.0:* udp 0 0 225.0.0.12:5405 0.0.0.0:* (5405 is what openais uses for multicast). It should say the same on your machine... (well, except the local IP will be different).
Wait, openais isn't chkconfig'd on is it?
It's saying the same on this machine. We checked it with cman_tool status and it showed 225.0.0.12. You need to look at the logs at about 17:04 and later our time. BTW: we don't have no openais chkconfiged cause everything is started via the cman initscript.
Thanks, Marc: So, we start: Feb 11 17:01:08 xenlx07 openais[5043]: [CMAN ] quorum device registered Feb 11 17:01:08 xenlx07 qdiskd[5094]: <notice> Score sufficient for master operation (20/20; required=10); upgrading Feb 11 17:01:23 xenlx07 qdiskd[5094]: <info> Assuming master role Feb 11 17:01:26 xenlx07 qdiskd[5094]: <notice> Writing eviction notice for node 2 Feb 11 17:01:26 xenlx07 openais[5043]: [CMAN ] quorum regained, resuming activity Feb 11 17:01:29 xenlx07 qdiskd[5094]: <notice> Node 2 evicted We're running fine, and the other node boots: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] CLM CONFIGURATION CHANGE Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] New Configuration: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] r(0) ip(192.168.0.1) Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] Members Left: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] Members Joined: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] CLM CONFIGURATION CHANGE Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] New Configuration: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] r(0) ip(192.168.0.1) Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] r(0) ip(192.168.0.2) Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] Members Left: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] Members Joined: Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] r(0) ip(192.168.0.2) Feb 11 17:04:44 xenlx07 openais[5043]: [SYNC ] This node is within the primary component and will provide service. Feb 11 17:04:44 xenlx07 openais[5043]: [TOTEM] entering OPERATIONAL state. Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] got nodejoin message 192.168.0.1 Feb 11 17:04:44 xenlx07 openais[5043]: [CLM ] got nodejoin message 192.168.0.2 Feb 11 17:04:44 xenlx07 openais[5043]: [CPG ] got joinlist message from node 1 Feb 11 17:04:44 xenlx07 openais[5043]: [CPG ] got joinlist message from node 2 Feb 11 17:04:46 xenlx07 ntpd[5512]: synchronized to LOCAL(0), stratum 10 Feb 11 17:04:46 xenlx07 ntpd[5512]: kernel time sync enabled 0001 So, quorate, we are running fine, another node joins our quorate cluster and CMAN gets a MDK message: Feb 11 17:05:00 xenlx07 openais[5043]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart Consequently, when the other node boots, everything looks good: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] CLM CONFIGURATION CHANGE Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] New Configuration: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] r(0) ip(192.168.0.2) Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] Members Left: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] Members Joined: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] CLM CONFIGURATION CHANGE Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] New Configuration: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] r(0) ip(192.168.0.1) Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] r(0) ip(192.168.0.2) Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] Members Left: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] Members Joined: Feb 11 17:04:50 xenlx08 openais[4914]: [CLM ] r(0) ip(192.168.0.1) Feb 11 17:04:50 xenlx08 openais[4914]: [SYNC ] This node is within the primary component and will provide service. Feb 11 17:04:50 xenlx08 openais[4914]: [TOTEM] entering OPERATIONAL state. Except this little thing: Feb 11 17:04:50 xenlx08 openais[4914]: [MAIN ] Node xenlx07-ic not joined to cman because it has existing state
Ok, but what do you read from "Node xenlx07-ic not joined to cman because it has existing state"? And note if we: 0. Set the FENCED_TIMEOUT=30 (for example) 1. Boot first node (fails to join fencedomain because of low timeout) 2. Boot second node (it successfully joins fencedomain) 3. fence_tool join on first node It just works. What do you think about that? Thanks Marc.
I can't say why it doesn't join the fence domain. But the rest makes perfect sense: If only one node has state (ie there are no fence or other daemons running) then the node without state can join it quite happily. If both nodes have state before they see each other then you get that "cannot join" message. I wonder if qdisk is causing single nodes to have quorum even though their aisexec processes are not talking to each other ?
I don't think it's just related to the qdisk, cause it also happens without qdisk.
I'm attaching new logs (even for openais). The logs only show the clusterjoin as I cleared them before. Note: This is a cluster with older openais, kernel, .. (RHEL5.2) but the problem still looks like the same. Perhaps I can provide new logs for 5.3 today.
Created attachment 331680 [details] syslog, cluster.conf and aislog for node1 (axqa03-1)
Created attachment 331681 [details] syslog, cluster.conf and aislog for node2 (axqa03-2)
Here are new logs with a RHEL5.3 cluster. This time again these problems. But afterwards the fencedomain is in JOIN_WAIT status. Which leads to a unusable cluster. That's different to the other clusters. But still the first node gets killed. Marc.
Created attachment 331691 [details] syslog, aislog of RHEL5.3 cluster
syslog output of first node started: Feb 12 17:22:50 dmse1-vm openais[21072]: [logging.c:0042] cman killed by node 2 because we rejoined the cluster without a full restart Feb 12 17:22:50 dmse1-vm gfs_controld[21131]: cluster is down, exiting Feb 12 17:22:50 dmse1-vm dlm_controld[21125]: cluster is down, exiting Feb 12 17:22:50 dmse1-vm fenced[21119]: cluster is down, exiting Feb 12 17:22:50 dmse1-vm kernel: dlm: closing connection to node 2 Feb 12 17:22:50 dmse1-vm kernel: dlm: closing connection to node 1 Feb 12 17:22:50 dmse1-vm qdiskd[21102]: <err> cman_dispatch: Host is down Feb 12 17:22:50 dmse1-vm qdiskd[21102]: <err> Halting qdisk operations Feb 12 17:23:16 dmse1-vm ccsd[21066]: Unable to connect to cluster infrastructure after 30 seconds. Feb 12 17:23:46 dmse1-vm ccsd[21066]: Unable to connect to cluster infrastructure after 60 seconds. ais.log [totemsrp.c:1842] aru a high delivered a received flag 1 [totemsrp.c:1950] Did not need to originate any messages in recovery. [totemsrp.c:4080] Sending initial ORF token [clm.c:0513] CLM CONFIGURATION CHANGE [clm.c:0514] New Configuration: [clm.c:0516] r(0) ip(192.168.0.21) [clm.c:0518] Members Left: [clm.c:0523] Members Joined: [clm.c:0513] CLM CONFIGURATION CHANGE [clm.c:0514] New Configuration: [clm.c:0516] r(0) ip(192.168.0.21) [clm.c:0516] r(0) ip(192.168.0.22) [clm.c:0518] Members Left: [clm.c:0523] Members Joined: [clm.c:0525] r(0) ip(192.168.0.22) [sync.c:0321] This node is within the primary component and will provide service. [totemsrp.c:1678] entering OPERATIONAL state. [clm.c:0595] got nodejoin message 192.168.0.21 [clm.c:0595] got nodejoin message 192.168.0.22 [cpg.c:0884] got joinlist message from node 1 [logging.c:0042] cman killed by node 2 because we rejoined the cluster without a full restart i will attach ais.log of the second node
Created attachment 331710 [details] ais.log of node which started later
Chrissi, Lon, have you already been able to reproduce the issue ? -Mark
Christine, Do we have everything we need to move forward on this issue? There are many different business projects on hold as a result of this un-documented feature.
(In reply to comment #36) > Chrissi, Lon, > have you already been able to reproduce the issue ? > -Mark I haven't been able to reproduce this at all on my 2-node KVM or my 5-node baremetal cluster.
Lon, are you using RHEL5.[23] with all rpms provided from this channel? And did you 1. bootup node1. 2. Wait until it has joined the fence_domain successfully (cman_tool services shows everything ok) 3. bootup node2. 4. Let it join the cluster 5. node1 will be thrown out of the cluster I'm having two kvm nodes installed with RHEL5.2/3 and if I boot them this way this problem happens. What do you think? Regards Marc.
I have tried all sorts of things in attempts to reproduce the problem and the only thing that works for me is to start the two nodes up disconnected and with a quorum forced at 1. Which is a highly unusual (and pathological) setup !
hello, I have the same situation with a 2-nodes physical cluster. It was previously in 5U2 with qdisk and fence_ilo as fenceing method. I had to do a workaround to have it working in 5U2: start of qdisk in level 20 before cman (21) instead of defualt 22. Otherwise there were problems with start of anode alone. Now this had to be solved in U3 because of qdisk start inclusioin in cman script itself. (Note: I didn't apply November patches related to cluster) After update to U3 I encounter the same problems as yours. I'm going to attach my situation with logs from both nodes. file is "boot_order_first_oracs1_then_oracs2.txt" The problem is symmetrical, so that if I change the start sequence of the nodes, I get the same: first node cman killed by the other. What is very bad in my opinion is that on first node the services (in my case some fs, oracle DB and ip) remains active but for example clvmd has problems and it is impossible to manage services.... HIH debugging and I'm available to help in testing if necessary. We would like to go to production before 28th of February with U3 packages if possible. Gianluca
Created attachment 332018 [details] both nodes powered off. start of oracs1, then oracs2. logs
Note on myself posts # 41 and 42. As I have a case open, I was requested in the mean time to try to downgrade openais, without success. So in my logs the version of openais is openais-0.80.3-15.el5.x86_64 instead of openais-0.80.3-22.el5.x86_64 as provided by default U3. Gianluca
Aha! I managed to reproduce it at last The problem seems to be related to qdisk, though it might need to be solved elsewhere. What seems to happen is that the first node starts up on its own and becomes quorate thanks to qdisk. So it starts fencing and thus the "Dirty" bit gets set. When that node is stable, the second node is started. This becomes quorate on its own (thanks to qdisk) and starts fencing, THEN it sees the other node. They try to merge and BANG, a fencing race happens. I think this happens partly because of the way aisexec always joins a single node cluster on its own before seeing any other nodes. As far as ais is concerned all joins are really merges. But for cman's sake we can only allow that where there is no state (in this case fencing). So the questions are: - is fencing 'state' ? I appreciate that it's more than just an order to kill a node but does it really need to preclude merges ? - can we 'fix' qdiskd startup so that it doesn't let fenced get its state before the second node joins? - is it possible to set the dirty bit later? is just starting a daemon really enough to preclude a join or does it have to have done something first ? - what help do we need from cman to accomplish any of this. Bearing in mind that cman in RHEL5/6 already holds some fencing state of its own that might be useful.
Gianluca: Downgrading openais will make no difference. Downgrading cman might - it depends if what you are seeing is what I described in comment #44
Michael Waite: Just to be clear, this isn't an undocumented feature. It's documented in the man pages and in my white paper on the subject. That doesn't change the fact that it's almost universally misunderstood though! What has changed in RHEL5.3 is that a bug has been fixed in cman that tightens the rules up regarding nodes merging with state which is, I suspect, why this seems to happen more frequently than it did.
Thanks Christine for your comment #45. I already knew that: I specified my version of openais to avoid anyone thinking that I had a not aligned 5U3, due eventually to an update mistake or so... Coming back to the problem: as from #44 can we try to start at the same instant the two nodes and supposed they arrives together at the starting cman phase we should get a correct join ot both the twos? Can I go with this? I ask because when I am in situation where node has cman killed and services still active, I'm not able to stop rgmanager cleanly and I have to force a power off..... Thanks, Gianluca
(In reply to comment #46) > Michael Waite: Just to be clear, this isn't an undocumented feature. It's > documented in the man pages and in my white paper on the subject. That doesn't > change the fact that it's almost universally misunderstood though! > > What has changed in RHEL5.3 is that a bug has been fixed in cman that tightens > the rules up regarding nodes merging with state which is, I suspect, why this > seems to happen more frequently than it did. Can we agree, that we found a (critical) bug here ?
Good info posted above, looks like it is understood what the root cause is now. What is now needed to resolve this issue in short order?
Yes, qdisk operation is totally broken in 5.3. It's a cman bug.
Chrissi, are you sure it is only broken with qdisk. I have the same problems without qdisk in a two node cluster. Marc.
What happens is that if a single node has quorum then another node joining that cluster will cause it to be killed. Now it is possible to do that without qdisk, maybe using two_node=1, or forcing expected_votes to 1 when it really shouldn't be. So, yes, it can happen without qdisk; it's not a bug in qdisk itself but that's the component most affected by it.
Can I also ask for a quality control in rgmanager stop script? With this problem, when you have the two nodes disconnected, you are unable to cleanly shutdown the server. In my case first start oracs1, then oracs2, then BOOM. Now I want to restart. A shutdown of oracs2 hangs at rgmanager stop phase. inside the code there is this: stop_cluster() { kill -TERM `pidof $RGMGRD` while [ 0 ]; do if [ -n "`pidof $RGMGRD`" ]; then echo -n $"Waiting for services to stop: " while [ -n "`pidof $RGMGRD`" ]; do sleep 1 done echo_success echo else echo $"Services are stopped." fi So if a soft kill of the process clurgmgrd is not succesfull, it stays indefinitely at this point inside the while loop.... Can this be a stop process for a cluster resource manager???? Also putting a -d flag to rgmanager, you only get this in messages Feb 16 17:22:53 oracs2 shutdown[24147]: shutting down for system halt Feb 16 17:22:53 oracs2 init: Switching to runlevel: 0 Feb 16 17:22:55 oracs2 modclusterd: shutdown succeeded Feb 16 17:22:55 oracs2 rgmanager: [24238]: <notice> Shutting down Cluster Servic e Manager... I would assume a more robust thing... Sorry, but I'm a little demotivated....
I am sending this to Subhendu Ghosh outside of this thread. I really want to make sure that this issue has enough visibility.
I confirm that starting both nodes at the same instant, they reach quorum and all is ok in this particular scenario.
I reproduced this in testing. My logs are here:
Created attachment 332231 [details] Patch to fix cman bug This patch has been tested by me and lon and should fix the cman problem. If you can test it on your system please do and let me know the results.
so I have to rebuild the cman.src.rpm applying this patch, to be able to test? Or do you have an rpm to try?
I don't have an rpm, sorry
This fix has been committed to the STABLE2 and STABLE3 trees. I'll wait until we get more feedback before adding it to RHEL5.
If you can wait until tomorrow-ish and no-one else steps in, I'll have a go at building a test RPM. I've never done it before though so don't expect too much...
Packages for x86_64 are here. http://people.redhat.com/ccaulfie/packages/ This is the first time I've built packages for other people's use so install them at your own risk. I won't be doing i686 packages because I don't have a RHEL5 system to do that on, sorry. If you try them please let me know how you get on. Thanks.
It seems the same behaviour. I'm going to attach three files: first_node_start.log second_node_start.log first_node_killcman.log Steps I've done. Both nodes up (because started together) shutdown of node without service shutdown of node with service start up of the nodes in single user mode and update cman-2.0.98-1chrissie (with --force) openais-0.80.3-22.el5 (I had 0.80.3-15 as initially it seemed to be the cause of the problem in other thread) startup of both nodes in the same instant to be sure it is again ok, as I updated cman and openais. And infact it is ok. shutdown of node without service (call it first_node) shutdown of node with service (call it second_node) startup first_node (it should be identical which one I power on, shouldn't it?) startup second_node boom in both nodes you can see inside logs: Feb 18 12:27:55 oracs2 openais[5750]: [CMAN ] CMAN 2.0.98 (built Feb 18 2009 08:49:44) started Feb 18 12:35:35 oracs1 openais[5743]: [CMAN ] CMAN 2.0.98 (built Feb 18 2009 08:49:44) started so I think it should the correct cman from chrissie... HIH, Gianluca
Created attachment 332362 [details] first node start logs
Created attachment 332363 [details] second node start logs
Created attachment 332364 [details] first node cman killed by second node
Can you post the full logs please, there seems to be quite a large time gap missing from those. In particular there is no message on the first node saying that the second one join with state, just on the the second node that it was killed. Also, if the order you give is correct, that's a different symptom than the original one I was investigating - where the first node was killed by the second. Can you post the cluster.conf and also enable qdiskd debugging messages (log_level="7") too please.
I can attach full logs. I sent them truncated for easier reading. But it IS the same problem: first node to start is oracs2 in this test. At 12:29 it finishes to start the cluster infra and oracle service on cluster logical volumes. And the file is first_node_start.log Then I wait for some minutes and at 12:35 I power on second node that is, in this test case, oracs1. And up to 12:35 no further messages in logs of oracs2 (aka the first node of the test). At 12:35 you get the logs of second node starting (oracs1 see file second_node_start.log) and the messages on first node (oracs2 see file first_node_killcman.log). In this latest log file above, that is related to the first node (oracs2) you can see in particular an initial join and then suddenly a kill; exactly as before: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] CLM CONFIGURATION CHANGE Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] New Configuration: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] r(0) ip(192.168.16.8) Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] Members Left: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] Members Joined: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] CLM CONFIGURATION CHANGE Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] New Configuration: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] r(0) ip(192.168.16.1) Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] r(0) ip(192.168.16.8) Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] Members Left: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] Members Joined: Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] r(0) ip(192.168.16.1) Feb 18 12:35:34 oracs2 openais[5750]: [SYNC ] This node is within the primary component and will provide service. Feb 18 12:35:34 oracs2 openais[5750]: [TOTEM] entering OPERATIONAL state. Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] got nodejoin message 192.168.16.1 Feb 18 12:35:34 oracs2 openais[5750]: [CLM ] got nodejoin message 192.168.16.8 Feb 18 12:35:34 oracs2 openais[5750]: [CPG ] got joinlist message from node 2 Feb 18 12:35:50 oracs2 openais[5750]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart let me know if I clarified or if I still have to send full logs. Gianluca BTW: have I a chance to cleanly shutdown the nodes when I am in this situation? For example with manual commands options of cman_tool or others? I'm powering off and off the nodes... and I don't like it very much. Thanks, Gianluca
Created attachment 332370 [details] cluster.conf of oracs1 and oracs2 version 39
Please, please *always* attach full logs. I'd much rather have 2GB of log files to wade through than 1K of truncated logs that don't show what I'm looking for. I'm very good at filtering log files, it's my job and I've been doing it for a very long time now! And it's quite possible that I might spot something important that looks insignificant to you.
OK. I'm going to provide full logs. Sorry, I didn't want to offend... But do you agree that it IS the same situation, do you? ITMT I changed configuration: [root@oracs1 ~]# ccs_tool update /etc/cluster/cluster.conf Config file updated from version 39 to 40 Update complete. [root@oracs1 ~]# cman_tool version -r 40 [root@oracs1 ~]# echo $? 0 diff between cluster.conf 39 and 40 < <cluster alias="oracs" config_version="39" name="oracs"> --- > <cluster alias="oracs" config_version="40" name="oracs"> < <quorumd device="/dev/mapper/mpath3" interval="3" label="acsquorum" log_level="4" tko="5" votes="1"> --- > <quorumd device="/dev/mapper/mpath3" interval="3" label="acsquorum" log_level="7" tko="5" votes="1"> Do I have to restart the cluster again to have qdiskd logging changes applied?
Created attachment 332383 [details] full messages for oracs1
Created attachment 332384 [details] full messages for oracs2
Actually, no it isn't the same problem. For the reasons I posted in comment #69. It looks similar on the face of it, but it's subtly different.
Sorry, typo in last comment: You also need to do this in cluster.conf: <quorumd ... log_facility="local4" ... > Then add this to syslog.conf: local4.* /var/log/qdiskd.log That will trap qdiskd's log messages to /var/log/qdiskd.log After doing this, restart qdiskd + syslogd
Gianluca - there's a problem with my RPM, can you try this one: http://people.redhat.com/ccaulfie/packages/cman-2.0.98-1subodh.i386.rpm [disclaimer in comments 61 & 62 are relevant here!] Sorry.
No problem. I will test the new cman rpm. Ok. set up logging and file for qdiskd. I have at the moment this on the nodes in qdiskd.log: Feb 18 16:50:40 oracs1 openais[5754]: [CMAN ] lost contact with quorum device Feb 18 17:12:28 oracs1 qdiskd[8502]: <debug> Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=3 Feb 18 17:12:28 oracs1 qdiskd[8502]: <debug> 1 heuristics loaded Feb 18 17:12:28 oracs1 qdiskd[8502]: <debug> Quorum Daemon: 1 heuristics, 3 interval, 5 tko, 1 votes Feb 18 17:12:28 oracs1 qdiskd[8502]: <debug> Run Flags: 00000031 Feb 18 17:12:28 oracs1 qdiskd[8502]: <info> Quorum Partition: /dev/dm-5 Label: acsquorum Feb 18 17:12:28 oracs1 qdiskd[8503]: <info> Quorum Daemon Initializing Feb 18 17:12:28 oracs1 qdiskd[8503]: <debug> I/O Size: 512 Page Size: 4096 Feb 18 17:12:29 oracs1 qdiskd[8503]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' UP Feb 18 17:12:43 oracs1 qdiskd[8503]: <info> Initial score 1/1 Feb 18 17:12:43 oracs1 qdiskd[8503]: <info> Initialization complete Feb 18 17:12:43 oracs1 qdiskd[8503]: <notice> Score sufficient for master operation (1/1; required=1); upgrading Feb 18 17:12:49 oracs1 qdiskd[8503]: <debug> Making bid for master Feb 18 17:12:52 oracs1 qdiskd[8503]: <debug> Node 2 is UP Feb 18 17:13:01 oracs1 qdiskd[8503]: <info> Assuming master role Feb 18 16:50:41 oracs2 openais[5777]: [CMAN ] lost contact with quorum device Feb 18 17:12:44 oracs2 qdiskd[538]: <debug> Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=3 Feb 18 17:12:44 oracs2 qdiskd[538]: <debug> 1 heuristics loaded Feb 18 17:12:44 oracs2 qdiskd[538]: <debug> Quorum Daemon: 1 heuristics, 3 interval, 5 tko, 1 votes Feb 18 17:12:44 oracs2 qdiskd[538]: <debug> Run Flags: 00000031 Feb 18 17:12:44 oracs2 qdiskd[538]: <info> Quorum Partition: /dev/dm-5 Label: acsquorum Feb 18 17:12:44 oracs2 qdiskd[539]: <info> Quorum Daemon Initializing Feb 18 17:12:44 oracs2 qdiskd[539]: <debug> I/O Size: 512 Page Size: 4096 Feb 18 17:12:45 oracs2 qdiskd[539]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' UP Feb 18 17:12:50 oracs2 qdiskd[539]: <debug> Node 1 is UP Feb 18 17:12:59 oracs2 qdiskd[539]: <info> Initial score 1/1 Feb 18 17:12:59 oracs2 qdiskd[539]: <info> Initialization complete Feb 18 17:12:59 oracs2 qdiskd[539]: <notice> Score sufficient for master operation (1/1; required=1); upgrading Feb 18 17:13:02 oracs2 qdiskd[539]: <info> Node 1 is the master The gap in time (16:50 and 17:12) depends on the fact that the restart command output for qdiskd was successfull but actually it wasn't able to restart.... [root@oracs2 ~]# service qdiskd restart Stopping the Quorum Disk Daemon: [ OK ] Starting the Quorum Disk Daemon: [ OK ] [root@oracs2 ~]# service qdiskd status qdiskd is stopped [root@oracs2 ~]# service qdiskd start Starting the Quorum Disk Daemon: [ OK ] But I don't understand you saying it is different situation. In two days ago logs I had this scenario. Start oracs1, then after 2 minutes oracs2 and boom and part of logs of second node (oracs2) were: Feb 16 10:17:10 oracs2 openais[5896]: [SYNC ] This node is within the primary component and will provide service. Feb 16 10:17:10 oracs2 ccsd[5885]: Error while processing connect: Connection refused Feb 16 10:17:10 oracs2 openais[5896]: [TOTEM] entering OPERATIONAL state. Feb 16 10:17:10 oracs2 openais[5896]: [MAIN ] Node node01 not joined to cman because it has existing state Feb 16 10:17:10 oracs2 openais[5896]: [CLM ] got nodejoin message 192.168.16.1 Feb 16 10:17:10 oracs2 openais[5896]: [CLM ] got nodejoin message 192.168.16.8 Feb 16 10:17:10 oracs2 openais[5896]: [CPG ] got joinlist message from node 1 Feb 16 10:17:11 oracs2 ccsd[5885]: Cluster is not quorate. Refusing connection. Today the start order scenario was inverted: Start of oracs2, then after 6 minutes start of oracs1 and boom and part of logs of second node (now oracs1) are: Feb 18 12:35:37 oracs1 openais[5743]: [SYNC ] This node is within the primary component and will provide service. Feb 18 12:35:37 oracs1 openais[5743]: [TOTEM] entering OPERATIONAL state. Feb 18 12:35:37 oracs1 openais[5743]: [MAIN ] Node node02 not joined to cman because it has existing state Feb 18 12:35:37 oracs1 openais[5743]: [CLM ] got nodejoin message 192.168.16.1 Feb 18 12:35:37 oracs1 openais[5743]: [CLM ] got nodejoin message 192.168.16.8 Feb 18 12:35:37 oracs1 openais[5743]: [CPG ] got joinlist message from node 2 Feb 18 12:35:37 oracs1 ccsd[5732]: Cluster is not quorate. Refusing connection. So the hosts are inverted but the messages are the same in my opinion. Or I have not understood you comment #69 ....
Adding Subhendu Ghosh
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
The qdiskd initscript 'restart' doesn't work; there's a bug open about this. "service qdiskd stop; sleep 3; service qdiskd start" ...should work fine.
I put Christine's fixed source RPM through our build system and tested it on my cluster; here are the resulting packages: http://people.redhat.com/lhh/cman-2.0.98-2chrissie.src.rpm http://people.redhat.com/lhh/cman-2.0.98-2chrissie.i386.rpm http://people.redhat.com/lhh/cman-devel-2.0.98-2chrissie.i386.rpm http://people.redhat.com/lhh/cman-2.0.98-2chrissie.x86_64.rpm http://people.redhat.com/lhh/cman-devel-2.0.98-2chrissie.x86_64.rpm The first pass was a bad build as previously noted.
I confirm that with cman-2.0.98-2 rpm the nodes are able to join correctly the cluster also when one starts after some minutes in respect of the other. Christine, in comment #46 you referred to a white paper of yours: do you have a link for it? As also RHEL Product and Program Management is seeing this, hopefully, can we hope in better rgmanager shutdown logic (see comment #53) and overall quality (see qdiskd restart problem comment #85). And also a sort of document that gives a clear workflow of the several pieces involved (cman, ccsd, fence, qdiskd, rgmanager, openais, ecc) so that in case of failur of one submodule one can have a way to know dependencies and restart correct path? Thanks for the help. Gianluca
Sorry for the confusion over the RPMs. I have had my wrist gently slapped and I have renounced the building of them on my systems ;-) My white papers are all here: http://people.redhat.com/ccaulfie/docs/ The one I referred to is aiscman.pdf but CSNetworking.pdf and ClusterPic.pdf might also be useful to you.
About the other other bugs, can you please report them separately? If you leave them as a single comment in an unrelated bugzilla they will get lost unfortunately.
I've committed the fix for 5.4. Still waiting for ACKs for 5.3.z commit 0bc0fbcf186c76eae15c17aef1e09ce90d34a771 Author: Christine Caulfield <ccaulfie> Date: Tue Feb 17 17:14:00 2009 +0000 cman: Don't eject nodes that were quorate on their own if we have no state
I can also confirm that this bug is fixed. Thanks that's really great news! I also backported the cman rpms to also fix this bug in RHEL5.2 (cman 2.0.84). Those can be downloaded here: http://download.atix.de/yum/comoonics/testrpms/cman-2.0.84-2.4.i386.rpm http://download.atix.de/yum/comoonics/testrpms/cman-2.0.84-2.4.x86_64.rpm http://download.atix.de/yum/comoonics/testrpms/cman-devel-2.0.84-2.4.i386.rpm http://download.atix.de/yum/comoonics/testrpms/cman-devel-2.0.84-2.4.x86_64.rpm http://download.atix.de/yum/comoonics/testrpms/cman-2.0.84-2.4.src.rpm They are neither signed nor verified and I don't want to responsible for any harm they cause. But for my tests they worked perfectly well. Marc.
Could you please tell us if there will be an appropriate errata for this bug? Marc.
Marking as modified, fix has been committed for 5.4.
hi. i´m have the same problem. I apply the rpm http://people.redhat.com/lhh/cman-2.0.98-2chrissie.x86_64.rpm and solved, but i have other problem. When i shutdown one of the nodes, the services do not relocated to the node who remain. Help me please ?
I have just done this and for me it is ok. My cluster.conf is between attachments and if I have the ACSSRV service active on a node (ip + 4fs + oracle script) and I run shutdown -r on it the service is started on the other node. messages on other node contains: Feb 26 19:24:53 oracs1 clurgmgrd[6784]: <notice> Member 2 shutting down Feb 26 19:24:59 oracs1 clurgmgrd[6784]: <notice> Starting stopped service service:ACSSRV Feb 26 19:24:59 oracs1 qdiskd[5932]: <info> Node 2 shutdown Feb 26 19:25:05 oracs1 qdiskd[5932]: <info> Assuming master role Feb 26 19:25:08 oracs1 openais[5889]: [TOTEM] The token was lost in the OPERATIONAL state. Feb 26 19:25:08 oracs1 openais[5889]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Feb 26 19:25:08 oracs1 openais[5889]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes). Feb 26 19:25:08 oracs1 openais[5889]: [TOTEM] entering GATHER state from 2. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] entering GATHER state from 0. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] Creating commit token because I am the rep. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] Saving state aru 50 high seq received 50 Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] Storing new sequence id for ring 4d9ac Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] entering COMMIT state. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] entering RECOVERY state. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] position [0] member 192.168.16.1: Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] previous ring seq 317864 rep 192.168.16.1 Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] aru 50 high delivered 50 received flag 1 Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] Did not need to originate any messages in recovery. Feb 26 19:25:13 oracs1 openais[5889]: [TOTEM] Sending initial ORF token Feb 26 19:25:13 oracs1 openais[5889]: [CLM ] CLM CONFIGURATION CHANGE Feb 26 19:25:13 oracs1 openais[5889]: [CLM ] New Configuration: Feb 26 19:25:13 oracs1 kernel: dlm: closing connection to node 2 Feb 26 19:25:13 oracs1 openais[5889]: [CLM ] r(0) ip(192.168.16.1) Feb 26 19:25:13 oracs1 kernel: kjournald starting. Commit interval 5 seconds Feb 26 19:25:13 oracs1 openais[5889]: [CLM ] Members Left: Feb 26 19:25:13 oracs1 kernel: EXT3 FS on dm-18, internal journal Feb 26 19:25:13 oracs1 openais[5889]: [CLM ] r(0) ip(192.168.16.8) Feb 26 19:25:14 oracs1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 26 19:25:14 oracs1 openais[5889]: [CLM ] Members Joined: you can see the munt of first file system and then there are the other resources of the service that start ok.. It takes in total about 30 seconds. HIH, Gianluca
Sorry, i not explain correctly my test. I have 2 node cluster. When i power off (and take off energy) the services don´t relocated. It´s a common behaviour? i saw who the remaining node stay trying to fence de power off node. Thank´s,
Which fencing devices are you using? Actually the surviving node should do a query about the status of the other and getting an answer that is already powered off and ok without starting fencing at all. The problem is that if you detach all the power suppplies from the server, the iLO too is unpowered (if for example you have it) and so the killing node would not be able to connect and get the status of the other. I simulated this behaviour putting an iptables rule on a node and then doing a panic of the other. What you get is: Feb 26 18:52:18 oracs2 fenced[5968]: fencing node "node01" Feb 26 18:52:28 oracs2 fenced[5968]: agent "fence_ilo" reports: Unable to connect/login to fencing device Feb 26 18:52:28 oracs2 fenced[5968]: fence "node01" failed and again each 5 seconds but I don't think there is an easy solution to this. The important thing is to avoid corruption of data. So my scenario to simulate a problem between production lan and iLO lan was this: node1 with active service block connection to iLO of node2 no message about this in logs (it seems fence lan is not monitored, till it is needed, is true?) panic of node2 node1 tries to fence it indefinitely then manual restart of node2 in single user mode iptables rule so that it is not able to see node1 (I assume a problem between Lans and so that each node doesn't see the other node iLO lan... it could not be so in real situation) init 3 on node2 node2 starts the cluster infra correctly and get joined node1 doesn't try to fence node2 anymore You still are in critical situation, because in case of problem on node1, node2 will not be able to fence it and start the service.... but I think probably this should be more a matter of monitoring.. perhaps HIH, and for the others, if I'm wrong in my considerations, please let me know Gianluca
hi Gianluca. My test was dettach power suplies of node1 simulating a crash in hardware (which is owner of services). Node2 stay trying connect to fence device of node1. My question is, this is a common behaviour of the software? or the an error in configuration? My fence devices is IPMI and server´s is DELL poweredge 1950. thank´s
Just giving my .02 cents here. Applied Christine's patch and all seems well. I have a 2 node (RHEL5U3 x86_64) with qdisk on Dell 2950's using DRAC5 fencing (I will be trying this on a 3 node without qdisk shortly). Bringing down the cluster and bringing up 1 node (eventually fences node 2) and the cluster forms and stays. Also the issue stated in <a href="show_bug.cgi?id=485026#c53">comment #53</a> is no longer an issue for me with this patch(rgmanger stop doesn't hang indefinately anymore). Thanks, Arwin
So it seems I have the chance to install another cluster from scratch. Systems are HP blades with updated rhel 5 U3 x86_64 and cman provided by Chrissie. Names are orastud1 and orastud2. Packages installed on both: rgmanager-2.0.46-1.el5 ricci-0.12.1-7.3.el5_3 device-mapper-1.02.28-2.el5 openais-0.80.3-22.el5 luci-0.12.1-7.3.el5_3 cman-2.0.98-2chrissie device-mapper-multipath-0.4.7-23.el5_3.1 Running kernel on both is: 2.6.18-128.1.6.el5 setup of multipath. Both nodes see the quorum disk lun as mpath6 No cluster services started at all Steps done for the very initial startup of the cluster: 1) format of the quorum disk # mkqdisk -c /dev/mapper/mpath6 -l orastudquorum mkqdisk v0.6.0 Writing new quorum disk label 'orastudquorum' to /dev/mapper/mpath6. WARNING: About to destroy all data on /dev/mapper/mpath6; proceed [N/y] ? y Initializing status block for node 1... .. 2) copy of the same cluster.conf file on both nodes (see attachment) It contains at this moment only two ip services (STUDSRV and DWHSRV) and is based on quorum 3) On first node: [root@orastud1 cluster]# service cman start Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... It remains here because I forgot to start qdisk... ;-( in messages (I'm going to attach full logs too) I have: Apr 8 18:06:37 orastud1 ccsd[8311]: Cluster is not quorate. Refusing connection. Apr 8 18:06:37 orastud1 ccsd[8311]: Error while processing connect: Connection refused and [root@orastud1 ~]# cman_tool status Version: 6.1.0 Config Version: 1 Cluster Name: orastud Cluster Id: 14022 Cluster Member: Yes Cluster Generation: 4 Membership state: Cluster-Member Nodes: 1 Expected votes: 3 Total votes: 1 Quorum: 2 Activity blocked Active subsystems: 7 Flags: Ports Bound: 0 Node name: intraclstud1 Node ID: 1 Multicast addresses: 239.192.54.252 Node addresses: 10.10.16.201 Even if actually I expected qdiskd startup incorporated in cman should come in place.... clarification? Anyway, in another console: [root@orastud1 ~]# service qdiskd start Starting the Quorum Disk Daemon: [ OK ] After some seconds: Apr 8 18:06:50 orastud1 qdiskd[8419]: <info> Initial score 1/1 Apr 8 18:06:50 orastud1 qdiskd[8419]: <info> Initialization complete Apr 8 18:06:50 orastud1 openais[8322]: [CMAN ] quorum device registered Apr 8 18:06:50 orastud1 qdiskd[8419]: <notice> Score sufficient for master operation (1/1; required=1); upgrading ... Apr 8 18:07:05 orastud1 ccsd[8311]: Error while processing connect: Connection refused Apr 8 18:07:05 orastud1 openais[8322]: [CMAN ] quorum regained, resuming activity and now [root@orastud1 ~]# cman_tool status Version: 6.1.0 Config Version: 1 Cluster Name: orastud Cluster Id: 14022 Cluster Member: Yes Cluster Generation: 4 Membership state: Cluster-Member Nodes: 1 Expected votes: 3 Quorum device votes: 1 Total votes: 2 Quorum: 2 Active subsystems: 8 Flags: Dirty Ports Bound: 0 Node name: intraclstud1 Node ID: 1 Multicast addresses: 239.192.54.252 Node addresses: 10.10.16.201 and cman start on the first console completes with: Starting fencing... done [ OK ] 4) [root@orastud1 cluster]# service modclusterd start Starting Cluster Module - cluster monitor: In messages: Apr 8 18:12:06 orastud1 modclusterd: startup succeeded Apr 8 18:12:38 orastud1 kernel: dlm: Using TCP for communications Apr 8 18:12:39 orastud1 clurgmgrd[8733]: <notice> Resource Group Manager Starting Apr 8 18:12:45 orastud1 clurgmgrd[8733]: <notice> Starting stopped service service:STUDSRV Apr 8 18:12:45 orastud1 clurgmgrd[8733]: <notice> Starting stopped service service:DWHSRV Apr 8 18:12:46 orastud1 in.rdiscd[9304]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Apr 8 18:12:46 orastud1 in.rdiscd[9304]: Failed joining addresses Apr 8 18:12:46 orastud1 in.rdiscd[9307]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Apr 8 18:12:46 orastud1 in.rdiscd[9307]: Failed joining addresses Apr 8 18:12:47 orastud1 clurgmgrd[8733]: <notice> Service service:STUDSRV started Apr 8 18:12:47 orastud1 clurgmgrd[8733]: <notice> Service service:DWHSRV started I am now able to ping the two virtual IP of the cluster, from an external pc 5) [root@orastud1 cluster]# service ricci start Starting ricci: in messages: Apr 8 18:13:32 orastud1 ricci: startup succeeded and [root@orastud1 ~]# clustat Cluster Status for orastud @ Wed Apr 8 18:14:17 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ intraclstud1 1 Online, Local, rgmanager intraclstud2 2 Offline /dev/dm-6 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:DWHSRV intraclstud1 started service:STUDSRV intraclstud1 started [root@orastud1 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2009-04-08 18:06:50 /dev/dm-6 1 M 4 2009-04-08 18:05:50 intraclstud1 2 X 0 intraclstud2 [root@orastud1 ~]# cman_tool services type level name id state fence 0 default 00010001 none [1] dlm 1 rgmanager 00020001 none [1] Now the tricky part: I want node2 to join this first started up cluster..... on node2: 6) [root@orastud2 cluster]# service qdiskd start Starting the Quorum Disk Daemon: [ OK ] In messages: Apr 8 18:17:04 orastud2 qdiskd[14574]: <info> Waiting for CMAN to start why? 7) [root@orastud2 cluster]# service cman start Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... BUM... it remains here... in messages: 4 times: Apr 8 18:17:31 orastud2 ccsd[14660]: cluster.conf (cluster name = orastud, version = 1) found. Apr 8 18:17:31 orastud2 ccsd[14660]: Remote copy of cluster.conf is from quorate node. Apr 8 18:17:31 orastud2 ccsd[14660]: Local version # : 1 Apr 8 18:17:31 orastud2 ccsd[14660]: Remote version #: 1 Apr 8 18:17:32 orastud2 openais[14671]: [CLM ] CLM CONFIGURATION CHANGE Apr 8 18:17:32 orastud2 ccsd[14660]: Cluster is not quorate. Refusing connection. Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] New Configuration: Apr 8 18:17:33 orastud2 ccsd[14660]: Error while processing connect: Connection refused Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] Members Left: Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] Members Joined: Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] CLM CONFIGURATION CHANGE Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] New Configuration: Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] r(0) ip(10.10.16.202) Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] Members Left: Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] Members Joined: Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] r(0) ip(10.10.16.202) Apr 8 18:17:33 orastud2 openais[14671]: [SYNC ] This node is within the primary component and will provide service. Apr 8 18:17:33 orastud2 openais[14671]: [TOTEM] entering OPERATIONAL state. Apr 8 18:17:33 orastud2 openais[14671]: [CLM ] got nodejoin message 10.10.16.202 Apr 8 18:17:33 orastud2 ccsd[14660]: Cluster is not quorate. Refusing connection. Apr 8 18:17:33 orastud2 ccsd[14660]: Error while processing connect: Connection refused ... Apr 8 18:17:34 orastud2 qdiskd[14574]: <info> Quorum Partition: /dev/dm-12 Label: orastudquorum Apr 8 18:17:34 orastud2 qdiskd[14574]: <info> Quorum Daemon Initializing Apr 8 18:17:34 orastud2 ccsd[14660]: Cluster is not quorate. Refusing connection. Apr 8 18:17:34 orastud2 ccsd[14660]: Error while processing connect: Connection refused Apr 8 18:17:34 orastud2 ccsd[14660]: Cluster is not quorate. Refusing connection. Apr 8 18:17:34 orastud2 ccsd[14660]: Error while processing connect: Connection refused Apr 8 18:17:35 orastud2 qdiskd[14574]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' UP ... Apr 8 18:17:43 orastud2 qdiskd[14574]: <info> Node 1 is the master Apr 8 18:17:43 orastud2 dlm_controld[14700]: connect to ccs error -111, check ccsd or cluster status Apr 8 18:17:43 orastud2 ccsd[14660]: Cluster is not quorate. Refusing connection. Apr 8 18:17:43 orastud2 ccsd[14660]: Error while processing connect: Connection refused ... Apr 8 18:17:49 orastud2 qdiskd[14574]: <info> Initial score 1/1 Apr 8 18:17:49 orastud2 qdiskd[14574]: <info> Initialization complete Apr 8 18:17:49 orastud2 openais[14671]: [CMAN ] quorum device registered Apr 8 18:17:49 orastud2 qdiskd[14574]: <notice> Score sufficient for master operation (1/1; required=1); upgrading no messages at all on node1..... 8) Ctrl+C on node2 on the cman start command 9) service cman stop on node 2 Apr 8 18:22:05 orastud2 ccsd[14660]: Stopping ccsd, SIGTERM received. Apr 8 18:22:05 orastud2 fenced[14694]: cluster is down, exiting Apr 8 18:22:13 orastud2 dlm_controld[14700]: connect to ccs error -111, check ccsd or cluster status Apr 8 18:22:13 orastud2 gfs_controld[14706]: connect to ccs error -111, check ccsd or cluster status what do I do wrong? Do I have to use a particular cman command the first time on node2? Or is this again the bug? Thanks, Gianluca
Created attachment 338880 [details] cluster.conf as of comment#109
Created attachment 338881 [details] messages file of orastud1, first node to start the cluster the first time
Created attachment 338882 [details] messages file of orastud2, second node that should join the cluster the first time
As of my note in comment #109 : "Even if actually I expected qdiskd startup incorporated in cman should come in place.... clarification?" I was wrong, because it depends on the fact that at that moment qdiskd service was not chkconfiged on for that node. So, in cman script the function start_qdisk() contains: /sbin/chkconfig --levels "$current_runlevel" qdiskd 2>/dev/null if [ $? -ne 0 ]; then # qdiskd doesn't start at this runlevel. return 0 fi and it is correct that I didn't see qdiskd starting. Anyway I started it manually from another console session.
Just in case my issues are due to multicast propagation problems, how to check for this? I found an utility named iperf on epel but I'm not sure if and how to use it to test.... To which network is multicast bound in redhat cluster? Intracluster one or both? Thanks, Gianluca
The problems described starting from comment#109 were indeed due to multicast issues.... and not with cluster software It would be good to have some tools to test and verify multicast along with the cluster itself.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: This bug was caused by a bug in the 'openais-only' detection code inside cman. If a node because quorate on its own, say because two_node=1 was set or a quorum disk was present then other services are allowed to start. This set the "Dirty" flag in CMAN to indicate that it has state that cannot be reproduced with a full restart. The check for rejecting node joins only checked for the remote node not having state. Consequence: If another node then joined the cluster, even if it didn't have any state it would be rejected. This means that the normal configuration for a 2 node cluster didn't work because one node is supposed to be quorate at all times, it would not let the second node join the cluster. Fix: The 'dirty' flag check now checks both the local and remote states of the nodes. So that the nodes can join the cluster if one of them does not have state. Result: Two node clusters now work correctly.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
Upgrading to the cman package in the Beta channel did not work for me. The version went from: cman-2.0.98-1.el5_3.1.x86_64 to: cman-2.0.108-1.el5.x86_64 I updated only on 1 or 3 nodes. No qdisk is present. Mounting configfs... done Starting ccsd... done Starting cman... failed /usr/sbin/cman_tool: aisexec daemon didn't start Version of openais that I'm on is: openais-0.80.3-22.el5_3.9 I'm forced for rollback to cman-2.0.98-1.el5_3.1.x86_64, then the node comes up fine. I have my config posted in Bug 515615. Thanks, Arwin
Arwin, upgrade to openais in RHEL5 beta as well. I think latest openais available in beta channel is openais-0.80.6-8.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1341.html