Description of problem: Two openais nodes is synchronized and operational. After turning off one's interface, cluster splits. Then after turning that interface on, both of them starts to synchronize and they keeps sending mCast traffic and cannot synchronize. No service is using openais. Version-Release number of selected component (if applicable): Host: Linux <HOSTNAME> 2.6.35.13-92.fc14.x86_64 #1 SMP Sat May 21 17:26:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Fedora release 14 (Laughlin) Guests: 1. Linux rhel5x 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 5.7 (Tikanga) 2. Linux rhel5i 2.6.18-274.el5 #1 SMP Fri Jul 8 17:39:55 EDT 2011 i686 i686 i386 GNU/Linux Red Hat Enterprise Linux Server release 5.7 (Tikanga) component: openais-0.80.6-30.el5 How reproducible: 98% Steps to Reproduce: 0. VM hosts are connected throuch virtual bridge virbr0, which is separated from internet. Interfaces on VMs is eth1. 1. configure openais for same cluster (eg. bindnetaddr: 192.168.5.0) 2. service iptables stop; newgrp ais; service openais start NOTE: successfully synchronized 3. rhel5i # ifdown eth1 NOTE: cluster is splitted, rhel5i is connected to localhost 4. rhel5i # ifup eth1 NOTE: start to synchronize Actual results: repeating of mCast traffic and log records like this: Aug 10 15:06:01.890461 [TOTEM] Sending initial ORF token Aug 10 15:06:01.891348 [TOTEM] entering OPERATIONAL state. Aug 10 15:06:01.906107 [TOTEM] entering GATHER state from 11. Aug 10 15:06:02.717460 [TOTEM] entering GATHER state from 0. Aug 10 15:06:02.717528 [TOTEM] Creating commit token because I am the rep. Aug 10 15:06:02.717557 [TOTEM] Storing new sequence id for ring 1bbdc Aug 10 15:06:02.717604 [TOTEM] entering COMMIT state. Aug 10 15:06:02.717627 [TOTEM] entering RECOVERY state. Aug 10 15:06:02.718393 [TOTEM] position [0] member 192.168.5.2: Aug 10 15:06:02.718406 [TOTEM] previous ring seq 113620 rep 192.168.5.2 Aug 10 15:06:02.718412 [TOTEM] aru 0 high delivered 0 received flag 1 Aug 10 15:06:02.718418 [TOTEM] Did not need to originate any messages in recovery. Expected results: synchronize after few moments and log records like: Aug 10 15:07:01.914060 [TOTEM] entering GATHER state from 11. Aug 10 15:07:01.916320 [TOTEM] Storing new sequence id for ring 1bdfc Aug 10 15:07:01.916479 [TOTEM] entering COMMIT state. Aug 10 15:07:01.917316 [TOTEM] entering RECOVERY state. Aug 10 15:07:01.918193 [TOTEM] position [0] member 192.168.5.1: Aug 10 15:07:01.918207 [TOTEM] previous ring seq 114168 rep 192.168.5.1 Aug 10 15:07:01.918214 [TOTEM] aru 0 high delivered 0 received flag 1 Aug 10 15:07:01.918221 [TOTEM] position [1] member 192.168.5.2: Aug 10 15:07:01.918227 [TOTEM] previous ring seq 114168 rep 192.168.5.2 Aug 10 15:07:01.918254 [TOTEM] aru c high delivered c received flag 1 Aug 10 15:07:01.918261 [TOTEM] Did not need to originate any messages in recovery. Aug 10 15:07:01.920487 [CLM ] CLM CONFIGURATION CHANGE Aug 10 15:07:01.920504 [CLM ] New Configuration: Aug 10 15:07:01.920515 [CLM ] r(0) ip(192.168.5.2) Aug 10 15:07:01.920521 [CLM ] Members Left: Aug 10 15:07:01.920526 [CLM ] Members Joined: Aug 10 15:07:01.920540 [CLM ] CLM CONFIGURATION CHANGE Aug 10 15:07:01.920547 [CLM ] New Configuration: Aug 10 15:07:01.920554 [CLM ] r(0) ip(192.168.5.1) Aug 10 15:07:01.920561 [CLM ] r(0) ip(192.168.5.2) Aug 10 15:07:01.920566 [CLM ] Members Left: Aug 10 15:07:01.920572 [CLM ] Members Joined: Aug 10 15:07:01.920578 [CLM ] r(0) ip(192.168.5.1) Aug 10 15:07:01.920595 [SYNC ] This node is within the primary component and will provide service. Aug 10 15:07:01.921395 [TOTEM] entering OPERATIONAL state. Aug 10 15:07:01.923995 [CLM ] got nodejoin message 192.168.5.1 Aug 10 15:07:01.924281 [CLM ] got nodejoin message 192.168.5.2 Additional info: openais with down interface even silently exitted twice.
Do not take an interface out of service when using openais. This can't be fixed. Regards -steve