Bug 729650 - toggle down/up interface causes to cluster cannot synchronize
Summary: toggle down/up interface causes to cluster cannot synchronize
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.7
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-10 13:11 UTC by Zdenek Kraus
Modified: 2016-04-26 14:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-10 14:59:42 UTC


Attachments (Terms of Use)

Description Zdenek Kraus 2011-08-10 13:11:17 UTC
Description of problem:
Two openais nodes is synchronized and operational. After turning off one's interface, cluster splits. Then after turning that interface on, both of them starts to synchronize and they keeps sending mCast traffic and cannot synchronize. No service is using openais.


Version-Release number of selected component (if applicable):
Host: 
  Linux <HOSTNAME> 2.6.35.13-92.fc14.x86_64 #1 SMP Sat May 21 17:26:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
  Fedora release 14 (Laughlin)

Guests: 
  1. Linux rhel5x 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
  Red Hat Enterprise Linux Server release 5.7 (Tikanga)
  2. Linux rhel5i 2.6.18-274.el5 #1 SMP Fri Jul 8 17:39:55 EDT 2011 i686 i686 i386 GNU/Linux 
  Red Hat Enterprise Linux Server release 5.7 (Tikanga)

component:
  openais-0.80.6-30.el5


How reproducible:
98%

Steps to Reproduce:
0. VM hosts are connected throuch virtual bridge virbr0, which is separated from internet. Interfaces on VMs is eth1.
1. configure openais for same cluster (eg. bindnetaddr: 192.168.5.0)
2. service iptables stop; newgrp ais; service openais start
NOTE: successfully synchronized
3. rhel5i # ifdown eth1
NOTE: cluster is splitted, rhel5i is connected to localhost
4. rhel5i # ifup eth1
NOTE: start to synchronize
  
Actual results:
repeating of mCast traffic and log records like this:
Aug 10 15:06:01.890461 [TOTEM] Sending initial ORF token
Aug 10 15:06:01.891348 [TOTEM] entering OPERATIONAL state.
Aug 10 15:06:01.906107 [TOTEM] entering GATHER state from 11.
Aug 10 15:06:02.717460 [TOTEM] entering GATHER state from 0.
Aug 10 15:06:02.717528 [TOTEM] Creating commit token because I am the rep.
Aug 10 15:06:02.717557 [TOTEM] Storing new sequence id for ring 1bbdc
Aug 10 15:06:02.717604 [TOTEM] entering COMMIT state.
Aug 10 15:06:02.717627 [TOTEM] entering RECOVERY state.
Aug 10 15:06:02.718393 [TOTEM] position [0] member 192.168.5.2:
Aug 10 15:06:02.718406 [TOTEM] previous ring seq 113620 rep 192.168.5.2
Aug 10 15:06:02.718412 [TOTEM] aru 0 high delivered 0 received flag 1
Aug 10 15:06:02.718418 [TOTEM] Did not need to originate any messages in recovery.


Expected results:
synchronize after few moments and log records like:
Aug 10 15:07:01.914060 [TOTEM] entering GATHER state from 11.
Aug 10 15:07:01.916320 [TOTEM] Storing new sequence id for ring 1bdfc
Aug 10 15:07:01.916479 [TOTEM] entering COMMIT state.
Aug 10 15:07:01.917316 [TOTEM] entering RECOVERY state.
Aug 10 15:07:01.918193 [TOTEM] position [0] member 192.168.5.1:
Aug 10 15:07:01.918207 [TOTEM] previous ring seq 114168 rep 192.168.5.1
Aug 10 15:07:01.918214 [TOTEM] aru 0 high delivered 0 received flag 1
Aug 10 15:07:01.918221 [TOTEM] position [1] member 192.168.5.2:
Aug 10 15:07:01.918227 [TOTEM] previous ring seq 114168 rep 192.168.5.2
Aug 10 15:07:01.918254 [TOTEM] aru c high delivered c received flag 1
Aug 10 15:07:01.918261 [TOTEM] Did not need to originate any messages in recovery.
Aug 10 15:07:01.920487 [CLM  ] CLM CONFIGURATION CHANGE
Aug 10 15:07:01.920504 [CLM  ] New Configuration:
Aug 10 15:07:01.920515 [CLM  ]  r(0) ip(192.168.5.2) 
Aug 10 15:07:01.920521 [CLM  ] Members Left:
Aug 10 15:07:01.920526 [CLM  ] Members Joined:
Aug 10 15:07:01.920540 [CLM  ] CLM CONFIGURATION CHANGE
Aug 10 15:07:01.920547 [CLM  ] New Configuration:
Aug 10 15:07:01.920554 [CLM  ]  r(0) ip(192.168.5.1) 
Aug 10 15:07:01.920561 [CLM  ]  r(0) ip(192.168.5.2) 
Aug 10 15:07:01.920566 [CLM  ] Members Left:
Aug 10 15:07:01.920572 [CLM  ] Members Joined:
Aug 10 15:07:01.920578 [CLM  ]  r(0) ip(192.168.5.1) 
Aug 10 15:07:01.920595 [SYNC ] This node is within the primary component and will provide service.
Aug 10 15:07:01.921395 [TOTEM] entering OPERATIONAL state.
Aug 10 15:07:01.923995 [CLM  ] got nodejoin message 192.168.5.1
Aug 10 15:07:01.924281 [CLM  ] got nodejoin message 192.168.5.2


Additional info:
openais with down interface even silently exitted twice.

Comment 1 Steven Dake 2011-08-10 14:59:42 UTC
Do not take an interface out of service when using openais.  This can't be fixed.

Regards
-steve


Note You need to log in before you can comment on or make changes to this bug.