Hide Forgot
Description of problem: Version-Release number of selected component (if applicable): 1.4.6 How reproducible: Steps to Reproduce: 1. configure coroysnc in udpu mode 2. service corosync start 3. ifdown eth0 (or unplug network cable) 4. ifup eth0 (or plugin network cable) Actual results: corosync is crashed. Expected results: the corosync back online Additional info: --corosync.conf-------------------------------- # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: off threads: 0 interface { member { memberaddr: 172.16.75.1 } member { memberaddr: 172.16.75.128 } member { memberaddr: 172.16.75.131 } ringnumber: 0 bindnetaddr: 172.16.75.128 mcastport: 5495 ttl: 1 } transport: udpu } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: no logfile: /var/log/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: on } } amf { mode: disabled } ----------------------------------------------- --gdb stack--------------------------------------- #0 0x00000033bdc32885 in raise () from /lib64/libc.so.6 #1 0x00000033bdc34065 in abort () from /lib64/libc.so.6 #2 0x00000033bdc2b9fe in __assert_fail_base () from /lib64/libc.so.6 #3 0x00000033bdc2bac0 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f5e8102aa6c in memb_consensus_agreed (instance=0x7f5e7f39d010) at totemsrp.c:1244 #5 0x00007f5e8102ea1f in memb_join_process (instance=0x7f5e7f39d010, memb_join=0x172c220) at totemsrp.c:4066 #6 0x00007f5e8102edc9 in message_handler_memb_join (instance=0x7f5e7f39d010, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:4311 #7 0x00007f5e810287e8 in rrp_deliver_fn (context=<value optimized out>, msg=0x172c220, msg_len=244) at totemrrp.c:1747 #8 0x00007f5e81025b3a in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0x172bb90) at totemudpu.c:1152 #9 0x00007f5e8101e482 in poll_run (handle=2697991128409440256) at coropoll.c:513 #10 0x00000000004072be in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:1927 -----------------------------------------------
Corosync 1.4.6 is not part of RHEL-6, it's version 1.4.1. Does this happen on a supported version?
I build corosync-1.4.6 based on the lastest corosync source code and corosync src rpm package from rhel6. I will make another test on corosync-1.4.1 to make sure whether the bugs exists in 1.4.1.
I am so sorry. This bug is caused by the service written by myself. After remove my service from corosync, the corosync works correct again.
No problem. Can you close this BZ then please :)
The bug missing is an mistake. It is still there. Because I had open the corefile flag in my service, I can get the corosync crash by the exist of corefile. After remove my service, there's no corefile generated when corosync is crashed. ----------------------------------------------------------------- Aug 01 14:09:45 corosync [TOTEM ] The network interface [172.20.0.128] is now up. Aug 01 14:09:45 corosync [TOTEM ] adding new UDPU member {172.20.0.128} my_failed_list 1 my_proc_list 2 token_memb_entries 1 Aug 01 14:09:45 corosync [TOTEM ] entering GATHER state from 15. my_failed_list 1 my_proc_list 2 token_memb_entries 1 my_failed_list 1 my_proc_list 2 token_memb_entries 1 ... ... my_failed_list 1 my_proc_list 2 token_memb_entries 1 my_failed_list 2 my_proc_list 2 token_memb_entries 0 corosync: totemsrp.c:1258: memb_consensus_agreed: Assertion `token_memb_entries >= 1' failed. Aug 01 14:09:46 corosync [TOTEM ] entering GATHER state from 0. ./myrun: line 3: 2003 Aborted (core dumped) ./corosync -f "$@" ----------------------------------------------------------------- my_failed_list 1: 172.20.0.128 my_proc_list 2: 172.20.0.128 127.0.0.1 at the point crash: my_failed_list 2: 172.20.0.128 127.0.0.1 my_proc_list 2: 172.20.0.128 127.0.0.1 Does the my_failed_list or my_proc_list need to be reinitialized after the network interface is up?
--------------------- my_failed_list 1: 172.20.0.128 my_proc_list 2: 172.20.0.128 127.0.0.1 --------------------- should be --------------------- my_failed_list 2: 172.20.0.128 127.0.0.1 my_proc_list 1: 172.20.0.128 ---------------------
Ifdown is unsupported. Only supported way to simulate failure is iptables drop (both uncast and multicast traffic) or unplug cable WITHOUT network manager (NM does ifdown on cable unplug). Also this is clone of 881694. *** This bug has been marked as a duplicate of bug 881694 ***