Bug 1352355

Summary: Corosync fails to start after upgrading package to 2.3.4-7.el7_2.3
Product: Red Hat Enterprise Linux 7 Reporter: Sam McLeod <mailinglists>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED UPSTREAM QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.2CC: akarlsso, ccaulfie, cfeist, cluster-maint, mailinglists
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: corosync-2.4.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-04 08:00:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
corosync strace none

Description Sam McLeod 2016-07-04 01:22:13 UTC
Created attachment 1175737 [details]
corosync strace

Description of problem:

After installing package update corosync-2.3.4-7.el7_2.3.x86_64 and corosynclib-2.3.4-7.el7_2.3.x86_64 Corosync now crashes on startup preventing clustering from working.

Ironically this package update was to stop corosync from crashing on startup, see bug 1333397

Version-Release number of selected component (if applicable):

2.3.4-7.el7_2.3

How reproducible:

Every time

Steps to Reproduce:
1. Upgrade Corosync package 2.3.4-7.el7_2.3 as per bug #1333397
2. Start corosync
3. Corosync fails to start

Actual results:

The error logged is:

Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1250.
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Can't autogenerate multicast address
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.

Expected results:

Corosync started normally before the update.

Additional info:

- Strace of service failing to start attached.

Corosync.conf:

# cat /etc/corosync/corosync.conf
#compatibility: whitetank

totem {
  version:                             2
  token:                               5000
  token_retransmits_before_loss_const: 20
  join:                                1000
  consensus:                           3600
  vsftype:                             none
  max_messages:                        20
  clear_node_high_bit:                 yes
  rrp_mode:                            none
  secauth:                             off
  threads:                             12
  transport:                           udpu
  net_mtu:                             8982
  interface {
    member {
      memberaddr: 172.30.1.7
    }
    member {
      memberaddr: 172.30.1.8
    }
    ringnumber:  0
    bindnetaddr: 172.30.1.0
    mcastport:   5405
  }
}

logging {
  fileline:        off
  to_stderr:       yes
  to_logfile:      no
  to_syslog:       yes
  syslog_facility: daemon
  debug:           off
  timestamp:       on
  logger_subsys {
    subsys: AMF
    debug:  off
    tags:   enter|leave|trace1|trace2|trace3|trace4|trace6
  }
}

amf {
  mode: disabled
}

aisexec {
  user:  root
  group: root
}

quorum {
  provider: corosync_votequorum
}

  two_node: 1

nodelist {
  node {
    ring0_addr: 172.30.1.7
    nodeid: 1
  }
  node {
    ring0_addr: 172.30.1.8
    nodeid: 2
  }
}

Comment 1 Sam McLeod 2016-07-04 01:26:24 UTC
Note, downgrading corosync and corosync-lib to the previous version fixes the issue and it starts as expected:

root@s1-san8:~ # yum downgrade corosync corosynclib
...
corosync.x86_64 0:2.3.4-7.el7_2.1                                            corosynclib.x86_64 0:2.3.4-7.el7_2.1

root@s1-san8:~  # systemctl start corosync

Comment 3 Jan Friesse 2016-07-04 08:00:35 UTC
Already reported in upstream as https://github.com/corosync/corosync/issues/137 (+ fixed). This bug is not fatal because it has simple workaround and pcs generated clusters are unaffected (pcs doesn't allow creating of corosync.conf without cluster_name) so no need for Z stream. Bug is already fixed in upstream (https://github.com/corosync/corosync/commit/44df76a7ee6c10468d87f1e0888d6ce5b558d565).

Comment 4 Sam McLeod 2016-07-04 08:10:49 UTC
Interesting and thank you for the link, I'm sorry I didn't actually spot that myself.

We don't use PCS (yet) due to some limitations when automating cluster bootstraipping / configuration hence why we noticed it.

Again, sorry for not spotting this upstream first.

Thanks,
Sam.

Comment 5 Jan Friesse 2016-07-05 09:25:09 UTC
(In reply to Sam McLeod from comment #4)
> Interesting and thank you for the link, I'm sorry I didn't actually spot
> that myself.

No problem, actually it's quite useful to have also BZ for anybody who hits same problem. 

> 
> We don't use PCS (yet) due to some limitations when automating cluster
> bootstraipping / configuration hence why we noticed it.

Yep, so for a quick workaround just follow github issue (setting cluster_name is generally recommended and it's needed for new qdevice feature). RHEL is going to get this BZ fixed with 7.3.

> 
> Again, sorry for not spotting this upstream first.

I would like to thank you for high quality report.

Regards,
  Honza

> 
> Thanks,
> Sam.