Bug 1352355

Summary:

Corosync fails to start after upgrading package to 2.3.4-7.el7_2.3

Product:

Red Hat Enterprise Linux 7

Reporter:

Sam McLeod <mailinglists>

Component:

corosync

Assignee:

Jan Friesse <jfriesse>

Status:

CLOSED UPSTREAM

QA Contact:

cluster-qe <cluster-qe>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

7.2

CC:

akarlsso, ccaulfie, cfeist, cluster-maint, mailinglists

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

corosync-2.4.0-1.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-07-04 08:00:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
corosync strace	none

Description Sam McLeod 2016-07-04 01:22:13 UTC

Created attachment 1175737 [details]
corosync strace

Description of problem:

After installing package update corosync-2.3.4-7.el7_2.3.x86_64 and corosynclib-2.3.4-7.el7_2.3.x86_64 Corosync now crashes on startup preventing clustering from working.

Ironically this package update was to stop corosync from crashing on startup, see bug 1333397

Version-Release number of selected component (if applicable):

2.3.4-7.el7_2.3

How reproducible:

Every time

Steps to Reproduce:
1. Upgrade Corosync package 2.3.4-7.el7_2.3 as per bug #1333397
2. Start corosync
3. Corosync fails to start

Actual results:

The error logged is:

Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1250.
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Can't autogenerate multicast address
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.

Expected results:

Corosync started normally before the update.

Additional info:

- Strace of service failing to start attached.

Corosync.conf:

# cat /etc/corosync/corosync.conf
#compatibility: whitetank

totem {
  version:                             2
  token:                               5000
  token_retransmits_before_loss_const: 20
  join:                                1000
  consensus:                           3600
  vsftype:                             none
  max_messages:                        20
  clear_node_high_bit:                 yes
  rrp_mode:                            none
  secauth:                             off
  threads:                             12
  transport:                           udpu
  net_mtu:                             8982
  interface {
    member {
      memberaddr: 172.30.1.7
    }
    member {
      memberaddr: 172.30.1.8
    }
    ringnumber:  0
    bindnetaddr: 172.30.1.0
    mcastport:   5405
  }
}

logging {
  fileline:        off
  to_stderr:       yes
  to_logfile:      no
  to_syslog:       yes
  syslog_facility: daemon
  debug:           off
  timestamp:       on
  logger_subsys {
    subsys: AMF
    debug:  off
    tags:   enter|leave|trace1|trace2|trace3|trace4|trace6
  }
}

amf {
  mode: disabled
}

aisexec {
  user:  root
  group: root
}

quorum {
  provider: corosync_votequorum
}

  two_node: 1

nodelist {
  node {
    ring0_addr: 172.30.1.7
    nodeid: 1
  }
  node {
    ring0_addr: 172.30.1.8
    nodeid: 2
  }
}

Comment 1 Sam McLeod 2016-07-04 01:26:24 UTC

Note, downgrading corosync and corosync-lib to the previous version fixes the issue and it starts as expected:

root@s1-san8:~ # yum downgrade corosync corosynclib
...
corosync.x86_64 0:2.3.4-7.el7_2.1                                            corosynclib.x86_64 0:2.3.4-7.el7_2.1

root@s1-san8:~  # systemctl start corosync

Comment 3 Jan Friesse 2016-07-04 08:00:35 UTC

Already reported in upstream as https://github.com/corosync/corosync/issues/137 (+ fixed). This bug is not fatal because it has simple workaround and pcs generated clusters are unaffected (pcs doesn't allow creating of corosync.conf without cluster_name) so no need for Z stream. Bug is already fixed in upstream (https://github.com/corosync/corosync/commit/44df76a7ee6c10468d87f1e0888d6ce5b558d565).

Comment 4 Sam McLeod 2016-07-04 08:10:49 UTC

Interesting and thank you for the link, I'm sorry I didn't actually spot that myself.

We don't use PCS (yet) due to some limitations when automating cluster bootstraipping / configuration hence why we noticed it.

Again, sorry for not spotting this upstream first.

Thanks,
Sam.

Comment 5 Jan Friesse 2016-07-05 09:25:09 UTC

(In reply to Sam McLeod from comment #4)
> Interesting and thank you for the link, I'm sorry I didn't actually spot
> that myself.

No problem, actually it's quite useful to have also BZ for anybody who hits same problem. 

> 
> We don't use PCS (yet) due to some limitations when automating cluster
> bootstraipping / configuration hence why we noticed it.

Yep, so for a quick workaround just follow github issue (setting cluster_name is generally recommended and it's needed for new qdevice feature). RHEL is going to get this BZ fixed with 7.3.

> 
> Again, sorry for not spotting this upstream first.

I would like to thank you for high quality report.

Regards,
  Honza

> 
> Thanks,
> Sam.