1352355 – Corosync fails to start after upgrading package to 2.3.4-7.el7_2.3

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1352355 - Corosync fails to start after upgrading package to 2.3.4-7.el7_2.3

Summary: Corosync fails to start after upgrading package to 2.3.4-7.el7_2.3

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Jan Friesse
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-04 01:22 UTC by Sam McLeod
Modified:	2019-11-14 08:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:	corosync-2.4.0-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-07-04 08:00:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
corosync strace (201.22 KB, text/plain) 2016-07-04 01:22 UTC, Sam McLeod	no flags	Details
View All

Description Sam McLeod 2016-07-04 01:22:13 UTC

Created attachment 1175737 [details]
corosync strace

Description of problem:

After installing package update corosync-2.3.4-7.el7_2.3.x86_64 and corosynclib-2.3.4-7.el7_2.3.x86_64 Corosync now crashes on startup preventing clustering from working.

Ironically this package update was to stop corosync from crashing on startup, see bug 1333397

Version-Release number of selected component (if applicable):

2.3.4-7.el7_2.3

How reproducible:

Every time

Steps to Reproduce:
1. Upgrade Corosync package 2.3.4-7.el7_2.3 as per bug #1333397
2. Start corosync
3. Corosync fails to start

Actual results:

The error logged is:

Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1250.
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Can't autogenerate multicast address
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jul 04 11:09:49 s1-san8 corosync[15865]:   [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.

Expected results:

Corosync started normally before the update.

Additional info:

- Strace of service failing to start attached.

Corosync.conf:

# cat /etc/corosync/corosync.conf
#compatibility: whitetank

totem {
  version:                             2
  token:                               5000
  token_retransmits_before_loss_const: 20
  join:                                1000
  consensus:                           3600
  vsftype:                             none
  max_messages:                        20
  clear_node_high_bit:                 yes
  rrp_mode:                            none
  secauth:                             off
  threads:                             12
  transport:                           udpu
  net_mtu:                             8982
  interface {
    member {
      memberaddr: 172.30.1.7
    }
    member {
      memberaddr: 172.30.1.8
    }
    ringnumber:  0
    bindnetaddr: 172.30.1.0
    mcastport:   5405
  }
}

logging {
  fileline:        off
  to_stderr:       yes
  to_logfile:      no
  to_syslog:       yes
  syslog_facility: daemon
  debug:           off
  timestamp:       on
  logger_subsys {
    subsys: AMF
    debug:  off
    tags:   enter|leave|trace1|trace2|trace3|trace4|trace6
  }
}

amf {
  mode: disabled
}

aisexec {
  user:  root
  group: root
}

quorum {
  provider: corosync_votequorum
}

  two_node: 1

nodelist {
  node {
    ring0_addr: 172.30.1.7
    nodeid: 1
  }
  node {
    ring0_addr: 172.30.1.8
    nodeid: 2
  }
}

Comment 1 Sam McLeod 2016-07-04 01:26:24 UTC

Note, downgrading corosync and corosync-lib to the previous version fixes the issue and it starts as expected:

root@s1-san8:~ # yum downgrade corosync corosynclib
...
corosync.x86_64 0:2.3.4-7.el7_2.1                                            corosynclib.x86_64 0:2.3.4-7.el7_2.1

root@s1-san8:~  # systemctl start corosync

Comment 3 Jan Friesse 2016-07-04 08:00:35 UTC

Already reported in upstream as https://github.com/corosync/corosync/issues/137 (+ fixed). This bug is not fatal because it has simple workaround and pcs generated clusters are unaffected (pcs doesn't allow creating of corosync.conf without cluster_name) so no need for Z stream. Bug is already fixed in upstream (https://github.com/corosync/corosync/commit/44df76a7ee6c10468d87f1e0888d6ce5b558d565).

Comment 4 Sam McLeod 2016-07-04 08:10:49 UTC

Interesting and thank you for the link, I'm sorry I didn't actually spot that myself.

We don't use PCS (yet) due to some limitations when automating cluster bootstraipping / configuration hence why we noticed it.

Again, sorry for not spotting this upstream first.

Thanks,
Sam.

Comment 5 Jan Friesse 2016-07-05 09:25:09 UTC

(In reply to Sam McLeod from comment #4)
> Interesting and thank you for the link, I'm sorry I didn't actually spot
> that myself.

No problem, actually it's quite useful to have also BZ for anybody who hits same problem. 

> 
> We don't use PCS (yet) due to some limitations when automating cluster
> bootstraipping / configuration hence why we noticed it.

Yep, so for a quick workaround just follow github issue (setting cluster_name is generally recommended and it's needed for new qdevice feature). RHEL is going to get this BZ fixed with 7.3.

> 
> Again, sorry for not spotting this upstream first.

I would like to thank you for high quality report.

Regards,
  Honza

> 
> Thanks,
> Sam.

Note You need to log in before you can comment on or make changes to this bug.