Bug 816200

Summary: Delayed corosync initialization breaks other systemd units
Product: [Fedora] Fedora Reporter: Milan Broz <mbroz>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: agk, fdinitto, jfriesse, prajnoha, pvrabec, sdake, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-13 23:14:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milan Broz 2012-04-25 13:33:43 UTC
Description of problem:

If some service depends on corosync, it requries that this service is initialized.
With systemd it is usually done by using
After=corosync.service

Unfortunately, this doesn't work properly - systemd thinks that corosync is ready
to provide service, while still initializing.
This causes other services to fail.

See e.g. dlm here - it has proper After= line but failed with
"corosync cfg init error 2".

If I add "sleep 5" to dlm initialization, it works.

Apr 25 15:12:29 (none) network[360]: Determining IP information for eth1... done.
Apr 25 15:12:29 (none) network[360]: [  OK  ]
Apr 25 15:12:29 (none) corosync[706]: notice  [MAIN  ] Corosync Cluster Engine ('2.0.0'): started and ready to provide service.
Apr 25 15:12:29 (none) corosync[706]: info    [MAIN  ] Corosync built-in features: dbus rdma systemd xmlconf snmp
Apr 25 15:12:29 (none) corosync[715]:   [TOTEM ] Initializing transport (UDP/IP Multicast).
Apr 25 15:12:29 (none) corosync[715]:   [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Apr 25 15:12:29 (none) rpcbind: Cannot open '/var/lib/rpcbind/rpcbind.xdr' file for reading, errno 2 (No such file or directory)
Apr 25 15:12:29 (none) rpcbind: Cannot open '/var/lib/rpcbind/portmap.xdr' file for reading, errno 2 (No such file or directory)
Apr 25 15:12:29 (none) kernel: [    9.418509] DLM installed
Apr 25 15:12:29 (none) kernel: [    9.471803] RPC: Registered named UNIX socket transport module.
Apr 25 15:12:29 (none) kernel: [    9.472167] RPC: Registered udp transport module.
Apr 25 15:12:29 (none) kernel: [    9.472435] RPC: Registered tcp transport module.
Apr 25 15:12:29 (none) kernel: [    9.472702] RPC: Registered tcp NFSv4.1 backchannel transport module.
Apr 25 15:12:29 (none) dlm_controld[732]: 9 dlm_controld 3.99.2 started
Apr 25 15:12:29 (none) dlm_controld[732]: 9 could not set SCHED_RR priority 99 err 1
Apr 25 15:12:29 (none) dlm_controld[732]: 9 corosync cfg init error 2
Apr 25 15:12:29 (none) corosync[715]:   [TOTEM ] The network interface [192.168.124.1] is now up.
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr 25 15:12:29 (none) corosync[715]:   [QB    ] server name: cmap
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Apr 25 15:12:29 (none) corosync[715]:   [QB    ] server name: cfg
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 25 15:12:29 (none) corosync[715]:   [QB    ] server name: cpg
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Using quorum provider corosync_votequorum
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 25 15:12:29 (none) corosync[715]:   [QB    ] server name: votequorum
Apr 25 15:12:29 (none) corosync[715]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 25 15:12:29 (none) corosync[715]:   [QB    ] server name: quorum
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Members[1]: 1
Apr 25 15:12:29 (none) corosync[715]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 25 15:12:29 (none) corosync[715]:   [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Apr 25 15:12:29 (none) corosync[715]:   [MAIN  ] Completed service synchronization, ready to provide service.


Version-Release number of selected component (if applicable):
corosync-2.0.0-2.1.fc17.x86_64

I am afraid you need to use something like here...
http://0pointer.de/public/systemd-man/sd_notify.html

Comment 1 Fabio Massimo Di Nitto 2012-04-25 13:55:45 UTC
corosync is correctly replying to dlm that the IPC is not available yet (EAGAIN in errno) and that it should retry.

Comment 2 Fedora End Of Life 2013-01-16 19:22:19 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Fedora End Of Life 2013-02-13 23:14:55 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.