| Summary: | libqb spin in qb_loop with timerfd | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Fabio Massimo Di Nitto <fdinitto> | ||||
| Component: | libqb | Assignee: | Angus Salkeld <asalkeld> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | rawhide | CC: | asalkeld, sdake | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libqb-0.9.0-2.fc16 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-02-07 07:58:08 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 559294 [details]
strace from node2
strace from node2
Tested with the new patches, so far I haven“t been able to trigger the spinning. Should be fixed in 0.9.0-2 libqb-0.9.0-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/libqb-0.9.0-2.fc16 libqb-0.9.0-2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. |
as we discussed on IRC, always reproducible in rawhide. corosync spins 100% cpu in certain conditions when timerfd is used. After I built libqb without timerfd also the startup logging.debug spinning is gone. This is a consistent reproducer, using dlm_controld. Install libqb master/corosync master rpms and dlm rpm from fedora rawhide. corosync.conf relevant bits: compatibility: whitetank quorum { provider: corosync_votequorum two_node: 1 wait_for_all: 0 last_man_standing: 0 auto_tie_breaker: 0 } nodelist { node { ring0_addr: 192.168.2.193 nodeid: 1 } node { ring0_addr: 192.168.2.194 nodeid: 2 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: yes # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes logfile: /var/log/cluster/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: yes # Log debug messages (very verbose). When in doubt, leave off. debug: on # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # timestamps can be annoying). timestamp: on logger_subsys { subsys: QUORUM debug: on } logger_subsys { subsys: VOTEQ debug: on } } On both nodes: modprobe dlm cd /dev ln -sf . misc start corosync on both nodes. I use "corosync -f" to see logging on stderr. on node1: start dlm_controld -f0 -D and now the spinning: allow dlm_controld to settle on node1. start dlm_controld -f0 -D on node2. corosync on node2 will start spinning 100% CPU. killing dlm_controld on node2 will NOT solve the problem.