Bug 400941
Summary: | openais reporting error continuously | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Nielsen <mnielsen> |
Component: | openais | Assignee: | Steven Dake <sdake> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.0 | CC: | cluster-maint, cmarthal, gavinf, ghelleks, lhh, sghosh |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 20:46:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 509885 |
Description
Mark Nielsen
2007-11-27 13:32:25 UTC
I found, in /etc/xen/ some "leftover" domU configuration files lying around. One user had been testing and copied an existing domU config to test with, failing to change the "name =" line or "uuid" line. Could this be what was causing this error? I've removed those errant files and rebooted... just waiting to see if the problem resurfaces. I am seeing the error again, after a reboot of 1 of my nodes. I have verified that the /etc/xen/ directory is "clean", no duplicates, etc. This time the error showed up just after adding a new vm using Luci. We started getting the following error: clurgmgrd[21756]: <err> #37: Error receiving header from 1 sz=0 CTX 0xbe20a90 clurgmgrd[21756]: <err> #37: Error receiving header from 1 sz=0 CTX 0xbe24b70 clurgmgrd[21756]: <err> #37: Error receiving header from 1 sz=0 CTX 0xbe2d4f0 and that continued until I rebooted node 1. Once node 1 was rebooted, those errors went away and I started getting the checkpoint_find error repeatedly. this is an error in synchronization that is not yet understood. A clear definition of how to reproduce the issue should help since in 2 years of dev I have never seen this in our labs. Until we have a solid QE reproducer or method to reproduce Im marking needinfo. Regards -steve I can give you some debug options to add to the cluster info that may help get more information to aid in debugging. Try adding <cluster config_version="3" name="brassow-xen"> <logging debug="on" fileline="on" timestamp="on"> <logger ident="CKPT" debug="on" tags="enter|leave"> </logger> </logging> do not put in the "cluster" tag but instead put the logging and logger tags after <cluster .....> then reload the config with ccs_tool "filename" where filename is the filename of the hand modified cluster.conf file with the above logger output. 'Error receiving header' from clurgmgrd might be a fixed problem in the current release, and may or may not be related to the openais errors. Rgmanager doesn't use checkpointing (though I wish it did :) ), but it does use cman (openais) messaging to communicate. *** Bug 436507 has been marked as a duplicate of this bug. *** *** Bug 430296 has been marked as a duplicate of this bug. *** fixed in openais-0.80-3.17 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0074.html |