Description of problem: The lssyscfg command issued to the HMC by fence_lpar to get the status of the LPAR can take longer than SHELL_TIMEOUT, 3 seconds. This results in fencing failures which could be avoided if fence_lpar waited longer for the HMC to respond to the command. A quick test on squad1hmc showed lssyscfg took up to 7 seconds to complete. Version-Release number of selected component (if applicable): cman-2.0.104-1.el5 How reproducible: ~40% of the time, probably dependent on the HMC Steps to Reproduce: 1. fence_lpar -o status <lpar dep opts> Actual results: Jun 8 15:52:32 basic-p1 fenced[1850]: fencing node "kent-p1" Jun 8 15:52:38 basic-p1 fenced[1850]: agent "fence_lpar" reports: Connection timed out Expected results: fencing should succeed. Additional info:
Created attachment 347028 [details] proposed patch I've been running something similar to this and it works much better.
This patch could not cause any harm. I will add it as soon as I got enough flags.
Verified that patch is included in cman-2.0.108-1.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1341.html