Description of problem: After manually fencing spm host ("confirm host has been rebooted button") the system won't start the spm on other host. Version-Release number of selected component (if applicable): - 3.4.0 How reproducible: Always Steps to Reproduce: - 2 Node Cluster -SPM/HSM - block spm host network - host becomes non response - click on "confirm host has been rebooted" button - other host isn't being selected as the spm Actual results: - other host isn't being selected as the spm Expected results: - spm should be started on the other host
I'll add logs - basically the issue seems that when we "fence" on that scenario, the pool metadata is being updated with spmId = -1 and lver = -1. The problem is that when the engine runs getSpmStatus the stats are retrieved from sanlock that weren't updated and contains the previous spm id/lver. this bug https://bugzilla.redhat.com/show_bug.cgi?id=1082365 are kind of blocking each other , 1082365 can't be "completely" verified with succesful "fence" while this bug can't be solved without 1082365 attribute errors fixed. This bug was opened for sanity and testing of the complete scenario.
verified on av9 step taken: 1.create 2 Node Cluster -SPM/HSM (shared dc) 2.block spm host network 3.wait for host to become non response 4.stop vdsmd service on blocked spm 5.click on "confirm host has been rebooted" button HSM gains SPM as expected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html