Description of problem:
getSpmStatus is returned from cache which might be wrong - faced a problem with it when rhevm goes up and doesn't know which is the SPM, he sends getSpmStatus to a random host in the cluster and based on the spmId from the result tries that host for spm, since the spmId is wrong rhevm fails and looks for additional host - which may take a long time on a large scaled env.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.restart vdsm on spm
2.let other host take spm
3.run getSpmStatus on old spm
when rhevm goes up it tries to take spm using multiple hosts - and get resource acquire timeout.
running on host id 89 - vdsm says that spm runs on hostId 89 and free...
[root@dhcp151-128 ~]# vdsClient -s 0 getSpmStatus e505624e-c8af-11e0-96a0-03e8c0272e3a
spmId = 89
spmStatus = Free
spmLver = 4
Checked on 4.9-104.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.