cluster-api-provider-ovirt creates an api connection to ovirt and caches it. When that connection is invalid, for example after ovirt restarted its engine and clears the session, then the controller will not be able to get the status of the VMs, or to create/delete VMs. There is a Test() method to check the api connection, but it seems like it is not really testing the connection and the auth status. https://github.com/openshift/cluster-api-provider-ovirt/blob/c29232e3f4ed98e65626e27e82084a499103165e/pkg/cloud/ovirt/machine/actuator.go#L66 The Test() method from the go-ovirt sdk was fixed here https://github.com/oVirt/ovirt-engine-sdk-go/pull/197 The vendored version of go-ovirt needs to be bumped.
Hi Alberto, I'm looking at this BZ and I think I could use some help with verification steps. My idea is this: 1. Install OCP cluster 2. Restart ovirt-engine 3. Do some operation that requires OCP to communicate with RHV, e.g. add one more worker to OCP cluster Do you think this would do for verification?
Jan your steps are accurate - the pass criteria must be that the machine-controller under the api-machine-controller pod is using a new token and you can validate that against ovirt-engine 'Events' tab or 'Administration-Active User Sessions'
Verified with: rhvm-4.3.9.3-0.1.el7.noarch OCP clusterversion 4.4.0-0.nightly-2020-03-26-041820 Verification steps: 1. Have an OCP on RHV cluster 2. Restart ovirt-engine service on RHV 3. Wait for engine to come backup 4. Make sure that there is event informing about new user session from one of the master nodes 5. Increase the count of Machines in worker Machine Set 6. Make sure that the new worker VM has been successfully created in RHV engine