Created attachment 676665 [details] engine logs Description of problem: --------------------------------------- When the installation of a host fails, and it is in the 'Install Failed' state, trying to move it to 'Maintenance' mode fails in the first attempt. After clicking on 'Maintenance' button, a confirmation prompt is seen, clicking on 'OK' results in the host going to 'Maintenance' state briefly and then immediately going to 'Non-responsive' state. Trying to move it to 'Maintenance' mode the second time works fine. Version-Release number of selected component (if applicable): oVirt Engine Version: 3.2.0-4.el6ev How reproducible: Always Steps to Reproduce: 1. For a host that is in the 'Install Failed' state, click on the 'Maintenance' button. On the confirmation dialog box that appears, click on 'OK'. Actual results: The host goes to 'Maintenance' briefly and then goes to 'Non-responsive' state. Expected results: The host should remain in the 'Maintenance' mode. Additional info:
please attach full engine.log. can't see the reason for the installation failure and the exact flow that got you to non-responsive state.
Created attachment 684041 [details] full-engine-logs
The bug happens because the way we handle exception in vdsManager. When setting the status to maintenance while vsd status is "install failed", we first set vsd status to PreparingForMaintenance first, this cause vdsManager.isMonitoringNeeded to return True, and this cause calls to VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo. In refreshVdsRunTimeInfo we call to VdsUpdateRunTimeInfo.refreshVdsStats when vds status is PreparingForMaintenance. there we initiate GetStatsVDSCommand that fails on exception because vdsm is not installed. we assume that if exception is raised in this section, it means we have connection error and we need to turn the host to non-responsive, so we call to vdsManager.handleNetworkException. As I see isMonitoringNeeded should return true when vds status is PreparingForMaintenance for processing the flow of preparingForMaintenance. It can be handled by checking vdsStatus before calling to handleNetworkException, but also there if something failed with vds communication while we preparingForMaintenance we should turn the status to non-responsive. and anyway, the flow of preparingForMaintenance after install failed is redundant. So the only thing I can think of is to jump over preparingForMaintenance status when previous status was 'install failed', this happens in MaintananceNumberOfVdssCommand.setVdsStatusToPrepareForMaintaice that set the status to preparingForMaintenance. This is my suggestion: http://gerrit.ovirt.org/11272 Please correct me if I missed something.
Your suggestion of skipping the 'PreparingForMaintenance' state, for a host which is in the 'InstallFailed' state looks fine to me.
please see comments of https://bugzilla.redhat.com/show_bug.cgi?id=702914 and decide if you want to fix this issue.