Description of problem: When monitoring the VM that provides RHEV-M in hosted-mode, it may take more time than the default configured, and make the VM to enter in a reboot-loop even with just one host. Having a configurable value that user could tune for adapting to lower-performance mediums will help in doing evals with better user-experience. It's posible to workarround it by setting the engine on maintenance or tweaking global variables.
Created attachment 829281 [details] Agent log
Created attachment 829282 [details] Broker log
Created attachment 829295 [details] vdsm log
This is the result of vmGetStats call to VDSM. Notice the Status field with value Running. That is not a valid status.. {'statsList': [{'acpiEnable': 'true', 'appsList': [u'rhevm-guest-agent-common-1.0.8-6.el6ev', u'kernel-2.6.32-431.el6'], 'balloonInfo': {}, 'clientIp': '', 'cpuSys': '1.79', 'cpuUser': '3.86', 'disks': {u'hdc': {'apparentsize': '0', 'flushLatency': '0', 'readLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeLatency': '0', 'writeRate': '0.00'}, u'vda': {'apparentsize': '32212254720', 'flushLatency': '95602', 'imageID': '59bc6b3e-9109-4bdc-8141-2e1a27149a05', 'readLatency': '14869881', 'readRate': '1375.47', 'truesize': '6963359744', 'writeLatency': '390780435', 'writeRate': '9824.76'}}, 'disksUsage': [{u'fs': u'ext4', u'path': u'/', u'total': '26958753792', u'used': '6088613888'}, {u'fs': u'ext4', u'path': u'/boot', u'total': '507744256', u'used': '40357888'}], 'displayIp': '0', 'displayPort': u'5900', 'displaySecurePort': u'5901', 'displayType': 'qxl', 'elapsedTime': '1998', 'guestFQDN': u'rhevm.example.com', 'guestIPs': u'192.168.2.115', 'guestName': u'rhevm.example.com', 'guestOs': u'2.6.32-431.el6.x86_64', 'hash': '-7610418427437194438', 'kvmEnable': 'true', 'lastLogin': 1385472572.382862, 'memUsage': '41', 'memoryStats': {u'majflt': '0', u'mem_free': '1851624', u'mem_total': '2956596', u'mem_unused': '1503868', u'pageflt': '19', u'swap_in': '0', u'swap_out': '0', u'swap_total': '4194296', u'swap_usage': '0'}, 'monitorResponse': '0', 'netIfaces': [{u'hw': u'00:16:3e:15:97:16', u'inet': [u'192.168.2.115'], u'inet6': [u'fe80::216:3eff:fe15:9716'], u'name': u'eth0'}], 'network': {u'vnet0': {'macAddr': '00:16:3e:15:97:16', 'name': u'vnet0', 'rxDropped': '0', 'rxErrors': '0', 'rxRate': '0.0', 'speed': '1000', 'state': 'unknown', 'txDropped': '0', 'txErrors': '0', 'txRate': '0.0'}}, 'pauseCode': 'NOERR', 'pid': '11507', 'session': 'Unknown', 'statsAge': '1.80', 'status': 'Running', 'timeOffset': '0', 'username': u'None', 'vmId': 'ebdc068e-a1b6-4403-a8f3-1a44db957e15', 'vmType': 'kvm'}], 'status': {'code': 0, 'message': 'Done'}}
The running status is a result of having a guest agent installed, and it is valid. The HA should be able to handle it as well.
Here is the list of valid states: vm.py: VALID_STATES = ('Down', 'Migration Destination', 'Migration Source', 'Paused', 'Powering down', 'RebootInProgress', 'Restoring state', 'Saving State', 'Up', 'WaitForLaunch') and API description in the json file: ## # @VmStatus: # # An enumeration of possible virtual machine statuses. # # @Down: The VM is powered off # # @Migration Destination: The VM is migrating to this host # # @Migration Source: The VM is migrating away from this host # # @Paused: The VM is paused # # @Powering down: A shutdown command has been sent to the VM # # @RebootInProgress: The VM is currently rebooting # # @Restoring state: The VM is waking from hibernation # # @Saving State: The VM is preparing for hibernation # # @Up: The VM is running # # @WaitForLaunch: The VM is being created # # Since: 4.10.0 ## {'enum': 'VmStatus', 'data': ['Down', 'Migration Destination', 'Migration Source', 'Paused', 'Powering down', 'RebootInProgress', 'Restoring state', 'Saving State', 'Up', 'WaitForLaunch']} So if Running is valid, it is undocumented. And here is the code that causes the bug: vm.py: def _getStatsInternal(self): # used by API.Vm.getStats def _getGuestStatus(): GUEST_WAIT_TIMEOUT = 60 now = time.time() if now - self._guestEventTime < 5 * GUEST_WAIT_TIMEOUT and \ self._guestEvent == 'Powering down': return self._guestEvent if self.guestAgent and self.guestAgent.isResponsive() and \ self.guestAgent.getStatus(): return self.guestAgent.getStatus() # !! HERE !! if now - self._guestEventTime < GUEST_WAIT_TIMEOUT: return self._guestEvent return 'Up'
Please provide correct information for bug reproducing Thanks
Also I was unable to reach status "Running" for vm, via getVmStats I saw that first vm in "Powering Up" mode and after this in "Up" mode.
(In reply to Artyom from comment #9) > Please provide correct information for bug reproducing > Thanks Artyom, install the ovirt guest agent inside the VM, and it will start behaving like that. I had to manually apply the patch on each hosted-agent host to make it run stable. Do you need anything else?
After rhevm-guest-agent installation vmGetStats show that vm status is "Running"(in vdsm.log), and hosted vm run without any troubleshoots and restarts. Verified on ovirt-hosted-engine-ha-0.1.0-0.8.rc.el6ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0080.html