Bug 1034787
Summary: | [RFE] Hosted-HA should not start infinite reboot loop when VDSM reports the engine VM as "Running" | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Pablo Iranzo Gómez <pablo.iranzo> | ||||||||
Component: | ovirt-hosted-engine-ha | Assignee: | Martin Sivák <msivak> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.3.0 | CC: | acathrow, alukiano, gpadgett, iheim, lyarwood, mavital, pablo.iranzo, sbonazzo, scohen | ||||||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||||||
Target Release: | 3.3.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | sla | ||||||||||
Fixed In Version: | ovirt-hosted-engine-ha-0.1.0-0.8.rc.el6ev | Doc Type: | Enhancement | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-01-21 16:51:42 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1020228 | ||||||||||
Attachments: |
|
Description
Pablo Iranzo Gómez
2013-11-26 14:04:45 UTC
Created attachment 829281 [details]
Agent log
Created attachment 829282 [details]
Broker log
Created attachment 829295 [details]
vdsm log
This is the result of vmGetStats call to VDSM. Notice the Status field with value Running. That is not a valid status.. {'statsList': [{'acpiEnable': 'true', 'appsList': [u'rhevm-guest-agent-common-1.0.8-6.el6ev', u'kernel-2.6.32-431.el6'], 'balloonInfo': {}, 'clientIp': '', 'cpuSys': '1.79', 'cpuUser': '3.86', 'disks': {u'hdc': {'apparentsize': '0', 'flushLatency': '0', 'readLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeLatency': '0', 'writeRate': '0.00'}, u'vda': {'apparentsize': '32212254720', 'flushLatency': '95602', 'imageID': '59bc6b3e-9109-4bdc-8141-2e1a27149a05', 'readLatency': '14869881', 'readRate': '1375.47', 'truesize': '6963359744', 'writeLatency': '390780435', 'writeRate': '9824.76'}}, 'disksUsage': [{u'fs': u'ext4', u'path': u'/', u'total': '26958753792', u'used': '6088613888'}, {u'fs': u'ext4', u'path': u'/boot', u'total': '507744256', u'used': '40357888'}], 'displayIp': '0', 'displayPort': u'5900', 'displaySecurePort': u'5901', 'displayType': 'qxl', 'elapsedTime': '1998', 'guestFQDN': u'rhevm.example.com', 'guestIPs': u'192.168.2.115', 'guestName': u'rhevm.example.com', 'guestOs': u'2.6.32-431.el6.x86_64', 'hash': '-7610418427437194438', 'kvmEnable': 'true', 'lastLogin': 1385472572.382862, 'memUsage': '41', 'memoryStats': {u'majflt': '0', u'mem_free': '1851624', u'mem_total': '2956596', u'mem_unused': '1503868', u'pageflt': '19', u'swap_in': '0', u'swap_out': '0', u'swap_total': '4194296', u'swap_usage': '0'}, 'monitorResponse': '0', 'netIfaces': [{u'hw': u'00:16:3e:15:97:16', u'inet': [u'192.168.2.115'], u'inet6': [u'fe80::216:3eff:fe15:9716'], u'name': u'eth0'}], 'network': {u'vnet0': {'macAddr': '00:16:3e:15:97:16', 'name': u'vnet0', 'rxDropped': '0', 'rxErrors': '0', 'rxRate': '0.0', 'speed': '1000', 'state': 'unknown', 'txDropped': '0', 'txErrors': '0', 'txRate': '0.0'}}, 'pauseCode': 'NOERR', 'pid': '11507', 'session': 'Unknown', 'statsAge': '1.80', 'status': 'Running', 'timeOffset': '0', 'username': u'None', 'vmId': 'ebdc068e-a1b6-4403-a8f3-1a44db957e15', 'vmType': 'kvm'}], 'status': {'code': 0, 'message': 'Done'}} The running status is a result of having a guest agent installed, and it is valid. The HA should be able to handle it as well. Here is the list of valid states: vm.py: VALID_STATES = ('Down', 'Migration Destination', 'Migration Source', 'Paused', 'Powering down', 'RebootInProgress', 'Restoring state', 'Saving State', 'Up', 'WaitForLaunch') and API description in the json file: ## # @VmStatus: # # An enumeration of possible virtual machine statuses. # # @Down: The VM is powered off # # @Migration Destination: The VM is migrating to this host # # @Migration Source: The VM is migrating away from this host # # @Paused: The VM is paused # # @Powering down: A shutdown command has been sent to the VM # # @RebootInProgress: The VM is currently rebooting # # @Restoring state: The VM is waking from hibernation # # @Saving State: The VM is preparing for hibernation # # @Up: The VM is running # # @WaitForLaunch: The VM is being created # # Since: 4.10.0 ## {'enum': 'VmStatus', 'data': ['Down', 'Migration Destination', 'Migration Source', 'Paused', 'Powering down', 'RebootInProgress', 'Restoring state', 'Saving State', 'Up', 'WaitForLaunch']} So if Running is valid, it is undocumented. And here is the code that causes the bug: vm.py: def _getStatsInternal(self): # used by API.Vm.getStats def _getGuestStatus(): GUEST_WAIT_TIMEOUT = 60 now = time.time() if now - self._guestEventTime < 5 * GUEST_WAIT_TIMEOUT and \ self._guestEvent == 'Powering down': return self._guestEvent if self.guestAgent and self.guestAgent.isResponsive() and \ self.guestAgent.getStatus(): return self.guestAgent.getStatus() # !! HERE !! if now - self._guestEventTime < GUEST_WAIT_TIMEOUT: return self._guestEvent return 'Up' Please provide correct information for bug reproducing Thanks Also I was unable to reach status "Running" for vm, via getVmStats I saw that first vm in "Powering Up" mode and after this in "Up" mode. (In reply to Artyom from comment #9) > Please provide correct information for bug reproducing > Thanks Artyom, install the ovirt guest agent inside the VM, and it will start behaving like that. I had to manually apply the patch on each hosted-agent host to make it run stable. Do you need anything else? After rhevm-guest-agent installation vmGetStats show that vm status is "Running"(in vdsm.log), and hosted vm run without any troubleshoots and restarts. Verified on ovirt-hosted-engine-ha-0.1.0-0.8.rc.el6ev.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0080.html |