Bug 1034787
| Summary: | [RFE] Hosted-HA should not start infinite reboot loop when VDSM reports the engine VM as "Running" | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Pablo Iranzo Gómez <pablo.iranzo> | ||||||||
| Component: | ovirt-hosted-engine-ha | Assignee: | Martin Sivák <msivak> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.3.0 | CC: | acathrow, alukiano, gpadgett, iheim, lyarwood, mavital, pablo.iranzo, sbonazzo, scohen | ||||||||
| Target Milestone: | --- | Keywords: | FutureFeature | ||||||||
| Target Release: | 3.3.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | sla | ||||||||||
| Fixed In Version: | ovirt-hosted-engine-ha-0.1.0-0.8.rc.el6ev | Doc Type: | Enhancement | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2014-01-21 16:51:42 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1020228 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Pablo Iranzo Gómez
2013-11-26 14:04:45 UTC
Created attachment 829281 [details]
Agent log
Created attachment 829282 [details]
Broker log
Created attachment 829295 [details]
vdsm log
This is the result of vmGetStats call to VDSM. Notice the Status field with value Running. That is not a valid status..
{'statsList': [{'acpiEnable': 'true',
'appsList': [u'rhevm-guest-agent-common-1.0.8-6.el6ev',
u'kernel-2.6.32-431.el6'],
'balloonInfo': {},
'clientIp': '',
'cpuSys': '1.79',
'cpuUser': '3.86',
'disks': {u'hdc': {'apparentsize': '0',
'flushLatency': '0',
'readLatency': '0',
'readRate': '0.00',
'truesize': '0',
'writeLatency': '0',
'writeRate': '0.00'},
u'vda': {'apparentsize': '32212254720',
'flushLatency': '95602',
'imageID': '59bc6b3e-9109-4bdc-8141-2e1a27149a05',
'readLatency': '14869881',
'readRate': '1375.47',
'truesize': '6963359744',
'writeLatency': '390780435',
'writeRate': '9824.76'}},
'disksUsage': [{u'fs': u'ext4',
u'path': u'/',
u'total': '26958753792',
u'used': '6088613888'},
{u'fs': u'ext4',
u'path': u'/boot',
u'total': '507744256',
u'used': '40357888'}],
'displayIp': '0',
'displayPort': u'5900',
'displaySecurePort': u'5901',
'displayType': 'qxl',
'elapsedTime': '1998',
'guestFQDN': u'rhevm.example.com',
'guestIPs': u'192.168.2.115',
'guestName': u'rhevm.example.com',
'guestOs': u'2.6.32-431.el6.x86_64',
'hash': '-7610418427437194438',
'kvmEnable': 'true',
'lastLogin': 1385472572.382862,
'memUsage': '41',
'memoryStats': {u'majflt': '0',
u'mem_free': '1851624',
u'mem_total': '2956596',
u'mem_unused': '1503868',
u'pageflt': '19',
u'swap_in': '0',
u'swap_out': '0',
u'swap_total': '4194296',
u'swap_usage': '0'},
'monitorResponse': '0',
'netIfaces': [{u'hw': u'00:16:3e:15:97:16',
u'inet': [u'192.168.2.115'],
u'inet6': [u'fe80::216:3eff:fe15:9716'],
u'name': u'eth0'}],
'network': {u'vnet0': {'macAddr': '00:16:3e:15:97:16',
'name': u'vnet0',
'rxDropped': '0',
'rxErrors': '0',
'rxRate': '0.0',
'speed': '1000',
'state': 'unknown',
'txDropped': '0',
'txErrors': '0',
'txRate': '0.0'}},
'pauseCode': 'NOERR',
'pid': '11507',
'session': 'Unknown',
'statsAge': '1.80',
'status': 'Running',
'timeOffset': '0',
'username': u'None',
'vmId': 'ebdc068e-a1b6-4403-a8f3-1a44db957e15',
'vmType': 'kvm'}],
'status': {'code': 0, 'message': 'Done'}}
The running status is a result of having a guest agent installed, and it is valid. The HA should be able to handle it as well. Here is the list of valid states:
vm.py:
VALID_STATES = ('Down', 'Migration Destination', 'Migration Source',
'Paused', 'Powering down', 'RebootInProgress',
'Restoring state', 'Saving State',
'Up', 'WaitForLaunch')
and API description in the json file:
##
# @VmStatus:
#
# An enumeration of possible virtual machine statuses.
#
# @Down: The VM is powered off
#
# @Migration Destination: The VM is migrating to this host
#
# @Migration Source: The VM is migrating away from this host
#
# @Paused: The VM is paused
#
# @Powering down: A shutdown command has been sent to the VM
#
# @RebootInProgress: The VM is currently rebooting
#
# @Restoring state: The VM is waking from hibernation
#
# @Saving State: The VM is preparing for hibernation
#
# @Up: The VM is running
#
# @WaitForLaunch: The VM is being created
#
# Since: 4.10.0
##
{'enum': 'VmStatus',
'data': ['Down', 'Migration Destination', 'Migration Source', 'Paused',
'Powering down', 'RebootInProgress', 'Restoring state',
'Saving State', 'Up', 'WaitForLaunch']}
So if Running is valid, it is undocumented.
And here is the code that causes the bug:
vm.py:
def _getStatsInternal(self):
# used by API.Vm.getStats
def _getGuestStatus():
GUEST_WAIT_TIMEOUT = 60
now = time.time()
if now - self._guestEventTime < 5 * GUEST_WAIT_TIMEOUT and \
self._guestEvent == 'Powering down':
return self._guestEvent
if self.guestAgent and self.guestAgent.isResponsive() and \
self.guestAgent.getStatus():
return self.guestAgent.getStatus() # !! HERE !!
if now - self._guestEventTime < GUEST_WAIT_TIMEOUT:
return self._guestEvent
return 'Up'
Please provide correct information for bug reproducing Thanks Also I was unable to reach status "Running" for vm, via getVmStats I saw that first vm in "Powering Up" mode and after this in "Up" mode. (In reply to Artyom from comment #9) > Please provide correct information for bug reproducing > Thanks Artyom, install the ovirt guest agent inside the VM, and it will start behaving like that. I had to manually apply the patch on each hosted-agent host to make it run stable. Do you need anything else? After rhevm-guest-agent installation vmGetStats show that vm status is "Running"(in vdsm.log), and hosted vm run without any troubleshoots and restarts. Verified on ovirt-hosted-engine-ha-0.1.0-0.8.rc.el6ev.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0080.html |