Bug 1101299

Summary: Hosted engine upgrade from 3.3 to 3.4, ovirt-ha-agent die after three errors
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: ovirt-hosted-engine-haAssignee: Jiri Moskovcak <jmoskovc>
Status: CLOSED ERRATA QA Contact: Artyom <alukiano>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.0CC: acathrow, adahms, cpelland, dfediuck, gklein, iheim, jmoskovc, mavital, pstehlik, sbonazzo, sherold
Target Milestone: ---   
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: ovirt-hosted-engine-ha-1.1.2-5.el6ev Doc Type: Bug Fix
Doc Text:
Previously, upgrading an environment in a hosted engine configuration would fail under certain conditions. This was caused by an error in the code used to store the state of the engine during the upgrade process, whereby the state could be correctly parsed in a Red Hat Enterprise Virtualization 3.4 environment, but not in a Red Hat Enterprise Virtualization 3.3 environment. Now, this code has been updated so that the state of the engine can be correctly parsed by both version.
Story Points: ---
Clone Of: 1092075 Environment:
Last Closed: 2014-06-09 14:26:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artyom 2014-05-26 16:45:30 UTC
Description of problem:
Exception appeared in time of upgrade all environment from 3.3 to 3.4
Have hosted engine environment environment:
In step when environment have one host_3.3 with hosted-engine 3.3(on it run vm now)
and another host_3.4 with 3.4, agent on host_3.3 shutdown because 3 errors:
MainThread::WARNING::2014-05-26 19:12:00,466::hosted_engine::336::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 329, in start_monitoring
    self._collect_all_host_stats()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 885, in _collect_all_host_stats
    in json.loads(md['engine-status']).iteritems()])
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
    raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)

Version-Release number of selected component (if applicable):
Before upgrade:
2 Host - ovirt-hosted-engine-ha.noarch 0:1.0.0-3.el6ev
engine vm - is36.4
After upgrade:
2 Host - ovirt-hosted-engine-ha.noarch 0:1.0.0-3.el6ev
engine vm - av9.2

How reproducible:
Always

Steps to Reproduce:
Have hosted engine environment environment(hosts and engine vm 3.3)
1. From one of hosts  hosted-engine --set-maintenance --mode=global
2. Put to maintenance host without engine vm
3. Upgrade engine vm to 3.4
4. Upgrade host to 3.4(now it host_3.4), service vdsmd restart && service ovirt-ha-broker restart && service ovirt-ha-agent restart
5. Upgrade engine vm to 3.4
6. hosted-engine --set-maintenance --mode=none
7. Wait few minute until error appear in agent.log of host_3.3 

Actual results:
Appear error in agent.log and HA agent failed to start

Expected results:
No error agent.log success to start

Additional info:
If after all steps I run hosted-engine --set-maintenance --mode=global
and upgrade also second host(without maintenance and with vm) all back to normal and HA agent succes to run on host_3.4

Comment 1 Jiri Moskovcak 2014-05-29 14:07:01 UTC
*** Bug 1099395 has been marked as a duplicate of this bug. ***

Comment 2 Jiri Moskovcak 2014-05-30 09:05:41 UTC
I got this during the update, but not sure if it's connected: 

warning: /etc/vdsm/vdsm.conf created as /etc/vdsm/vdsm.conf.rpmnew

Checking configuration status...

Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 145, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 142, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 230, in configure
    service.service_stop(s)
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 370, in service_stop
    return _runAlts(_srvStopAlts, srvName)
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 351, in _runAlts
    "%s failed" % alt.func_name, out, err)
vdsm.tool.service.ServiceOperationError: ServiceOperationError: _serviceStop failed
Sending stop signal sanlock (7145): [  OK  ]
Waiting for sanlock (7145) to stop:[FAILED]

Comment 3 Jiri Moskovcak 2014-05-30 13:23:03 UTC
As i turned out it is a bug in 3.4 which makes 3.4 generate wrong json which 3.3 code is not able to parse -> moving to 3.4

Comment 5 Artyom 2014-06-01 10:40:54 UTC
Verified on ovirt-hosted-engine-ha-1.1.2-5.el6ev.noarch

Comment 6 errata-xmlrpc 2014-06-09 14:26:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0671.html