Bug 1432880

Summary: [downstream clone - 3.6.11] [HE - Logging] Improve logging of uptodate score
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-hosted-engine-haAssignee: Denis Chaplygin <dchaplyg>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: medium    
Version: 3.6.9CC: dfediuck, lsurette, mavital, msivak, ykaul, ylavi
Target Milestone: ovirt-3.6.11Keywords: Improvement, Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1406622 Environment:
Last Closed: 2017-05-09 17:05:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1406622    
Bug Blocks:    

Description rhev-integ 2017-03-16 10:04:20 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1406622 +++
======================================================================

Description of problem:

I have recently worked in a case that one of the hosts kept the 2400 score even after everything was upgraded correctly. All is working fine, except this host which continues with this score. Restaring the ha daemons does not make any difference. It extracts the OVFs just fine, no errors and does not report any errors, but subtracts 1000 from the 3400 base score. This is possibly what was happening in BZ #1337960, which was closed due to insufficient data.

After following the logs (debug enabled), I suspect it was caused by "if spuuid != constants.BLANK_UUID" below, but since it's not logged, it's hard to say.

In order for us to clearly see why and troubleshoot this further the next time it happens, could you please improve logging in these two places:

ovirt_hosted_engine_ha/lib/upgrade.py

    def is_conf_file_uptodate(self):
        uptodate = False
        try:
            volume = self._config.get(config.ENGINE, config.CONF_VOLUME_UUID)
            self._log.debug('Conf volume: %s ' % volume)
            _image = self._config.get(config.ENGINE, config.CONF_IMAGE_UUID)
            self._log.debug('Conf image: %s ' % _image)
            spuuid = self._config.get(config.ENGINE, config.SP_UUID)
[1]         if spuuid == constants.BLANK_UUID:  
                uptodate = True
        except (KeyError, ValueError):
            uptodate = False
        return uptodate

ovirt_hosted_engine_ha/agent/states.py

        score = score_cfg['base-score']

        upgrademgr = upgrade.Upgrade()
        if not upgrademgr.is_conf_file_uptodate():
            score -= score_cfg['not-uptodate-config-penalty']
            [2]

These are the requests:
[1] Print a message when this condition is not met.
[2] Add a "Penalizing score ..." message as we have for other cases

Finally, please also consider backporting it to 4.0 and 3.6.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.3.8
master

(Originally by Germano Veit Michel)

Comment 1 rhev-integ 2017-03-16 10:04:25 UTC
It's actually:

ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch

(Originally by Germano Veit Michel)

Comment 2 rhev-integ 2017-03-16 10:04:29 UTC
This is more of a debugging level, than standard logging output.
So when implementing please ensure it gets the relevant log level.

(Originally by Doron Fediuck)

Comment 6 rhev-integ 2017-03-16 10:04:43 UTC
3.5->3.6 upgrade flow should provide the reproduction, see https://bugzilla.redhat.com/show_bug.cgi?id=1337960 and if it will be hit, then this bug fix should report ['not-uptodate-config-penalty'] with 2400 score (penalty of 1000). But if you won't back-port this fix to 3.6, then only after 3.6->4.0 upgrade, the fix would be verified.

(Originally by Nikolai Sednev)

Comment 9 Nikolai Sednev 2017-04-24 16:28:07 UTC
Logging was improved for latest 3.6.11:
Engine:
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch
rhevm-sdk-python-3.6.9.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-setup-base-3.6.11.1-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.11.1-0.1.el6.noarch
rhevm-doc-3.6.10-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-backend-3.6.11.1-0.1.el6.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
rhevm-setup-3.6.11.1-0.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch
rhevm-restapi-3.6.11.1-0.1.el6.noarch
rhevm-dwh-3.6.8-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.11.1-0.1.el6.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
rhevm-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-tools-3.6.11.1-0.1.el6.noarch
rhevm-3.6.11.1-0.1.el6.noarch
rhevm-lib-3.6.11.1-0.1.el6.noarch
rhevm-dwh-setup-3.6.8-1.el6ev.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-websocket-proxy-3.6.11.1-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.11.1-0.1.el6.noarch
rhevm-webadmin-portal-3.6.11.1-0.1.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
ovirt-vmconsole-1.0.4-1.el6ev.noarch
rhevm-userportal-3.6.11.1-0.1.el6.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.11.1-0.1.el6.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhevm-tools-backup-3.6.11.1-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-dbscripts-3.6.11.1-0.1.el6.noarch
Linux version 2.6.32-642.13.2.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Sat Feb 18 16:54:58 EST 2017
Linux 2.6.32-642.13.2.el6.x86_64 #1 SMP Sat Feb 18 16:54:58 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Hosts:
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.10-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
mom-0.5.6-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.17.39-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
rhevm-sdk-python-3.6.9.1-1.el7ev.noarch
Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017
Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)


/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/states.py :
  upgrademgr = upgrade.Upgrade()
        if not upgrademgr.is_conf_file_uptodate():
            logger.info("Penalizing score by %d due to "
                        "not up-to-date VM configuration",
                        score_cfg['not-uptodate-config-penalty'])
            score -= score_cfg['not-uptodate-config-penalty']

/usr/lib/python2.7/site-packages/vdsm/tool/upgrade.py 

def _upgrade_seal(upgrade):
    seal_file = _upgrade_seal_path(upgrade)
    try:
        utils.touchFile(seal_file)
    except (OSError, IOError):
        _get_upgrade_log().exception("Failed to seal upgrade %s", upgrade.name)
    else:
        utils.persist(seal_file)
        _get_upgrade_log().debug("Upgrade %s successfully performed",
                                 upgrade.name)

/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py:
  def upgrade_35_36(self):
        uptodate = self.is_conf_file_uptodate()
        if uptodate:
            self._log.info('Host configuration is already up-to-date')
            return False
        else:
            self._log.info('Upgrading to current version')
            if not self._is_in_engine_maintenance():
                self._log.error(
                    'Unable to upgrade while not in maintenance mode: '
                    'please put this host into maintenance mode '
                    'from the engine, and manually restart this service '
                    'when ready'
                )
                return False

            self._startMonitoringDomain()
            content = self._get_conffile_content(
                constants.HEConfFiles.HECONFD_HECONF,
                self._update_conf_file_35_36,
            )
            self._wrote_updated_conf_file(content)
            self._log.info('Successfully upgraded')
            return True



Moving to verified as I've managed to successfully upgrade from 3.5 to 3.6, without any problems with score (got 3400 score) on 3.5->3.6 upgraded hosts.
Within the logs I've seen normal logging and hosted-storage was successfully upgraded:
MainThread::INFO::2017-04-24 18:21:11,331::upgrade::823::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_remove_storage_pool) Successfully removed the shared pool
MainThread::INFO::2017-04-24 18:21:42,222::upgrade::1046::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Successfully upgraded

Comment 11 errata-xmlrpc 2017-05-09 17:05:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1213