Bug 1432880 - [downstream clone - 3.6.11] [HE - Logging] Improve logging of uptodate score
Summary: [downstream clone - 3.6.11] [HE - Logging] Improve logging of uptodate score
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 3.6.9
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-3.6.11
: ---
Assignee: Denis Chaplygin
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1406622
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-16 10:04 UTC by rhev-integ
Modified: 2019-04-28 13:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1406622
Environment:
Last Closed: 2017-05-09 17:05:16 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1213 0 normal SHIPPED_LIVE ovirt-hosted-engine-ha bug fix update for 3.6.11 2017-05-09 21:03:43 UTC
oVirt gerrit 74154 0 ovirt-hosted-engine-ha-1.3 MERGED he: Added additional logging to the score calculation 2017-03-20 11:39:06 UTC

Description rhev-integ 2017-03-16 10:04:20 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1406622 +++
======================================================================

Description of problem:

I have recently worked in a case that one of the hosts kept the 2400 score even after everything was upgraded correctly. All is working fine, except this host which continues with this score. Restaring the ha daemons does not make any difference. It extracts the OVFs just fine, no errors and does not report any errors, but subtracts 1000 from the 3400 base score. This is possibly what was happening in BZ #1337960, which was closed due to insufficient data.

After following the logs (debug enabled), I suspect it was caused by "if spuuid != constants.BLANK_UUID" below, but since it's not logged, it's hard to say.

In order for us to clearly see why and troubleshoot this further the next time it happens, could you please improve logging in these two places:

ovirt_hosted_engine_ha/lib/upgrade.py

    def is_conf_file_uptodate(self):
        uptodate = False
        try:
            volume = self._config.get(config.ENGINE, config.CONF_VOLUME_UUID)
            self._log.debug('Conf volume: %s ' % volume)
            _image = self._config.get(config.ENGINE, config.CONF_IMAGE_UUID)
            self._log.debug('Conf image: %s ' % _image)
            spuuid = self._config.get(config.ENGINE, config.SP_UUID)
[1]         if spuuid == constants.BLANK_UUID:  
                uptodate = True
        except (KeyError, ValueError):
            uptodate = False
        return uptodate

ovirt_hosted_engine_ha/agent/states.py

        score = score_cfg['base-score']

        upgrademgr = upgrade.Upgrade()
        if not upgrademgr.is_conf_file_uptodate():
            score -= score_cfg['not-uptodate-config-penalty']
            [2]

These are the requests:
[1] Print a message when this condition is not met.
[2] Add a "Penalizing score ..." message as we have for other cases

Finally, please also consider backporting it to 4.0 and 3.6.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.3.8
master

(Originally by Germano Veit Michel)

Comment 1 rhev-integ 2017-03-16 10:04:25 UTC
It's actually:

ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch

(Originally by Germano Veit Michel)

Comment 2 rhev-integ 2017-03-16 10:04:29 UTC
This is more of a debugging level, than standard logging output.
So when implementing please ensure it gets the relevant log level.

(Originally by Doron Fediuck)

Comment 6 rhev-integ 2017-03-16 10:04:43 UTC
3.5->3.6 upgrade flow should provide the reproduction, see https://bugzilla.redhat.com/show_bug.cgi?id=1337960 and if it will be hit, then this bug fix should report ['not-uptodate-config-penalty'] with 2400 score (penalty of 1000). But if you won't back-port this fix to 3.6, then only after 3.6->4.0 upgrade, the fix would be verified.

(Originally by Nikolai Sednev)

Comment 9 Nikolai Sednev 2017-04-24 16:28:07 UTC
Logging was improved for latest 3.6.11:
Engine:
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch
rhevm-sdk-python-3.6.9.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-setup-base-3.6.11.1-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.11.1-0.1.el6.noarch
rhevm-doc-3.6.10-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-backend-3.6.11.1-0.1.el6.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
rhevm-setup-3.6.11.1-0.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch
rhevm-restapi-3.6.11.1-0.1.el6.noarch
rhevm-dwh-3.6.8-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.11.1-0.1.el6.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
rhevm-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-tools-3.6.11.1-0.1.el6.noarch
rhevm-3.6.11.1-0.1.el6.noarch
rhevm-lib-3.6.11.1-0.1.el6.noarch
rhevm-dwh-setup-3.6.8-1.el6ev.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-websocket-proxy-3.6.11.1-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.11.1-0.1.el6.noarch
rhevm-webadmin-portal-3.6.11.1-0.1.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
ovirt-vmconsole-1.0.4-1.el6ev.noarch
rhevm-userportal-3.6.11.1-0.1.el6.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.11.1-0.1.el6.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhevm-tools-backup-3.6.11.1-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-dbscripts-3.6.11.1-0.1.el6.noarch
Linux version 2.6.32-642.13.2.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Sat Feb 18 16:54:58 EST 2017
Linux 2.6.32-642.13.2.el6.x86_64 #1 SMP Sat Feb 18 16:54:58 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Hosts:
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.10-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
mom-0.5.6-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.17.39-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
rhevm-sdk-python-3.6.9.1-1.el7ev.noarch
Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017
Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)


/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/states.py :
  upgrademgr = upgrade.Upgrade()
        if not upgrademgr.is_conf_file_uptodate():
            logger.info("Penalizing score by %d due to "
                        "not up-to-date VM configuration",
                        score_cfg['not-uptodate-config-penalty'])
            score -= score_cfg['not-uptodate-config-penalty']

/usr/lib/python2.7/site-packages/vdsm/tool/upgrade.py 

def _upgrade_seal(upgrade):
    seal_file = _upgrade_seal_path(upgrade)
    try:
        utils.touchFile(seal_file)
    except (OSError, IOError):
        _get_upgrade_log().exception("Failed to seal upgrade %s", upgrade.name)
    else:
        utils.persist(seal_file)
        _get_upgrade_log().debug("Upgrade %s successfully performed",
                                 upgrade.name)

/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py:
  def upgrade_35_36(self):
        uptodate = self.is_conf_file_uptodate()
        if uptodate:
            self._log.info('Host configuration is already up-to-date')
            return False
        else:
            self._log.info('Upgrading to current version')
            if not self._is_in_engine_maintenance():
                self._log.error(
                    'Unable to upgrade while not in maintenance mode: '
                    'please put this host into maintenance mode '
                    'from the engine, and manually restart this service '
                    'when ready'
                )
                return False

            self._startMonitoringDomain()
            content = self._get_conffile_content(
                constants.HEConfFiles.HECONFD_HECONF,
                self._update_conf_file_35_36,
            )
            self._wrote_updated_conf_file(content)
            self._log.info('Successfully upgraded')
            return True



Moving to verified as I've managed to successfully upgrade from 3.5 to 3.6, without any problems with score (got 3400 score) on 3.5->3.6 upgraded hosts.
Within the logs I've seen normal logging and hosted-storage was successfully upgraded:
MainThread::INFO::2017-04-24 18:21:11,331::upgrade::823::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_remove_storage_pool) Successfully removed the shared pool
MainThread::INFO::2017-04-24 18:21:42,222::upgrade::1046::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Successfully upgraded

Comment 11 errata-xmlrpc 2017-05-09 17:05:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1213


Note You need to log in before you can comment on or make changes to this bug.