Bug 1287195 - ovirt-ha-agent should explicitly fail if the configuration volume is not valid
ovirt-ha-agent should explicitly fail if the configuration volume is not valid
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent (Show other bugs)
Unspecified Unspecified
medium Severity low (vote)
: ovirt-3.6.1
Assigned To: Simone Tiraboschi
Nikolai Sednev
: EasyFix, Triaged
Depends On:
Blocks: ovirt-hosted-engine-ha-
  Show dependency treegraph
Reported: 2015-12-01 12:50 EST by Simone Tiraboschi
Modified: 2016-02-18 05:53 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If for any reasons (eg. data corruption on disk) the configuration volume on the shared domain wasn't valid, ovirt-ha-agent was logging it just at debug level and silently failing. Increasing the log level to make the issue more clear and evident.
Story Points: ---
Clone Of:
Last Closed: 2016-02-18 05:53:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
bmcclain: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 49574 master MERGED log: increasing the log level on shared conf errors Never
oVirt gerrit 50237 ovirt-hosted-engine-ha-1.3 MERGED log: increasing the log level on shared conf errors 2015-12-17 08:18 EST

  None (edit)
Description Simone Tiraboschi 2015-12-01 12:50:08 EST
Description of problem:
ovirt-ha-agent reports an invalid configuration volume just with a debug message and silently fails back to the local copy of the configurations files.
This can made issues on the configuration volumes less evident.

Version-Release number of selected component (if applicable):

How reproducible:
Only as the result of a failed setup

Steps to Reproduce:
1. Manually destroy the configuration volume with dd and restart ha-agent

Actual results:
It silently failback to the local copy of the configuration files with a few debug messages

Expected results:
It should explicitly report the issue 

Additional info:
Comment 1 Nikolai Sednev 2015-12-20 07:03:16 EST
Hi Simone,
Is it possible to describe a bit more in details the reproduction steps please?
May this https://bugzilla.redhat.com/show_bug.cgi?id=1116469 reproduction flow, match our case?
Comment 2 Simone Tiraboschi 2015-12-28 15:03:52 EST
Hi Nikolay.

Deploy hosted-engine as usual, than manually wipe the configuration volume (you can find its uuid in hosted-engine.conf) on the shared storage and restart the agent.
Previously was failing with just an error line at debug level (and normally we log info and upper), now the error should be reported at error level.
Comment 3 Nikolai Sednev 2016-02-01 09:05:01 EST
Executed dd to the conf_volume_UUID=98d26505-2bcf-43e3-9425-a958efefad68 and received error message in/var/log/ovirt-hosted-engine-ha/broker.log, as described bellow:

Thread-24289::ERROR::2016-02-01 16:48:24,175::heconflib::111::ovirt_hosted_engine_ha.broker.notifications.Notifications.config::(validateConfImage
) 'version' is not stored in the HE configuration image
Thread-24289::ERROR::2016-02-01 16:48:24,177::notifications::35::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 11
1] Connection refused
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 24, in send_email
    server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"])
  File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused

Components on host:
Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20160126.0.el7ev)

Linux version 2.6.32-573.12.1.el6.x86_64 (mockbuild@x86-031.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Mon Nov 23 12:55:32 EST 2015

Note You need to log in before you can comment on or make changes to this bug.