Bug 1343434

Summary: Too early timed out while waiting for the disk to be created during upgrade-appliance action
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Jiri Belka <jbelka>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0.0CC: bugs, cshao, dfediuck, stirabos, ycui
Target Milestone: ovirt-4.0.0-rcFlags: rule-engine: ovirt-4.0.0+
rule-engine: blocker+
rule-engine: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+
Target Release: 2.0.0.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-05 07:59:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jiri Belka 2016-06-07 10:41:02 UTC
Description of problem:

~~~
[ INFO  ] Still waiting for new engine VM disk to be created. This may take several minutes...
[ ERROR ] Timed out while waiting for the disk to be created. Please check engine logs.
[ ERROR ] Timed out while waiting for the disk to be created. Please check engine logs.
[ ERROR ] Failed to execute stage 'Misc configuration': Failed creating the new engine VM disk
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607121210-0uaink.log
~~~

~~~
# egrep -i 'executemethod.*(create_disk|exception)' /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607121210-0uaink.log
2016-06-07 12:13:49 DEBUG otopi.context context._executeMethod:128 Stage misc METHOD otopi.plugins.gr_he_upgradeappliance.engine.add_vm_disk.Plugin._create_disk
2016-06-07 12:25:44 DEBUG otopi.context context._executeMethod:142 method exception
~~~

In fact it took 14 mins on my env:

~~~
# grep 7ec0358f-6b0e-4f42-b6cb-7269a7766fe6 /var/log/ovirt-engine/engine.log 
2016-06-07 06:14:35,103 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-1) [18a9c009] Correlation ID: 53beabe3, Job ID: 7ec0358f-6b0e-4f42-b6cb-7269a7766fe6, Call Stack: null, Custom Event ID: -1, Message: Add-Disk operation of 'virtio-disk0' was initiated by admin@internal.
2016-06-07 06:28:02,457 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-21) [] Correlation ID: 53beabe3, Job ID: 7ec0358f-6b0e-4f42-b6cb-7269a7766fe6, Call Stack: null, Custom Event ID: -1, Message: The disk 'virtio-disk0' was successfully added.

# date
Tue Jun  7 06:31:31 EDT 2016
~~~

And IMO the following message is little bit strange as well:

~~~
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
~~~

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.0.0-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. hosted-engine --upgrade-appliance and do some magic so new disk is in state
   'locked' for longer time than the command expects
2. 
3.

Actual results:
hosted-engine --upgrade-appliance just fails, in fact the action was successfully finished in another 2,5 mins

Expected results:
imo it should not just fail, it should probably request some user confirmation

Additional info:
and it does not do cleanup correctly, see BZ1343425

Comment 1 Jiri Belka 2016-06-07 11:28:06 UTC
My workaround to make the setup pass:

# grep 1200 /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-upgradeappliance/engine/add_vm_disk.py
    API_RETRIES = 1200

Comment 2 Doron Fediuck 2016-06-08 07:41:04 UTC
sometimes it's hard to tell if we should keep waiting or not.
We should probably add a question for interactive mode to ask the user if he wants to keep waiting. For the automated installation your workaround is actually a valid way and we only need to document it, since in most cases this should be sufficient.

Comment 3 Simone Tiraboschi 2016-06-08 09:26:05 UTC
We are also explicitly checking from failures in the API response so we are just looping while the creation is pending.

Comment 4 Red Hat Bugzilla Rules Engine 2016-06-09 07:54:46 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Jiri Belka 2016-06-21 07:55:55 UTC
ok, ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch

[root@dell-r210ii-04 ~]# grep '^[[:blank:]]*API_RETRIES' /usr/share/ovirt-hosted-engine-setup/plugins/gr-he-upgradeappliance/engine/add_vm_disk.py
    API_RETRIES = 3600  # one hour
[root@dell-r210ii-04 ~]# rpm -q ovirt-hosted-engine-setup
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch

Comment 6 Sandro Bonazzola 2016-07-05 07:59:40 UTC
oVirt 4.0.0 has been released, closing current release.