Bug 1311955

Summary: discrepancies in web ui for failed create cluster task
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Martin Bukatovic <mbukatov>
Component: coreAssignee: Darshan <dnarayan>
Status: CLOSED ERRATA QA Contact: Martin Kudlej <mkudlej>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2CC: mkudlej, nthomas, rnachimu, sankarshan
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhscon-core-0.0.28-1.el7scon.x86_64.rpm Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:47:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1344195    
Attachments:
Description Flags
screenshot of progressbar and clusters page
none
screenshot of progressbar and tasks page none

Description Martin Bukatovic 2016-02-25 11:47:27 UTC
Description of problem
======================

When "Create Cluster" task fail, I see discrepancy in it's status
reported by various components of usm web ui.

Version-Release number of selected component
============================================

rhscon-ceph-0.0.6-8.el7.x86_64
rhscon-core-0.0.8-7.el7.x86_64
rhscon-ui-0.0.16-1.el7.noarch

How reproducible
================

Hard to say, because this BZ assumes that cluster creation fails
in a specific way.

Steps to Reproduce
==================

1. Install skyring on server and prepare few hosts for cluster setup
2. Accept all nodes
3. Start "Create Cluster" wizard and create a cluster using def. config
4. The Create Cluster task fails

Actual results
==============

State of the cluster and create cluster task is not aligned through various
components of usm web ui:

 * task popup window still shows unfinished "running" progressbar for this task
 * clusters page shows this cluster in a failed state
   (red cross in a circle icon is displayed next to the cluster name)
 * tasks page shows the task as *Failed*, but the failed icon is missing here

The other problem is that when I let the task progressbar running for about
an hour, it didn't change in any way.

Expected results
================

All usm web components would show that the task either still running or failed,
discrepancies between task popup and tasks page should not happen.

Taks progressbar should stop/finish immediately when the task fails.

Additional info
===============

Hopefully useful/related part of the logs, full skyring.log file attached.

~~~
2016-02-25T11:03:26+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node4.example.com': {'file_|-/etc/ceph/mbukatov-usm1-cluster1.conf_|
-/etc/ceph/mbukatov-usm1-cluster1.conf_|-managed': {'comment': 'File /etc/ceph/mbukatov-usm1-cluster1.conf is in the correct state', 'name': '/etc/ceph/mbukatov-usm1-cl
uster1.conf', 'start_time': '11:03:26.342414', 'result': True, 'duration': 16.959, '__run_num__': 0, 'changes': {}}}}
2016-02-25T11:03:26+0000 ERROR    saltwrapper.py:498 saltwrapper.AddOSD] admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc-add_osd failed. error={'mbukatov-usm1-node4.os1.phx2
.redhat.com': {'pid': 31390, 'retcode': 1, 'stderr': "2016-02-25 11:03:25.653244 7fe2efe24780 -1 did not load config file, using default settings.\n2016-02-25 11:03:25.
665761 7f1ef057d780 -1 did not load config file, using default settings.\nlibust[31401/31401]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user 
tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\nceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'mbukatov-usm1-cl
uster1', 'start', 'osd.3']' returned non-zero exit status 1\nceph-disk: Error: One or more partitions failed to activate", 'stdout': '/etc/init.d/ceph: osd.3 not found 
(/etc/ceph/mbukatov-usm1-cluster1.conf defines osd.usm1-cluster1-3 mon.a, /var/lib/ceph defines osd.usm1-cluster1-3)'}}
ESC[31m2016-02-25T11:03:26.381+01:00 ERROR    utils.go:133 FailTask]ESC[0m admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc-Failed adding all OSDs while create cluster mbukat
ov-usm1-cluster1: <nil>
ESC[31m2016-02-25T11:03:28.359+01:00 ERROR    cluster.go:227 funcĀ·001]ESC[0m admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc- Failed to create the cluster mbukatov-usm1-clus
ter1
2016-02-25T11:03:28.359+01:00 ERROR    cluster.go:227 funcĀ·001] admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc- Failed to create the cluster mbukatov-usm1-cluster1
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:75 ReleaseLock]ESC[0m Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[251ae1f8-c9bd-43bf-a78c
-10aa5afe7ec9:0xc20b4527c0 de342cd8-2535-4eb2-8583-ce03535c3fe7:0xc20b4529e0 462b1a36-c987-4a2b-806d-2b199e632630:0xc20b452c00 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:0xc2
0b452e20 fdff7bc2-1c27-4454-b775-f7ec48a52edb:0xc20b453040])
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:75 ReleaseLock] Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9
:0xc20b4527c0 de342cd8-2535-4eb2-8583-ce03535c3fe7:0xc20b4529e0 462b1a36-c987-4a2b-806d-2b199e632630:0xc20b452c00 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:0xc20b452e20 fdff
7bc2-1c27-4454-b775-f7ec48a52edb:0xc20b453040])
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:76 ReleaseLock]ESC[0m Releasing the locks for:%!(EXTRA map[uuid.UUID]string=map[251ae1f8-c9bd-43bf-a78c-10aa
5afe7ec9:POST_Clusters : mbukatov-usm1-node3.example.com de342cd8-2535-4eb2-8583-ce03535c3fe7:POST_Clusters : mbukatov-usm1-node4.example.com 462b1a36-c
987-4a2b-806d-2b199e632630:POST_Clusters : mbukatov-usm1-mon1.example.com 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:POST_Clusters : mbukatov-usm1-node1.os1.phx2.redh
at.com fdff7bc2-1c27-4454-b775-f7ec48a52edb:POST_Clusters : mbukatov-usm1-node2.example.com])
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:76 ReleaseLock] Releasing the locks for:%!(EXTRA map[uuid.UUID]string=map[870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:POST
_Clusters : mbukatov-usm1-node1.example.com fdff7bc2-1c27-4454-b775-f7ec48a52edb:POST_Clusters : mbukatov-usm1-node2.example.com 251ae1f8-c9bd-43bf-a78c
-10aa5afe7ec9:POST_Clusters : mbukatov-usm1-node3.example.com de342cd8-2535-4eb2-8583-ce03535c3fe7:POST_Clusters : mbukatov-usm1-node4.example.com 462b1
a36-c987-4a2b-806d-2b199e632630:POST_Clusters : mbukatov-usm1-mon1.example.com])
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]ESC[0m Lock Released: %!(EXTRA uuid.UUID=fdff7bc2-1c27-4454-b775-f7ec48a52edb)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=fdff7bc2-1c27-4454-b775-f7ec48a52edb)
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]ESC[0m Lock Released: %!(EXTRA uuid.UUID=251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9)
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]ESC[0m Lock Released: %!(EXTRA uuid.UUID=de342cd8-2535-4eb2-8583-ce03535c3fe7)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=de342cd8-2535-4eb2-8583-ce03535c3fe7)
ESC[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]ESC[0m Lock Released: %!(EXTRA uuid.UUID=462b1a36-c987-4a2b-806d-2b199e632630)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=462b1a36-c987-4a2b-806d-2b199e632630)
ESC[36m2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:83 ReleaseLock]ESC[0m Lock Released: %!(EXTRA uuid.UUID=870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0)
2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0)
ESC[36m2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:86 ReleaseLock]ESC[0m Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[])
2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:86 ReleaseLock] Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[])
~~~

Comment 2 Martin Bukatovic 2016-02-25 11:50:41 UTC
Created attachment 1130492 [details]
screenshot of progressbar and clusters page

Comment 3 Martin Bukatovic 2016-02-25 11:51:13 UTC
Created attachment 1130493 [details]
screenshot of progressbar and tasks page

Comment 5 Martin Bukatovic 2016-03-18 18:54:14 UTC
Updating the BZ so that this issue is easier to reproduce with the latest
builds.

ceph-0.94.5-9.el7cp.x86_64
ceph-ansible-1.0.1-1.20160307gitb354445.el7.noarch
ceph-common-0.94.5-9.el7cp.x86_64
redhat-ceph-installer-0.2.3-1.20160304gitb3e3c68.el7.noarch
rhscon-ceph-0.0.6-14.el7.x86_64
rhscon-core-0.0.8-14.el7.x86_64
rhscon-ui-0.0.23-1.el7.noarch

Steps to Reproduce
==================

1. Prepare node machines for the cluster. Make sure that all additional disks
   on OSD machines have a zero size (this will make the create cluster task
   fail later).
2. Install skyring on usm server machine.
3. Accept all nodes (machines you have prepared in the step #1).
4. Start "Create Cluster" wizard and create a cluster using def. config
5. Wait until the Create Cluster task fails as expected.

Actual results
==============

State of the cluster and create cluster task is not aligned through various
components of usm web ui:

* Tasks page shows the task as *Failed* (which is a correct description here).
* Clusters page still shows the cluster in a 'Creating' state, the progressbar
  is unfinished and reads 'Creating'.

Comment 7 Nishanth Thomas 2016-06-21 13:21:18 UTC
*** Bug 1341504 has been marked as a duplicate of this bug. ***

Comment 8 Darshan 2016-06-23 05:37:06 UTC
Fix patch: https://review.gerrithub.io/#/c/281235/

Comment 9 Martin Kudlej 2016-08-09 06:28:40 UTC
Tested with
ceph-ansible-1.0.5-32.el7scon.noarch
ceph-installer-1.0.14-1.el7scon.noarch
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-ui-0.0.52-1.el7scon.noarch
and 
1) cluster creation task has passed even if no OSD has been added
2) cluster is in failed state because there is no OSD

For 1) there is already BZ
2) is OK

--> VERIFIED

Comment 11 errata-xmlrpc 2016-08-23 19:47:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754