1335631 – Task Details page doesn't report status in a clear way

Bug 1335631 - Task Details page doesn't report status in a clear way

Summary: Task Details page doesn't report status in a clear way

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	UI
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3
Assignee:	sankarshan
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-12 17:43 UTC by Martin Bukatovic
Modified:	2017-03-23 04:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 04:11:59 UTC
Embargoed:

Attachments	(Terms of Use)
screenshot of Task Details page (147.15 KB, image/png) 2016-05-12 17:43 UTC, Martin Bukatovic	no flags	Details
View All

Description Martin Bukatovic 2016-05-12 17:43:21 UTC

Created attachment 1156749 [details]
screenshot of Task Details page

Description of problem
======================

The current implementation of *Task Details* page doesn't follow proposed
design. Instead of *Execution steps*, *Events* and *Logs* list, there are two
list with *Execution steps* with completely different semantics compared to
the original design.

The most important difference (from QE perspective) is that the list of Events
(with items clearly classified into Info/Warning/Error categories) is missing.
This feature is important for UI to be able to clearly communicate overall
status when a tasks finishes with success, but with few minor or major
problems.

The other problem is that the idea behind separation of *Execution Steps* into
2 groups *Status* and *Sub Tasks* is not clear from the page itself.

For example, while it makes sense to mark a *Cluster Create* task as "success"
when a cluster is created even though there is a OSD node which failed to join
it properly (because the cluster has been created after all and it doesn't make
sense to block the whole eg. 20 node cluster because of failure on a single
node), it's important to communicate this fact to the user in a clear way. And
this is not possible right now with the current implementation.

Full example of such scenario is given in "Steps to Reproduce" section.

Version-Release
===============

rhscon-ui-0.0.29-1.el7scon.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Prepare some machines for a new cluster.
2. Start *Create Cluster* wizard to create a new cluster, but break one future
   OSD host on purpose[1] (so that the installation or configuration of this
   host is going to fail).
3. See information on the *Task Details* page of this *Create Cluster* task.

[1] eg. kill the machine (virsh destroy) right after the packages has been
installed while the task is still running (so that ceph-provider would
not be able to configure OSDs there later)

Actual results
==============

The *Create Cluster* task finishes with success and there is a green ok icon
next in the top of the Task Details page (which is ok), but the fact
that one host was lost during the task ans so that the OSD setup failed there
is not easy to spot.

The first list of so called *Execution Steps* (labeled as *Status*) shows:

(see also attached screenshot)

~~~
1 	Started the task for cluster creation: e64a6d1d-9edc-4a55-891e-52a31e73b02e 	May 12 2016, 06:32:05 PM
2 	Installing packages 	May 12 2016, 06:32:05 PM
3 	Installed packages on: dhcp-126-79.lab.eng.brq.redhat.com 	May 12 2016, 06:39:18 PM
4 	Installed packages on: dhcp-126-83.lab.eng.brq.redhat.com 	May 12 2016, 06:43:29 PM
5 	Installed packages on: dhcp-126-85.lab.eng.brq.redhat.com 	May 12 2016, 06:43:29 PM
6 	Installed packages on: dhcp-126-82.lab.eng.brq.redhat.com 	May 12 2016, 06:46:19 PM
7 	Installed packages on: dhcp-126-84.lab.eng.brq.redhat.com 	May 12 2016, 06:49:40 PM
8 	Installing packages done. Starting Cluster Creation. 	May 12 2016, 06:49:40 PM
9 	Started provider task: 7311af53-e8e5-441f-8552-711700bcaf39 	May 12 2016, 06:50:01 PM
10 	Updating the monitoring configuration 	May 12 2016, 06:56:14 PM
11 	Starting disk sync 	May 12 2016, 06:56:14 PM
12 	Initializing Monitoring Schedules 	May 12 2016, 06:57:20 PM
13 	Success 	May 12 2016, 06:57:20 PM
~~~

The 2nd list of *Execution Steps* (labeled as *Sub Tasks*) shows:

~~~
1 	Started ceph provider task for cluster creation: 7311af53-e8e5-441f-8552-711700bcaf39 	May 12 2016, 06:50:01 PM
2 	Persisting cluster details 	May 12 2016, 06:50:01 PM
3 	Configuring the mons 	May 12 2016, 06:50:01 PM
4 	Added mon node: dhcp-126-79.lab.eng.brq.redhat.com 	May 12 2016, 06:50:41 PM
5 	Persisting mons 	May 12 2016, 06:50:41 PM
6 	Configuring the OSDs 	May 12 2016, 06:50:42 PM
7 	Added (dhcp-126-83.lab.eng.brq.redhat.com /dev/vdc) 	May 12 2016, 06:52:12 PM
8 	Added (dhcp-126-84.lab.eng.brq.redhat.com /dev/vdc) 	May 12 2016, 06:53:23 PM
9 	Added (dhcp-126-85.lab.eng.brq.redhat.com /dev/vdc) 	May 12 2016, 06:54:33 PM
10 	Syncing the OSD status 	May 12 2016, 06:54:33 PM
11 	OSD addition failed for [dhcp-126-82.lab.eng.brq.redhat.com:map[/dev/vdc:/dev/vdb]] 	May 12 2016, 06:56:03 PM
12 	Updating the status of the cluster 	May 12 2016, 06:56:03 PM
13 	Removing default created pool "rbd" 	May 12 2016, 06:56:04 PM
14 	Could not delete the default create pool "rbd" 	May 12 2016, 06:56:14 PM
15 	Creating default EC profiles 	May 12 2016, 06:56:14 PM
16 	Success 	May 12 2016, 06:56:14 PM
~~~

As you can see, the fail is actually listed among the other entries in the
second list. On the other hand, this is easy to miss and one could be tricked
into thinking that everything went fine without any error - which is not the
case. Especially when last entries in both steps lists states "Success" and
there is a green ok icon on the top.

Expected results
================

Task Details page provides a quick way to check the status of the task, which
includes both the final overall status and the number and severity of failures
which may happened.

Additional info
===============

And while there is no event or message reported on the top sidebar to alert
the user about a fail which happened either (as can be seen on the screenshot),
the fact that the host has been lost was processed and it's possible to see it
in the *Events* page with a red error icon:

~~~
major May 12, 2016 7:02:28 PM Host: dhcp-126-82.lab.eng.brq.redhat.com lost contact
~~~

But had the issue on the affected host been less brutal so that it wouldn't
have triggered a general event (like the "host lost contact" in this case),
the failure from the Create Cluster task would not be easy to spot anywhere
now (but on the list of steps of task details page).

Comment 1 Nishanth Thomas 2016-05-14 13:49:17 UTC

Ju, Matt, Please have a look at the current implementation and provide feedback on this

Comment 2 Ju Lim 2016-05-20 18:08:01 UTC

I've just reviewed and have made recommendations on what to do in https://docs.google.com/a/redhat.com/presentation/d/1-3HWjwCcpeeH9Tq2ip9GggQo0I4koWZtfNcmAfogNA8/edit?usp=sharing (Slide 23).

Comment 3 Nishanth Thomas 2016-06-23 14:00:59 UTC

Per discussion with JeffA, this is a nice to have and doesn't have any impact on the functionality. So moving to 3.0

Comment 4 Martin Bukatovic 2016-07-26 13:46:10 UTC

When a task is killed by console itself because of a timeout, a general
message is displayed (task FOOBAR has been stopped). But since the timeout
is enforced for each action of the task, it would make sense to report the
name of the action which actually timeouted in the error message as well
to make it more clear what went wrong.

Suggested based on the following discussion:

[7/26/16 12:48] <nishanth> mbukatov, I had a look at -->http://up1-qa.itos.redhat.com/#7qZZ8vW-wnyFVp494cTHLA
[7/26/16 12:49] <nishanth> mbukatov, its hitting the timeout
[7/26/16 12:49] <nishanth> mbukatov, right now the timeout is set as 10 mins
[7/26/16 12:50] <nishanth> as you can see in the task, the it waited for 10 mins from the last update time and timed out
[7/26/16 12:50] <nishanth> I feel 10 mins is bit on the lower, I will increase that a bit probably 30 mins is optimal I guess
[7/26/16 12:50] <nishanth> mbukatov ^^^
[7/26/16 12:51] <mbukatov> nishanth: and is this timeout counted for each action of a task or is it applied just for the task as a whole?
[7/26/16 12:52] <nishanth> each action, check for last updated time
[7/26/16 12:52] <mbukatov> nishanth: good, that was what I though
[7/26/16 12:55] <mbukatov> nishanth: and did you find out why the task detail page didn't reported the root cause (what timeouted)?
[7/26/16 12:56] <mbukatov> I mean, if the timeout is enforced for each action, would it make sense to report the name of action which timeouted?
[7/26/16 12:56] <nishanth> mbukatov, yeag That is an enhancement I guess probably post 2.0
[7/26/16 12:57] <mbukatov> nishanth: ok
[7/26/16 12:58] <mbukatov> nishanth: so I just add a comment about this case under BZ 1335631 so that we don't forget about this
[7/26/16 12:59] <nishanth> mbukatov, alright

Note You need to log in before you can comment on or make changes to this bug.