Created attachment 1156749 [details] screenshot of Task Details page Description of problem ====================== The current implementation of *Task Details* page doesn't follow proposed design. Instead of *Execution steps*, *Events* and *Logs* list, there are two list with *Execution steps* with completely different semantics compared to the original design. The most important difference (from QE perspective) is that the list of Events (with items clearly classified into Info/Warning/Error categories) is missing. This feature is important for UI to be able to clearly communicate overall status when a tasks finishes with success, but with few minor or major problems. The other problem is that the idea behind separation of *Execution Steps* into 2 groups *Status* and *Sub Tasks* is not clear from the page itself. For example, while it makes sense to mark a *Cluster Create* task as "success" when a cluster is created even though there is a OSD node which failed to join it properly (because the cluster has been created after all and it doesn't make sense to block the whole eg. 20 node cluster because of failure on a single node), it's important to communicate this fact to the user in a clear way. And this is not possible right now with the current implementation. Full example of such scenario is given in "Steps to Reproduce" section. Version-Release =============== rhscon-ui-0.0.29-1.el7scon.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Prepare some machines for a new cluster. 2. Start *Create Cluster* wizard to create a new cluster, but break one future OSD host on purpose[1] (so that the installation or configuration of this host is going to fail). 3. See information on the *Task Details* page of this *Create Cluster* task. [1] eg. kill the machine (virsh destroy) right after the packages has been installed while the task is still running (so that ceph-provider would not be able to configure OSDs there later) Actual results ============== The *Create Cluster* task finishes with success and there is a green ok icon next in the top of the Task Details page (which is ok), but the fact that one host was lost during the task ans so that the OSD setup failed there is not easy to spot. The first list of so called *Execution Steps* (labeled as *Status*) shows: (see also attached screenshot) ~~~ 1 Started the task for cluster creation: e64a6d1d-9edc-4a55-891e-52a31e73b02e May 12 2016, 06:32:05 PM 2 Installing packages May 12 2016, 06:32:05 PM 3 Installed packages on: dhcp-126-79.lab.eng.brq.redhat.com May 12 2016, 06:39:18 PM 4 Installed packages on: dhcp-126-83.lab.eng.brq.redhat.com May 12 2016, 06:43:29 PM 5 Installed packages on: dhcp-126-85.lab.eng.brq.redhat.com May 12 2016, 06:43:29 PM 6 Installed packages on: dhcp-126-82.lab.eng.brq.redhat.com May 12 2016, 06:46:19 PM 7 Installed packages on: dhcp-126-84.lab.eng.brq.redhat.com May 12 2016, 06:49:40 PM 8 Installing packages done. Starting Cluster Creation. May 12 2016, 06:49:40 PM 9 Started provider task: 7311af53-e8e5-441f-8552-711700bcaf39 May 12 2016, 06:50:01 PM 10 Updating the monitoring configuration May 12 2016, 06:56:14 PM 11 Starting disk sync May 12 2016, 06:56:14 PM 12 Initializing Monitoring Schedules May 12 2016, 06:57:20 PM 13 Success May 12 2016, 06:57:20 PM ~~~ The 2nd list of *Execution Steps* (labeled as *Sub Tasks*) shows: ~~~ 1 Started ceph provider task for cluster creation: 7311af53-e8e5-441f-8552-711700bcaf39 May 12 2016, 06:50:01 PM 2 Persisting cluster details May 12 2016, 06:50:01 PM 3 Configuring the mons May 12 2016, 06:50:01 PM 4 Added mon node: dhcp-126-79.lab.eng.brq.redhat.com May 12 2016, 06:50:41 PM 5 Persisting mons May 12 2016, 06:50:41 PM 6 Configuring the OSDs May 12 2016, 06:50:42 PM 7 Added (dhcp-126-83.lab.eng.brq.redhat.com /dev/vdc) May 12 2016, 06:52:12 PM 8 Added (dhcp-126-84.lab.eng.brq.redhat.com /dev/vdc) May 12 2016, 06:53:23 PM 9 Added (dhcp-126-85.lab.eng.brq.redhat.com /dev/vdc) May 12 2016, 06:54:33 PM 10 Syncing the OSD status May 12 2016, 06:54:33 PM 11 OSD addition failed for [dhcp-126-82.lab.eng.brq.redhat.com:map[/dev/vdc:/dev/vdb]] May 12 2016, 06:56:03 PM 12 Updating the status of the cluster May 12 2016, 06:56:03 PM 13 Removing default created pool "rbd" May 12 2016, 06:56:04 PM 14 Could not delete the default create pool "rbd" May 12 2016, 06:56:14 PM 15 Creating default EC profiles May 12 2016, 06:56:14 PM 16 Success May 12 2016, 06:56:14 PM ~~~ As you can see, the fail is actually listed among the other entries in the second list. On the other hand, this is easy to miss and one could be tricked into thinking that everything went fine without any error - which is not the case. Especially when last entries in both steps lists states "Success" and there is a green ok icon on the top. Expected results ================ Task Details page provides a quick way to check the status of the task, which includes both the final overall status and the number and severity of failures which may happened. Additional info =============== And while there is no event or message reported on the top sidebar to alert the user about a fail which happened either (as can be seen on the screenshot), the fact that the host has been lost was processed and it's possible to see it in the *Events* page with a red error icon: ~~~ major May 12, 2016 7:02:28 PM Host: dhcp-126-82.lab.eng.brq.redhat.com lost contact ~~~ But had the issue on the affected host been less brutal so that it wouldn't have triggered a general event (like the "host lost contact" in this case), the failure from the Create Cluster task would not be easy to spot anywhere now (but on the list of steps of task details page).
Ju, Matt, Please have a look at the current implementation and provide feedback on this
I've just reviewed and have made recommendations on what to do in https://docs.google.com/a/redhat.com/presentation/d/1-3HWjwCcpeeH9Tq2ip9GggQo0I4koWZtfNcmAfogNA8/edit?usp=sharing (Slide 23).
Per discussion with JeffA, this is a nice to have and doesn't have any impact on the functionality. So moving to 3.0
When a task is killed by console itself because of a timeout, a general message is displayed (task FOOBAR has been stopped). But since the timeout is enforced for each action of the task, it would make sense to report the name of the action which actually timeouted in the error message as well to make it more clear what went wrong. Suggested based on the following discussion: [7/26/16 12:48] <nishanth> mbukatov, I had a look at -->http://up1-qa.itos.redhat.com/#7qZZ8vW-wnyFVp494cTHLA [7/26/16 12:49] <nishanth> mbukatov, its hitting the timeout [7/26/16 12:49] <nishanth> mbukatov, right now the timeout is set as 10 mins [7/26/16 12:50] <nishanth> as you can see in the task, the it waited for 10 mins from the last update time and timed out [7/26/16 12:50] <nishanth> I feel 10 mins is bit on the lower, I will increase that a bit probably 30 mins is optimal I guess [7/26/16 12:50] <nishanth> mbukatov ^^^ [7/26/16 12:51] <mbukatov> nishanth: and is this timeout counted for each action of a task or is it applied just for the task as a whole? [7/26/16 12:52] <nishanth> each action, check for last updated time [7/26/16 12:52] <mbukatov> nishanth: good, that was what I though [7/26/16 12:55] <mbukatov> nishanth: and did you find out why the task detail page didn't reported the root cause (what timeouted)? [7/26/16 12:56] <mbukatov> I mean, if the timeout is enforced for each action, would it make sense to report the name of action which timeouted? [7/26/16 12:56] <nishanth> mbukatov, yeag That is an enhancement I guess probably post 2.0 [7/26/16 12:57] <mbukatov> nishanth: ok [7/26/16 12:58] <mbukatov> nishanth: so I just add a comment about this case under BZ 1335631 so that we don't forget about this [7/26/16 12:59] <nishanth> mbukatov, alright