Bug 783139 - Remove job using aviary isn't handled properly
Summary: Remove job using aviary isn't handled properly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: cumin
Version: Development
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 2.3
: ---
Assignee: Trevor McKay
QA Contact: Peter Belanyi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-19 14:03 UTC by Stanislav Graf
Modified: 2014-11-18 02:24 UTC (History)
5 users (show)

Fixed In Version: cumin-0.1.5251-1
Doc Type: Bug Fix
Doc Text:
Cause Whenever Cumin renders a page where an object is found to be missing, the user is shown a page that says "We can't find the object you requested..." and is told to manually return to the site root and try again. Consequence This is less user friendly than it could be. Cumin should be able to make an intelligent choice for redirect based on the context of the error and inform the user of issues. Change Handling of deleted submissions has been improved and errors in the job pages redirect to job lists or submission lists as appropriate. Hooks have been added to allow future redirects from specific points in the UI as needed. In general, when an object is found to be missing the user will be redirected to the Cumin main page with a banner notice that informs them an object could not be found. Result Cumin has a structure for taking appropriate action in error cases and automatically redirecting users to pages that make sense instead of the catch-all "We can't find..." page.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:41:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 773680 0 unspecified CLOSED Released job doesn't start 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 789416 0 low CLOSED Handle better the case when cumin can't find the object user requested 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2013:0564 0 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC

Internal Links: 773680 789416

Description Stanislav Graf 2012-01-19 14:03:06 UTC
Description of problem:
When we submit job  JOB1, then remove it, and then submit another job with the unused name JOB1. Go to Grid::Submission::JOB1 we can see old job (with flag removed) and new one (probably running). Try to remove new one. After removing cumin should return to Grid::Submission. Instead it stays on the same page and shows new JOB1 as running. Except it is removed (verified by condor_q). When you try to manipulate with job again we get:

We can't find the object you requested
This often happens when a far-off agent stops or is disconnected.
  It may come back under a different database ID.  Try navigating anew
  from the site root.

And we can go through 'site root' link into cumin start page.

Version-Release number of selected component (if applicable):
condor-aviary-7.6.5-0.12.el5.i386
condor-qmf-7.6.5-0.12.el5.i386
cumin-0.1.5184-1.el5.noarch

How reproducible:
100%

Steps to Reproduce:
0. Enable aviary for cumin 
INFO Enabled Aviary interface for job submission and control.
INFO Enabled Aviary interface for query operations.
1. Cumin - submit JOB1
2. Cumin - remove JOB1 (using button on page Grid::Submission::JOB1)
3. Cumin - submit JOB1
4. Cumin - remove JOB1
5. Cumin - you are still on removed JOB1 page instead of Grid::Submission
6. Cumin - try to look around and you get 'We can't find the object...'
  
Actual results:
Cumin stays on the same page when removing second job with the same name

Expected results:
Cumin goes to Grid::Submission after removing

Additional info:

Comment 2 Trevor McKay 2012-02-01 21:15:02 UTC
Thinking out loud....

There is a filter currently in Cumin that does a redirect on a job removal if the number of items selected for removal matches the number of items in the list.  This may have predated the ability to show "removed" jobs in the list.  As it is, the logic is not adequate -- "removed" jobs on the page breaks it. looking into repairing it.

There is another possible failure case, however, which is that some action outside of a Cumin session makes the QMF object go away -- qpid-tool, condor_rm, aviary, another cumin session, etc.  Result is the same as the failed logic above -- server generates a 500 response, and "last update failed" is displayed.  Trying to follow a link produces the "We can't find..."

The server should catch the error on a missing object and cause the browser to redirect.  Alternatively, if Aviary is in use, it may be possible to render around the missing QMF object anyway.  If we know enough to identify the submission through Aviary, the fact that we can't look it up in QMF space shouldn't mater.  We should still be able to draw the table from the Aviary query (which is based on the history files)

Comment 3 Trevor McKay 2012-02-03 14:23:01 UTC
Fixed the logic in the filter so that Cumin returns to the submission list when the set of running jobs in the submission is targeted for removal.

Fixed in revision 5202.

Considering a redirect on missing QMF object found during an ajax update call as an enhancement.

Comment 4 Trevor McKay 2012-02-06 16:07:50 UTC
Note,

  QMF submission objects disappear when the last running job is removed and submission are being published by the schedd (QMF_PUBLISH_SUBMISSIONs = True in the condor config).

  When submissions objects are published by the jobserver (QMF_PUBLISH_SUBMISSIONS = False and a jobserver daemon runing), submission objects persist when there are no running jobs.

  Therefore, the behavior of returning to the submission list when the last running job is removed does not really make sense in all condor configurations.  

  Additionally, a user may drill down into the same submission again after the return but before the object actually disappears in the "schedd" case described here, leaving themselves set up for an error when the update ultimately fails.

  Continuing to investigate automatic redirect on failed update as a solution for the "schedd" case, with the "return to submission list on remove" behavior eliminated.

Comment 5 Trevor McKay 2012-02-08 17:54:03 UTC
Fixed in revision 5207.

The following changes should handle errors in the "schedd" case while eliminating unnecessary redirects in the "jobserver" case.

* Add a mechanism that  optionally allows a redirect on a failed ajax update.
Redirects at this point must be specified widget by widget.

* Redirect updates of the job summary page and the job details page
to the submission list when an update fails because the submission
object is missing.  

* Change redirect from "Remove job" link on the job details page
to the job list instead of the submission list.

* Remove redirect from job summary page when the last running job
in a submission is removed.

Expected behavior:

0) Set up for schedd publishing (default condor config)
1) Submit a job
2) Drill into the submission
3) Remove the job
4) Wait
5) When an update detects that the submission is gone, the page will be redirected to the submission list


0) Set up for jobserver publishing
1) Submit a job
2) Drill into the submission
3) Remove the job
4) Wait
5) There will be no redirect.  The status of the job should change as updates are processed.  Occasionally, jobs may disappear from this list as they are transitioning to "removed", but they should come back.  This is expected behavior and has to do with how Aviary scans job files according to pmackinn.

Comment 6 Trevor McKay 2012-02-09 20:01:35 UTC
Note, also added a "Notice: The submission being displayed became unavailable" message to the yellow banner on redirect.

Comment 7 Trevor McKay 2012-02-10 16:36:18 UTC
Revision 5211

Changed the redirect behavior when following a link finds a missing object.  In the following scenario, Cumin should go to index.html (ie, "site root") with a notice instead of going to the "We can't find the object you requested" display."

0) Set up for schedd publishing (default condor config)
1) Submit a job
2) Drill into the submission
3) Remove the job
4) Refresh the page, again, again, again ....
5) When the submission is deleted, the browser should redirect to the main page and the yellow banner should contain "Notice:  An object being displayed became unavailable".

Note, this sequence almost always works to observe the redirect to the main page.  It is possible that an "update" is processed before the refresh is sent, in which case the redirect to the Submission list will happen instead.

Comment 9 Trevor McKay 2012-03-06 16:05:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Whenever Cumin renders a page where an object is found to be missing, the user is shown a page that says "We can't find the object you requested..." and is told to manually return to the site root and try again.

Consequence
    This is less user friendly than it could be. Cumin should be able to make an intelligent choice for redirect based on the context of the error and inform the user of issues.

Change
    Handling of deleted submissions has been improved and errors in the job pages redirect to job lists or submission lists as appropriate.  Hooks have been added to allow future redirects from specific points in the UI as needed.  In general, when an object is found to be missing the user will be redirected to the Cumin main page with a banner notice that informs them an object could not be found.

Result
    Cumin has a structure for taking appropriate action in error cases and automatically redirecting users to pages that make sense instead of the catch-all "We can't find..." page.

Comment 13 Peter Belanyi 2013-01-24 15:30:48 UTC
Reproduced on cumin-0.1.5184-1

Tested on RHEL 5/6 i386/x86_64
cumin-0.1.5648-1
according to comment 0, comment 5 and comment 7

--> VERIFIED

Comment 15 errata-xmlrpc 2013-03-06 18:41:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.