Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 915210

Summary: Suspend and continue of parallel universe job
Product: Red Hat Enterprise MRG Reporter: Daniel Horák <dahorak>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 2.2CC: matt, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 19:51:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Parallel universe job. none

Description Daniel Horák 2013-02-25 08:51:08 UTC
Created attachment 702243 [details]
Parallel universe job.

Description of problem:
  Suspending job doesn't work with parallel universe Job.

Version-Release number of selected component (if applicable):
  MRG 2.3: condor-7.8.8-0.4.1
  MRG 2.2: condor-7.6.5-0.22

How reproducible:
  100%

Steps to Reproduce:
1. Configure condor for run parallel universe job:
  DedicatedScheduler="DedicatedScheduler@HOSTNAME"
  STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler

2. Prepare parallel universe job (job.py is attached):
  cat /tmp/parallel.job 
  universe = parallel
  executable = /tmp/job.py
  args = 120
  log = /tmp/parallel_job.$(cluster).$(process).log
  output = /tmp/parallel_job.$(cluster).$(process)-$(NODE).out
  error = /tmp/parallel_job.$(cluster).$(process)-$(NODE).err
  machine_count = 1
  should_transfer_files = yes
  when_to_transfer_output = on_exit
  requirements = (FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)
  +ParallelShutdownPolicy = "WAIT_FOR_ALL"
  queue

3. Submit job and wait for start, then try to submit them:
  $ condor_submit /tmp/parallel.job 
    Submitting job(s).
    1 job(s) submitted to cluster 1.

  $ condor_q 
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:00 I  0   0.0  job2.py           

    1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

  $ condor_q 
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:15 R  0   0.0  job2.py           

    1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

  $ condor_suspend 1.0
    Job 1.0 suspended

  $ condor_q
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:21 I  0   4.2  job2.py           

    1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

  
Actual results:
  Job is not suspended but moved to Idle.

Expected results:
  Job is correctly suspended.

Additional info:
  Suspending for Vanilla and Java universe jobs works well.

Comment 1 Anne-Louise Tangring 2016-05-26 19:51:31 UTC
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.