Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 915210

Summary:

Suspend and continue of parallel universe job

Product:

Red Hat Enterprise MRG

Reporter:

Daniel Horák <dahorak>

Component:

condor

Assignee:

grid-maint-list <grid-maint-list>

Status:

CLOSED WONTFIX

QA Contact:

MRG Quality Engineering <mrgqe-bugs>

Severity:

high

Docs Contact:

Priority:

medium

Version:

2.2

CC:

matt, tstclair

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-05-26 19:51:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Parallel universe job.	none

Description Daniel Horák 2013-02-25 08:51:08 UTC

Created attachment 702243 [details]
Parallel universe job.

Description of problem:
  Suspending job doesn't work with parallel universe Job.

Version-Release number of selected component (if applicable):
  MRG 2.3: condor-7.8.8-0.4.1
  MRG 2.2: condor-7.6.5-0.22

How reproducible:
  100%

Steps to Reproduce:
1. Configure condor for run parallel universe job:
  DedicatedScheduler="DedicatedScheduler@HOSTNAME"
  STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler

2. Prepare parallel universe job (job.py is attached):
  cat /tmp/parallel.job 
  universe = parallel
  executable = /tmp/job.py
  args = 120
  log = /tmp/parallel_job.$(cluster).$(process).log
  output = /tmp/parallel_job.$(cluster).$(process)-$(NODE).out
  error = /tmp/parallel_job.$(cluster).$(process)-$(NODE).err
  machine_count = 1
  should_transfer_files = yes
  when_to_transfer_output = on_exit
  requirements = (FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)
  +ParallelShutdownPolicy = "WAIT_FOR_ALL"
  queue

3. Submit job and wait for start, then try to submit them:
  $ condor_submit /tmp/parallel.job 
    Submitting job(s).
    1 job(s) submitted to cluster 1.

  $ condor_q 
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:00 I  0   0.0  job2.py           

    1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

  $ condor_q 
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:15 R  0   0.0  job2.py           

    1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

  $ condor_suspend 1.0
    Job 1.0 suspended

  $ condor_q
    -- Submitter: HOSTNAME : <IP:47117> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/25 09:43   0+00:00:21 I  0   4.2  job2.py           

    1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

  
Actual results:
  Job is not suspended but moved to Idle.

Expected results:
  Job is correctly suspended.

Additional info:
  Suspending for Vanilla and Java universe jobs works well.

Comment 1 Anne-Louise Tangring 2016-05-26 19:51:31 UTC

MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.