Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 916612

Summary: Suspend and continue of VM universe job
Product: Red Hat Enterprise MRG Reporter: Daniel Horák <dahorak>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 2.2CC: matt, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 20:01:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Condor logs. none

Description Daniel Horák 2013-02-28 13:30:00 UTC
Created attachment 703899 [details]
Condor logs.

Description of problem:
  What is expected behaviour for suspending VM universe job?
  After suspend VM job, it looks like suspended (in condor_q) and then after a while go to held state.

Version-Release number of selected component (if applicable):
  MRG 2.3: condor-7.8.8-0.4.1

How reproducible:
  100%

Steps to Reproduce:
1. Configure condor for run VM universe job (KVM in this case):
  VM_TYPE = kvm
  VM_MEMORY = 512
  VM_MAX_MEMORY = 1024
  VM_NETWORKING = TRUE
  VM_NETWORKING_TYPE = nat,bridge
  VM_NETWORKING_DEFAULT_TYPE = bridge
  VM_NETWORKING_BRIDGE_INTERFACE = br0

  MAX_VM_GAHP_LOG   = 1000000
  VM_GAHP_DEBUG = D_FULLDEBUG

2. Prepare VM universe job:
  #cat /tmp/vmuniverse.job 
    Universe=vm
    Log=/tmp/vm_universe.log.$(cluster)
    Executable=testvm
    VM_TYPE=kvm
    VM_MEMORY=512
    VM_DISK=/var/lib/libvirt/images/vmuniverse-rhel6x.img:vda:w
    Queue

3. Submit job and wait for start, then try to submit them:
  $ condor_submit vmuniverse.job 
    Submitting job(s).
    1 job(s) submitted to cluster 1.
  
  $ condor_suspend 1.0
    Job 1.0 suspended
  
  # condor_q 
    -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/28 14:04   0+00:00:34 R  0   0.0  testvm            
  
    1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

  # condor_q 
    -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/28 14:04   0+00:01:14 S  0   0.0  testvm            

    1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 1 suspended

  # condor_status 
    Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
    slot1@HOSTNAME LINUX      X86_64 Claimed   Suspende 1.120  1206  0+00:00:04
    slot2@HOSTNAME LINUX      X86_64 Unclaimed Idle     1.000  1206  0+00:00:54
    slot3@HOSTNAME LINUX      X86_64 Unclaimed Idle     1.000  1206  0+00:00:55
    slot4@HOSTNAME LINUX      X86_64 Unclaimed Idle     1.000  1206  0+00:00:56
                         Machines Owner Claimed Unclaimed Matched Preempting
            X86_64/LINUX        4     0       1         3       0          0
                   Total        4     0       1         3       0          0

  # condor_q
    -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
       1.0   test            2/28 14:04   0+00:01:50 H  0   0.0  testvm            
    1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 1 held, 0 suspended

  # condor_q -bet
    -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME
    ---
    001.000:  Request is held.
  
    Hold reason: Error from slot1@HOSTNAME: STARTER at IP failed to send file(s) to <IP:41383>: error reading from /var/lib/condor/execute/dir_16408/xen.mem.ckpt: (errno 13) Permission denied; SHADOW failed to receive file(s) from <IP:35593>
  
  
Actual results:
  VM jobs is not suspended.

Expected results:
  VM jobs is correctly suspended.


Additional info:

Comment 1 Daniel Horák 2013-02-28 13:56:10 UTC
It is valid also for MRG 2.2: condor-7.6.5-0.22.