Bug 916612
| Summary: | Suspend and continue of VM universe job | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Daniel Horák <dahorak> | ||||
| Component: | condor | Assignee: | grid-maint-list <grid-maint-list> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 2.2 | CC: | matt, tstclair | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-05-26 20:01:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
It is valid also for MRG 2.2: condor-7.6.5-0.22. |
Created attachment 703899 [details] Condor logs. Description of problem: What is expected behaviour for suspending VM universe job? After suspend VM job, it looks like suspended (in condor_q) and then after a while go to held state. Version-Release number of selected component (if applicable): MRG 2.3: condor-7.8.8-0.4.1 How reproducible: 100% Steps to Reproduce: 1. Configure condor for run VM universe job (KVM in this case): VM_TYPE = kvm VM_MEMORY = 512 VM_MAX_MEMORY = 1024 VM_NETWORKING = TRUE VM_NETWORKING_TYPE = nat,bridge VM_NETWORKING_DEFAULT_TYPE = bridge VM_NETWORKING_BRIDGE_INTERFACE = br0 MAX_VM_GAHP_LOG = 1000000 VM_GAHP_DEBUG = D_FULLDEBUG 2. Prepare VM universe job: #cat /tmp/vmuniverse.job Universe=vm Log=/tmp/vm_universe.log.$(cluster) Executable=testvm VM_TYPE=kvm VM_MEMORY=512 VM_DISK=/var/lib/libvirt/images/vmuniverse-rhel6x.img:vda:w Queue 3. Submit job and wait for start, then try to submit them: $ condor_submit vmuniverse.job Submitting job(s). 1 job(s) submitted to cluster 1. $ condor_suspend 1.0 Job 1.0 suspended # condor_q -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 test 2/28 14:04 0+00:00:34 R 0 0.0 testvm 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended # condor_q -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 test 2/28 14:04 0+00:01:14 S 0 0.0 testvm 1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 1 suspended # condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@HOSTNAME LINUX X86_64 Claimed Suspende 1.120 1206 0+00:00:04 slot2@HOSTNAME LINUX X86_64 Unclaimed Idle 1.000 1206 0+00:00:54 slot3@HOSTNAME LINUX X86_64 Unclaimed Idle 1.000 1206 0+00:00:55 slot4@HOSTNAME LINUX X86_64 Unclaimed Idle 1.000 1206 0+00:00:56 Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 4 0 1 3 0 0 Total 4 0 1 3 0 0 # condor_q -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 test 2/28 14:04 0+00:01:50 H 0 0.0 testvm 1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 1 held, 0 suspended # condor_q -bet -- Submitter: HOSTNAME : <192.168.199.1:56377> : HOSTNAME --- 001.000: Request is held. Hold reason: Error from slot1@HOSTNAME: STARTER at IP failed to send file(s) to <IP:41383>: error reading from /var/lib/condor/execute/dir_16408/xen.mem.ckpt: (errno 13) Permission denied; SHADOW failed to receive file(s) from <IP:35593> Actual results: VM jobs is not suspended. Expected results: VM jobs is correctly suspended. Additional info: