Bug 723894

Summary: Documentation: vmware deployments to NFS datastore can error out
Product: [Retired] CloudForms Cloud Engine Reporter: wes hayutin <whayutin>
Component: DocumentationAssignee: Justin Clift <jclift>
Status: CLOSED CURRENTRELEASE QA Contact: wes hayutin <whayutin>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.3.1CC: akarol, clalance, dajohnso, deltacloud-maint, kwade, morazi, ssachdev
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 723351 Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description wes hayutin 2011-07-21 13:37:21 UTC
+++ This bug was initially created as a clone of Bug #723351 +++

Description of problem:

It seems like when you bring up a vmware based deployment with more than two instances it takes quite some time and condor places the jobs in "hold"

going into condor and releasing the jobs resolves the issue, however I'm wondering if there is some default time out for condor that can be adjusted for vmware.


Recreate:
1. setup conductor for vmware
2. create a deployment w/ four or more instances
3. start the deployable
4. vmware will take 10-15 minutes to start



root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:41:39 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:41:39 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:10 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:42:40 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:06 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:29:43 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:26:40 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:26:39 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:03 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:13:18 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:00 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:03 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:12:18 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:02 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:11:48 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:11:48 R  0   0.0  job_userquota04_mi

23 jobs; 0 idle, 19 running, 4 held
[root@hp-dl180g6-01 ~]# condor_release 22.0 23.0 24.0 25.0
Job 22.0 released
Job 23.0 released
Job 24.0 released
Job 25.0 released
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:05 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:05 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:06 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:32 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:09 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:06 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:05 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:29 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:13:44 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:26 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:29 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:12:44 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:28 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:14 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:14 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:21 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:21 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:52 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:22 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:48 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:25 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:22 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:21 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:45 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:42 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:45 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:00 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:44 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:30 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:30 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:43:07 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:37 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:37:03 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:40 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:37 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:36 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:15 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:57 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:15 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:59 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:45 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:45 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:42 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:42 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:43:13 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:43 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:37:09 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:04 R  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:46 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:43 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:42 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:14:06 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:21 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:14:03 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:14:06 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:21 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:13:05 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:51 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:51 R  0   0.0  job_userquota04_mi

23 jobs; 0 idle, 23 running, 0 held

--- Additional comment from matt on 2011-07-20 15:50:31 EDT ---

FYI - condor_q 22.0 -l | grep LastHoldReason -> "Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)"

The jobs were running.

Should investigate why the transfer failed. Possibly a timing issue?

--- Additional comment from clalance on 2011-07-20 16:11:38 EDT ---

Yeah, those sorts of errors usually are some sort of timeout, or a bug in deltacloud itself.  At the very least, deltacloudd should always be returning an error code (and not no headers, no data).

Comment 1 Justin Clift 2011-07-29 11:06:02 UTC
Ugh, this is *not* good to blame on NFS.

Different NFS implementations have extremely different performance with VMware.

Using enterprise NAS gear, such as NetApp and similar, has _very good_ performance.  In comparison, using Linux based NFS servers doesn't.  "NFS" itself isn't to blame here.

Comment 2 wes hayutin 2011-08-01 19:12:23 UTC
BZ 723894 - VMware deployments to low spec NFS datastores error out
Low spec NFS datastores are not recommend due to poor performance

Comment 3 wes hayutin 2011-08-01 19:48:45 UTC
removing from tracker

Comment 4 wes hayutin 2011-08-01 19:58:02 UTC
release pending...

Comment 5 wes hayutin 2011-08-01 19:59:07 UTC
release pending...

Comment 7 wes hayutin 2011-12-08 13:57:53 UTC
closing out old bugs

Comment 8 wes hayutin 2011-12-08 14:10:44 UTC
perm close