Bug 723894 - Documentation: vmware deployments to NFS datastore can error out
Summary: Documentation: vmware deployments to NFS datastore can error out
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: Documentation
Version: 0.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
Assignee: Justin Clift
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-21 13:37 UTC by wes hayutin
Modified: 2015-07-13 04:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 723351
Environment:
Last Closed:


Attachments (Terms of Use)

Description wes hayutin 2011-07-21 13:37:21 UTC
+++ This bug was initially created as a clone of Bug #723351 +++

Description of problem:

It seems like when you bring up a vmware based deployment with more than two instances it takes quite some time and condor places the jobs in "hold"

going into condor and releasing the jobs resolves the issue, however I'm wondering if there is some default time out for condor that can be adjusted for vmware.


Recreate:
1. setup conductor for vmware
2. create a deployment w/ four or more instances
3. start the deployable
4. vmware will take 10-15 minutes to start



root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:41:39 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:41:39 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:10 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:42:40 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:06 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 H  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:13 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:29:43 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:26:40 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:26:39 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:03 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:13:18 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:00 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:03 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:12:18 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:02 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:11:48 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:11:48 R  0   0.0  job_userquota04_mi

23 jobs; 0 idle, 19 running, 4 held
[root@hp-dl180g6-01 ~]# condor_release 22.0 23.0 24.0 25.0
Job 22.0 released
Job 23.0 released
Job 24.0 released
Job 25.0 released
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:05 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:05 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:06 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:32 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:39 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:09 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:06 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:05 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:29 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:13:44 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:26 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:29 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:12:44 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:28 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:14 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:14 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:21 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:21 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:42:52 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:22 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:36:48 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:30:55 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:25 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:22 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:21 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:13:45 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:42 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:13:45 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:00 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:44 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:30 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:30 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:36 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:43:07 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:37 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:37:03 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:00 I  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:31:10 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:40 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:37 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:36 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:15 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:13:57 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:14:00 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:15 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:12:59 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:45 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:45 R  0   0.0  job_userquota04_mi

23 jobs; 4 idle, 19 running, 0 held
[root@hp-dl180g6-01 ~]# condor_q


-- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  17.0   aeolus          7/19 16:05   0+00:42:42 R  0   0.0  job_1_frontend_21 
  18.0   aeolus          7/19 16:05   0+00:42:42 R  0   0.0  job_1_backend_22  
  19.0   aeolus          7/19 16:05   0+00:43:13 R  0   0.0  job_1_middle01_23 
  20.0   aeolus          7/19 16:05   0+00:43:43 R  0   0.0  job_1_middle02_24 
  21.0   aeolus          7/19 16:11   0+00:37:09 R  0   0.0  job_2_frontend_25 
  22.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_fronte
  23.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_backen
  24.0   aeolus          7/19 16:13   0+00:00:04 R  0   0.0  job_vmware1_middle
  25.0   aeolus          7/19 16:13   0+00:00:05 R  0   0.0  job_vmware1_middle
  26.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_fr
  27.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_ba
  28.0   aeolus          7/19 16:17   0+00:31:16 R  0   0.0  job_userquota01_mi
  29.0   aeolus          7/19 16:17   0+00:30:46 R  0   0.0  job_userquota01_mi
  30.0   aeolus          7/19 16:21   0+00:27:43 R  0   0.0  job_userquota02_fr
  31.0   aeolus          7/19 16:21   0+00:27:42 R  0   0.0  job_userquota02_ba
  32.0   aeolus          7/19 16:34   0+00:14:06 R  0   0.0  job_userquota03_fr
  33.0   aeolus          7/19 16:34   0+00:14:21 R  0   0.0  job_userquota03_ba
  34.0   aeolus          7/19 16:34   0+00:14:03 R  0   0.0  job_userquota03_mi
  35.0   aeolus          7/19 16:34   0+00:14:06 R  0   0.0  job_userquota03_mi
  36.0   aeolus          7/19 16:36   0+00:13:21 R  0   0.0  job_userquota04_fr
  37.0   aeolus          7/19 16:36   0+00:13:05 R  0   0.0  job_userquota04_ba
  38.0   aeolus          7/19 16:36   0+00:12:51 R  0   0.0  job_userquota04_mi
  39.0   aeolus          7/19 16:36   0+00:12:51 R  0   0.0  job_userquota04_mi

23 jobs; 0 idle, 23 running, 0 held

--- Additional comment from matt on 2011-07-20 15:50:31 EDT ---

FYI - condor_q 22.0 -l | grep LastHoldReason -> "Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)"

The jobs were running.

Should investigate why the transfer failed. Possibly a timing issue?

--- Additional comment from clalance on 2011-07-20 16:11:38 EDT ---

Yeah, those sorts of errors usually are some sort of timeout, or a bug in deltacloud itself.  At the very least, deltacloudd should always be returning an error code (and not no headers, no data).

Comment 1 Justin Clift 2011-07-29 11:06:02 UTC
Ugh, this is *not* good to blame on NFS.

Different NFS implementations have extremely different performance with VMware.

Using enterprise NAS gear, such as NetApp and similar, has _very good_ performance.  In comparison, using Linux based NFS servers doesn't.  "NFS" itself isn't to blame here.

Comment 2 wes hayutin 2011-08-01 19:12:23 UTC
BZ 723894 - VMware deployments to low spec NFS datastores error out
Low spec NFS datastores are not recommend due to poor performance

Comment 3 wes hayutin 2011-08-01 19:48:45 UTC
removing from tracker

Comment 4 wes hayutin 2011-08-01 19:58:02 UTC
release pending...

Comment 5 wes hayutin 2011-08-01 19:59:07 UTC
release pending...

Comment 7 wes hayutin 2011-12-08 13:57:53 UTC
closing out old bugs

Comment 8 wes hayutin 2011-12-08 14:10:44 UTC
perm close


Note You need to log in before you can comment on or make changes to this bug.