Hide Forgot
+++ This bug was initially created as a clone of Bug #723351 +++ Description of problem: It seems like when you bring up a vmware based deployment with more than two instances it takes quite some time and condor places the jobs in "hold" going into condor and releasing the jobs resolves the issue, however I'm wondering if there is some default time out for condor that can be adjusted for vmware. Recreate: 1. setup conductor for vmware 2. create a deployment w/ four or more instances 3. start the deployable 4. vmware will take 10-15 minutes to start root@hp-dl180g6-01 ~]# condor_q -- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 17.0 aeolus 7/19 16:05 0+00:41:39 R 0 0.0 job_1_frontend_21 18.0 aeolus 7/19 16:05 0+00:41:39 R 0 0.0 job_1_backend_22 19.0 aeolus 7/19 16:05 0+00:42:10 R 0 0.0 job_1_middle01_23 20.0 aeolus 7/19 16:05 0+00:42:40 R 0 0.0 job_1_middle02_24 21.0 aeolus 7/19 16:11 0+00:36:06 R 0 0.0 job_2_frontend_25 22.0 aeolus 7/19 16:13 0+00:00:00 H 0 0.0 job_vmware1_fronte 23.0 aeolus 7/19 16:13 0+00:00:00 H 0 0.0 job_vmware1_backen 24.0 aeolus 7/19 16:13 0+00:00:00 H 0 0.0 job_vmware1_middle 25.0 aeolus 7/19 16:13 0+00:00:00 H 0 0.0 job_vmware1_middle 26.0 aeolus 7/19 16:17 0+00:30:13 R 0 0.0 job_userquota01_fr 27.0 aeolus 7/19 16:17 0+00:30:13 R 0 0.0 job_userquota01_ba 28.0 aeolus 7/19 16:17 0+00:30:13 R 0 0.0 job_userquota01_mi 29.0 aeolus 7/19 16:17 0+00:29:43 R 0 0.0 job_userquota01_mi 30.0 aeolus 7/19 16:21 0+00:26:40 R 0 0.0 job_userquota02_fr 31.0 aeolus 7/19 16:21 0+00:26:39 R 0 0.0 job_userquota02_ba 32.0 aeolus 7/19 16:34 0+00:13:03 R 0 0.0 job_userquota03_fr 33.0 aeolus 7/19 16:34 0+00:13:18 R 0 0.0 job_userquota03_ba 34.0 aeolus 7/19 16:34 0+00:13:00 R 0 0.0 job_userquota03_mi 35.0 aeolus 7/19 16:34 0+00:13:03 R 0 0.0 job_userquota03_mi 36.0 aeolus 7/19 16:36 0+00:12:18 R 0 0.0 job_userquota04_fr 37.0 aeolus 7/19 16:36 0+00:12:02 R 0 0.0 job_userquota04_ba 38.0 aeolus 7/19 16:36 0+00:11:48 R 0 0.0 job_userquota04_mi 39.0 aeolus 7/19 16:36 0+00:11:48 R 0 0.0 job_userquota04_mi 23 jobs; 0 idle, 19 running, 4 held [root@hp-dl180g6-01 ~]# condor_release 22.0 23.0 24.0 25.0 Job 22.0 released Job 23.0 released Job 24.0 released Job 25.0 released [root@hp-dl180g6-01 ~]# condor_q -- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 17.0 aeolus 7/19 16:05 0+00:42:05 R 0 0.0 job_1_frontend_21 18.0 aeolus 7/19 16:05 0+00:42:05 R 0 0.0 job_1_backend_22 19.0 aeolus 7/19 16:05 0+00:42:36 R 0 0.0 job_1_middle01_23 20.0 aeolus 7/19 16:05 0+00:43:06 R 0 0.0 job_1_middle02_24 21.0 aeolus 7/19 16:11 0+00:36:32 R 0 0.0 job_2_frontend_25 22.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_fronte 23.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_backen 24.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 25.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 26.0 aeolus 7/19 16:17 0+00:30:39 R 0 0.0 job_userquota01_fr 27.0 aeolus 7/19 16:17 0+00:30:39 R 0 0.0 job_userquota01_ba 28.0 aeolus 7/19 16:17 0+00:30:39 R 0 0.0 job_userquota01_mi 29.0 aeolus 7/19 16:17 0+00:30:09 R 0 0.0 job_userquota01_mi 30.0 aeolus 7/19 16:21 0+00:27:06 R 0 0.0 job_userquota02_fr 31.0 aeolus 7/19 16:21 0+00:27:05 R 0 0.0 job_userquota02_ba 32.0 aeolus 7/19 16:34 0+00:13:29 R 0 0.0 job_userquota03_fr 33.0 aeolus 7/19 16:34 0+00:13:44 R 0 0.0 job_userquota03_ba 34.0 aeolus 7/19 16:34 0+00:13:26 R 0 0.0 job_userquota03_mi 35.0 aeolus 7/19 16:34 0+00:13:29 R 0 0.0 job_userquota03_mi 36.0 aeolus 7/19 16:36 0+00:12:44 R 0 0.0 job_userquota04_fr 37.0 aeolus 7/19 16:36 0+00:12:28 R 0 0.0 job_userquota04_ba 38.0 aeolus 7/19 16:36 0+00:12:14 R 0 0.0 job_userquota04_mi 39.0 aeolus 7/19 16:36 0+00:12:14 R 0 0.0 job_userquota04_mi 23 jobs; 4 idle, 19 running, 0 held [root@hp-dl180g6-01 ~]# condor_q -- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 17.0 aeolus 7/19 16:05 0+00:42:21 R 0 0.0 job_1_frontend_21 18.0 aeolus 7/19 16:05 0+00:42:21 R 0 0.0 job_1_backend_22 19.0 aeolus 7/19 16:05 0+00:42:52 R 0 0.0 job_1_middle01_23 20.0 aeolus 7/19 16:05 0+00:43:22 R 0 0.0 job_1_middle02_24 21.0 aeolus 7/19 16:11 0+00:36:48 R 0 0.0 job_2_frontend_25 22.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_fronte 23.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_backen 24.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 25.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 26.0 aeolus 7/19 16:17 0+00:30:55 R 0 0.0 job_userquota01_fr 27.0 aeolus 7/19 16:17 0+00:30:55 R 0 0.0 job_userquota01_ba 28.0 aeolus 7/19 16:17 0+00:30:55 R 0 0.0 job_userquota01_mi 29.0 aeolus 7/19 16:17 0+00:30:25 R 0 0.0 job_userquota01_mi 30.0 aeolus 7/19 16:21 0+00:27:22 R 0 0.0 job_userquota02_fr 31.0 aeolus 7/19 16:21 0+00:27:21 R 0 0.0 job_userquota02_ba 32.0 aeolus 7/19 16:34 0+00:13:45 R 0 0.0 job_userquota03_fr 33.0 aeolus 7/19 16:34 0+00:14:00 R 0 0.0 job_userquota03_ba 34.0 aeolus 7/19 16:34 0+00:13:42 R 0 0.0 job_userquota03_mi 35.0 aeolus 7/19 16:34 0+00:13:45 R 0 0.0 job_userquota03_mi 36.0 aeolus 7/19 16:36 0+00:13:00 R 0 0.0 job_userquota04_fr 37.0 aeolus 7/19 16:36 0+00:12:44 R 0 0.0 job_userquota04_ba 38.0 aeolus 7/19 16:36 0+00:12:30 R 0 0.0 job_userquota04_mi 39.0 aeolus 7/19 16:36 0+00:12:30 R 0 0.0 job_userquota04_mi 23 jobs; 4 idle, 19 running, 0 held [root@hp-dl180g6-01 ~]# condor_q -- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 17.0 aeolus 7/19 16:05 0+00:42:36 R 0 0.0 job_1_frontend_21 18.0 aeolus 7/19 16:05 0+00:42:36 R 0 0.0 job_1_backend_22 19.0 aeolus 7/19 16:05 0+00:43:07 R 0 0.0 job_1_middle01_23 20.0 aeolus 7/19 16:05 0+00:43:37 R 0 0.0 job_1_middle02_24 21.0 aeolus 7/19 16:11 0+00:37:03 R 0 0.0 job_2_frontend_25 22.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_fronte 23.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_backen 24.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 25.0 aeolus 7/19 16:13 0+00:00:00 I 0 0.0 job_vmware1_middle 26.0 aeolus 7/19 16:17 0+00:31:10 R 0 0.0 job_userquota01_fr 27.0 aeolus 7/19 16:17 0+00:31:10 R 0 0.0 job_userquota01_ba 28.0 aeolus 7/19 16:17 0+00:31:10 R 0 0.0 job_userquota01_mi 29.0 aeolus 7/19 16:17 0+00:30:40 R 0 0.0 job_userquota01_mi 30.0 aeolus 7/19 16:21 0+00:27:37 R 0 0.0 job_userquota02_fr 31.0 aeolus 7/19 16:21 0+00:27:36 R 0 0.0 job_userquota02_ba 32.0 aeolus 7/19 16:34 0+00:14:00 R 0 0.0 job_userquota03_fr 33.0 aeolus 7/19 16:34 0+00:14:15 R 0 0.0 job_userquota03_ba 34.0 aeolus 7/19 16:34 0+00:13:57 R 0 0.0 job_userquota03_mi 35.0 aeolus 7/19 16:34 0+00:14:00 R 0 0.0 job_userquota03_mi 36.0 aeolus 7/19 16:36 0+00:13:15 R 0 0.0 job_userquota04_fr 37.0 aeolus 7/19 16:36 0+00:12:59 R 0 0.0 job_userquota04_ba 38.0 aeolus 7/19 16:36 0+00:12:45 R 0 0.0 job_userquota04_mi 39.0 aeolus 7/19 16:36 0+00:12:45 R 0 0.0 job_userquota04_mi 23 jobs; 4 idle, 19 running, 0 held [root@hp-dl180g6-01 ~]# condor_q -- Submitter: hp-dl180g6-01.rhts.eng.bos.redhat.com : <10.16.65.63:41877> : hp-dl180g6-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 17.0 aeolus 7/19 16:05 0+00:42:42 R 0 0.0 job_1_frontend_21 18.0 aeolus 7/19 16:05 0+00:42:42 R 0 0.0 job_1_backend_22 19.0 aeolus 7/19 16:05 0+00:43:13 R 0 0.0 job_1_middle01_23 20.0 aeolus 7/19 16:05 0+00:43:43 R 0 0.0 job_1_middle02_24 21.0 aeolus 7/19 16:11 0+00:37:09 R 0 0.0 job_2_frontend_25 22.0 aeolus 7/19 16:13 0+00:00:05 R 0 0.0 job_vmware1_fronte 23.0 aeolus 7/19 16:13 0+00:00:05 R 0 0.0 job_vmware1_backen 24.0 aeolus 7/19 16:13 0+00:00:04 R 0 0.0 job_vmware1_middle 25.0 aeolus 7/19 16:13 0+00:00:05 R 0 0.0 job_vmware1_middle 26.0 aeolus 7/19 16:17 0+00:31:16 R 0 0.0 job_userquota01_fr 27.0 aeolus 7/19 16:17 0+00:31:16 R 0 0.0 job_userquota01_ba 28.0 aeolus 7/19 16:17 0+00:31:16 R 0 0.0 job_userquota01_mi 29.0 aeolus 7/19 16:17 0+00:30:46 R 0 0.0 job_userquota01_mi 30.0 aeolus 7/19 16:21 0+00:27:43 R 0 0.0 job_userquota02_fr 31.0 aeolus 7/19 16:21 0+00:27:42 R 0 0.0 job_userquota02_ba 32.0 aeolus 7/19 16:34 0+00:14:06 R 0 0.0 job_userquota03_fr 33.0 aeolus 7/19 16:34 0+00:14:21 R 0 0.0 job_userquota03_ba 34.0 aeolus 7/19 16:34 0+00:14:03 R 0 0.0 job_userquota03_mi 35.0 aeolus 7/19 16:34 0+00:14:06 R 0 0.0 job_userquota03_mi 36.0 aeolus 7/19 16:36 0+00:13:21 R 0 0.0 job_userquota04_fr 37.0 aeolus 7/19 16:36 0+00:13:05 R 0 0.0 job_userquota04_ba 38.0 aeolus 7/19 16:36 0+00:12:51 R 0 0.0 job_userquota04_mi 39.0 aeolus 7/19 16:36 0+00:12:51 R 0 0.0 job_userquota04_mi 23 jobs; 0 idle, 23 running, 0 held --- Additional comment from matt on 2011-07-20 15:50:31 EDT --- FYI - condor_q 22.0 -l | grep LastHoldReason -> "Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)" The jobs were running. Should investigate why the transfer failed. Possibly a timing issue? --- Additional comment from clalance on 2011-07-20 16:11:38 EDT --- Yeah, those sorts of errors usually are some sort of timeout, or a bug in deltacloud itself. At the very least, deltacloudd should always be returning an error code (and not no headers, no data).
Ugh, this is *not* good to blame on NFS. Different NFS implementations have extremely different performance with VMware. Using enterprise NAS gear, such as NetApp and similar, has _very good_ performance. In comparison, using Linux based NFS servers doesn't. "NFS" itself isn't to blame here.
BZ 723894 - VMware deployments to low spec NFS datastores error out Low spec NFS datastores are not recommend due to poor performance
removing from tracker
release pending...
closing out old bugs
perm close