Bug 1171815 - Cannot create Jenkins cartridge
Summary: Cannot create Jenkins cartridge
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Timothy Williams
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1126826
Blocks: 1131011
TreeView+ depends on / blocked
 
Reported: 2014-12-08 16:32 UTC by Miheer Salunke
Modified: 2019-08-15 04:08 UTC (History)
19 users (show)

Fixed In Version: openshift-origin-broker-util-1.36.2.2-1.el6op
Doc Type: Bug Fix
Doc Text:
It is possible that when a Jenkins application fails to create and is rolled back, its domain environment variables still exist. These map to a non-existing gear component, and any new Jenkins applications cannot be created since the environment variables already exist. This bug fix updates the `oo-admin-repair` command to add the ability to clean up domains that have Jenkins environment variables with missing components. An administrator can use the `--orphaned-envs` switch with `oo-admin-repair` to clean environment variables from domains that do not have a related component. An administrator can also use `--domain <domain>` to specify a specific domain to repair.
Clone Of: 1126826
Environment:
Last Closed: 2015-09-30 16:36:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1844 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 2.2.7 security, bug fix and enhancement update 2015-09-30 20:35:28 UTC

Comment 5 Luke Meyer 2014-12-12 19:31:34 UTC
This particular bug is about domain env vars being left around in Mongo when a Jenkins creation failed to complete and was rolled back, which blocks the creation of another Jenkins app. That problem should be fixed upstream and in OSE 2.2 (though the last few comments on the upstream bug leave room for some doubt).

(In reply to Miheer Salunke from comment #2)
> I gave the customer 2 workarounds -
> 
> 1.
> Try
> oo-admin-ctl-domain -c env_del -e JENKINS_USER -l <username> -n <domain> ;
> oo-admin-ctl-domain -c env_del -e JENKINS_PASSWORD -l <username> -n <domain>
> 
> 
>        After trying option 1, the results were as follows -
> Still get same error when adding Jenkins:
> 
> Unexpected error: Cartridge attempted to override the following gear
> environment variables: JENKINS_USERNAME, JENKINS_PASSWORD

I might add JENKINS_URL to the list of vars to delete, although the error isn't complaining about it.

So, that should have worked; given that it didn't, I would be interested to see the domain record for the faulty domain after deleting with oo-admin-ctl-domain, similar to  https://bugzilla.redhat.com/show_bug.cgi?id=1126826#c12

> 2.
> Try deleting and recreating the domain using the rhc tools which should
> clear things up.
> 
>        Trying option 2 gives the following result and error:
> 
> Jenkins created successfully.  Please make note of these credentials:
> [...] 
>     Unable to complete the requested operation. Show less
> 
>     Shell command '/sbin/runuser -s /bin/sh 5486e2883431a02b0e000001 -c
> "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c591' /bin/sh
> -c \"/var/lib/openshift/5486e2883431a02b0e000001/jenkins-client/bin/install
> --version 1\""' returned an error. rc=1
>     Could not disable job 'appname-build' in Jenkins server: Continuing
> anyway.
>     Could not add job 'appname-build' in Jenkins server: create_job
> status: 000 You'll need to correct this error before attempting to embed the
> Jenkins client again. application is not scalable

This seems like an unrelated bug. The Jenkins app was actually successfully created given a clean domain. However it didn't handle a client correctly. Could you file that as a separate bug?

> The above customer has following jenkins installed-
> 
> openshift-origin-cartridge-jenkins-1.20.3.6-1.el6op.noarch  Wed Nov 12
> 16:20:10 2014
> openshift-origin-cartridge-jenkins-client-1.19.3.5-1.el6op.noarch Wed Nov 12
> 16:23:16 2014

These indicate the use of OSE 2.1 which does not have the fix for this particular bug. They ought to be able to work around it as you indicated but it will potentially recur whenever a Jenkins app creation fails partway through.

Comment 7 Timothy Williams 2015-02-03 14:27:47 UTC
We are also seeing odd behavior from `oo-admin-ctl-domain`. Unsure if this is related or should be another bug. Specifically, oo-admin-ctl-domain will not remove the jeknins environment variables:

$ sudo oo-admin-ctl-domain -c env_del -e JENKINS_PASSWORD -n exdomain -l exuser
Environment variable JENKINS_PASSWORD not found withing domain exdomain

$ sudo oo-admin-ctl-domain -c env_add -e JENKINS_PASSWORD -v secret -n exdomain -l exuser
Environment variable with name JENKINS_PASSWORD already exists in the domain. Please remove it first

We can check the database and see that the environment variables are still present. The only way we were able to remove them was manually through the rails console:

$ oo-exec-ruby irb
  > require '/var/www/openshift/broker/config/environment'
  > my_domain = Domain.find_by(:canonical_namespace => "exdomain")
  > my_domain.update_attributes(:env_vars => [])

Removing the env variables successfully works around the issue.

Comment 9 Ma xiaoqiang 2015-02-04 06:22:13 UTC
Check on puddle[2.2.4/2015-02-02.1]

1. create a jenkins app
2. stop the ruby193-mcollective service as soon as possible, if found env_vars is written in domain 
3. clear the pending op 
#oo-admin-clear-pending-ops 
0 applications were cleaned up. 0 users were cleaned up. 1 domains were cleaned up. 0 teams were cleaned up.

4. create jenkins app again
fail to create app: Unexpected error: Cartridge attempted to override the following gear environment variables:
JENKINS_USERNAME, JENKINS_PASSWORD

5. check the domain date in mongo
> db.domains.findOne()
"env_vars" : [
		{
			"key" : "JENKINS_URL",
			"value" : "https://jenkins-xiaom.ose22-manual.com.cn/",
			"component_id" : ObjectId("54d18db8e5fed51569000088"),
			"unique" : false
		},
		{
			"key" : "JENKINS_USERNAME",
			"value" : "system_builder",
			"component_id" : ObjectId("54d18db8e5fed51569000088"),
			"unique" : false
		},
		{
			"key" : "JENKINS_PASSWORD",
			"value" : "p79_rXfWBluw",
			"component_id" : ObjectId("54d18db8e5fed51569000088"),
			"unique" : false
		}
	],

The env_vars is not cleared.

Comment 10 Ma xiaoqiang 2015-02-05 01:59:21 UTC
(In reply to Ma xiaoqiang from comment #9)
> Check on puddle[2.2.4/2015-02-02.1]
> 
> 1. create a jenkins app
> 2. stop the ruby193-mcollective service as soon as possible, if found
> env_vars is written in domain 
> 3. clear the pending op 
> #oo-admin-clear-pending-ops 
> 0 applications were cleaned up. 0 users were cleaned up. 1 domains were
> cleaned up. 0 teams were cleaned up.
> 
> 4. create jenkins app again
> fail to create app: Unexpected error: Cartridge attempted to override the
> following gear environment variables:
> JENKINS_USERNAME, JENKINS_PASSWORD
> 
> 5. check the domain date in mongo
> > db.domains.findOne()
> "env_vars" : [
> 		{
> 			"key" : "JENKINS_URL",
> 			"value" : "https://jenkins-xiaom.ose22-manual.com.cn/",
> 			"component_id" : ObjectId("54d18db8e5fed51569000088"),
> 			"unique" : false
> 		},
> 		{
> 			"key" : "JENKINS_USERNAME",
> 			"value" : "system_builder",
> 			"component_id" : ObjectId("54d18db8e5fed51569000088"),
> 			"unique" : false
> 		},
> 		{
> 			"key" : "JENKINS_PASSWORD",
> 			"value" : "p79_rXfWBluw",
> 			"component_id" : ObjectId("54d18db8e5fed51569000088"),
> 			"unique" : false
> 		}
> 	],
> 
> The env_vars is not cleared.

run 'oo-admin-ctl-domain -c env_del -e JENKINS_USERNAME -n xiaom -l xiaom', then create jenkins app again, still fail to create it.

The jenkins envs in mongo are not deleted.

Comment 12 Andy Grimm 2015-04-10 17:48:58 UTC
In order to delete Jenkins env vars from the domain, I found that I had to patch the script to delete by component ID:

--- /usr/sbin/oo-admin-ctl-domain	2015-02-19 14:33:38.000000000 -0500
+++ oo-admin-ctl-domain.bz1171815	2015-04-10 13:47:09.962283575 -0400
@@ -136,7 +136,11 @@
     exit 1
   end
 
-  domain.remove_env_variables([{"key" => env["key"], "value" => env["value"]}])
+  if env['component_id'].nil?
+    domain.remove_env_variables([{"key" => env["key"], "value" => env["value"]}])
+  else
+    domain.remove_env_variables(env['component_id'])
+  end
   domain.reload
   domain.run_jobs
 when "create"

Comment 13 Andy Grimm 2015-04-10 17:51:33 UTC
I should mention that we may want some sort of command-line switch for the patch above, to avoid having admins delete several related variables at once when that was not their intent.

Comment 14 Michal Fojtik 2015-05-05 08:36:43 UTC
Andy, I made a PR with your patch and I added --remove_all_env option to confirm this action: https://github.com/openshift/origin-server/pull/6134

Comment 19 openshift-github-bot 2015-07-08 23:37:42 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/584a456d0135cbc0d43995b0407e6f1d16531a57
Add fixing orphaned domain environment variables

Bug 1171815
Bugzilla link https://bugzilla.redhat.com/show_bug.cgi?id=1171815
It is possible, in some cases, for a domain environment variable to remain part of the domain after the related component has been removed. oo-admin-repair should be able to detect and clear these orphaned environment variables

Comment 22 Ma xiaoqiang 2015-08-19 06:47:23 UTC
Check on openshift-origin-broker-util-1.35.2.4-1.el6op.noarch

# oo-admin-repair --orphaned-envs --domain xiaom
/usr/sbin/oo-admin-repair: unrecognized option `--orphaned-envs

Comment 25 Ma xiaoqiang 2015-08-20 02:07:12 UTC
Check on openshift-origin-broker-util-1.36.2-1.el6oso.noarch.rpm

After repairing the env. QE can create jenkins app again.

If dev add the package to OSE puddle, QE will move it to VERIFIED as soon as possible.

Comment 26 Johnny Liu 2015-08-20 02:28:32 UTC
Just as what is said in comment 25, according to our work flow, if want QE to verify this bug fix, please create OSE puddle for QE's verification. So I will move this bug into "MODIFIED" status.

BTW, the fix package in comment 24, "oso" is in its name, I guess it is target for origin, not OSE. I think need build a new "ose" rpm package, and include it in puddle.

Comment 31 Ma xiaoqiang 2015-09-22 02:52:40 UTC
Check on puddle [2.2.7/2015-09-21.1]

After orphaning envs in the domain, create jenkins app successfully.

Comment 33 errata-xmlrpc 2015-09-30 16:36:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1844.html


Note You need to log in before you can comment on or make changes to this bug.