Bug 1202511

Summary: oo-admin-repair does not fix stale env vars in domains with no apps
Product: OpenShift Container Platform Reporter: Brenton Leanhardt <bleanhar>
Component: NodeAssignee: Brenton Leanhardt <bleanhar>
Status: CLOSED ERRATA QA Contact: libra bugs <libra-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.2.0CC: abhgupta, adellape, agrimm, jhou, jokerman, libra-bugs, libra-onpremise-devel, mmccomas, pruan, qizhao, xiama
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-openshift-origin-controller-1.35.1.1-1.el6op Doc Type: Bug Fix
Doc Text:
When checking for stale SSH keys and environment variables to repair, previously the oo-admin-repair tool on brokers did not check in user domains where there were no existing applications. This bug fix updates this logic so that domains without existing applications are now also checked, and as a result all stale SSH keys and environment variables are repaired as expected.
Story Points: ---
Clone Of: 1183048 Environment:
Last Closed: 2015-04-06 17:06:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1183048    
Bug Blocks:    

Description Brenton Leanhardt 2015-03-16 18:58:15 UTC
+++ This bug was initially created as a clone of Bug #1183048 +++

Description of problem:

If a domain has stale jenkins environment variables, the suggested workaround is to run "oo-admin-repair --ssh-keys".  I found that oo-admin-repair iterates over existing gears to determine the set of domains to examine, so domains with no apps are omitted.

Version-Release number of selected component (if applicable):
openshift-origin-broker-util-1.32.1-1.el6oso.noarch

How reproducible:
uncertain

Steps to Reproduce:
I'm speculating a bit here...

1. Find a domain with stale env vars
2. ensure the domain contains at least one app (crete one if necessary)
3. run "oo-admin-repair -r --ssh-keys" to confirm that the stale env vars are detected
4. delete all apps in the domain
5. Re-run "oo-admin-repair -r --ssh-keys" and see that the stale env vars are no longer detected.

Actual results:

stale env vars in an "empty" domain cannot be repaired.

Expected results:

stale env vars should be detected in all domains, regardless of whether apps are present.

Additional info:

I worked around the customer issue related to this by hard-coding the affected domain id into stale_keys_vars_domain_ids in the oo-admin-repair script, and this properly cleaned up the domain.

--- Additional comment from Abhishek Gupta on 2015-02-02 17:45:43 EST ---

Proposed fix: https://github.com/openshift/origin-server/pull/6066

--- Additional comment from Zhao Qiang on 2015-02-11 03:41:52 EST ---

Verified on devenv-5428

Steps:
1,I create some apps. when I run rhc apps the result likes:
[root@local ~]# rhc apps
d1 @ http://d1-qizhaotest.dev.rhcloud.com/ (uuid: 54db519abf14976fb5000062)
---------------------------------------------------------------------------
  Domain:     qizhaotest
  Created:    8:56 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://54db519abf14976fb5000062.rhcloud.com/~/git/d1.git/
  SSH:        54db519abf14976fb5000062.rhcloud.com
  Deployment: auto (on git push)

  diy-0.1 (Do-It-Yourself 0.1)
  ----------------------------
    Gears: Located with jenkins-client-1

  jenkins-client-1 (Jenkins Client)
  ---------------------------------
    Gears:   Located with diy-0.1
    Job URL: https://jks-qizhaotest.dev.rhcloud.com/job/d1-build/

jks @ http://jks-qizhaotest.dev.rhcloud.com/ (uuid: 54db5112bf14976fb500002b)
-----------------------------------------------------------------------------
  Domain:     qizhaotest
  Created:    8:54 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://54db5112bf14976fb500002b.rhcloud.com/~/git/jks.git/
  SSH:        54db5112bf14976fb500002b.rhcloud.com
  Deployment: auto (on git push)

  jenkins-1 (Jenkins Server)
  --------------------------
    Gears: 1 small

php @ http://php-qizhaotest.dev.rhcloud.com/ (uuid: 54db5097bf14976fb5000001)
-----------------------------------------------------------------------------
  Domain:     qizhaotest
  Created:    8:52 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://54db5097bf14976fb5000001.rhcloud.com/~/git/php.git/
  SSH:        54db5097bf14976fb5000001.rhcloud.com
  Deployment: auto (on git push)

  php-5.3 (PHP 5.3)
  -----------------
    Gears: 1 small

You have access to 3 applications.

2, ssh to the instance as root user.
[root@ip-10-180-44-88 ~]# mongo openshift_broker_dev
MongoDB shell version: 2.4.12
connecting to: openshift_broker_dev
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user

libra_rs:PRIMARY> db.applications.findOne({name:"jks"})
{
	"_id" : ObjectId("54db5112bf14976fb500002b"),
	"analytics" : {
		"user_agent" : "rhc/1.34.2 (ruby 2.0.0; x86_64-linux) (API [1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7]) (2.5.3.3, ruby 2.0.0 (2013-11-22))"
	},
	"builder_id" : null,
	"canonical_name" : "jks",
	"component_instances" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500003d"),
			"cartridge_id" : ObjectId("54da9e66b494fef2dc000020"),
			"cartridge_name" : "jenkins-1",
			"cartridge_vendor" : "redhat",
			"component_name" : "jenkins-1",
			"component_properties" : {
				"username" : "system_builder",
				"password" : "T_9zsdsvjhab"
			},
			"created_at" : ISODate("2015-02-11T12:54:42.876Z"),
			"group_instance_id" : ObjectId("54db5112bf14976fb500002d")
		}
	],
	"config" : {
		"auto_deploy" : true,
		"deployment_branch" : "master",
		"keep_deployments" : 1,
		"deployment_type" : "git"
	},
	"created_at" : ISODate("2015-02-11T12:54:42.877Z"),
	"default_gear_size" : "small",
	"deployments" : [
		{
			"_id" : ObjectId("54db515abf14976fb5000046"),
			"activations" : [
				1423659298.050458
			],
			"artifact_url" : null,
			"created_at" : ISODate("2015-02-11T12:54:45.799Z"),
			"deployment_id" : "85acbac7",
			"force_clean_build" : false,
			"hot_deploy" : false,
			"ref" : "master",
			"sha1" : "a4727e6"
		}
	],
	"domain_id" : ObjectId("54db5080bf149765fc000008"),
	"domain_namespace" : "qizhaotest",
	"gears" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500002b"),
			"app_dns" : true,
			"group_instance_id" : ObjectId("54db5112bf14976fb500002d"),
			"host_singletons" : true,
			"name" : "jks",
			"quarantined" : false,
			"server_identity" : "ip-10-180-44-88",
			"sparse_carts" : [ ],
			"uid" : null,
			"uuid" : "54db5112bf14976fb500002b"
		}
	],
	"group_instances" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500002d"),
			"platform" : "linux",
			"addtl_fs_gb" : 0,
			"gear_size" : "small"
		}
	],
	"group_overrides" : [ ],
	"ha" : false,
	"init_git_url" : null,
	"members" : [
		{
			"t" : null,
			"n" : "qizhao",
			"r" : "admin",
			"f" : [
				[
					"domain",
					"admin"
				]
			],
			"e" : null,
			"_id" : ObjectId("54db506ebf149765fc000001")
		}
	],
	"name" : "jks",
	"owner_id" : ObjectId("54db506ebf149765fc000001"),
	"pending_op_groups" : [ ],
	"scalable" : false,
	"secret_token" : "YSFHTa_wHVmzm0f6BW5RAOVtMyVpbMt6IdYMDBLHS3nXNJtGSSurWJR93qYKyRMiuY75dwOzZRnAhBXwG4HxJx-qya7ZmOAKMup-7RLzXRVY_KVIwGG0MON8W7wT_6rX",
	"updated_at" : ISODate("2015-02-11T12:55:54.998Z"),
	"uuid" : "25ad8748b1ed11e482330242ac110002"
}
libra_rs:PRIMARY> 
libra_rs:PRIMARY> db.applications.findOne({name:"jks"})
{
	"_id" : ObjectId("54db5112bf14976fb500002b"),
	"analytics" : {
		"user_agent" : "rhc/1.34.2 (ruby 2.0.0; x86_64-linux) (API [1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7]) (2.5.3.3, ruby 2.0.0 (2013-11-22))"
	},
	"builder_id" : null,
	"canonical_name" : "jks",
	"component_instances" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500003d"),
			"cartridge_id" : ObjectId("54da9e66b494fef2dc000020"),
			"cartridge_name" : "jenkins-1",
			"cartridge_vendor" : "redhat",
			"component_name" : "jenkins-1",
			"component_properties" : {
				"username" : "system_builder",
				"password" : "T_9zsdsvjhab"
			},
			"created_at" : ISODate("2015-02-11T12:54:42.876Z"),
			"group_instance_id" : ObjectId("54db5112bf14976fb500002d")
		}
	],
	"config" : {
		"auto_deploy" : true,
		"deployment_branch" : "master",
		"keep_deployments" : 1,
		"deployment_type" : "git"
	},
	"created_at" : ISODate("2015-02-11T12:54:42.877Z"),
	"default_gear_size" : "small",
	"deployments" : [
		{
			"_id" : ObjectId("54db515abf14976fb5000046"),
			"activations" : [
				1423659298.050458
			],
			"artifact_url" : null,
			"created_at" : ISODate("2015-02-11T12:54:45.799Z"),
			"deployment_id" : "85acbac7",
			"force_clean_build" : false,
			"hot_deploy" : false,
			"ref" : "master",
			"sha1" : "a4727e6"
		}
	],
	"domain_id" : ObjectId("54db5080bf149765fc000008"),
	"domain_namespace" : "qizhaotest",
	"gears" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500002b"),
			"app_dns" : true,
			"group_instance_id" : ObjectId("54db5112bf14976fb500002d"),
			"host_singletons" : true,
			"name" : "jks",
			"quarantined" : false,
			"server_identity" : "ip-10-180-44-88",
			"sparse_carts" : [ ],
			"uid" : null,
			"uuid" : "54db5112bf14976fb500002b"
		}
	],
	"group_instances" : [
		{
			"_id" : ObjectId("54db5112bf14976fb500002d"),
			"platform" : "linux",
			"addtl_fs_gb" : 0,
			"gear_size" : "small"
		}
	],
	"group_overrides" : [ ],
	"ha" : false,
	"init_git_url" : null,
	"members" : [
		{
			"t" : null,
			"n" : "qizhao",
			"r" : "admin",
			"f" : [
				[
					"domain",
					"admin"
				]
			],
			"e" : null,
			"_id" : ObjectId("54db506ebf149765fc000001")
		}
	],
	"name" : "jks",
	"owner_id" : ObjectId("54db506ebf149765fc000001"),
	"pending_op_groups" : [ ],
	"scalable" : false,
	"secret_token" : "YSFHTa_wHVmzm0f6BW5RAOVtMyVpbMt6IdYMDBLHS3nXNJtGSSurWJR93qYKyRMiuY75dwOzZRnAhBXwG4HxJx-qya7ZmOAKMup-7RLzXRVY_KVIwGG0MON8W7wT_6rX",
	"updated_at" : ISODate("2015-02-11T12:55:54.998Z"),
	"uuid" : "25ad8748b1ed11e482330242ac110002"
}

libra_rs:PRIMARY> db.applications.remove({name:"jks"})

3, Then I run "rhc apps" from the client like:
[root@local ~]# rhc apps
d1 @ http://d1-qizhaotest.dev.rhcloud.com/ (uuid: 54db519abf14976fb5000062)
---------------------------------------------------------------------------
  Domain:     qizhaotest
  Created:    8:56 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://54db519abf14976fb5000062.rhcloud.com/~/git/d1.git/
  SSH:        54db519abf14976fb5000062.rhcloud.com
  Deployment: auto (on git push)

  diy-0.1 (Do-It-Yourself 0.1)
  ----------------------------
    Gears: Located with jenkins-client-1

  jenkins-client-1 (Jenkins Client)
  ---------------------------------
    Gears:   Located with diy-0.1
    Job URL: https://jks-qizhaotest.dev.rhcloud.com/job/d1-build/

php @ http://php-qizhaotest.dev.rhcloud.com/ (uuid: 54db5097bf14976fb5000001)
-----------------------------------------------------------------------------
  Domain:     qizhaotest
  Created:    8:52 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://54db5097bf14976fb5000001.rhcloud.com/~/git/php.git/
  SSH:        54db5097bf14976fb5000001.rhcloud.com
  Deployment: auto (on git push)

  php-5.3 (PHP 5.3)
  -----------------
    Gears: 1 small

You have access to 2 applications.

4, I return to the ssh command line of the instance, then runs
[root@ip-10-180-44-88 ~]# oo-broker oo-admin-repair -r --ssh-keys
Started at: 2015-02-11 13:03:43 UTC
Total gears found in mongo: 2
Gear '54db5097bf14976fb5000001' has a stale key 'domain-jks' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Gear '54db5097bf14976fb5000001' has a stale environment variable 'JENKINS_URL' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Gear '54db5097bf14976fb5000001' has a stale environment variable 'JENKINS_USERNAME' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Gear '54db5097bf14976fb5000001' has a stale environment variable 'JENKINS_PASSWORD' in mongo with missing component/gear '54db5112bf14976fb500003d'.

Finished at: 2015-02-11 13:04:04 UTC
Total time: 20.888s
SUCCESS

5, from rhc client I delete all the applications under this domain:
[root@local ~]# rhc app delete php
This is a non-reversible action! Your application code and data will be permanently deleted if you continue!

Are you sure you want to delete the application 'php'? (yes|no): yes

Deleting application 'php' ... deleted

[root@local ~]# rhc app delete d1
This is a non-reversible action! Your application code and data will be permanently deleted if you continue!

Are you sure you want to delete the application 'd1'? (yes|no): yes

Deleting application 'd1' ... deleted

The corresponding job 'd1-build' in Jenkins has been disabled.
You can re-enable or delete as desired.
Job URL: https://jks-qizhaotest.dev.rhcloud.com/job/d1-build/

[root@local ~]# rhc apps
No applications. Use 'rhc create-app'.

6, I return to the ssh command line of the instance, Re-run "oo-admin-repair -r --ssh-keys" and see the results:
[root@ip-10-180-44-88 ~]# oo-broker oo-admin-repair -r --ssh-keys
Started at: 2015-02-11 13:25:11 UTC
Total gears found in mongo: 0
Domain 'qizhaotest' has a stale key 'domain-jks' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Domain 'qizhaotest' has a stale environment variable 'JENKINS_URL' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Domain 'qizhaotest' has a stale environment variable 'JENKINS_USERNAME' in mongo with missing component/gear '54db5112bf14976fb500003d'.
Domain 'qizhaotest' has a stale environment variable 'JENKINS_PASSWORD' in mongo with missing component/gear '54db5112bf14976fb500003d'.

Finished at: 2015-02-11 13:25:32 UTC
Total time: 20.972s
SUCCESS


Result:
1 stale env vars should be detected in all domains, regardless of whether apps are present.

--- Additional comment from Zhao Qiang on 2015-02-11 04:22:30 EST ---

7, I repair the stale env vars from the ssh console:
[root@ip-10-180-44-88 ~]# oo-broker oo-admin-repair --ssh-keys
Started at: 2015-02-11 14:14:24 UTC
Total gears found in mongo: 0
......
Finished at: 2015-02-11 14:14:45 UTC
Total time: 20.891s
SUCCESS

8, I retest the error:
[root@ip-10-180-44-88 ~]# oo-broker oo-admin-repair -r --ssh-keys
Started at: 2015-02-11 14:15:16 UTC
Total gears found in mongo: 0

Finished at: 2015-02-11 14:15:37 UTC
Total time: 20.864s
SUCCESS


Result:
1 stale env vars error can be repaired.

Comment 3 Ma xiaoqiang 2015-03-17 09:42:19 UTC
Check on puddle[2.2.z/2015-03-16.2]

1. Create a jenkins app
#rhc app create jenkins jenkins
2. Delete the jenkins app from mongo
>db.applications.remove({"canonical_name":"jenkins"})
> db.domains.findOne({"namespace":"123"})
{
        "_allowed_domains" : null,
        "_id" : ObjectId("5507cc46e5fed5adf90000b0"),
        "_type" : "Domain",
        "allowed_gear_sizes" : [
                "small",
                "medium"
        ],
        "canonical_namespace" : "123",
        "created_at" : ISODate("2015-03-17T06:40:06.836Z"),
        "env_vars" : [
                {
                        "key" : "JENKINS_URL",
                        "value" : "https://jenkins-123.ose22-manual.com.cn/",
                        "component_id" : ObjectId("5507f437e5fed55c720001d8"),
                        "unique" : false
                },
                {
                        "key" : "JENKINS_USERNAME",
                        "value" : "system_builder",
                        "component_id" : ObjectId("5507f437e5fed55c720001d8"),
                        "unique" : false
                },
                {
                        "key" : "JENKINS_PASSWORD",
                        "value" : "C_YErE1f2nAm",
                        "component_id" : ObjectId("5507f437e5fed55c720001d8"),
                        "unique" : false
                }
Some stale env exists in domain
3.fix the issue
# oo-admin-repair --ssh-keys 
Started at: 2015-03-17 09:34:32 UTC
Total gears found in mongo: 6
Gear '123-php3-1' has a stale key 'domain-jenkins' in mongo with missing component/gear '5507f437e5fed55c720001d8'.
Gear '123-php3-1' has a stale environment variable 'JENKINS_URL' in mongo with missing component/gear '5507f437e5fed55c720001d8'.
Gear '123-php3-1' has a stale environment variable 'JENKINS_USERNAME' in mongo with missing component/gear '5507f437e5fed55c720001d8'.
Gear '123-php3-1' has a stale environment variable 'JENKINS_PASSWORD' in mongo with missing component/gear '5507f437e5fed55c720001d8'.

Finished at: 2015-03-17 09:34:53 UTC
Total time: 21.336s
SUCCESS
4.Check the env in the domain
> db.domains.findOne({"namespace":"123"})
{
        "_allowed_domains" : null,
        "_id" : ObjectId("5507cc46e5fed5adf90000b0"),
        "_type" : "Domain",
        "allowed_gear_sizes" : [
                "small",
                "medium"
        ],
        "canonical_namespace" : "123",
        "created_at" : ISODate("2015-03-17T06:40:06.836Z"),
        "env_vars" : [ ],
        "members" : [
                {
                        "_type" : "Member",
                        "_id" : ObjectId("55078fa3e5fed55c7200002b"),
                        "t" : null,
                        "n" : "gpei",
                        "r" : "admin",
                        "f" : [
                                [
                                        "owner",
                                        "admin"
                                ]
                        ],
                        "e" : null
                }
        ],
        "namespace" : "123",
        "owner_id" : ObjectId("55078fa3e5fed55c7200002b"),
        "pending_ops" : [ ],
        "system_ssh_keys" : [ ],
        "updated_at" : ISODate("2015-03-17T06:40:06.836Z")
}

the env and sshkey in the domain are deleted.

Comment 5 errata-xmlrpc 2015-04-06 17:06:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0779.html