Bug 1120966

Summary: oo-admin-* utils not installed on broker
Product: OKD Reporter: Nicholas Schuetz <nick>
Component: InstallerAssignee: N. Harrison Ripps <hripps>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.xCC: mmccomas, nick
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-18 18:40:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oo-install-cfg.yml
none
openshift-deploy.log none

Description Nicholas Schuetz 2014-07-18 04:25:43 UTC
Description of problem:

oo-install does not install openshift utils.

At the end, i get this:

Now performing post-installation tasks.
Failed to create district 'Default'.
You will need to run the following manually on a Broker to create the district:

	oo-admin-ctl-district -c create -n Default -p small

Then you will need to run the add-node command for each associated node:

	oo-admin-ctl-district -c add-node -n Default -i <node_hostname>

Attempting to register available cartridge types with Broker(s).
Could not register cartridge types with Broker(s).
Log into any Broker and attempt to register the carts with this command:

	oo-admin-ctl-cartridge -c import-node --activate

but none of the oo-admin-* utils are available on the broker

origin 4.0
rhel 6.5

Comment 1 Nicholas Schuetz 2014-07-18 07:13:13 UTC
Looking at it further, the openshift-origin repo isn't even enabled on the broker like it is on the node.  I copied the openshift-origin.repo file to the broker and then installed the openshift-origin-broker-util package via yum.  This gave me the oo-admin tools on the broker so i could manually create my districts.

Comment 2 N. Harrison Ripps 2014-07-18 13:06:14 UTC
Can you please provide:

1. The ~/.openshift/oo-install-cfg.yml file from the host where you ran oo-install

2. The /tmp/openshift-deploy.log file from your broker host?

It sounds like there was a problem during the puppet deployment; this will help me confirm it.

Comment 3 Nicholas Schuetz 2014-07-18 13:18:52 UTC
Created attachment 919071 [details]
oo-install-cfg.yml

Comment 4 Nicholas Schuetz 2014-07-18 13:19:23 UTC
Created attachment 919072 [details]
openshift-deploy.log

Comment 5 N. Harrison Ripps 2014-07-18 13:52:50 UTC
Okay, interesting. The output log only contains one line:

"Could not run: Could not find file /tmp/oo_install_configure_broker.nicknach.net.pp"

So the puppet config script was never copied to the broker.

From the oo-install run, do you still have any of the STDOUT content? I would expect to see some sort of error related to the scp attempt.

Comment 6 Nicholas Schuetz 2014-07-18 14:29:37 UTC
Preflight check: verifying system and resource availability.

Checking broker.nicknach.net:
* SSH connection succeeded
* Target host is running Red Hat Enterprise Linux
* Located getenforce
* SELinux is running in enforcing mode
* Located yum
* Could not find optional channel through RHN.
* Found enabled optional repo through RHSM.
* puppet RPM is installed.
* openssh-clients RPM is installed.
* The 'bind' package is not installed on this host.

The 'bind' RPM is required, but not installed on broker.nicknach.net.
Do you want me to try to install it for you? (y/n/q) y

Checking availability of 'bind' RPM... available.
Attempting to install... success!

Checking node01.nicknach.net:
* SSH connection succeeded
* Target host is running Red Hat Enterprise Linux
* Located getenforce
* SELinux is running in enforcing mode
* Located yum
* Could not find optional channel through RHN.
* Found enabled optional repo through RHSM.
* puppet RPM is installed.
* openssh-clients RPM is installed.

Deploying workflow 'origin_deploy'.

Preparing to install OpenShift Origin on the following hosts:
  * broker.nicknach.net (Broker, DBServer, MsgServer, NameServer)
  * node01.nicknach.net (Node)

Generating template for 'broker.nicknach.net'
* Checking for apps.nicknach.net DNS key... not found; attempting to generate.
* Key generation successful.
* Checking for nicknach.net DNS key... not found; attempting to generate.
* Key generation successful.
* BIND DNS enabled.
* Created template /tmp/oo_install_configure_broker.nicknach.net.pp
* Copying Puppet script to host... success. Removing local copy.

Generating template for 'node01.nicknach.net'
* Created template /tmp/oo_install_configure_node01.nicknach.net.pp
* Copying Puppet script to host... success. Removing local copy.

node01.nicknach.net: Running Puppet deployment for host
broker.nicknach.net: Running Puppet deployment for host
node01.nicknach.net: Puppet module removal failed. This is expected if the module was not installed.
node01.nicknach.net: Attempting Puppet module installation (try #1)
broker.nicknach.net: Puppet module removal failed. This is expected if the module was not installed.
broker.nicknach.net: Attempting Puppet module installation (try #1)
node01.nicknach.net: Puppet module installation succeeded.
node01.nicknach.net: Cleaning yum repos.
node01.nicknach.net: Running the Puppet deployment. This step may take up to an hour.
broker.nicknach.net: Puppet module installation succeeded.
broker.nicknach.net: Cleaning yum repos.
broker.nicknach.net: Running the Puppet deployment. This step may take up to an hour.
broker.nicknach.net: Puppet deployment completed.
broker.nicknach.net: Cleaning up temporary files.
broker.nicknach.net: Clean up of /tmp/#hostfile} failed; please remove this file manually.
node01.nicknach.net: Puppet deployment completed.
node01.nicknach.net: Cleaning up temporary files.

Host deployments completed succesfully.

Restarting services in dependency order.
broker.nicknach.net: service named restart succeeded.
broker.nicknach.net: service mongod restart failed: 
node01.nicknach.net: service ruby193-mcollective stop succeeded.
broker.nicknach.net: service activemq restart failed: 
node01.nicknach.net: service ruby193-mcollective start succeeded.
broker.nicknach.net: service openshift-broker restart failed: 
broker.nicknach.net: service openshift-console restart failed: 

Now performing post-installation tasks.
Failed to create district 'Default'.
You will need to run the following manually on a Broker to create the district:

	oo-admin-ctl-district -c create -n Default -p small

Then you will need to run the add-node command for each associated node:

	oo-admin-ctl-district -c add-node -n Default -i <node_hostname>

Attempting to register available cartridge types with Broker(s).
Could not register cartridge types with Broker(s).
Log into any Broker and attempt to register the carts with this command:

	oo-admin-ctl-cartridge -c import-node --activate



The following user / password combinations were created during the configuration:
Web console:   demo / bUhawVfOgYs1JwqEchbyg
MCollective:   mcollective / VTBOmhjAqYKOjdlczrmg
MongoDB Admin: admin / iB0ByQ9vQCYFOVMiTAEMXw
MongoDB User:  openshift / hgONaAeAOCQFF5ZAzcOg


Be sure to record these somewhere for future use.

Deployment successful. Exiting installer.

All tasks completed.
oo-install exited; removing temporary assets.

Comment 7 N. Harrison Ripps 2014-07-18 15:08:12 UTC
The two interesting lines from above come from the section that begins with:

Generating template for 'broker.nicknach.net'

Specifically:

* Created template /tmp/oo_install_configure_broker.nicknach.net.pp
* Copying Puppet script to host... success. Removing local copy.

The scp command that is run looks like this:

scp -q /tmp/<puppet_file_name> <target_host_user>@<target_host_name>:/tmp/<puppet_file_name>

And if this command returns exit code 0, we assume that the copy was successful.

Two thoughts: either /tmp/ got cleared out between the file copy and the puppet run, or the copy was not successful but returned a 0 exit code anyway.

If you run oo-install with a '-d' flag, you will see a ton of debug output, mostly related to SSH channel info.

If you run it and then search for "Copying Puppet script to host..." in the output, you will see the scp command being run with the '-v' flag for verbose logging. If there is some sort of error, this is the place where you would see it.

Comment 8 Nicholas Schuetz 2014-07-18 15:24:01 UTC
* BIND DNS enabled.
* Created template /tmp/oo_install_configure_broker.nicknach.net.pp
Executing: program /usr/bin/ssh host broker.nicknach.net, user root, command scp -v -t /tmp/oo_install_configure_broker.nicknach.net.pp
OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Executing proxy command: exec /usr/bin/sss_ssh_knownhostsproxy -p 22 broker.nicknach.net
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/identity-cert type -1
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: permanently_drop_suid: 0
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'broker.nicknach.net' is known and matches the RSA host key.
debug1: Found key in /var/lib/sss/pubconf/known_hosts:1
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/identity
debug1: Trying private key: /root/.ssh/id_rsa
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env XMODIFIERS = @im=ibus
debug1: Sending env LANG = en_US.UTF-8
debug1: Sending command: scp -v -t /tmp/oo_install_configure_broker.nicknach.net.pp
Sending file modes: C0644 1786 oo_install_configure_broker.nicknach.net.pp
Sink: C0644 1786 oo_install_configure_broker.nicknach.net.pp
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 1
debug1: fd 0 clearing O_NONBLOCK
debug1: fd 1 clearing O_NONBLOCK
Transferred: sent 4104, received 2176 bytes, in 0.1 seconds
Bytes per second: sent 40717.2, received 21588.8
debug1: Exit status 0
* Copying Puppet script to host... success. Removing local copy.

Generating template for 'node01.nicknach.net'

Comment 9 Nicholas Schuetz 2014-07-18 15:30:03 UTC
no failed events in the audit log either...

# cat /var/log/audit/audit.log |grep failed
[root@broker ~]#

Comment 10 N. Harrison Ripps 2014-07-18 15:32:13 UTC
Okay, that seems legit. The /tmp/oo_install_configure_broker.nicknach.net.pp file should be present on your broker from the time that the scp is performed until after the puppet module is run.

Can you please confirm that after that success that the file was present on the broker at /tmp/oo_install_configure_broker.nicknach.net.pp

If present on the broker host during the period between the scp command and the puppet module deployment completing: then your original bug was possibly the result of a transient SCP error.

If not present: then something about your broker host is unusual.

Comment 11 Nicholas Schuetz 2014-07-18 17:30:11 UTC
the broker portion is very quick (probably because it's erroring out) and thereby the pp file exists only fleetingly.  This is a up2date verion of RHEL 6.5 vanilla with optional repo added.

Comment 12 N. Harrison Ripps 2014-07-18 17:33:52 UTC
I think I know what the problem is. Are you running oo-install -on- the broker host?

Comment 13 Nicholas Schuetz 2014-07-18 17:36:52 UTC
Yes!

Comment 14 Nicholas Schuetz 2014-07-18 17:48:09 UTC
I was able to complete the install properly when not running oo-install from the broker node.  Interesting how it couldnt provision itself, that *use* to work.  If there is a technical reason for this, maybe there should be a warning or exit or both.

Comment 15 N. Harrison Ripps 2014-07-18 17:51:49 UTC
Okay. Because your broker's ssh_host name is not 'localhost', oo-install doesn't know that the puppet config file that it generates at localhost:/tmp/<puppet_file_name> is literally the same file as broker.nicknach.net:/tmp/<puppet_file_name>.

SCPing a file over itself is not a problem, but oo-install cleans up the oo-install host's copy of that file as soon as the scp is completed. In your case, this means deleting the file from where it needed to be. If I defer the 'local' delete until the 'remote' host operations are done, this situation will not impact the installation.

As an aside, you might wonder why I am not doing a sanity check like "If I run `hostname` and it is identical to this oo-install target system's host name, I must be on localhost for that system." The reason is because the hostname that oo-install knows about is not necessarily a target host's actual -current hostname-. It can still find that host as long as the right SSH alias has been set up.

Anyhow, I will update once I have patched this issue.

Comment 16 N. Harrison Ripps 2014-07-18 18:40:05 UTC
Okay, I've deployed a fix for this. The PR is here:

https://github.com/openshift/openshift-extras/pull/412

And the code has been pushed to install.openshift.com.

Thanks for helping me track this down. Please re-open this bug if the problem isn't solved.