Bug 1475506

Summary: Undercloud Installation fails due to rabbitmq-server not starting
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: erlangAssignee: Peter Lemenkov <plemenko>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: medium Docs Contact:
Priority: medium    
Version: 12.0 (Pike)CC: apevec, aschultz, chjones, jeckersb, jschluet, lhh, mburns, plemenko, rhel-osp-director-maint, rscarazz, smalleni, ushkalim
Target Milestone: betaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: erlang-18.3.4.5-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 21:45:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2017-07-26 19:50:23 UTC
Description of problem: Undercloud Installation fails. rabbitmq-server failing to start appears to be the reason.


2017-07-26 13:47:12,355 INFO: ^[[1;31mError: /Stage[main]/Rabbitmq::Service/Service[rabbitmq-server]/ensure: change from stopped to running failed: Systemd start for rabbitmq-server failed!
2017-07-26 13:47:12,355 INFO: journalctl log for rabbitmq-server:
2017-07-26 13:47:12,355 INFO: -- Logs begin at Wed 2017-07-26 12:58:04 EDT, end at Wed 2017-07-26 13:47:12 EDT. --
2017-07-26 13:47:12,356 INFO: Jul 26 13:47:09 10.12.20.128.redhat.local systemd[1]: Starting RabbitMQ broker...
2017-07-26 13:47:12,356 INFO: Jul 26 13:47:10 10.12.20.128.redhat.local rabbitmq-server[19581]: ERROR: epmd error for host 10: badarg (unknown POSIX error)
2017-07-26 13:47:12,356 INFO: Jul 26 13:47:10 10.12.20.128.redhat.local systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
2017-07-26 13:47:12,357 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local rabbitmqctl[19722]: Stopping and halting node rabbit@10 ...
2017-07-26 13:47:12,357 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local rabbitmqctl[19722]: Error: unable to connect to node rabbit@10: nodedown
2017-07-26 13:47:12,358 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local rabbitmqctl[19722]: DIAGNOSTICS
2017-07-26 13:47:12,358 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local rabbitmqctl[19722]: ===========
2017-07-26 13:47:12,358 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local rabbitmqctl[19722]: attempted to contact: [rabbit@10]
2017-07-26 13:47:12,359 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local systemd[1]: Failed to start RabbitMQ broker.
2017-07-26 13:47:12,359 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local systemd[1]: Unit rabbitmq-server.service entered failed state.
2017-07-26 13:47:12,359 INFO: Jul 26 13:47:12 10.12.20.128.redhat.local systemd[1]: rabbitmq-server.service failed.
2017-07-26 13:47:12,360 INFO: ^[[0m
2017-07-26 13:47:12,360 INFO: ^[[mNotice: /Stage[main]/Rabbitmq/Anchor[rabbitmq::end]: Dependency Service[rabbitmq-server] has failures: true^[[0m
2017-07-26 13:47:12,360 INFO: ^[[1;33mWarning: /Stage[main]/Rabbitmq/Anchor[rabbitmq::end]: Skipping because of failed dependencies^[[0m
2017-07-26 13:47:18,777 INFO: ^[[mNotice: /Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-registry]/returns: executed successfully^[[0m
2017-07-26 13:47:18,820 INFO: ^[[mNotice: /Stage[main]/Tripleo::Profile::Base::Docker_registry/File_line[docker insecure registry]/ensure: created^[[0m
2017-07-26 13:47:21,502 INFO: ^[[mNotice: /Stage[main]/Tripleo::Profile::Base::Docker/Service[docker]: Triggered 'refresh' from 2 events^[[0m
2017-07-26 13:47:22,216 INFO: ^[[mNotice: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]: Dependency Service[rabbitmq-server] has failures: true^[[0m
2017-07-26 13:47:22,217 INFO: ^[[1;33mWarning: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]: Skipping because of failed dependencies^[[0m
2017-07-26 13:47:22,217 INFO: ^[[mNotice: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/File[/usr/local/bin/rabbitmqadmin]: Dependency Service[rabbitmq-server] has failures: true^[[0m
2017-07-26 13:47:22,217 INFO: ^[[1;33mWarning: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/File[/usr/local/bin/rabbitmqadmin]: Skipping because of failed dependencies^[[0m
2017-07-26 13:47:49,619 INFO: ^[[mNotice: /Stage[main]/Glance::Db::Sync/Exec[glance-manage db_sync]/returns: executed successfully^[[0m
2017-07-26 13:47:49,622 INFO: ^[[mNotice: /Stage[main]/Glance::Deps/Anchor[glance::dbsync::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:51,573 INFO: ^[[mNotice: /Stage[main]/Glance::Db::Metadefs/Exec[glance-manage db_load_metadefs]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:55,231 INFO: ^[[mNotice: /Stage[main]/Nova::Db::Sync_api/Exec[nova-db-sync-api]/returns: executed successfully^[[0m
2017-07-26 13:47:55,232 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::dbsync_api::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:55,233 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::cell_v2::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:55,235 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::db_online_data_migrations::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:58,422 INFO: ^[[mNotice: /Stage[main]/Nova::Cell_v2::Map_cell0/Exec[nova-cell_v2-map_cell0]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:58,423 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::cell_v2::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:47:58,424 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::dbsync::begin]: Triggered 'refresh' from 2 events^[[0m
2017-07-26 13:48:02,141 INFO: ^[[mNotice: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]/returns: executed successfully^[[0m
2017-07-26 13:48:05,791 INFO: ^[[mNotice: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:48:05,792 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::dbsync::end]: Triggered 'refresh' from 2 events^[[0m
2017-07-26 13:48:05,793 INFO: ^[[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::service::begin]: Triggered 'refresh' from 2 events^[[0m
2017-07-26 13:48:16,527 INFO: ^[[mNotice: /Stage[main]/Nova::Api/Nova::Generic_service[api]/Service[nova-api]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:48:20,415 INFO: ^[[mNotice: /Stage[main]/Nova::Conductor/Nova::Generic_service[conductor]/Service[nova-conductor]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:24,632 INFO: ^[[mNotice: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:26,917 INFO: ^[[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: executed successfully^[[0m
2017-07-26 13:49:26,920 INFO: ^[[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::dbsync::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:26,921 INFO: ^[[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:27,280 INFO: ^[[mNotice: /Stage[main]/Neutron::Agents::Dhcp/Service[neutron-dhcp-service]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:27,518 INFO: ^[[mNotice: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[neutron-ovs-agent-service]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:29,428 INFO: ^[[mNotice: /Stage[main]/Heat::Db::Sync/Exec[heat-dbsync]/returns: executed successfully^[[0m
2017-07-26 13:49:29,430 INFO: ^[[mNotice: /Stage[main]/Heat::Deps/Anchor[heat::dbsync::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:29,431 INFO: ^[[mNotice: /Stage[main]/Heat::Deps/Anchor[heat::service::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:29,514 INFO: ^[[mNotice: /Stage[main]/Heat::Api/Service[heat-api]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:29,571 INFO: ^[[mNotice: /Stage[main]/Heat::Api_cfn/Service[heat-api-cfn]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:30,008 INFO: ^[[mNotice: /Stage[main]/Heat::Engine/Service[heat-engine]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:31,904 INFO: ^[[mNotice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]/returns: executed successfully^[[0m
2017-07-26 13:49:31,905 INFO: ^[[mNotice: /Stage[main]/Ironic::Deps/Anchor[ironic::dbsync::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:31,906 INFO: ^[[mNotice: /Stage[main]/Ironic::Deps/Anchor[ironic::service::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:49:32,696 INFO: ^[[mNotice: /Stage[main]/Ironic::Api/Service[ironic-api]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:50:32,780 INFO: ^[[mNotice: /Stage[main]/Ironic::Conductor/Service[ironic-conductor]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:50:32,783 INFO: ^[[mNotice: /Stage[main]/Ironic::Deps/Anchor[ironic::service::end]: Triggered 'refresh' from 2 events^[[0m
2017-07-26 13:50:37,806 INFO: ^[[mNotice: /Stage[main]/Mistral::Db::Sync/Exec[mistral-db-sync]/returns: executed successfully^[[0m
2017-07-26 13:50:40,714 INFO: ^[[mNotice: /Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]/returns: executed successfully^[[0m
2017-07-26 13:50:40,716 INFO: ^[[mNotice: /Stage[main]/Keystone::Deps/Anchor[keystone::dbsync::end]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:55:40,721 INFO: ^[[1;31mError: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: Failed to call refresh: Command exceeded timeout^[[0m
2017-07-26 13:55:40,721 INFO: ^[[1;31mError: /Stage[main]/Keystone/Exec[keystone-manage bootstrap]: Command exceeded timeout^[[0m
2017-07-26 13:55:40,723 INFO: ^[[mNotice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::begin]: Triggered 'refresh' from 1 events^[[0m
2017-07-26 13:55:40,727 INFO: ^[[mNotice: /Stage[main]/Apache::Service/Service[httpd]: Dependency Service[rabbitmq-server] has failures: true^[[0m
2017-07-26 13:55:40,727 INFO: ^[[1;33mWarning: /Stage[main]/Apache::Service/Service[httpd]: Skipping because of failed dependencies^[[0m
2017-07-26 13:55:40,729 INFO: ^[[mNotice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Dependency Service[rabbitmq-server] has failures: true^[[0m
2017-07-26 13:55:40,729 INFO: ^[[1;33mWarning: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Skipping because of failed dependencies^[[0m
2017-07-26 13:58:33,942 INFO: ^[[1;31mError: Failed to apply catalog: Execution of '/bin/openstack role list --quiet --format csv' returned 1: Unable to establish connection to http://192.168.24.1:35357/v3/roles?: HTTPConnectionPool(host='192.168.24.1', port=35357): Max retries exceeded with url: /v3/roles (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x2e9a2d0>: Failed to establish a new connection: [Errno 111] Connection refused',)) (tried 36, for a total of 170 seconds)^[[0m
2017-07-26 13:58:42,867 INFO: + rc=1
2017-07-26 13:58:42,868 INFO: + set -e
2017-07-26 13:58:42,868 INFO: + echo 'puppet apply exited with exit code 1'
2017-07-26 13:58:42,868 INFO: puppet apply exited with exit code 1
2017-07-26 13:58:42,868 INFO: + '[' 1 '!=' 2 -a 1 '!=' 0 ']'
2017-07-26 13:58:42,868 INFO: + exit 1
2017-07-26 13:58:42,868 INFO: [2017-07-26 13:58:42,868] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1]
2017-07-26 13:58:42,868 INFO:
2017-07-26 13:58:42,869 INFO: [2017-07-26 13:58:42,868] (os-refresh-config) [ERROR] Aborting...
2017-07-26 13:58:42,878 DEBUG: An exception occurred
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1629, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1307, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 597, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)




Version-Release number of selected component (if applicable):
12- Build 2017-07-22.2

How reproducible:
100%

Steps to Reproduce:
1. openstack undercloud install
2.
3.

Actual results:
Undercloud install fails

Expected results:
Undercloud installs successfully

Additional info:

Comment 1 Sai Sindhur Malleni 2017-07-26 19:53:20 UTC
RHEL version was 7.4
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Comment 2 Alex Schultz 2017-07-26 22:09:54 UTC
2017-07-26 17:28:24,032 INFO: Error: Execution of '/bin/yum -d 0 -e 0 -y install rabbitmq-server' returned 1: Error: Package: erlang-crypto-18.3.4.5-4.el7ost.x86_64 (rhelosp-12.0-puddle)
2017-07-26 17:28:24,032 INFO:            Requires: libcrypto.so.10(OPENSSL_1.0.2)(64bit)
2017-07-26 17:28:24,032 INFO:  You could try using --skip-broken to work around the problem
2017-07-26 17:28:24,033 INFO:  You could try running: rpm -Va --nofiles --nodigest
2017-07-26 17:28:24,033 INFO: Error: /Stage[main]/Rabbitmq::Install/Package[rabbitmq-server]/ensure: change from purged to present failed: Execution of '/bin/yum -d 0 -e 0 -y install rabbitmq-server' returned 1: Error: Package: erlang-crypto-18.3.4.5-4.el7ost.x86_64 (rhelosp-12.0-puddle)
2017-07-26 17:28:24,033 INFO:            Requires: libcrypto.so.10(OPENSSL_1.0.2)(64bit)
2017-07-26 17:28:24,033 INFO:  You could try using --skip-broken to work around the problem
2017-07-26 17:28:24,033 INFO:  You could try running: rpm -Va --nofiles --nodigest


Looks like packaging issues with the puddle.

Comment 3 Raoul Scarazzini 2017-07-27 07:16:24 UTC
There's something strange here, because yes, problem seem to be related to the puddle packages, but the same puddle worked fine yesterday [1] in a complete deployment that of course included the undercloud installation.
It would be good to take a look to the repo of the undercloud and eventually I would suggest to delete and reconfigure them to be sure that just the rhel and the puddle's one are present.

[1] https://thirdparty.logs.rdoproject.org/jenkins-oooq-rhos-12-puddle-bmu-had00-lab-float_nic_with_vlans-8/undercloud/etc/yum.repos.d/rhos-release-12.repo.txt.gz

Comment 4 Sai Sindhur Malleni 2017-07-27 13:26:45 UTC
Repos
https://gist.github.com/smalleni/168240a3797783f9ceff9c859af0d6cf

Comment 5 Alex Schultz 2017-07-27 13:41:21 UTC
I think I figured it out, it appears that it was trying to be installed on 7.3 (in my case). I think 7.4 is required for 12?

Comment 6 Sai Sindhur Malleni 2017-07-27 13:43:46 UTC
Yes, my undercloud is 7.4. Do I need to use -r argument with rhos-release to force installation on 7.4?

However, I was using infrared to deploy undercloud, I was hoping that could take care of it.

Comment 7 Raoul Scarazzini 2017-07-27 13:47:30 UTC
Yes 7.4 is required for OSP12.
The main difference I see is the fact that in my env I have "latest" instead of 2017-07-22.2 in the director repo, which basically means that I have the repos rhelosp-12.0-puddle and rhelosp-12.0-devtools-puddle pointing a url with "latest" and not the puddle id.
Other than that, everything is the same, so maybe you can try modifying that and see if it fixes the issue.
The rhos-release command I launched was just "rhos-release 12 -p latest".

Comment 8 Sai Sindhur Malleni 2017-07-27 13:49:42 UTC
Can I change it on the same installation? Or does it need a fresh OS? What is the -r argument of rhos-release for? Would help here if I passed -r 7.4?

Comment 9 Raoul Scarazzini 2017-07-27 13:52:52 UTC
A fresh install might be better, but I would go first for a rhos-release -x to cleanup and then the rhos-release 12 -p latest command.

Comment 10 Sai Sindhur Malleni 2017-07-27 14:16:50 UTC
rasca,
Hey however rhos-release 12 -p now picks a build that has not passed_phase1, however I need a build that passed phase1

Comment 11 Peter Lemenkov 2017-08-21 13:28:15 UTC
This shoudl be fixed in any recent Erlang build (more recent than erlang-18.3.4.5-4.el7ost). Sai, could you try it again and tell us if the issue is still present?

Comment 13 Sai Sindhur Malleni 2017-10-02 12:38:50 UTC
Peter, 

I haven't seen this with more recent builds.

Comment 15 pkomarov 2017-10-15 14:18:59 UTC
Verified , tested in osp 12 with erlang-18.3.4.5-4.

#check rhos release: 
[root@undercloud ~]# rhos-release -L
Installed repositories (rhel-7.4):
  12
  rhel-7.4

#check erlang package: 
[root@puma33 ~]# rpm -qa |grep erlang
erlang-kernel-18.3.4.5-4.el7ost.x86_64
erlang-public_key-18.3.4.5-4.el7ost.x86_64
...

#check undercloud installation success:
 [stack@undercloud ~]$ grep 'Undercloud install' undercloud_install.log
Undercloud install complete.

#ir success message : 
msg: All items completed

PLAY RECAP ************************************************************************************
localhost                  : ok=6    changed=2    unreachable=0    failed=0   
undercloud.com : ok=69   changed=44   unreachable=0    failed=0   

                    [[ previous task time: 0:00:02.526307 = 2.53s / 3965.07s ]]
                  [[ previous play time: 0:11:00.466845 = 660.47s / 3965.07s ]]
             [[ previous playbook time: 1:06:05.070158 = 3965.07s / 3965.07s ]]
                   [[ previous total time: 1:06:05.070329 = 3965.07s / 0.00s ]]


#############################################################################
Undercloud install complete.

Comment 18 errata-xmlrpc 2017-12-13 21:45:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462