Description of problem: Deploying the overcloud from a virt install of the bits from the latest poodle fails - the overcloud nodes don't PXE boot and show "No bootable device". As a side note: was trying a HA deployment with 1 compute and three controller nodes. Version-Release number of selected component (if applicable): [stack@instack ~]$ rpm -qa | grep openstack openstack-nova-console-2015.1.0-7.el7ost.noarch openstack-neutron-2015.1.0-6.el7ost.noarch openstack-ironic-conductor-2015.1.0-4.el7ost.noarch openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch openstack-swift-account-2.3.0-1.el7ost.noarch openstack-tuskar-ui-0.3.0-2.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch openstack-neutron-openvswitch-2015.1.0-6.el7ost.noarch openstack-nova-api-2015.1.0-7.el7ost.noarch openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch python-openstackclient-1.0.3-2.el7ost.noarch openstack-ironic-discoverd-1.1.0-3.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch openstack-swift-object-2.3.0-1.el7ost.noarch openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch openstack-utils-2014.2-1.el7ost.noarch openstack-nova-common-2015.1.0-7.el7ost.noarch openstack-heat-common-2015.1.0-3.el7ost.noarch openstack-tuskar-0.4.18-2.el7ost.noarch python-django-openstack-auth-1.2.0-2.el7ost.noarch openstack-dashboard-theme-2015.1.0-9.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-2.el7ost.noarch openstack-tuskar-ui-extras-0.0.3-3.el7ost.noarch openstack-tempest-kilo-20150507.2.el7ost.noarch openstack-swift-2.3.0-1.el7ost.noarch openstack-neutron-ml2-2015.1.0-6.el7ost.noarch openstack-nova-novncproxy-2015.1.0-7.el7ost.noarch openstack-keystone-2015.1.0-1.el7ost.noarch openstack-swift-plugin-swift3-1.7-3.el7ost.noarch openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch openstack-neutron-common-2015.1.0-6.el7ost.noarch openstack-heat-engine-2015.1.0-3.el7ost.noarch openstack-ceilometer-common-2015.1.0-2.el7ost.noarch openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch openstack-ceilometer-api-2015.1.0-2.el7ost.noarch openstack-ironic-api-2015.1.0-4.el7ost.noarch openstack-swift-proxy-2.3.0-1.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch openstack-ironic-common-2015.1.0-4.el7ost.noarch openstack-selinux-0.6.31-1.el7ost.noarch openstack-nova-compute-2015.1.0-7.el7ost.noarch openstack-nova-conductor-2015.1.0-7.el7ost.noarch openstack-swift-container-2.3.0-1.el7ost.noarch redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch openstack-glance-2015.1.0-6.el7ost.noarch openstack-heat-api-2015.1.0-3.el7ost.noarch openstack-ceilometer-central-2015.1.0-2.el7ost.noarch openstack-puppet-modules-2015.1.3-3.el7ost.noarch openstack-nova-scheduler-2015.1.0-7.el7ost.noarch openstack-nova-cert-2015.1.0-7.el7ost.noarch openstack-dashboard-2015.1.0-9.el7ost.noarch How reproducible: Always - confirmed by dsneddon on his install Steps to Reproduce: 1. Install the virt host and undercloud with bits from the latest poodle (06/10/2015) 2. Register nodes and deploy overcloud instack-deploy-overcloud --tuskar 3. See error on overcloud nodes consoles Actual results: Overcloud nodes don't get PXE boot -expect deploy will time out and fail Expected results: Successful overcloud deployment Additional info:
Added the changes from https://review.gerrithub.io/#/c/235148/ to the install.
I tried the same thing with a very basic configuration. RHOS on virt with Delorean trunk, all default settings. The overcloud instances aren't getting PXE booted during deployment (but discovery works).
I took a closer look at the instack host when deploying, here is what I found: Discovery works without a hitch. During deployment, DHCP requests are seen on the instack host, but no DHCP offer or PXE boot info is sent by the host. ========== [stack@instack ~]$ find /tftpboot/ /tftpboot/ /tftpboot/token-1ecb36a1-9cad-4690-8aaa-e1e1fcb6e864 /tftpboot/token-bff29ba1-a116-4061-a360-285191122f8d /tftpboot/pxelinux.0 /tftpboot/master_images /tftpboot/master_images/3a3a7468-a2eb-4666-b44f-2eb99609295c /tftpboot/master_images/aa7a69a1-b59b-4b79-b316-ea2c854e1414 /tftpboot/master_images/00d07ccc-ee89-45d8-b00e-8426e9925c7a /tftpboot/master_images/8a6244bf-1956-44c8-bef4-ca7a6105234a /tftpboot/undionly.kpxe /tftpboot/map-file /tftpboot/pxelinux.cfg (the pxelinux.cfg is an empty directory). ========== [stack@instack ~]$ ironic node-list +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | bff29ba1-a116-4061-a360-285191122f8d | None | 2bde80ea-5cf3-4101-9f47-79c75f050246 | power on | wait call-back | False | | 1ecb36a1-9cad-4690-8aaa-e1e1fcb6e864 | None | fc378bb1-3c50-405f-a78b-beea2f4580c1 | power on | wait call-back | False | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ ========== [stack@instack ~]$ cat /etc/ironic-discoverd/dnsmasq.conf port=0 interface=br-ctlplane bind-interfaces dhcp-range=192.0.2.100,192.0.2.120,29 enable-tftp tftp-root=/tftpboot dhcp-match=ipxe,175 dhcp-boot=tag:!ipxe,undionly.kpxe,localhost.localdomain,192.0.2.1 dhcp-boot=tag:ipxe,http://192.0.2.1:8088/discoverd.ipxe ==========
In a possibly related twist, I just tried deploying the latest upstream TripleO devtest, and I got the same behavior when I tried to deploy. The instances did not get DHCP or PXE boot.
Ironic error message: Jun 10 19:04:43 instack ironic-discoverd: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address 00:5f:08:71:32:13 already exists. (HTTP 409). Attempt 6 of 6
yeah, came here to say that ^^^ poking around on Dan's setup this is the only thing i could quickly see. The nova computes aren't even spawned. Jun 10 19:04:43 instack.localdomain ironic-discoverd[30934]: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address 00:5f:08:71:32:13 already exists. (HTTP 409). Attempt 6 of 6 Jun 10 19:04:54 instack.localdomain ironic-discoverd[30934]: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address 00:1a:89:60:d1:b5 already exists. (HTTP 409). Attempt 6 of 6 so in any case it isn't (like other recent issues) to do with overcloud config/heat/puppet
so actually this isn't an error: 09:12 < dsneddon_zzz> lucasagomes, dtantsur: Error from my deployment: Jun 10 19:04:43 instack ironic-discoverd: ERROR:ironicclient.common.http:Error contacting Ironic server: A port with MAC address 00:5f:08:71:32:13 already exists. (HTTP 409). Attempt 6 of 6 09:13 < lucasagomes> dsneddon_zzz, this looks like discoverd is trying to create a port resource in ironic with a macaddress that is already registered 09:13 < dtantsur> damn ironicclient, how to silence your faults? >_< 09:13 < dtantsur> dsneddon_zzz, that's not an error. ignore it. 09:14 < dtantsur> the problem is that I don't know the way to tell ironicclient "error is ok, don't report it as ERROR in logs"... 09:14 < dsneddon_zzz> marios, ^
Created attachment 1037610 [details] some logs from the failed virt setup some assorted logs from the failing setup. There are a few different things going on, especially auth issue (nova-api.log keystonemiddleware.auth_token [-] Authorization failed for token). The neutron logs are towards the end, it seems like dhcp agent is having a permissions issue (and can see in ./dhcp-agent.log:2015-06-11 05:39:54.472 9237 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 693ed227-df2c-423e-a1d3-0374db845c48). Hopefully this helps someone
some more context. so i could only get neutron dhcp agent to come up clean after I setenforce 0. I then tried the deploy but eventually neutron dhcp agent has auth issues again (still with setenforce 0): [stack@instack ~]$ sudo service neutron-dhcp-agent status -l Redirecting to /bin/systemctl status -l neutron-dhcp-agent.service neutron-dhcp-agent.service - OpenStack Neutron DHCP Agent Loaded: loaded (/usr/lib/systemd/system/neutron-dhcp-agent.service; enabled) Active: active (running) since Thu 2015-06-11 06:06:30 EDT; 28min ago Main PID: 4923 (neutron-dhcp-ag) CGroup: /system.slice/neutron-dhcp-agent.service └─4923 /usr/bin/python2 /usr/bin/neutron-dhcp-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-dhcp-agent --log-file /var/log/neutron/dhcp-agent.log Jun 11 06:06:30 instack.localdomain systemd[1]: Starting OpenStack Neutron DHCP Agent... Jun 11 06:06:30 instack.localdomain systemd[1]: Started OpenStack Neutron DHCP Agent. Jun 11 06:34:19 instack.localdomain sudo[9147]: pam_unix(sudo:auth): conversation failed Jun 11 06:34:19 instack.localdomain sudo[9147]: pam_unix(sudo:auth): auth could not identify password for [neutron]
Hi, So the problem is permission related. I noticed that the neutron-openvswitch-agent failed to start with [1], and neutron-dhcp-agent was started but the dnsmasq process couldn't be spawned due similar problems [2]. Talking to ajo on IRC he pointed out that the rootwrap-daemon was recently introduced and probably it's not working [3], also on [3] he suggested to have an open rule in /etc/sudoers for the neutron user. And that worked for me, I could start neutron-openvswitch-agent and neutron-dhcp-agent start the dnsmasq process. So, *as a workaround*: 1) Edit the /etc/sudoers and add and entry at the end like: neutron ALL=(ALL) NOPASSWD: ALL 2) Restart the neutron services: $ sudo systemctl restart neutron-dhcp-agent neutron-server neutron-openvswitch-agent.service [1] http://paste.openstack.org/show/284064/ [2] http://paste.openstack.org/show/283919/ [3] http://paste.openstack.org/show/284125/ Cheers, Lucas
Hi, Sorry, one more thing. The selinux is in permissive mode. So the workaround is: 1) Edit the /etc/sudoers and add and entry at the end like: neutron ALL=(ALL) NOPASSWD: ALL 2) Put selinux in permissive mode: $ sudo setenforce 0 3) Restart the neutron services: $ sudo systemctl restart neutron-dhcp-agent neutron-server neutron-openvswitch-agent.service ... Creating a selinux rule from the audit logs show me: cat neutron.te module neutron 1.0; require { type neutron_t; type chkpwd_exec_t; type sudo_db_t; type shadow_t; type sendmail_exec_t; class dir { getattr create add_name }; class file { execute read execute_no_trans getattr open }; } #============= neutron_t ============== allow neutron_t chkpwd_exec_t:file { read execute open execute_no_trans }; allow neutron_t sendmail_exec_t:file execute; allow neutron_t shadow_t:file { read getattr open }; allow neutron_t sudo_db_t:dir { getattr create add_name };
So as Terry Wilson found out, there is a wrong rule in /etc/sudoers.d/neutron: neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf should be without asterisk at the end. Another bug is the selinux issue, we need to add rules for accessing sudodb for rootwrap-daemon
Verified - installation and deployment successful Connection to 192.0.2.7 closed. Overcloud Endpoint: http://192.0.2.7:5000/v2.0/ Overcloud Deployed [stack@instack ~]$ rpm -qa |grep openstack-neutron openstack-neutron-common-2015.1.0-10.el7ost.noarch openstack-neutron-openvswitch-2015.1.0-10.el7ost.noarch openstack-neutron-2015.1.0-10.el7ost.noarch openstack-neutron-ml2-2015.1.0-10.el7ost.noarch tested on : RHEL-OSP director puddle 7.0 RC - 2015-06-29.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548