Bug 1320777

Summary: rhel-osp-director: After upgrade 7.3->8.0 nova compute has state "Down" in the nova services list
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: James Slagle <jslagle>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0 (Liberty)CC: bnemec, dbecker, emacchi, jslagle, mburns, mcornea, morazi, rhel-osp-director-maint, sasha
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-31 18:52:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2016-03-24 01:17:52 UTC
rhel-osp-director: After upgrade 7.3->8.0 nova compute has state "Down" in the nova services list

Environment:
openstack-nova-compute-12.0.2-4.el7ost.noarch
python-novaclient-3.1.0-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.12-2.el7ost.noarch
instack-undercloud-2.2.6-1.el7ost.noarch
openstack-puppet-modules-7.0.15-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.12-2.el7ost.noarch


Steps to reproduce:
1. Deploy 7.3 overcloud.
2. Upgrade to 8.0
3. Run "nova service-list"

Result:

+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host                               | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| 2  | nova-scheduler   | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-03-24T01:07:09.000000 | -               |
| 5  | nova-scheduler   | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-03-24T01:07:08.000000 | -               |
| 8  | nova-scheduler   | overcloud-controller-1.localdomain | internal | enabled | up    | 2016-03-24T01:07:08.000000 | -               |
| 11 | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-03-24T01:07:00.000000 | -               |
| 14 | nova-consoleauth | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-03-24T01:07:01.000000 | -               |
| 17 | nova-consoleauth | overcloud-controller-1.localdomain | internal | enabled | up    | 2016-03-24T01:07:07.000000 | -               |
| 20 | nova-conductor   | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-03-24T01:07:04.000000 | -               |
| 23 | nova-conductor   | overcloud-controller-1.localdomain | internal | enabled | up    | 2016-03-24T01:06:59.000000 | -               |
| 26 | nova-compute     | overcloud-compute-0.localdomain    | nova     | enabled | down  | 2016-03-23T22:49:40.000000 | -               |
| 29 | nova-conductor   | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-03-24T01:07:06.000000 | -               |
| 32 | nova-compute     | overcloud-compute-0                | nova     | enabled | up    | 2016-03-24T01:07:00.000000 | -               |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+


Expected result:
nova-compute     | overcloud-compute-0.localdomain    | nova     | enabled | up

Comment 2 Alexander Chuzhoy 2016-03-24 01:18:56 UTC
Note:
The service is active on the compute machine:

[root@overcloud-compute-0 ~]# openstack-service status
MainPID=4314 Id=neutron-openvswitch-agent.service ActiveState=active
MainPID=4481 Id=openstack-ceilometer-compute.service ActiveState=active
MainPID=4414 Id=openstack-nova-compute.service ActiveState=active     


and I was able to launch an instance.

Comment 4 Alexander Chuzhoy 2016-03-30 13:57:41 UTC
*** Bug 1322427 has been marked as a duplicate of this bug. ***

Comment 6 James Slagle 2016-03-30 14:50:29 UTC
ben, can you comment here and just confirm this doesn't cause an actual issue and is only cosmetic? this is related to the domainname (localdomain) in this case getting added to the compute service, and now the old one shows as down, and the new one shows as up.

Comment 7 Emilien Macchi 2016-03-31 14:02:20 UTC
This is not a bug. Your DNS is not sending your domain name on DHCP, look at your resolv.conf.

See https://github.com/puppetlabs/facter/blob/2.4.3/lib/facter/domain.rb#L44-L71

Facter first tries to run hostname -f and if no domain is set, it will try to read resolv.conf and find it, otherwise return nothing, which is your case.

Comment 8 James Slagle 2016-03-31 16:20:37 UTC
after debugging a bit with sasha, we found that /etc/resolv.conf was being manually set after the deployment. this meant the "search" line in resolv.conf that would have specified a domain (and caused puppet to return the correct value for fqdn) to not be set.

recommendation is to set the correct values for the dns servers via the DnsServers parameter in network-environment.yaml and try to reproduce without manually configuring dns.

Comment 9 James Slagle 2016-03-31 18:52:49 UTC
didn't reproduce when setting dns via DnsServers

Comment 10 Marius Cornea 2016-04-07 10:39:24 UTC
*** Bug 1324739 has been marked as a duplicate of this bug. ***