Bug 1412352

Summary: rhel-registration script broken with satellite 6.2.5+
Product: Red Hat OpenStack Reporter: Jason Montleon <jmontleo>
Component: diskimage-builderAssignee: James Slagle <jslagle>
Status: CLOSED CURRENTRELEASE QA Contact: Gurenko Alex <agurenko>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: aschultz, bnemec, emacchi, greartes, jcoufal, jmontleo, jslagle, mburns, rhel-osp-director-maint
Target Milestone: Upstream M2Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.3.0-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 21:48:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1411935    

Description Jason Montleon 2017-01-11 20:18:24 UTC
Description of problem:
The script /usr/share/diskimage-builder/elements/rhel-common/os-refresh-config/pre-configure.d/06-rhel-registration, which is part of the diskimage-builder package and is part of overcloud images included with RHEL OSP 10 is broken with Satellite 6.2.5+

The problem is that the katello-ca-consumer package now run the command /usr/bin/katello-rhsm-consumer as a postscript.

This in turn at the very end writes a file /etc/rhsm/facts/katello.facts by running:
if [ -d /etc/rhsm/facts/ ]; then
  echo "{\"network.hostname-override\":\"`hostname -f`\"}" > /etc/rhsm/facts/katello.facts
fi

The problem is that every overcloud node populates this file with:
{"network.hostname-override":"localhost"}

This in turn causes every overcloud host to register with the name 'localhost' overwriting previous registrations. Depending on timing this can cause later commands in the script to fail, throwing an error, and causing the deployment to fail.

Version-Release number of selected component (if applicable):
On the Satellite:
[root@qci ~]# rpm -q satellite
satellite-6.2.6-2.0.el7sat.noarch

On the Director:
[root@undercloud ~]# rpm -qa | grep rhosp
rhosp-director-images-ipa-10.0-20161212.1.el7ost.noarch
rhosp-director-images-10.0-20161212.1.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install a satllite 6.2.6 host
2. Set up a director with plan parameters to register to the satellite
3. run deployment

Actual results:
you end up with one host registered to satellite with the name localhost. All previous host registrations are invalidated

Expected results:
All hosts register correctly and deployment does not fail.

Additional Information:
It looks like hostname -f returns localhost until /etc/hosts is updated, which as far as I can tell happens well after registration.

Comment 1 Jason Montleon 2017-01-12 14:58:15 UTC
The script being run is from the plan:
extraconfig/pre_deploy/rhel-registration/scripts/rhel-registration

Although I see 06-rhel-registration running on the hosts in /var/log/messages it appears it's not actually doing anything as it's exiting in a couple seconds.

Again `hostname -f` is returning localhost because /etc/hosts has not been populated yet and the name does not resolve from DNS, despite the hostname being set.

To workaround this I added the line:
echo "{\"network.hostname\":\"$HOSTNAME\"}" > /etc/rhsm/facts/katello.facts
immediately after:
rpm -Uvh katello-ca-consumer-latest.noarch.rpm || true

The registration process then goes on using the correct hostname.

Comment 3 Alex Schultz 2017-03-13 16:20:07 UTC
This might be a duplicate of BZ#1421228

Comment 4 James Slagle 2017-04-13 17:25:24 UTC
The NodeExtraConfig resource that is mapped to rhel registration definitely comes before the {{role.name}}HostsDeployment in the templates. Still, I would have expected cloud-init to have already set the hostname.

If the workaround from comment 1 works, then let's just add that to tripleo-heat-templates/extraconfig/pre_deploy/rhel-registration/scripts/rhel-registration
as the fix. Perhaps we could use hostnamectl to get the hostname though instead of $HOSTNAME. Would have to investigate that.

Comment 5 Ben Nemec 2017-04-19 22:01:00 UTC
James has this.  Cancelling needinfo.

Comment 6 James Slagle 2017-08-16 15:54:19 UTC
can you clarify if you are building rhel images with diskimage-builder and registering them to satellite during that image build process?

Or,

only attempting to register the unmodified director images to satellite with the extraconfig/pre_deploy/rhel-registration/environment-rhel-registration.yaml template from tripleo-heat-templates?

From what I can tell, katello-ca-consumer would only be installed by either the script during the image build or during the overcloud deploy.

If used during the image build, then I'm not surprised the hostname is "localhost".

If used during the overcloud deploy, then the hostname should be set by cloud-init, and if it's not then we need to investigate why that's not the case.

Or possibly if you're doing both, then the image build fact is still taking precedence.

Comment 7 James Slagle 2017-08-16 21:30:01 UTC
there are a few more details on https://bugzilla.redhat.com/show_bug.cgi?id=1476760

I believe I've tracked this down to Heat configuring /etc/hosts after rhel registration. This is a change from when we were using the 51-hosts script to configure /etc/hosts.

Since satellite forces a hostname override in /etc/rhsm/facts/katello.facts to the result of "hostname -f", if /etc/hosts is not configured, that value will always be localhost.

as i mentioned in the other bug, a quick fix may be to rm /etc/rhsm/facts/katello.facts after katello-ca-consumer is installed in the rhel-registration script