Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1142295

Summary: Rubygem-Staypuft: The default gateway resides on the wrong NIC. Should be on the external NIC where it exists otherwise on Provisioning/PXE.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhel-osp-installerAssignee: Marek Hulan <mhulan>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: 5.0 (RHEL 7)CC: aberezin, mburns, mhulan, oblaut, rhos-maint, sasha, slong, sseago, stevenca, yeylon
Target Milestone: z1Keywords: ZStream
Target Release: Installer   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rhel-osp-installer-0.3.6-1.el6ost Doc Type: Bug Fix
Doc Text:
Previously, the kickstart template for the RHEL OpenStack Installer assigned the default gateway to the wrong NIC. With this update, the gateway is now assigned to the subnet with public API traffic. If the host does not have a NIC in that network (for example, a compute host in an OpenStack Networking deployment), the default gateway is set to the NIC used for provisioning.
Story Points: ---
Clone Of:
: 1148746 (view as bug list) Environment:
Last Closed: 2014-10-02 12:56:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1142873, 1154145, 1154159, 1154162    

Description Alexander Chuzhoy 2014-09-16 14:11:53 UTC
Rubygem-Staypuft: The default gateway resides on the wrong NIC. Should be on the external NIC where it exists otherwise on Provisioning/PXE.

Environment:
rhel-osp-installer-0.3.4-3.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-foreman-installer-2.0.23-1.el6ost.noarch
openstack-puppet-modules-2014.1-21.8.el6ost.noarch



Steps to reproduce:
1. Install rhel-osp-installer.
2. Configure/run a Nova/Flat deployment.
3. Check the default route on the controller (it should have an external connection).


Result:
The default route is configured via the provisioning/PXE network.

Expected result:
The default route should be through the external NIC/network.

Comment 3 Alexander Chuzhoy 2014-09-16 18:17:47 UTC
This also affects the controller - the horizon isn't reachable from external hosts, until the route is fixed.

Comment 4 Mike Burns 2014-09-17 01:23:14 UTC
Scott, Marek, I'm not sure if this is something to fix in the kickstart or in staypuft somewhere.  Can you take a look?

Comment 5 Marek Hulan 2014-09-17 08:59:22 UTC
There's a configuration in kickstart template that disables DEFROUTE on provisioning interface for Controller. I'm afraid we currently can't select which interface has DEFROUTE and configure it. I wonder why it appeared in this version, we didn't set it before either. Scott, was is somehow configured from puppet modules?

Comment 6 Scott Seago 2014-09-18 17:46:29 UTC
So in talking to Lars and othersk, it looks like the main difference from this bug as reported is that we should use the Public API network for the gateway, not the External network, but otherwise this looks about right.

Comment 7 Mike Burns 2014-09-18 19:53:00 UTC
Summary:

Public API is used when available
When not available, use Provisioning

This does not need to be configurable in the near future.

There was some concern over whether setting this in the kickstart breaks things.  One possible way to do this is to make the configuration of the default gateway the *last* step in the %post section of the kickstart.

Comment 8 Alexander Chuzhoy 2014-09-18 20:15:38 UTC
Sometimes there's no default gateway at all on the controllers.
Sometimes all the default gateway are configured properly.

Comment 9 Scott Seago 2014-09-19 03:53:12 UTC
PR  https://github.com/theforeman/staypuft/pull/309 addresses the rubygem-staypuft side of things. To reference the gateway subnet or interface in the kickstart template, call one of these methods:

@host.network_query.gateway_subnet
@host.network_query.gateway_interface

Comment 10 Marek Hulan 2014-09-19 13:38:18 UTC
Kickstart template ready for testing in https://github.com/theforeman/foreman-installer-staypuft/pull/92

Comment 11 Mike Burns 2014-09-19 21:53:05 UTC
Latest patch in the kickstart appears to cause a reboot loop.  The host never boots from disk after the provisioning.

Comment 12 Alexander Chuzhoy 2014-09-19 22:46:20 UTC
The latest patch still doesn't resolve the issue - there's no default gateway.

There's actually "DEFROUTE=no" in the external NIC's config file.

Comment 13 Mike Burns 2014-09-19 23:00:15 UTC
Just for clarity, it's the public_api network that matters, not the external network.

Even with that statement, this still isn't fixed.

Comment 15 Marek Hulan 2014-09-22 08:52:14 UTC
Could you please check the result of kickstart template? (Host detail -> Templates tab -> provisioning template -> review). Is the name of DEFROUTE_IFACE= correct? Could you upload the installer log from %post? We may need to detect new interface name using MAC address (renaming issue).

Comment 16 Alexander Chuzhoy 2014-09-22 14:39:30 UTC
The DEFROUTE_IFACE is correct - poiting to the external NIC.
I noticed the following, which seems like the cause for the issue:
The line   'if [ "$i" = "$DEFROUTE_IFACE"]; then', as it appears below doesn't have a space between IFACE"]

It seems like placing the space there actually resolves the issue - outputs:
1. Without the space:
setting DEFROUTE=no on ens8

2. With the space:
setting DEFROUTE=yes on ens8





below are the %post sections from the kickstart:

%post --nochroot
exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(
cp -va /etc/resolv.conf /mnt/sysimage/etc/resolv.conf
/usr/bin/chvt 1
) 2>&1 | tee /mnt/sysimage/root/install.postnochroot.log
%end

%post
logger "Starting anaconda maca25400702875.example.com postinstall"
exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(


#update local time
echo "updating system time"
/usr/sbin/ntpdate -sub clock.redhat.com
/usr/sbin/hwclock --systohc

#disable NetworkManager and enable network
chkconfig NetworkManager off
chkconfig network on

# setup SSH key for root user
mkdir --mode=700 /root/.ssh
cat >> /root/.ssh/authorized_keys << PUBLIC_KEY

PUBLIC_KEY
chmod 600 /root/.ssh/authorized_keys



# Red Hat Registration Snippet
#
# Set these parameters if you're using rhnreg_ks:
#
#   spacewalk_type = 'site'     (local Spacewalk/Satellite server)
#                  = 'hosted'   (RHN hosted)
#   spacewalk_host = <hostname> (hostname of Spacewalk server, optional for
#                                RHN hosted)
#
# Set these parameters if you're using subscription-manager:
#
#   subscription_manager = 'true' (you're going to use subscription-manager)
#
#   subscription_manager_username = <username> (if using hosted RHN)
#
#   subscription_manager_password = <password> (if using hosted RHN)
#
#   subscription_manager_host = <hostname> (hostname of SAM/Katello
#                                           installation, if using SAM)
#
#   subscription_manager_org = <org name> (organization name, if using
#                                          SAM/Katello)
#
#   subscription_manager_repos = <repos> (comma separated list of repos (like
#                                         rhel-6-server-optional-rpms) to
#                                         enable after registration)
#
#   subscription_manager_pool = <pool> (specific pool to be used for
#                                       registration)
#
#   http-proxy = <host> (proxy hostname to be used for registration)
#
#   http-proxy-port = <port> (proxy port to be used for registration)
#
#   http-proxy-user = <user> (proxy user to be used for registration)
#
#   http-proxy-password = <password> (proxy password to be
#                                           used for registration)
#
# Set this parameter regardless of which registration method you're using:
#
#   activation_key = <key>      (activation key string, not needed if using
#                                subscription-manager with hosted RHN)
#


  

  
    # Not registering - host.params['activation_key'] not found.
  

# End Red Hat Registration Snippet

yum remove puppet -y 
find /var/lib/puppet/ssl -type f -delete 
mv /etc/yum.repos.d/* /root/ 
yum localinstall -y  http://team.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm 
rhos-release 5 
yum clean all


# update all the base packages from the updates repository
yum -t -y -e 0 update

# ensure firewalld is absent (BZ#1125075)
yum -t -y -e 0 remove firewalld


# and add the puppet package
yum -t -y -e 0 install puppet

echo "Configuring puppet"
cat > /etc/puppet/puppet.conf << EOF

[main]
vardir = /var/lib/puppet
logdir = /var/log/puppet
rundir = /var/run/puppet
ssldir = \$vardir/ssl

[agent]
pluginsync      = true
report          = true
ignoreschedules = true
daemon          = false
ca_server       = staypuft.example.com
certname        = maca25400702875.example.com
environment     = production
server          = staypuft.example.com

EOF

# Setup puppet to run on system reboot
/sbin/chkconfig --level 345 puppet on

/usr/bin/puppet agent --config /etc/puppet/puppet.conf -o --tags no_such_tag --server staypuft.example.com --no-daemonize


sync

# Inform the build system that we are done.
echo "Informing Foreman that we are built"
wget -q -O /dev/null --no-check-certificate http://staypuft.example.com:80/unattended/built


real=`ip -o link | grep a2:54:00:70:28:75 | awk '{print $2;}' | sed s/://`

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-$real
BOOTPROTO="dhcp"
DEVICE="$real"
HWADDR="a2:54:00:70:28:75"
ONBOOT=yes
NM_CONTROLLED=no
EOF




real=`ip -o link | grep 52:54:00:02:dd:e9 | awk '{print $2;}' | sed s/:$//`

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-$real
BOOTPROTO="dhcp"
DEVICE="$real"
HWADDR="52:54:00:02:dd:e9"
ONBOOT=yes
PEERDNS=no
PEERROUTES=no
NM_CONTROLLED=no
EOF



# get name of provisioning interface
PROVISION_IFACE=$(ip route  | awk '$1 == "default" {print $5}' | head -1)
echo "found provisioning interface = $PROVISION_IFACE"
DEFROUTE_IFACE="ens8"

IFACES=$(ls -d /sys/class/net/* | while read iface; do readlink $iface | grep -q virtual || echo ${iface##*/}; done)
for i in $IFACES; do
    sed -i 's/ONBOOT.*/ONBOOT=yes/' /etc/sysconfig/network-scripts/ifcfg-$i
    if [ "$i" != "$PROVISION_IFACE" ]; then
        echo "setting PEERDNS=no on $i"
        sed -i '
            /PEERDNS/ d
            $ a\PEERDNS=no
        ' /etc/sysconfig/network-scripts/ifcfg-$i
    fi

    if [ "$i" = "$DEFROUTE_IFACE"]; then
        echo "setting DEFROUTE=yes on $i"
        sed -i '
            /DEFROUTE/ d
            $ a\DEFROUTE=yes
        ' /etc/sysconfig/network-scripts/ifcfg-$i
    else
        echo "setting DEFROUTE=no on $i"
        sed -i '
            /DEFROUTE/ d
            $ a\DEFROUTE=no
        ' /etc/sysconfig/network-scripts/ifcfg-$i
    fi
done

service network restart

# Sleeping an hour for debug
) 2>&1 | tee /root/install.post.log
exit 0

Comment 17 Marek Hulan 2014-09-23 08:48:10 UTC
So if I understand correctly the whitespace fixed it for you. I added also a fix for possible interface renaming issue. Here is a list of current PR's related to this BZ

installer:
https://github.com/theforeman/foreman-installer-staypuft/pull/92
staypuft:
https://github.com/theforeman/staypuft/pull/309 (merged)
https://github.com/theforeman/staypuft/pull/316

Comment 18 Marek Hulan 2014-09-23 12:44:32 UTC
All PRs got merged.

Comment 21 Alexander Chuzhoy 2014-09-24 18:04:11 UTC
Verified:rhel-osp-installer-0.3.6-1.el6ost.noarch

Taking in mind comment #13, this bug seem to be fixed. The default gateway resides on the network where the public API role resides. In case some host doesn't have a NIC in that network (for example a compute host in Neutron deployment) - then the default gateway is set on the NIC used for provisioning.

Comment 22 Summer Long 2014-09-29 23:22:06 UTC
Mike, the doc text looks odd. Doesn't describe the problem or fix. Is there something else going on or can I use the following?

"Previously, the kickstart template for the RHEL OpenStack Installer assigned the wrong NIC to the default gateway.  With this update, a white-space error in the template has been fixed, and the gateway is now assigned to the network that has the public API. If the host does not have a NIC in that network (for example, a compute host in an OpenStack Networking deployment), the default gateway is set to the NIC used for provisioning." thanks, Summer

Comment 23 Mike Burns 2014-09-30 12:36:38 UTC
(In reply to Summer Long from comment #22)
> Mike, the doc text looks odd. Doesn't describe the problem or fix. Is there
> something else going on or can I use the following?
> 
> "Previously, the kickstart template for the RHEL OpenStack Installer
> assigned the wrong NIC to the default gateway.  With this update, a
> white-space error in the template has been fixed, and the gateway is now
> assigned to the network that has the public API. If the host does not have a
> NIC in that network (for example, a compute host in an OpenStack Networking
> deployment), the default gateway is set to the NIC used for provisioning."
> thanks, Summer

"Previously, the kickstart template for the RHEL OpenStack Installer assigned the default gateway to the wrong NIC.  With this update, the gateway is now assigned to the subnet with public API traffic. If the host does not have a NIC in that network (for example, a compute host in an OpenStack Networking deployment), the default gateway is set to the NIC used for provisioning."

Comment 24 Mike Burns 2014-09-30 12:39:39 UTC
*** Bug 1144862 has been marked as a duplicate of this bug. ***

Comment 25 Scott Lewis 2014-10-02 12:56:35 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1350.html