1125147 – During deployment, my host lost its networking and needed a 'service network restart'

Bug 1125147 - During deployment, my host lost its networking and needed a 'service network restart'

Summary: During deployment, my host lost its networking and needed a 'service network ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rubygem-staypuft
Sub Component:
Version:	5.0 (RHEL 7)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	Installer
Assignee:	Lars Kellogg-Stedman
QA Contact:	Omri Hochman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-31 07:56 UTC by Udi Kalifon
Modified:	2014-08-07 13:42 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-07 13:42:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Udi Kalifon 2014-07-31 07:56:09 UTC

Description of problem:
I left the system running overnight to deploy an Openstack installation with Neutron. In the morning it was impossible to ping or ssh to the host machine. I connected to the mgmt console and ran "ip a" and could see that no interface has a valid IP address. So I restarted the network service and things went back to normal.

This is from the output of "ip a" before restarting the network:
6: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 00:9c:02:b0:8f:60 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::29c:2ff:feb0:8f60/64 scope link 
       valid_lft forever preferred_lft forever

and after restarting the network:
6: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 00:9c:02:b0:8f:60 brd ff:ff:ff:ff:ff:ff
    inet 10.35.160.13/24 brd 10.35.160.255 scope global dynamic br0
       valid_lft 43179sec preferred_lft 43179sec
    inet6 fe80::29c:2ff:feb0:8f60/64 scope link 
       valid_lft forever preferred_lft forever


Version-Release number of selected component (if applicable):
Foreman puddle from July 29th.


How reproducible:
rendomly


Steps to Reproduce:
1. Deploy using foreman according to the instructions: http://etherpad.corp.redhat.com/Create-staypuft-test-setup

Comment 2 Mike Burns 2014-07-31 11:31:03 UTC

Any logs? any indication why networking was lost? any puppet issues listed?  how reproducible (randomly doesn't really tell me much, is it every other time, 1 out of 100, 95 out of 100?)

Comment 5 Mike Burns 2014-08-05 14:26:52 UTC

If this happens, again, can you let us know so we can look for more information.

Comment 7 Udi Kalifon 2014-08-07 08:00:31 UTC

This happened again. Mail was sent to Mike and Lars with the details of the server to connect to.

Comment 8 Lars Kellogg-Stedman 2014-08-07 13:42:46 UTC

This is a networking problem occurring on the physical host that is hosting the staypuft vm.  Staypuft is not running on the affected host nor is it configuring the affected host.

Looking in the logs, I see:

Aug 06 23:11:41 puma03.scl.lab.tlv.redhat.com dhclient[2340]: DHCPREQUEST on br0 to 10.35.28.1 port 67 (xid=0x215a303a)
Aug 06 23:11:41 puma03.scl.lab.tlv.redhat.com dhclient[2340]: DHCPACK from 10.35.28.1 (xid=0x215a303a)
Aug 06 23:11:41 puma03.scl.lab.tlv.redhat.com dhclient[2340]: suspect value in domain_name option - discarded
Aug 06 23:11:41 puma03.scl.lab.tlv.redhat.com dhclient[2340]: suspect value in domain_name option - discarded
Aug 06 23:11:41 puma03.scl.lab.tlv.redhat.com dhclient[2340]: DHCPDECLINE on br0 to 255.255.255.255 port 67 (xid=0x215a303a)

The DHCPDECLINE message indicates that dhclient has determined that the address provided by the DHCP server is not useable, typically because it looks as if another host on the network is already using the given ip address.

After the DHCPDECLINE message, there is no more activity from dhclient, suggesting that this error condition causes it to stop requesting a lease.  I'm not sure that this is expected or desired behavior, but in any case it is unrelated to Staypuft.

Note You need to log in before you can comment on or make changes to this bug.