Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1553099

Summary: interface on compute node is not started after reboot
Product: Red Hat OpenStack Reporter: Eduard Barrera <ebarrera>
Component: os-net-configAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Shai Revivo <srevivo>
Severity: medium Docs Contact:
Priority: low    
Version: 7.0 (Kilo)CC: aschultz, bfournie, dsneddon, ebarrera, hbrock, jmelvin, jraju, jslagle, mburns, mori, pablo.iranzo, rcernin, rhel-osp-director-maint, rhos-maint, srevivo, ssigwald
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1267169 Environment:
Last Closed: 2018-04-16 12:27:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1267169, 1579831, 1585763, 1585764    
Bug Blocks:    

Comment 4 Bob Fournier 2018-03-29 14:49:19 UTC
Thanks Eduard.  This is a bit confusing, let me try to back up.  The issue seems to be that the enp8s0 interface in the bond is not coming up.  That seems to not be related to the messages with:
"Failed to start DHCP interface br/ex"

Do you agree?

This "Failed to start message interface br/ex" was due to the '_' in the bridge name not being handled properly as described in https://bugzilla.redhat.com/show_bug.cgi?id=1403795

It doesn't appear that this patch would fix the problem as it resolves a cosmetic issue only.  We shouldn't need to start DHCP on br-ex as its not a proper interface.

So what is the issue?  From the case it seems to be:
"The problem is that after an unexpected failure of compute node (which happens someteimes), on reboot it doesn't bring up interfaces and requires a manual operation. This means that a outage caused by unexpected reboot that will be fixed by itself in 5-10 minutes is not longer resolving itself and requires a manual operation after reboot that could make the issue to last until 1 hour (nightly operation that requires someone on-call to be woken up, etc). Here the importance in solving this"

I've been looking through the sosreports but haven't been able to pinpoint any issues with enp7s0 or enp8s0 coming up.  I do see these cloud-init messages:
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: 2017-11-14 17:31:25,453 - stages.py[WARNING]: Failed to rename devices: [unknown] Error performing rename('enp7s0', 'br-bond1') for 00:25:b5:25:0a:6a, br-bond1: Unexpected error while running command.
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Command: ['ip', 'link', 'set', 'enp7s0', 'name', 'br-bond1']
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Exit code: 2
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Reason: -
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stdout: -
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stderr: RTNETLINK answers: File exists

But not sure if those are indicative of a problem.

It seems from Comment 1 that you're not seeing a problem either.  Is it just that we need to backport the fix to remove the "Failed to start DHCP interface br/ex" messages?  Thanks.

Comment 6 Bob Fournier 2018-04-16 12:27:40 UTC
Thanks for the clarification Eduard. https://bugzilla.redhat.com/show_bug.cgi?id=1267169 has been flagged for inclusion in OSP-10.  Closing this one out.