Thanks Eduard. This is a bit confusing, let me try to back up. The issue seems to be that the enp8s0 interface in the bond is not coming up. That seems to not be related to the messages with:
"Failed to start DHCP interface br/ex"
Do you agree?
This "Failed to start message interface br/ex" was due to the '_' in the bridge name not being handled properly as described in https://bugzilla.redhat.com/show_bug.cgi?id=1403795
It doesn't appear that this patch would fix the problem as it resolves a cosmetic issue only. We shouldn't need to start DHCP on br-ex as its not a proper interface.
So what is the issue? From the case it seems to be:
"The problem is that after an unexpected failure of compute node (which happens someteimes), on reboot it doesn't bring up interfaces and requires a manual operation. This means that a outage caused by unexpected reboot that will be fixed by itself in 5-10 minutes is not longer resolving itself and requires a manual operation after reboot that could make the issue to last until 1 hour (nightly operation that requires someone on-call to be woken up, etc). Here the importance in solving this"
I've been looking through the sosreports but haven't been able to pinpoint any issues with enp7s0 or enp8s0 coming up. I do see these cloud-init messages:
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: 2017-11-14 17:31:25,453 - stages.py[WARNING]: Failed to rename devices: [unknown] Error performing rename('enp7s0', 'br-bond1') for 00:25:b5:25:0a:6a, br-bond1: Unexpected error while running command.
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Command: ['ip', 'link', 'set', 'enp7s0', 'name', 'br-bond1']
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Exit code: 2
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Reason: -
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stdout: -
Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stderr: RTNETLINK answers: File exists
But not sure if those are indicative of a problem.
It seems from Comment 1 that you're not seeing a problem either. Is it just that we need to backport the fix to remove the "Failed to start DHCP interface br/ex" messages? Thanks.