Bug 1292919 - Director introspection caused disruption of IBM server IMM (IPMI) network interfaces
Director introspection caused disruption of IBM server IMM (IPMI) network int...
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-discoverd (Show other bugs)
8.0 (Liberty)
x86_64 All
unspecified Severity low
: ---
: 10.0 (Newton)
Assigned To: RHOS Maint
Shai Revivo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-18 13:20 EST by Bradford Nichols
Modified: 2016-10-14 15:49 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-14 15:49:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bradford Nichols 2015-12-18 13:20:04 EST
Description of problem:
Introspection caused a reset in IBM server IMM (IPMI) modules which caused them to loose network connectivity when configured to use the server NIC. Configuring the server's IMM (IPMI) to use there embedded NICs eliminated the issue

Version-Release number of selected component (if applicable):
Beta2

How reproducible:
Yes

Steps to Reproduce:
1. Configure you IMM (IPMI) modules to use a server NIC and not the embedded nic
2. Run introspection
3.

Actual results:
The rest of the modules caused them to not be able to get back their DHCP reservations and become unavilable. 

Expected results:
No disruption or change of a configured IMM (IPMI) as a side affect of introspection.

Additional info:
Servers: IBM x3550 M5
FW: DSA 10.0, IMM2 1.02, UEFI 1.03, Bootcode 1.38

In a pool of 8 IBM x3550 servers, after interspection we found that 3 server’s IMM (IPMI) management interfaces were unreachable on the network

Further investigation we found that in the case of the impacted servers,  that their IMM module were configured use the server’s native first NIC instead of the IMM module’s own dedicated NIC. And that those IMMs had a DHCP provided IP address key to their MAC addresses in the network environment.  It seems something in the introspection caused the IMMs to reset. After the reset modules did not manage to get an IP address back. 

The 7 unaffected servers were configured to use the IMM’s embedded NIC.  

The same equipment had been used to do a 7.0 OSP install 8 weeks ago and during that introspection the IMM outage did not occur. 

We’ve now switch the 3 servers over to use the embedded IMM nic, and have been able to continue our OSP 8.0 beta2 install.
/Brad

Reference to Red Hat internal CEE Lab tickets, 

 INC0342532
2015-12-09 11:04:31 EST - Brian RhatiganCustomer update
Hi Brad,
 
The IMMs are available now. I found the 3 servers were powered off so I powered them up and did find an IMM configuration issue probably going back to when they were first deployed as ceph servers. The IMM was configured to "share" the first ethernet interface (even though there is a dedicated interface for the IMM *and* it was cabled). So I converted them to use the dedicated interface (which then caused their MAC address to change). I updated the DHCP server with their new MAC and they instantly pulled the appropriate IP address. I would imagine the other servers have the same configuration issue as the same person set them up so I can't blame the issue on it as they seem to be working fine. But all 3 that I looked at were unable to pull an IP address prior to me making the changes...
 
This might be something to consider fixing down the road if we see the issue crop up again on the other servers.
 
Anyway, you should be good to go. Closing this ticket.
 
Thanks,
Brian
 
If you feel that your request should not be closed yet, please reply to this email and let us know. We want to make sure we are providing a resolution as quickly as possible.
 
 
Ref:MSG5671759
Comment 2 Mike Burns 2016-04-07 17:00:12 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Note You need to log in before you can comment on or make changes to this bug.