Description of problem: In a recent deployment of RHCI trying to set up RHEV + CFME, I got myself in trouble during provisioning the two RHEV nodes, and despite our efforts to get back on track, were unable to. The blades in our environment are old and need manual rebooting and PXE selection during the discovery and provisioning process, and when the unified installer was waiting for the hypervisor system to deploy (I needed to reboot it and ensure it PXE'd off the right NIC), I accidentally rebooted the engine system and it was re-registered with Satellite, showing two entries in Discovered Hosts, one with the name I had set in the unified installer, and one that was just the mac address. After the hypervisor was successfully deployed, I started the process for the engine, but when it rebooted to be provisioned, it was blocked from doing so, its MAC and IP addresses were recognized as being a duplicate of another system, and the system just looped, complaining about that. We ended up deleting both entries from the Discovered Hosts, and manually provisioning the engine system, giving it the same name as originally specified in the unified installer. After the system was provisioned and online, we thought we were back on track, but the unified installer never recognized it as the system it was waiting for, and eventually timed out, with an Error Couldn't find Host::Base with id=5. It would be great if there was something we could do to make this process more robust. Having to throw out the deployment and start over again hurts. I don't know if there's something we could do to recognize duplicate hosts and not have that screw things up, or if there's something we could do to manually point out that the node it was looking for had issues, but this node over here is what we now want it to use, or something smarter. I imagine running into issues like this would be more difficult in a more modern environment, but mistakes happen and it would be nice if we had more leeway to get back on track when troubleshooting has to happen. Version-Release number of selected component (if applicable): 1-22 build
Created attachment 1119302 [details] Error Message Displayed on Installation Progress screen
The root of the problem is that rebooting a discovered host after it has been renamed on the rhev configuration pages, but before it's converted to a managed host causes a duplicate discovered host entry with a conflicting IP and MAC address. This will generally also cause the host to fail being converted from a discovered to managed host. The simplest strategy is probably looking at not renaming the discovered host until we convert it to a managed host.
We are deferring this to post-GA. Once we come back to working this we would like the fix to ensure that no changes are made to the state of Satellite until the deploy button is clicked. i.e., let's not make any name changes to the discovered hosts during the UI selection process. We want to queue the changes and only execute them after deploy is clicked.