Bug 1830350

Summary: ironic mac ports not deleted on node (worker) deletion
Product: OpenShift Container Platform Reporter: Dave Wilson <dwilson>
Component: Bare Metal Hardware ProvisioningAssignee: Steven Hardy <shardy>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Amit Ugol <augol>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: derekh, hpokorny
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-26 16:29:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Wilson 2020-05-01 17:35:25 UTC
Description of problem: deletion of worker and then redploy of that worker appears to cause a port conflict with " A port with MAC address 40:a6:b7:00:47:e0 already exists. (HTTP 409)."


Version-Release number of selected component (if applicable): 4.4


How reproducible:
100%

Steps to Reproduce:
1. Create new worker to existing allocation
2. Delete worker
3. On ironic inspection it fails with port exists

Actual results:
ironic inspection fails

Expected results:
ironic inspection success 

Additional info:

Comment 1 Dmitry Tantsur 2020-05-04 10:00:31 UTC
Please provide any logs. From the ironic side, the logs from the ironic-inspector and dnsmasq containers.

I'm moving this bug to the BMO component, since I suspect the first worker is not fully deleted by the time a new one is created. Ironic removes ports for removed nodes.

Comment 2 Steven Hardy 2020-05-04 10:41:44 UTC
We also need the exact steps to reproduce this, "Delete worker" isn't specific enough.

The logs from the baremetal-operator container would also be helpful - one possible reason is the node delete was not yet completed at the time you created the new BMH and the logs should confirm that - you can also interact directly with the ironic API on the master running the ironic-api container e.g 

  $ oc get pods -n openshift-machine-api | grep metal3
  metal3-796ddb8446-kbn7r                        8/8     Running   0          3d

  $ oc describe pod metal3-796ddb8446-kbn7r -n openshift-machine-api | grep IRONIC_ENDPOINT
      IRONIC_ENDPOINT:            http://[fd00:1101::3]:6385/v1/

  curl http://[fd00:1101::3]:6385/v1/ports | jq .  #here you can grep for the conflicting mac, and also check v1/nodes for an existing node

Comment 3 Derek Higgins 2020-05-12 16:16:49 UTC
(In reply to Dave Wilson from comment #0)
> Description of problem: deletion of worker and then redploy of that worker
> appears to cause a port conflict with " A port with MAC address
> 40:a6:b7:00:47:e0 already exists. (HTTP 409)."

Note: this message "A port with...." is expected when a node powers and boots the inspection image while inspector is not inspecting it.
Can you check if inspection succeeded if so the node may just be waiting to be deployed. 

Details on how the node was deleted and added along with the logs requested above should help in figuring this out.

Comment 4 Dave Wilson 2020-05-19 14:34:35 UTC
I've not been able to repeat/verify this issue. At the time it was reported there were some switch config setting that were erroneous causing intermittent link issue on prov/bm nics. This resulted in having to delete and add the worker multiple times. Since, the issue with the switch port configs have been rectified and workers consistently deploy.