1791949 – [B&R] After controller restore using ReaR cloud-init fails with error 'Failed to start Initial cloud-init job (metadata service crawler)' - 'RuntimeError: duplicate mac found! both 'br-ex' and 'ens5' have mac...'

Bug 1791949 - [B&R] After controller restore using ReaR cloud-init fails with error 'Failed to start Initial cloud-init job (metadata service crawler)' - 'RuntimeError: duplicate mac found! both 'br-ex' and 'ens5' have mac...'

Summary: [B&R] After controller restore using ReaR cloud-init fails with error 'Failed...

Keywords:
Status:	CLOSED DUPLICATE of bug 1795383
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	tripleo-ansible
Sub Component:
Version:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Toure Dunnon
QA Contact:	Eliad Cohen
Docs Contact:
URL:
Whiteboard:
Depends On:	1768770 1802152
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-16 18:50 UTC by Eliad Cohen
Modified:	2020-02-17 07:44 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-27 22:45:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Eliad Cohen 2020-01-16 18:50:56 UTC

Description of problem:
After doing a restore using ReaR as per the procedure described at [1], upon looking at the console for the restored controller, it is evident that cloud-init failed [2]. Stack shows "RuntimeError: duplicate mac found! both 'br-ex' and 'ens5' have mac..."

oddly, pcs status shows nothing wrong.


Version-Release number of selected component (if applicable):


How reproducible:
100% with every systemctl restart cloud-init.service

Steps to Reproduce:
1. [In a virtual monolithic deployment] Use the tripleo-ansible role to create a backup of all controllers on the hypervisor
2. Restore one of the controllers as per [1]
3. reboot the restored controller and see the failure in the console

Actual results:
cloud-init fails to run

Expected results:
Cloud init should run successfully

Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/undercloud_and_control_plane_back_up_and_restore/index
[2] http://pastebin.test.redhat.com/828099

Comment 1 Bob Fournier 2020-01-17 13:59:01 UTC

Not sure why the component is python-ironic-lib but this looks more like an upgrade issue. Including Upgrades DFG.

Eliad - is it possible to get an sosreport?

Comment 3 Julia Kreger 2020-01-17 16:17:47 UTC

Greetings,

I suspect I know what is happening (Mainly I had the same bug in another case long ago.) Essentially, upon restore the instance-id is different. Because cloud-init identifies the different id value from it's last configuration run, it attempts to reconfigure the machine as if it is a brand new cloned machine. Obviously this can be problematic with system that has undergone configuration from another tool set. Ideally, post initial configuration, we would disable cloud-init so it can never run again. That may not be the actual solution though.

-Julia

Comment 4 Eliad Cohen 2020-01-17 16:20:47 UTC

Thanks Julia, Bob. To make things more complicated, looks like all nodes went into maintenance mode. Any idea?

Comment 7 Bob Fournier 2020-01-24 14:54:15 UTC

Removing HardProv as looks like B+R issue.

Comment 9 Bob Fournier 2020-01-27 16:26:39 UTC

Note that there is a bug to prevent cloud-init from modifying network config after first boot - https://bugzilla.redhat.com/show_bug.cgi?id=1773642, not sure if that is relevant here.  That bug was created as a result of https://bugzilla.redhat.com/show_bug.cgi?id=1760806.

Comment 11 Steve Baker 2020-01-27 22:45:05 UTC


*** This bug has been marked as a duplicate of bug 1795383 ***

Note You need to log in before you can comment on or make changes to this bug.