Bug 1791949 - [B&R] After controller restore using ReaR cloud-init fails with error 'Failed to start Initial cloud-init job (metadata service crawler)' - 'RuntimeError: duplicate mac found! both 'br-ex' and 'ens5' have mac...'
Summary: [B&R] After controller restore using ReaR cloud-init fails with error 'Failed...
Keywords:
Status: CLOSED DUPLICATE of bug 1795383
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Toure Dunnon
QA Contact: Eliad Cohen
URL:
Whiteboard:
Depends On: 1768770 1802152
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-16 18:50 UTC by Eliad Cohen
Modified: 2020-02-17 07:44 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-27 22:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eliad Cohen 2020-01-16 18:50:56 UTC
Description of problem:
After doing a restore using ReaR as per the procedure described at [1], upon looking at the console for the restored controller, it is evident that cloud-init failed [2]. Stack shows "RuntimeError: duplicate mac found! both 'br-ex' and 'ens5' have mac..."

oddly, pcs status shows nothing wrong.


Version-Release number of selected component (if applicable):


How reproducible:
100% with every systemctl restart cloud-init.service

Steps to Reproduce:
1. [In a virtual monolithic deployment] Use the tripleo-ansible role to create a backup of all controllers on the hypervisor
2. Restore one of the controllers as per [1]
3. reboot the restored controller and see the failure in the console

Actual results:
cloud-init fails to run

Expected results:
Cloud init should run successfully

Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/undercloud_and_control_plane_back_up_and_restore/index
[2] http://pastebin.test.redhat.com/828099

Comment 1 Bob Fournier 2020-01-17 13:59:01 UTC
Not sure why the component is python-ironic-lib but this looks more like an upgrade issue. Including Upgrades DFG.

Eliad - is it possible to get an sosreport?

Comment 3 Julia Kreger 2020-01-17 16:17:47 UTC
Greetings,

I suspect I know what is happening (Mainly I had the same bug in another case long ago.) Essentially, upon restore the instance-id is different. Because cloud-init identifies the different id value from it's last configuration run, it attempts to reconfigure the machine as if it is a brand new cloned machine. Obviously this can be problematic with system that has undergone configuration from another tool set. Ideally, post initial configuration, we would disable cloud-init so it can never run again. That may not be the actual solution though.

-Julia

Comment 4 Eliad Cohen 2020-01-17 16:20:47 UTC
Thanks Julia, Bob. To make things more complicated, looks like all nodes went into maintenance mode. Any idea?

Comment 7 Bob Fournier 2020-01-24 14:54:15 UTC
Removing HardProv as looks like B+R issue.

Comment 9 Bob Fournier 2020-01-27 16:26:39 UTC
Note that there is a bug to prevent cloud-init from modifying network config after first boot - https://bugzilla.redhat.com/show_bug.cgi?id=1773642, not sure if that is relevant here.  That bug was created as a result of https://bugzilla.redhat.com/show_bug.cgi?id=1760806.

Comment 11 Steve Baker 2020-01-27 22:45:05 UTC

*** This bug has been marked as a duplicate of bug 1795383 ***


Note You need to log in before you can comment on or make changes to this bug.