Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1551603

Summary:	virt-customize on rhel-7.5 re-introduces a static /etc/machine-id in overcloud-full.qcow2
Product:	Red Hat OpenStack	Reporter:	Omri Hochman <ohochman>
Component:	rhosp-director-images	Assignee:	Alex Schultz <aschultz>
Status:	CLOSED ERRATA	QA Contact:	Artem Hrechanychenko <ahrechan>
Severity:	high	Docs Contact:
Priority:	high
Version:	13.0 (Queens)	CC:	agurenko, ahrechan, aschultz, bcafarel, dhill, dsariel, ekuris, evelu, hbrock, jjoyce, jschluet, jslagle, lmarsh, mburns, pablo.iranzo, radoslaw.smigielski, ragiman, rhel-osp-director-maint, sasha, wznoinsk
Target Milestone:	beta	Keywords:	Regression, Triaged
Target Release:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	rhosp-director-images-13.0-20180315.1.el7ost	Doc Type:	Bug Fix
Doc Text:	Recent versions of libguestfs generate a machine-id when virt-customize or the customize action from virt-sysprep run. When this happens, a static /etc/machine-id is included in the image, which can cause issues with services that rely on this information to be unique across hosts. To fix the issue, the build process cleans the overcloud image to provide a blank /etc/machine-id to ensure the image generates correctly when systems are booted for the first time. However, if you use virt-customize to update the overcloud image prior to deployment, run "virt-sysprep --operation machine-id -a <image>" again prior to uploading the image.	Story Points:	---
Clone Of:	1476612
Clones:	1555474 1557046 (view as bug list)		Environment:
Last Closed:	2018-06-27 13:24:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1554546, 1270860, 1476612, 1481443
Bug Blocks:	1555474, 1557046

Description Omri Hochman 2018-03-05 13:32:08 UTC

Cloning, after Bz reproduced again in OSP13. 

Environment: 
------------
[stack@undercloud75 ~]$ rpm -qa | grep rhosp-director-images
rhosp-director-images-ipa-13.0-20180302.1.el7ost.noarch
rhosp-director-images-13.0-20180302.1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.0-0.20180227121938.e0f59ee.el7ost.noarch



stack@undercloud75 ~]$ ssh heat-admin.0.11
The authenticity of host '192.168.0.11 (192.168.0.11)' can't be established.
ECDSA key fingerprint is SHA256:wQ+dLwaTklIrk0vYtTOcDQg/C8MpmIQ8xrCwk2R8i+U.
ECDSA key fingerprint is MD5:f1:a3:20:da:a6:65:05:db:f8:1f:14:86:a5:35:a2:dd.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.0.11' (ECDSA) to the list of known hosts.
[heat-admin@overcloud-controller-1 ~]$ sudo cat /etc/machine-id
b9270d3f95c6be35104f175dd46e8486
[heat-admin@overcloud-controller-1 ~]$ exit
logout
Connection to 192.168.0.11 closed.
[stack@undercloud75 ~]$ ssh heat-admin.0.8
Last login: Sat Mar  3 18:19:55 2018 from gateway
[heat-admin@overcloud-cephstorage-2 ~]$ sudo su -
[root@overcloud-cephstorage-2 ~]# cat /etc/machine-id
b9270d3f95c6be35104f175dd46e8486





----------------------------------------------------------------------------
+++ This bug was initially created as a clone of Bug #1476612 +++

Description of problem:
/etc/machine-id is the same on all overcloud nodes  (well the base RHEL 7.x image) so in a case where we're validating /etc/machine-id to be unique, it would create a conflict and only one node could be added (RHCS let's say).   I'm hesitating between creating a BZ for cloud-init or heat-templates to adress this. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Deploy an overcloud
2.
3.

Actual results:
They all have the same /etc/machine-id value

Expected results:
Should be different somehow

Additional info:

--- Additional comment from David Hill on 2017-07-30 14:20:59 EDT ---

(undercloud) [stack@undercloud-0-trunk ~]$ cat /etc/machine-id 
c9b62f7bee8b444da86ee1bc26aa7e72
(undercloud) [stack@undercloud-0-trunk ~]$ ssh heat-admin.2.9
The authenticity of host '192.0.2.9 (192.0.2.9)' can't be established.
ECDSA key fingerprint is d8:29:01:55:ef:7e:c7:29:08:d2:c0:fa:7e:28:9f:28.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.0.2.9' (ECDSA) to the list of known hosts.
[heat-admin@overcloud-controller-0 ~]$ cat /etc/machine-id 
c9b62f7bee8b444da86ee1bc26aa7e72



This could also be fixed in the image builder tool by deleting that file and creating it if it's missing on first boot.

--- Additional comment from Alex Schultz on 2017-07-31 09:05:36 EDT ---

This was fixed in Ocata as part of Bug 1270860. We would need to backport the appropriate changes.

--- Additional comment from David Hill on 2017-07-31 09:27:36 EDT ---

Is it fixed?  I clicked one of the openstack gerrit and the changes were abandonned.  Also , deleting it will prevent the system from starting properly but setting it as an empty file should work.

--- Additional comment from Alex Schultz on 2017-07-31 11:26:33 EDT ---

Correct, the two abandoned reviews were for fixes to DIB which were rejected. The first to listed on that bug were merged which were tripleo specific fixes for this issue. It's not a problem from 11+. We would have to backport for 10 and make sure images are rebuilt for 10.

https://review.openstack.org/#/c/445174/
https://review.openstack.org/#/c/445173/

--- Additional comment from Alex Schultz on 2017-07-31 14:08:59 EDT ---

Well I just verified that the file is still there. So I guess it needs further investigation.

--- Additional comment from David Hill on 2017-08-06 18:29:57 EDT ---

In 11, it'll be a problem if we only do a "rm -rf /etc/machine-id"... we must recreate it afterwards with "touch /etc/machine-id" !   We can probably abandon my change if we take this one.

--- Additional comment from David Hill on 2017-08-06 18:30:42 EDT ---

https://review.openstack.org/#/c/489013/

--- Additional comment from David Hill on 2017-08-06 18:33:51 EDT ---

This needs to be changed :
https://review.openstack.org/#/c/445173/1/elements/remove-machine-id/post-install.d/70-remove-machine-id

--- Additional comment from David Hill on 2017-08-06 18:35:46 EDT ---

Let me test this in Ocata and confirm if it's removing /etc/machine-id

--- Additional comment from Alex Schultz on 2017-08-07 11:46:36 EDT ---

So we'll fix this for tripleo via a backport of the previously mentioned items. https://review.openstack.org/#/c/489618/ is required to make it work for tripleo at the moment. We won't backport any work for disk image builder if that gets merged.

--- Additional comment from Lon Hohberger on 2017-11-16 16:05:33 EST ---

According to our records, this should be resolved by openstack-tripleo-puppet-elements-5.3.2-1.el7ost.  This build is available now.

--- Additional comment from Lon Hohberger on 2017-11-16 16:05:39 EST ---

According to our records, this should be resolved by openstack-tripleo-common-5.4.4-1.el7ost.  This build is available now.

--- Additional comment from Lon Hohberger on 2017-11-16 16:05:45 EST ---

According to our records, this should be resolved by rhosp-director-images-10.0-20171108.1.el7ost.  This build is available now.

--- Additional comment from Gurenko Alex on 2017-12-13 06:17:41 EST ---

Verified on build 2017-12-05.1 with RHEL 7.4 and rhosp-director-images-10.0-20171204.1.el7ost

Comment 6 Alex Schultz 2018-03-05 15:32:05 UTC

Jon can you verify the image build process has the remove-machine-id element in the configuration?

https://github.com/openstack/tripleo-common/blob/master/image-yaml/overcloud-images.yaml#L22

Comment 8 Alex Schultz 2018-03-12 23:10:38 UTC

This is caused by a newer version of virt-customize that we use in the image building process.

https://github.com/libguestfs/libguestfs/commit/d5ce659e2c136fbcf0a0b9058711765cfae6c210

Comment 13 Alex Schultz 2018-03-16 16:11:59 UTC

A quick way to verify this has been done is to run:

guestfish -a overcloud-full.qcow2 run : mount /dev/sda / : cat /etc/machine-id

It should return a blank line. The file should exist but be empty.


10:10 AM tmp  ➜ guestfish -a overcloud-full.qcow2 run : mount /dev/sda / : cat /etc/machine-id          

10:10 AM tmp  ➜

Comment 14 Bernard Cafarelli 2018-03-16 16:18:40 UTC

*** Bug 1545085 has been marked as a duplicate of this bug. ***

Comment 19 Artem Hrechanychenko 2018-03-30 13:52:26 UTC

VERIFIED

(undercloud) [stack@undercloud-0 images]$ tar -xvf /usr/share/rhosp-director-images/overcloud-full-latest-13.0.tar 
overcloud-full.qcow2
overcloud-full.initrd
overcloud-full.vmlinuz
overcloud-full-rpm.manifest
overcloud-full-signature.manifest
(undercloud) [stack@undercloud-0 images]$ guestfish -a overcloud-full.qcow2 run : mount /dev/sda / : cat /etc/machine-id

(undercloud) [stack@undercloud-0 images]$ 

(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.14
Last login: Fri Mar 30 13:51:42 2018 from 192.168.24.1
[heat-admin@compute-1 ~]$ cat /etc/machine-id 
e420ff129b2243b89c2f9536a6e66d03
[heat-admin@compute-1 ~]$ exit
logout
Connection to 192.168.24.14 closed.
(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.11
The authenticity of host '192.168.24.11 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:fmnNktU4xBACzFTYS0daYlaYlTQTXaOo/6F9yUc+m2s.
ECDSA key fingerprint is MD5:ed:4d:53:25:d8:97:94:f0:79:e3:2d:12:07:42:0c:2c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.24.11' (ECDSA) to the list of known hosts.
Last login: Fri Mar 30 13:38:25 2018 from 192.168.24.254
[heat-admin@controller-1 ~]$ cat /etc/machine-id 
3a34f59f127b435cadbc727c32412b05

Comment 20 Artem Hrechanychenko 2018-03-30 13:53:02 UTC

(undercloud) [stack@undercloud-0 ~]$ sudo rpm -qa "rhosp-director-image*"
rhosp-director-images-13.0-20180328.1.el7ost.noarch

Comment 21 Roee Agiman 2018-04-10 07:19:32 UTC

Hey.
Re-opening due to issues popping out recently.

https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OSPD-Customized-Deployment-virt/3514/

Take a look at this Customized Job, failing in the overcloud step due to same reasons that should be fixed here.

Console Output:

13:42:58 TASK [get ironic info for the node] ********************************************
13:42:58 task path: /home/rhos-ci/jenkins/workspace/OSPD-Customized-Deployment-virt@2/infrared/plugins/tripleo-overcloud/tasks/add_overcloud_host.yml:2
13:42:58 fatal: [undercloud-0]: FAILED! => {
13:42:58     "changed": true, 
13:42:58     "cmd": "source ~/stackrc\n openstack baremetal node show  -c name -f value", 
13:42:58     "delta": "0:00:02.120858", 
13:42:58     "end": "2018-04-09 09:43:11.794658", 
13:42:58     "failed": true, 
13:42:58     "rc": 2, 
13:42:58     "start": "2018-04-09 09:43:09.673800"
13:42:58 }
13:42:58 
13:42:58 STDERR:
13:42:58 
13:42:58 usage: openstack baremetal node show [-h] [-f {json,shell,table,value,yaml}]
13:42:58                                      [-c COLUMN] [--max-width <integer>]
13:42:58                                      [--fit-width] [--print-empty]
13:42:58                                      [--noindent] [--prefix PREFIX]
13:42:58                                      [--instance]
13:42:58                                      [--fields <field> [<field> ...]]
13:42:58                                      <node>
13:42:58 openstack baremetal node show: error: too few arguments
13:42:58 
13:42:58 
13:42:58 MSG:
13:42:58 
13:42:58 non-zero return code

Please contact me if any further information is needed.

Comment 22 Alex Schultz 2018-04-10 15:48:06 UTC

The failed job did not fail for the same reason. It was a timeout of the deployment.  Also this bug wasn't a deployment failure related problem


2018-04-09 13:41:27Z [overcloud.Compute]: CREATE_FAILED  CREATE aborted (Task create from ResourceGroup "Compute" Stack "overcloud" [fbd5b33b-6892-40b6-8bde-1e0f48d39c35] Timed out)
2018-04-09 13:41:27Z [overcloud.Compute]: UPDATE_FAILED  Stack UPDATE cancelled
2018-04-09 13:41:27Z [overcloud]: CREATE_FAILED  Timed out
2018-04-09 13:41:28Z [overcloud.Compute.0]: CREATE_FAILED  Stack CREATE cancelled
2018-04-09 13:41:28Z [overcloud.Compute.0]: CREATE_FAILED  resources[0]: Stack CREATE cancelled
2018-04-09 13:41:28Z [overcloud.Compute]: UPDATE_FAILED  Resource CREATE failed: resources[0]: Stack CREATE cancelled

Comment 23 Omri Hochman 2018-04-10 18:58:56 UTC

(In reply to Roee Agiman from comment #21)
> Hey.
> Re-opening due to issues popping out recently.
> 
> Please contact me if any further information is needed.

Hi Roee,  

The original bug body mention issue with a specific value that was in the overcloud images nodes,  under: /etc/machine-id . 
It was always the same value, while this value should be unique.  

I'm re-verifying this bug, if you encounter this specific issue in the future please re-open.

Comment 24 Bob Fournier 2018-04-12 16:29:34 UTC

*** Bug 1545085 has been marked as a duplicate of this bug. ***

Comment 25 Alex Schultz 2018-04-17 23:59:11 UTC

*** Bug 1545085 has been marked as a duplicate of this bug. ***

Comment 29 errata-xmlrpc 2018-06-27 13:24:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2083