Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1293942

Summary:	SELinux context problem on pre generated overcloud images
Product:	Red Hat OpenStack	Reporter:	Raoul Scarazzini <rscarazz>
Component:	rhosp-director	Assignee:	Ben Nemec <bnemec>
Status:	CLOSED NOTABUG	QA Contact:	yeylon <yeylon>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	8.0 (Liberty)	CC:	ahirshbe, bnemec, hbrock, jslagle, mburns, michele, oblaut, rhel-osp-director-maint, roxenham, rscarazz, srevivo, ukalifon
Target Milestone:	ga	Keywords:	Reopened, Triaged
Target Release:	8.0 (Liberty)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1301050 (view as bug list)		Environment:
Last Closed:	2016-03-22 07:46:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1301050

Description Raoul Scarazzini 2015-12-23 15:50:34 UTC

While trying to test instance-ha in an OSPd 8 environment, using the images from http://rhos-release.virt.bos.redhat.com/mburns/latest-8.0-images/ we've found out that the selinux context of one critical file is mistaken:

[root@overcloud-novacompute-2 ~]# ls -lZ /etc/machine-id 
-rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id

This causes, while rebooting, this situation:

[root@overcloud-novacompute-0 ~]# systemctl start systemd-journald
Job for systemd-journald.service failed because the control process exited with error code. See "systemctl status systemd-journald.service" and "journalctl -xe" for details.

And the only way to fix it is to do a restorecon:

[root@overcloud-novacompute-0 ~]# restorecon -rv /etc/machine-id 
restorecon reset /etc/machine-id context system_u:object_r:unlabeled_t:s0->system_u:object_r:machineid_t:s0

So to be able to start again systemd-journald:

[root@overcloud-novacompute-0 ~]# systemctl start systemd-journald

The next reboots go fine because of this. Note that there are other files with the unlabeled_t
selinux context. A quick "restorecon -rvn /" will show them all, although the only file
that gave us a problem was /etc/machin-id making systemd-journald non-functional.

Comment 2 Ben Nemec 2016-01-25 18:24:35 UTC

This does not appear to be an image problem.  I just deployed the 12-3 images (which is what the latest link in the OP takes me to) with the latest OSPd 8 puddle and /etc/machine-id on the deployed nodes is fine:

[root@overcloud-controller-0 etc]# ls -lZ machine-id 
-r--r--r--. root root unconfined_u:object_r:machineid_t:s0 machine-id

So I don't know what is causing this, but it isn't anything in the image build or I should be seeing the problem in my deployment too.  What exactly is being done to enable instance ha?  I would look to that at this point.

Comment 3 Rhys Oxenham 2016-02-01 14:11:21 UTC

Not sure it's an image problem either.

Just deployed an OSP7 environment with Packstack and had the same issue.

Issuing a 'restorecon /etc/machine-id' allowed systemd-journal to start, and then I was able to start systemd-journald as well as run httpd/openstack-dashboard which was originally failing as per..

https://bugzilla.redhat.com/show_bug.cgi?id=1300800

Comment 4 Mike Burns 2016-02-04 14:36:22 UTC

*** Bug 1301050 has been marked as a duplicate of this bug. ***

Comment 5 Raoul Scarazzini 2016-02-04 14:47:35 UTC

(In reply to Ben Nemec from comment #2)
[...]
> So I don't know what is causing this, but it isn't anything in the image
> build or I should be seeing the problem in my deployment too.  What exactly
> is being done to enable instance ha?  I would look to that at this point.

Enabling Instance ha (like described here https://access.redhat.com/articles/1544823) could not affect the context of that file, since all the steps are just about configurations. No new package is installed, no selinux context is changed anywhere.
The only thing that is introduced is the fence, and this means that nodes could be rebooted, and if you reboot with the wrong context you get the problem.

Comment 6 Raoul Scarazzini 2016-02-04 15:03:25 UTC

As reported on the clone of this bug this problem is also present in a Mitaka environment with  images generated from scratch:

[root@overcloud-controller-1 ~]# ls -lZ /etc/machine-id 
-rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id

Note that on this environment I wasn't able to complete an overcloud deploy (it fails for other reasons), but once the controllers and computes comes up the problem is there. So I don't think it's something due to overcloud deployment, the image born with this problem already.

Comment 7 Ben Nemec 2016-02-04 16:52:14 UTC

(In reply to Raoul Scarazzini from comment #6)
> As reported on the clone of this bug this problem is also present in a
> Mitaka environment with  images generated from scratch:
> 
> [root@overcloud-controller-1 ~]# ls -lZ /etc/machine-id 
> -rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id
> 
> Note that on this environment I wasn't able to complete an overcloud deploy
> (it fails for other reasons), but once the controllers and computes comes up
> the problem is there. So I don't think it's something due to overcloud
> deployment, the image born with this problem already.

I'm not seeing that in my upstream images.  Is there any chance you could upload your exact overcloud-full.qcow2 somewhere so I can pull it down and try it myself?

FWIW, when I mount a locally built upstream image I see the following:

bnemec@RedHat ~]$ ls -lZ /mnt/temp/etc/machine-id 
-r--r--r--. 1 root root unconfined_u:object_r:machineid_t:s0 33 Mar 31  2015 /mnt/temp/etc/machine-id

I see the same thing after the nodes are deployed, so I don't think anything bad is happening during deployment either.

Comment 8 Raoul Scarazzini 2016-02-04 19:48:03 UTC

Hey Ben,
this is the (CentOS, since it's taken from a Mitaka deployment) image with the wrong /etc/machine_id context: http://file.rdu.redhat.com/rscarazz/overcloud-full.qcow2

I've tried to use this image locally by creating a vm (Just for information: how can you mount a qcow image and viewing the Selinux context?) and the context is unlabeled_t, so it is wrong.

Comment 9 Matthias Runge 2016-02-09 09:09:19 UTC

*** Bug 1305486 has been marked as a duplicate of this bug. ***

Comment 10 Ben Nemec 2016-02-10 00:34:35 UTC

Okay, that's very strange.  I can also see that the selinux context is broken in that image.  Where did you get this file?  Is it built locally or did you download it from somewhere?

FWIW, when I mount the image this is what I see:

[bnemec@RedHat ~]$ ls -lZ /mnt/temp/etc/machine-id 
-rw-r--r-- 1 root root ? 0 Feb  4 03:45 /mnt/temp/etc/machine-id

Which makes me think the selinux context got dropped entirely when the image was built.  I don't see that in either my current upstream images or the latest OSP 8 puddle images.

I use a process like http://blog.loftninjas.org/2008/10/27/mounting-kvm-qcow2-qemu-disk-images/ to mount qcows.

Comment 11 Raoul Scarazzini 2016-02-10 08:38:41 UTC

Hi Ben,
the image is generated on-the-flight after installing the undercloud, using the standard procedure:

overcloud.export NODE_DIST=centos7
export USE_DELOREAN_TRUNK=1
export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/centos7/current-tripleo/"
export DELOREAN_REPO_FILE="delorean.repo"
openstack overcloud image build --all

Now it's clear also how you verified the context on the image. To be sure from my side, as I wrote, I created a vm with this image and saw the wrong context (unlabeled_t) on it.

Let me add that before using the image I do an additional thing on the image, resetting the root password:

virt-sysprep --root-password password:redhat -a overcloud-full.qcow2

Do you think this could affect in some way that specific selinux context?

Comment 12 Ben Nemec 2016-02-10 15:58:08 UTC

[  68.4] Performing "machine-id" ...

Ding, ding, ding, we have a winner. :-)

virt-sysprep wasn't a problem on my F23 laptop, but when I ran it from CentOS it broke the SELinux context exactly as described here.  I'm guessing there was a bug in the version of virt-sysprep in CentOS that causes this.

Comment 13 Raoul Scarazzini 2016-02-11 09:27:31 UTC

That's great, so do I need to create a bug against libguestfs-tools for this? This command caused problems on both CentOS and RHEL.

Comment 14 Ben Nemec 2016-02-15 18:07:11 UTC

I would think so, yes.  Like I said, the version on F23 is already fixed, so it's probably a question of backporting the fix to the EL7 version.

Comment 15 Raoul Scarazzini 2016-02-16 17:16:10 UTC

Hi Ben, here it is: https://bugzilla.redhat.com/show_bug.cgi?id=1308997

I think we can close this bug as not a bug, at least not a bug of the director.

Comment 16 Asaf Hirshberg 2016-03-17 09:10:21 UTC

I just encountered this bug on RHEL-OSP director 8.0 puddle - 2016-03-11.1
The deployment failed at the end.

[root@overcloud-controller-2 ~]# cd /var/log
[root@overcloud-controller-2 log]# less messages
Mar 16 09:38:15 localhost rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="712" x-info="http://www.rsyslog.com"] start
Mar 16 09:38:15 localhost rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
Mar 16 09:38:15 localhost rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
[root@overcloud-controller-2 log]# ls -ltrZ /etc/machine-id
-rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id
[root@overcloud-controller-2 log]# restorecon -v /etc/machine-id
restorecon reset /etc/machine-id context system_u:object_r:unlabeled_t:s0->system_u:object_r:machineid_t:s0
[root@overcloud-controller-2 log]#

Comment 17 Raoul Scarazzini 2016-03-17 09:15:06 UTC

Asaf are you using virt-sysprep on the overcloud images for some reason? If so, then you need to use instead virt-customize. See https://bugzilla.redhat.com/show_bug.cgi?id=1308997 for all the explanations.

Comment 18 Asaf Hirshberg 2016-03-22 06:25:49 UTC

Raoul, yes:

virt-sysprep --root-password password:12345678 -a overcloud-full.qcow2
virt-customize -a overcloud-full.qcow2 --run-command "rpm -ivh http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm"
virt-customize -a overcloud-full.qcow2 --run-command "rhos-release 8-director" ( *** this sets the target node repo files, so you can choose which version/puddle to work with and this should probably be the same as the repo version you set for the undercloud)

Comment 19 Raoul Scarazzini 2016-03-22 07:46:06 UTC

Ok, as explained here [1] you need to use virt-customize also for changing the password or, if you still need to use for some reason virt-sysprep, you need to pass --selinux-relabel to the command. Then your context will be fine.

I'm closing again this bug since... It's not a bug.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1308997