While trying to test instance-ha in an OSPd 8 environment, using the images from http://rhos-release.virt.bos.redhat.com/mburns/latest-8.0-images/ we've found out that the selinux context of one critical file is mistaken: [root@overcloud-novacompute-2 ~]# ls -lZ /etc/machine-id -rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id This causes, while rebooting, this situation: [root@overcloud-novacompute-0 ~]# systemctl start systemd-journald Job for systemd-journald.service failed because the control process exited with error code. See "systemctl status systemd-journald.service" and "journalctl -xe" for details. And the only way to fix it is to do a restorecon: [root@overcloud-novacompute-0 ~]# restorecon -rv /etc/machine-id restorecon reset /etc/machine-id context system_u:object_r:unlabeled_t:s0->system_u:object_r:machineid_t:s0 So to be able to start again systemd-journald: [root@overcloud-novacompute-0 ~]# systemctl start systemd-journald The next reboots go fine because of this. Note that there are other files with the unlabeled_t selinux context. A quick "restorecon -rvn /" will show them all, although the only file that gave us a problem was /etc/machin-id making systemd-journald non-functional.
This does not appear to be an image problem. I just deployed the 12-3 images (which is what the latest link in the OP takes me to) with the latest OSPd 8 puddle and /etc/machine-id on the deployed nodes is fine: [root@overcloud-controller-0 etc]# ls -lZ machine-id -r--r--r--. root root unconfined_u:object_r:machineid_t:s0 machine-id So I don't know what is causing this, but it isn't anything in the image build or I should be seeing the problem in my deployment too. What exactly is being done to enable instance ha? I would look to that at this point.
Not sure it's an image problem either. Just deployed an OSP7 environment with Packstack and had the same issue. Issuing a 'restorecon /etc/machine-id' allowed systemd-journal to start, and then I was able to start systemd-journald as well as run httpd/openstack-dashboard which was originally failing as per.. https://bugzilla.redhat.com/show_bug.cgi?id=1300800
*** Bug 1301050 has been marked as a duplicate of this bug. ***
(In reply to Ben Nemec from comment #2) [...] > So I don't know what is causing this, but it isn't anything in the image > build or I should be seeing the problem in my deployment too. What exactly > is being done to enable instance ha? I would look to that at this point. Enabling Instance ha (like described here https://access.redhat.com/articles/1544823) could not affect the context of that file, since all the steps are just about configurations. No new package is installed, no selinux context is changed anywhere. The only thing that is introduced is the fence, and this means that nodes could be rebooted, and if you reboot with the wrong context you get the problem.
As reported on the clone of this bug this problem is also present in a Mitaka environment with images generated from scratch: [root@overcloud-controller-1 ~]# ls -lZ /etc/machine-id -rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id Note that on this environment I wasn't able to complete an overcloud deploy (it fails for other reasons), but once the controllers and computes comes up the problem is there. So I don't think it's something due to overcloud deployment, the image born with this problem already.
(In reply to Raoul Scarazzini from comment #6) > As reported on the clone of this bug this problem is also present in a > Mitaka environment with images generated from scratch: > > [root@overcloud-controller-1 ~]# ls -lZ /etc/machine-id > -rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id > > Note that on this environment I wasn't able to complete an overcloud deploy > (it fails for other reasons), but once the controllers and computes comes up > the problem is there. So I don't think it's something due to overcloud > deployment, the image born with this problem already. I'm not seeing that in my upstream images. Is there any chance you could upload your exact overcloud-full.qcow2 somewhere so I can pull it down and try it myself? FWIW, when I mount a locally built upstream image I see the following: bnemec@RedHat ~]$ ls -lZ /mnt/temp/etc/machine-id -r--r--r--. 1 root root unconfined_u:object_r:machineid_t:s0 33 Mar 31 2015 /mnt/temp/etc/machine-id I see the same thing after the nodes are deployed, so I don't think anything bad is happening during deployment either.
Hey Ben, this is the (CentOS, since it's taken from a Mitaka deployment) image with the wrong /etc/machine_id context: http://file.rdu.redhat.com/rscarazz/overcloud-full.qcow2 I've tried to use this image locally by creating a vm (Just for information: how can you mount a qcow image and viewing the Selinux context?) and the context is unlabeled_t, so it is wrong.
*** Bug 1305486 has been marked as a duplicate of this bug. ***
Okay, that's very strange. I can also see that the selinux context is broken in that image. Where did you get this file? Is it built locally or did you download it from somewhere? FWIW, when I mount the image this is what I see: [bnemec@RedHat ~]$ ls -lZ /mnt/temp/etc/machine-id -rw-r--r-- 1 root root ? 0 Feb 4 03:45 /mnt/temp/etc/machine-id Which makes me think the selinux context got dropped entirely when the image was built. I don't see that in either my current upstream images or the latest OSP 8 puddle images. I use a process like http://blog.loftninjas.org/2008/10/27/mounting-kvm-qcow2-qemu-disk-images/ to mount qcows.
Hi Ben, the image is generated on-the-flight after installing the undercloud, using the standard procedure: overcloud.export NODE_DIST=centos7 export USE_DELOREAN_TRUNK=1 export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/centos7/current-tripleo/" export DELOREAN_REPO_FILE="delorean.repo" openstack overcloud image build --all Now it's clear also how you verified the context on the image. To be sure from my side, as I wrote, I created a vm with this image and saw the wrong context (unlabeled_t) on it. Let me add that before using the image I do an additional thing on the image, resetting the root password: virt-sysprep --root-password password:redhat -a overcloud-full.qcow2 Do you think this could affect in some way that specific selinux context?
[ 68.4] Performing "machine-id" ... Ding, ding, ding, we have a winner. :-) virt-sysprep wasn't a problem on my F23 laptop, but when I ran it from CentOS it broke the SELinux context exactly as described here. I'm guessing there was a bug in the version of virt-sysprep in CentOS that causes this.
That's great, so do I need to create a bug against libguestfs-tools for this? This command caused problems on both CentOS and RHEL.
I would think so, yes. Like I said, the version on F23 is already fixed, so it's probably a question of backporting the fix to the EL7 version.
Hi Ben, here it is: https://bugzilla.redhat.com/show_bug.cgi?id=1308997 I think we can close this bug as not a bug, at least not a bug of the director.
I just encountered this bug on RHEL-OSP director 8.0 puddle - 2016-03-11.1 The deployment failed at the end. [root@overcloud-controller-2 ~]# cd /var/log [root@overcloud-controller-2 log]# less messages Mar 16 09:38:15 localhost rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="712" x-info="http://www.rsyslog.com"] start Mar 16 09:38:15 localhost rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ] Mar 16 09:38:15 localhost rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ] [root@overcloud-controller-2 log]# ls -ltrZ /etc/machine-id -rw-r--r--. root root system_u:object_r:unlabeled_t:s0 /etc/machine-id [root@overcloud-controller-2 log]# restorecon -v /etc/machine-id restorecon reset /etc/machine-id context system_u:object_r:unlabeled_t:s0->system_u:object_r:machineid_t:s0 [root@overcloud-controller-2 log]#
Asaf are you using virt-sysprep on the overcloud images for some reason? If so, then you need to use instead virt-customize. See https://bugzilla.redhat.com/show_bug.cgi?id=1308997 for all the explanations.
Raoul, yes: virt-sysprep --root-password password:12345678 -a overcloud-full.qcow2 virt-customize -a overcloud-full.qcow2 --run-command "rpm -ivh http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm" virt-customize -a overcloud-full.qcow2 --run-command "rhos-release 8-director" ( *** this sets the target node repo files, so you can choose which version/puddle to work with and this should probably be the same as the repo version you set for the undercloud)
Ok, as explained here [1] you need to use virt-customize also for changing the password or, if you still need to use for some reason virt-sysprep, you need to pass --selinux-relabel to the command. Then your context will be fine. I'm closing again this bug since... It's not a bug. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1308997