Bug 1347436 - fedora-import-state sets incorrect mode for /dev/shm when dracut places it in /run/initramfs/state (causes various things to break, inc. webkit and Boxes)
Summary: fedora-import-state sets incorrect mode for /dev/shm when dracut places it in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: initscripts
Version: 25
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
Assignee: David Kaspar // Dee'Kej
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker AcceptedFreezeException
: 1349804 (view as bug list)
Depends On:
Blocks: F25AlphaFreezeException F25BetaBlocker 1406254
TreeView+ depends on / blocked
 
Reported: 2016-06-16 20:59 UTC by Adam Williamson
Modified: 2016-12-20 06:32 UTC (History)
29 users (show)

Fixed In Version: initscripts-9.68-1.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1406254 (view as bug list)
Environment:
Last Closed: 2016-08-19 02:26:18 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
initscripts patch: explicitly set correct mode for /dev/shm after import (1.88 KB, patch)
2016-08-05 22:07 UTC, Adam Williamson
deekej: review-
Details | Diff
fedora-import-state.patch (1.75 KB, patch)
2016-08-09 13:38 UTC, David Kaspar // Dee'Kej
no flags Details | Diff

Description Adam Williamson 2016-06-16 20:59:38 UTC
This is an extremely bizarre bug but I don't think we can write it off as a glitch...

For the last two days, in both openQA production and staging, the 'Welcome' screen - the one run by /usr/libexec/gnome-welcome-tour , which triggers after initial-setup is complete - has failed to display correctly after an install of the Workstation live image to a UEFI test VM. It seems to work OK for a BIOS install.

All that shows up is an empty Yelp window - with the normal title/tool bar, the title 'Help', and a completely empty grey square as the content. See:

https://openqa.fedoraproject.org/tests/22900/modules/_graphical_wait_login/steps/11

on a BIOS install, the expected content shows up - the window title switches to 'Getting Started' (subtitle 'GNOME Help') and there's the expected page with demo videos and Common Tasks links and so on.

In the system journal for the UEFI install, I see a bunch of errors like this:

Jun 16 13:34:11 localhost.localdomain gnome-welcome-tour.desktop[1827]: Failed to create shared memory file /WK2SharedMemory.2546975387: Permission denied

there are a little over 30,000 of those. There is nothing like this in the BIOS boot. I think that message is coming from WebKit?

Example affected image: https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160616.n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-Rawhide-20160616.n.0.iso

Comment 1 Adam Williamson 2016-06-16 21:01:44 UTC
I seem to be able to trigger the same thing just by trying to open an application's help screen - I tried with Nautilus, and it did set the window title correctly ("Files, folders & search"), but did not load the actual content (still just a grey space) and another flood of "Permission denied" errors appeared in the logs.

I don't see any SELinux denials, note.

Comment 2 Adam Williamson 2016-06-16 21:08:37 UTC
aha, so yes, ultimately this seems to be a general Webkit 2 issue on UEFI: I can trigger the same thing by trying to add a Flickr online account, which causes a webkitgtk4-powered mini-browser to try and load the Yahoo! auth page. On a BIOS install, the page displays correctly. On a UEFI install, the mini-browser gets to displaying a bit of progress bar and the text 'Loading "m.flickr.com"...', but then sticks there, and in the log I see another flood of the same WK2SharedMemory errors, from org.gnome.ControlCenter.SearchProvider .

Comment 3 Adam Williamson 2016-06-16 21:13:43 UTC
nirik noted that he'd seen odd permissions on /dev/shm , and that does indeed seem to be the problem here!

On the BIOS install, /dev/shm is rwxrwxrwt . On the UEFI install, it's rwxr-xr-t . If I do 'chmod go+w /dev/shm' , the bug goes away - I can open help pages, and the Online Accounts minibrowser starts working properly.

So now I guess the question is, why are the /dev/shm perms different between the BIOS and UEFI installs (and which is correct)?

Comment 4 Adam Williamson 2016-06-16 21:45:19 UTC
so there's one other odd wrinkle to this bug: it really only happens with the *live* image. If I install from the Workstation netinst of the same day - https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160616.n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-netinst-x86_64-Rawhide-20160616.n.0.iso - the bug does not happen, /dev/shm is rwxrwxrwt and all the webkit2-based things work fine.

So the bug *only* happens with a UEFI install of the live image.

Comment 5 Adam Williamson 2016-06-16 22:22:09 UTC
OK, so a bit more data. /dev/shm seems to be systemd's responsibility; it's mounted as a tmpfs and I see code in systemd for doing that, so kicking this over one more time to systemd.

I checked that the issue occurs on bare metal as well as in a KVM. I also checked and found that the write access is missing both when running live and after install from the live session. I also checked that the issue also affects KDE live images, so this is not specific to the Workstation live. Basically, it seems write access to /dev/shm is not available when UEFI booting a Rawhide live image, but is available in all other cases (BIOS boots of live images, and network installs regardless whether BIOS or UEFI).

Comment 6 Adam Williamson 2016-06-16 22:28:44 UTC
Finally, I've also confirmed this is really new in Rawhide - it's not that /dev/shm has been like this all along and webkit changed, or anything. I booted the F24 final Workstation live in my UEFI test VM and /dev/shm is rwxrwxrwt . So this is really something new in Rawhide's systemd, I guess.

Comment 7 Zbigniew Jędrzejewski-Szmek 2016-06-16 23:44:27 UTC
I'd guess that this is another instance of some service changing the permissions. I'm pretty sure that systemd initially sets them correctly (there was a small change in this area, I'll double check the patch, but I don't think it could be related).

Comment 8 Josh Boyer 2016-06-24 14:16:36 UTC
*** Bug 1349804 has been marked as a duplicate of this bug. ***

Comment 9 Laura Abbott 2016-06-24 16:14:32 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1282981 this was the previous instance of the same problem I was thinking of

Comment 10 Adam Williamson 2016-06-24 20:25:02 UTC
Looking at Laura's bug, fedora-import-state could be involved here, I guess. It definitely runs and does something to /dev/shm:

fedora-import-state[938]: './dev/shm/lldpad.state' -> '/dev/shm/lldpad.state'

fedora-import-state is part of initscripts, so CCing Lukas in case he has any ideas here. initscripts 9.66 is in Rawhide but not F24, so we do have a delta there which matches the bug.

Comment 11 Adam Williamson 2016-06-24 21:07:16 UTC
So yeah, fedora-import-state seems to be the culprit here - `systemctl mask fedora-import-state.service` and rebooting gives me a properly-permissioned /dev/shm.

Downgrading initscripts to the F24 build doesn't help, though. Nor does booting the most recent F24 kernel (4.5.7-300.fc24) instead of a Rawhide kernel. So having trouble triaging this...

Comment 12 Adam Williamson 2016-06-24 21:50:39 UTC
Oh, OK. So what fedora-import-state actually *does* is important here. It's a pretty simple script. This is the relevant bit:

# copy state into root
cd /run/initramfs/state
find . -mindepth 1 -maxdepth 1 -exec cp -av -t / {} \;

so, obviously, its impact is dependent on what's in /run/initramfs/state . And that's our difference here: when you boot an F24 live, /run/initramfs/state contains etc/ and var/ , but no dev/ .

Interestingly, while checking this, I noticed the bug does actually affect BIOS boots *when running live* - I guess I didn't check that case before (I'll boot a few more times in case it's intermittent or something). When just booting the live on BIOS, there is a dev/ in /run/initramfs/state , and the permissions on /dev are wrong. But after installing and booting the installed system, there is no dev/ in /run/initramfs/state , and /dev permissions are right.

So yeah, it's definitely the rsync of /dev/shm/lldpad.state by fedora-import-state that causes this.

The difference between BIOS and UEFI cases with the live image appears to be what's in the initramfs. I ran `lsinitrd /boot/initramfs-4.7.0-0.rc3.git1.1.fc25.x86_64.img | grep lldpad` on both a BIOS install and a UEFI install of the 2016-06-16 live image. On the BIOS install, there are no results. On the UEFI install, there are:

usr/lib/dracut/hooks/pre-trigger/03-lldpad.sh
usr/sbin/lldpad
var/lib/lldpad

so that's obviously the difference here: for some reason, UEFI live installs and BIOS live installs have this different in their initramfs.

Aha, and now I look, I *do* see a suspicious difference in dracut here: dracut contains a '95fcoe-uefi' module (note the name) which seems to test whether it's running on UEFI:

check() {
    [[ $hostonly ]] || [[ $mount_needs ]] && {
        [ -d /sys/firmware/efi ] || return 255
    }
    require_binaries dcbtool fipvlan lldpad ip readlink || return 1
    return 0
}

so that could certainly explain the difference: note the 'require_binaries' line, which would result in the inclusion of lldpad binaries in the initramfs when booted under UEFI, but not under BIOS.

Not sure what's different between 24 and 25 yet, though, nor why this doesn't happen on network installs. More investigation to come.

Comment 13 Adam Williamson 2016-06-24 21:55:36 UTC
Aha. I think I see what changed between F24 and F25:

https://git.kernel.org/cgit/boot/dracut/dracut.git/commit?id=b99e72427b517dea0d91d15fe43cf0a37420af36

note that *reverts* a commit that stopped dracut putting lldpad.state in /run/initramfs/state/dev/shm/ - so in other words, it made dracut start doing that again. And also note that it's this code that stuffs up the permissions, because it does:

mkdir -m 0755 -p /run/initramfs/state/dev/shm

so, I think we can see how to fix that. :)

Comment 14 Adam Williamson 2016-06-24 22:08:03 UTC
https://github.com/dracutdevs/dracut/pull/138

Comment 15 Adam Williamson 2016-06-24 22:17:28 UTC
There's probably a second bug here, btw - I don't think dracut should be including the fcoe-uefi module in a hostonly initramfs for a UEFI system either, but it clearly *does*, and I'm not sure why it does.

If you compare the journal.log from the BIOS and UEFI installs, the BIOS install one clearly shows that the 'fcoe' and 'fcoe-uefi' dracut modules are included in the rescue initramfs (which is generic, not hostonly), but aren't included in the 'normal' initramfs (which is hostonly). The UEFI install log shows the modules being included in both initramfs'es. I don't think it's as simple as the UEFI initramfs not being properly hostonly, though, as some modules definitely *are* left out.

Comment 16 Adam Williamson 2016-06-24 22:32:00 UTC
So for that second bug - I'm not 100% confident, but if I understand dracut correctly, I think the problem is this:

each dracut module has a check() section which is used to indicate whether the module can and/or should be included in the initramfs being built. the check() function for the fcoe module does this:

check() {
    [[ $hostonly ]] || [[ $mount_needs ]] && {
        for c in /sys/bus/fcoe/devices/ctlr_* ; do
            [ -L $c ] || continue
            fcoe_ctlr=$c
        done
        [ -z "$fcoe_ctlr" ] && return 255
    }

    require_binaries dcbtool fipvlan lldpad ip readlink fcoemon fcoeadm || return 1
    return 0
}

So I think the first part there is basically a 'is this module needed' check: if we're in hostonly mode (or 'mount_needs', dunno what that is), let's check and see if there are actually any FCoE devices, and if not, return 255, indicating 'module isn't needed'.

The second part means 'make sure we can include these binaries in the initramfs, otherwise return 1 indicating the module should be included, but cannot be because some required binaries are missing'.

However, we have this 95fcoe-uefi module, too. How does setup() for that module look?

check() {
    [[ $hostonly ]] || [[ $mount_needs ]] && {
        [ -d /sys/firmware/efi ] || return 255
    }
    require_binaries dcbtool fipvlan lldpad ip readlink || return 1
    return 0
}

note it doesn't, AFAICS, include the check from the fcoe module. It just checks whether this is a UEFI boot, and wants to be enabled if it is. It also has this:

depends() {
    echo fcoe uefi-lib
    return 0
}

which obviously indicates a requirement for the 'fcoe' module. So I think what happens when you build a 'hostonly' initramfs on UEFI is that the fcoe module check() returns 255 (meaning 'nope, I don't need to be enabled'), but the fcoe-uefi module check() returns 0 (meaning 'yes, I do want to be enabled!') and drags fcoe along via dependencies (presumably a dependency beats out a 255 check() return code).

So you wind up with the fcoe module *always* included in initramfs'es built on UEFI systems, even though there are no FCoE mounts.

I dunno what the right way to fix this is - just duplicate the fcoe module's 'are there any FCoE mounts?' check into fcoe-uefi, or if there's a better way.

Comment 17 Adam Williamson 2016-06-24 22:57:19 UTC
To fill in the final question - why it only happens on lives - I think it's probably because some of the required binaries for the fcoe module are missing on a network install (when there's no FCoE devices) because anaconda knows not to bother installing those packages in that case, so dracut will leave the module out because the binary requirements can't be satisfied. Whereas live images have to include the required binaries, and installs from live images always include all the stuff that's in the live image.

Comment 18 Adam Williamson 2016-06-24 23:15:27 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=14639525 is a scratch build of dracut with my patch (for the permissions) applied. I tested and it does indeed appear to resolve the issue.

Comment 19 Adam Williamson 2016-06-25 14:22:54 UTC
harald says that world-writeable in /run/initramfs is a security hole (though he doesn't say why) and rejected my patch, saying it needs fixing in fedora-import-state. fine, whatever, someone please just fix this crap.

Comment 20 Jan Kurik 2016-07-26 04:36:22 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 21 Matthias Clasen 2016-07-26 16:03:56 UTC
Can we get this fixed ? It breaks not just webkit, but boxes, and lots of other things.

Comment 22 Fedora Blocker Bugs Application 2016-08-01 12:46:22 UTC
Proposed as a Blocker for 25-beta by Fedora user mclasen using the blocker tracking app because:

 This breaks the functionality of several applications, some of which (boxes) are installed by default.

Comment 23 Adam Williamson 2016-08-05 22:07:00 UTC
Created attachment 1188049 [details]
initscripts patch: explicitly set correct mode for /dev/shm after import

So here's my new proposal. This is a fairly simple-minded patch for initscripts that just explicitly resets the mode of /dev/shm after doing the import, if /run/initramfs/state/dev/shm exists. There's one obvious drawback to this (if we ever wanted to change the default mode of /dev/shm , we'd have to adjust both systemd *and* initscripts), but it should at least fix the bug, I can't think of anything better, and no-one else has suggested anything better.

I've tested, and this does solve the problem. Lukas, can you please review this? If I don't hear from Lukas in the next day or two I think I will just send out an initscripts build with this patch applied, since we really ought to fix this bug.

Comment 24 Kevin Kofler 2016-08-07 11:45:58 UTC
This bug also breaks QtWebEngine (along with bug #1363914 and bug #1364781 – 3 unrelated bugs hitting 1 component).

Comment 25 David Kaspar // Dee'Kej 2016-08-08 16:31:38 UTC
Hello guys,

I've been assigned to this BZ today. We have almost finished the patch for this, just doing some additional reviews and testing it. We will post more info tomorrow.

Best regards,

Dee'Kej

Comment 26 Geoffrey Marr 2016-08-08 17:35:45 UTC
Discussed during the 2016-08-08 blocker review meeting: [1]

This bug has been classifed as an Accepted Blocker as it violates the following criteria:

"All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test." [2]

The consequences of this bug are obvious and serious and so the classification is also made as an AcceptedFreezeException.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-08/f25-blocker-review.2016-08-08-16.01.txt

[2] https://fedoraproject.org/wiki/Fedora_25_Final_Release_Criteria#Default_application_functionality

Comment 27 David Kaspar // Dee'Kej 2016-08-09 13:35:06 UTC
Comment on attachment 1188049 [details]
initscripts patch: explicitly set correct mode for /dev/shm after import

We have decided to not use this patch, because it was too specific and probably would require some additional maintenance in the future.

Comment 28 David Kaspar // Dee'Kej 2016-08-09 13:38:31 UTC
Created attachment 1189268 [details]
fedora-import-state.patch

This is the patch we will be using for new release for Rawhide and F25. It copies the files as it did before, but does not copy folders if they already exists. In this way, the attributes of already existing folders are not overwritten.

In case the destination folder does not exist, it is created normally.

Comment 29 Fedora Update System 2016-08-09 14:15:04 UTC
initscripts-9.68-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-17b522dda8

Comment 30 Adam Williamson 2016-08-09 15:18:24 UTC
Looks to me like if more than one depth of directory needs to be created, the mode for the lower levels won't be preserved. e.g. say initramfs has:

/run/initramfs/state/foo (0400)
/run/initramfs/state/foo/bar (0400)

and /foo does not exist on the installed system, the script will create /foo/bar with mode 0400, but I don't believe it'll set the mode of /foo to 0400.

There's also obviously a gap between 'mkdir' and 'chmod' / 'chown' during which even the top level directory will have default ownership and permissions...

Comment 31 Adam Williamson 2016-08-09 15:20:53 UTC
oh wait no, didn't read closely enough, it should work. sorry.

Comment 32 Adam Williamson 2016-08-09 20:37:54 UTC
OK, tested and this does fix the main /dev/shm case.

Comment 33 Adam Williamson 2016-08-09 21:22:26 UTC
geoff: when you're secretarializing, take care to make sure bugs actually block the correct trackers, as well as setting the whiteboard fields. This was not formally proposed as a freeze exception issue prior to the meeting, we decided to give it that status during the meeting; in this case you have to set it to block AlphaFreezeException as well as setting the whiteboard.

Comment 34 Fedora Update System 2016-08-10 02:50:16 UTC
initscripts-9.68-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-17b522dda8

Comment 35 David Kaspar // Dee'Kej 2016-08-10 11:26:24 UTC
(In reply to Adam Williamson from comment #30)
> Looks to me like if more than one depth of directory needs to be created,
> the mode for the lower levels won't be preserved. e.g. say initramfs has:
> 
> /run/initramfs/state/foo (0400)
> /run/initramfs/state/foo/bar (0400)
> 
> and /foo does not exist on the installed system, the script will create
> /foo/bar with mode 0400, but I don't believe it'll set the mode of /foo to
> 0400.
> 
> There's also obviously a gap between 'mkdir' and 'chmod' / 'chown' during
> which even the top level directory will have default ownership and
> permissions...

I see what you mean, Adam. We have been discussing this before as well. However, AFAIK, find should always print the top-level directory before the lower-level directory. Therefore, the top-level directory should be always created first (and the rights should be set correctly). And only after that, the files from original directory are copied to the destination directory.

In case we can't trust the find about how it prints its results, we can sort it with the 'sort' command if needed. The missing character or white spaces should always have less ordering value compared to any other printable characters. IOW, the shorter string (substring) will be always sorted before longer (complete) string.

If you insist, I can add the sort option there, just to be sure.

Comment 36 Adam Williamson 2016-08-10 15:40:17 UTC
yeah, as I said in my follow-up it should be fine, I wasn't considering that find walked all the way through the directory tree. I don't know if the window between creation and mode change may be a problem in any way.

Comment 37 David Kaspar // Dee'Kej 2016-08-10 15:52:03 UTC
(In reply to Adam Williamson from comment #36)
> I don't know if the window between creation and mode change may be a problem in any way.

Is there a chance that some other processes will be accessing those files before fedora-import-state is finished? If not, then we should be fine, since that bash script should be running as a single-thread, and copying is being done always after folder creation. 

The problem would occur in case the folder creation would fail somehow, but I really don't know what should be a proper way trying to recover from it.

Comment 38 Fedora Update System 2016-08-19 02:26:09 UTC
initscripts-9.68-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.