Bug 2025651 - startx fails to start X with the latest glibc in rawhide, xfree86 call to iopl fails. 20211121
Summary: startx fails to start X with the latest glibc in rawhide, xfree86 call to iop...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-22 17:03 UTC by stan
Modified: 2021-12-23 18:04 UTC (History)
31 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-12-23 18:04:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
These are the packagse that were updated just before the error happened. (25.17 KB, text/plain)
2021-11-22 17:05 UTC, stan
no flags Details
This is the complete xorg log file that ended with the backtrace from a segmentation fault. (15.79 KB, text/plain)
2021-11-25 17:09 UTC, stan
no flags Details

Description stan 2021-11-22 17:03:46 UTC
Description of problem:  After the updates yesterday in rawhide, I cannot use startx to start an LXDE session.  So, I update from multiuser using dnf.  There had been no updates to rawhide for many days, and 20211121 saw 271 updates, 775 MB.  After the update completed, I used the script that I have used for many years to start LXDE.  It failed.
The command that is causing the lockup is
startx -- vt10
with
exec startlxde 
in ~/.Xclients

The error is 
xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)


Version-Release number of selected component (if applicable):
xorg-x11-server 1.20.11-2.fc35
I have since downloaded the src.rpm and rebuilt and re-installed the package on rawhide without any change.

How reproducible:
every time


Steps to Reproduce:
1. Update rawhide to latest available packages
2. Try to start X with the above command
3. Error.

I tried starting Gnome and KDE with startx, and both failed also.

Actual results:
*** Replacing existing ~/.Xclients or creating new one.
Starting X with LXDE...
xauth:  file /home/stan/.serverauth.3347 does not exist
xauth: (stdin):2:  unknown command "0a4862ba6b1a93c3728819b71229a730"


X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
Build Operating System:  5.12.13-300.fc34.x86_64
Current Operating System: Linux fedora 5.15.0-60.20211107.fc36.x86_64 #1 SMP PREEMPT Sun Nov 7 08:47:47 MST 2021 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60.20211107.fc36.x86_64 root=/dev/sdb10
Build Date: 23 July 2021  12:00:00AM
Build ID: xorg-x11-server 1.20.11-2.fc35
Current version of pixman: 0.40.0
        Before reporting problems, check http://wiki.x.org
                to make sure that you have the latest version.
                Markers: (--) probed, (**) from config file, (==) default setting,
                        (++) from command line, (!!) notice, (II) informational,
                                (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
                                (==) Log file: "/home/stan/.local/share/xorg/Xorg.0.log", Time: Sun Nov 21 10:52:24 2021
                                (==) Using config directory: "/etc/X11/xorg.conf.d"
                                (==) Using system config directory "/usr/share/X11/xorg.conf.d"
                                xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)(II) [KMS] Kernel modesetting enabled.xinit: giving up
                                                                                                                                                             xinit: unable to connect to X server: Connection refused
                                                                                                                                                                                                                                                                                                                                                  xinit:
                                                                                                                                                                                                                                                                                                                                                  server
                                                                                                                                                                                                                                                                                                                                                  error

Expected results:
X starts

Additional info:

From hw/xfree86/os-support/linux/lnx_video.c

<code>

static Bool
hwEnableIO(void)
{
    if (ioperm(0, 1024, 1) || iopl(3)) {
        ErrorF("xf86EnableIOPorts: failed to set IOPL for I/O (%s)\n",
               strerror(errno));
        return FALSE;
    }
#if !defined(__alpha__)
    /* XXX: this is actually not trapping anything because of iopl(3)
     * above */
    ioperm(0x40, 4, 0);         /* trap access to the timer chip */
    ioperm(0x60, 4, 0);         /* trap access to the keyboard
controller */ #endif

    return TRUE;
}

</code>

The iopl call is from <sys/io.h> in glibc.  So is the ioperm call.
Since glibc was updated just before the problem, it is a good place to
start.  It might be that something else has disallowed permissions for
this call.  The man page says it is 

iopl() changes the I/O privilege level of the calling process, as
specified by the two least significant bits in level.

This call is necessary to allow 8514-compatible X servers to run
under Linux.  Since these X servers require access to all 65536
I/O ports, the ioperm(2) call is not sufficient.

Comment 1 stan 2021-11-22 17:05:44 UTC
Created attachment 1843051 [details]
These are the packagse that were updated just before the error happened.

I don't see many possible culprits other than glibc for the problem in this list of updates.

Comment 2 stan 2021-11-22 17:10:21 UTC
Comment on attachment 1843051 [details]
These are the packagse that were updated just before the error happened.

It isn't SELinux.  I switched to permissive (setenforce 0), and the problem was still there.

Comment 3 stan 2021-11-22 18:24:23 UTC
I downgraded to the previous version of glibc in koji.  Problem persists.  The journal showed the same output for both versions of glibc.

audit[3929]: ANOM_ABEND auid=9999 uid=9999 gid=9999 ses=17 subj=unconfined_u:unconfined_r:xserver_t:s0-s0:c0.c1023 pid=3929 comm="Xorg" exe="/usr/libexec/Xorg" sig=11 res=1
systemd-coredump[3942]: Resource limits disable core dumping for process 3929 (Xorg).
fedora systemd-coredump[3942]: [�] Process 3929 (Xorg) of user 9999 dumped core.
fedora systemd[1]: systemd-coredump: Deactivated successfully.

Comment 4 stan 2021-11-22 18:25:16 UTC
I downgraded to the previous version of glibc in koji.  Problem persists.  The journal showed the same output for both versions of glibc.

audit[3929]: ANOM_ABEND auid=9999 uid=9999 gid=9999 ses=17 subj=unconfined_u:unconfined_r:xserver_t:s0-s0:c0.c1023 pid=3929 comm="Xorg" exe="/usr/libexec/Xorg" sig=11 res=1
systemd-coredump[3942]: Resource limits disable core dumping for process 3929 (Xorg).
fedora systemd-coredump[3942]: [�] Process 3929 (Xorg) of user 9999 dumped core.
fedora systemd[1]: systemd-coredump: Deactivated successfully.

Comment 5 Samuel Sieb 2021-11-23 07:09:43 UTC
Did you try an earlier kernel version?

Comment 6 Carlos O'Donell 2021-11-23 14:51:20 UTC
The glibc implementation ioperm() is a thin assembly wrapper around the actual syscall and so is calling directly into the kernel.

I'm reassigning to the kernel to let them comment. This certainly looks like a secure boot or security related issue.

I don't think that this is a glibc bug, but I'll stay on the CC to help out if you have any questions.

Comment 7 stan 2021-11-23 15:13:07 UTC
@5

Yes, the original try was with a 5.15 kernel that had been working fine until then.  It also failed with 5.16 rc1.  Both of these are custom compiled locally to fit the local system hardware.  You've given me an idea, though.  When I downgraded glibc and then rebooted, I didn't try with the 5.15 kernel, only the 5.16 kernel.  That might be worth a try.  Especially in light of the comment from Carlos that this glibc call is only a wrapper around a kernel call.  And, it also means that I might be able patch the kernel to work around this issue; not a big deal since I'm already compiling locally.  I did examine the kernel changelog and didn't see anything that might have caused this.  Nor have I seen anything in the kernel list, though I confess that I mostly just skim all the ark changes.

Comment 8 stan 2021-11-23 15:23:17 UTC
Thanks for peeling one more layer off the onion.  I thought about grabbing the glibc src.rpm and unpacking it in order to look at the iopl and ioperm functions, but hadn't done it yet.  This saves me the effort, I can just look in the kernel source.  I agree that this seems like a policy decision to enhance security.  And I agree, this is a security issue.  But, I'm on a single user system with no internet facing services being accessed.  I think the only threats are from  the browser and my dnf updates.

Comment 9 Samuel Sieb 2021-11-23 18:39:38 UTC
If you're compiling your own kernels, then double check the protection options.  I know the ioperm functions have more restrictions on them now.  The idea is to not be using them at all.  The X server shouldn't need that, which is why I was wondering which X driver you're using.

Comment 10 stan 2021-11-23 19:57:32 UTC
The config file for the kernel shows CONFIG_X86_IOPL_IOPERM=y, and when I look in the kernel source, that seems to be the only required setting for iopl to work.  The kernel seems to explicitly accept that this should be allowed for legacy reasons.  Another dead end.  If glibc has no restrictions, and the kernel has no restrictions, what is stopping X from getting the io permissions it requests?  Something is causing that error message, (Operation not permitted).

Maybe I'm not interpreting what you mean by protection options correctly.  Are there other settings than CONFIG_X86_IOPL_IOPERM=y needed?  The 5.15 kernel with essentially the same options as the 5.16 kernel was working just fine until the update occurred.  The main difference was the addition of CONFIG_SCHED_CLUSTER=Y to the 5.16 kernel, but that should have nothing to do with this.

The fact that running the 5.15 kernel doesn't allow X to start after the update, whereas it did before, leads me to believe that Carlos is correct, this is a security setting being tightened.  It seems pretty disruptive that it happened without any announcement or discussion on devel or test lists.  Which brings me to conclude that it was inadvertent, a side effect of something else.

Comment 11 Samuel Sieb 2021-11-24 01:07:58 UTC
Are you using secure boot?  Neither one of those functions work in that case, even for root.  If not, then is the X server starting as root?  Direct IO access like that is very deprecated and is being removed.  What is your X config that it's even trying something like that?

Here's a program you can use to test:

#include <sys/io.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int main() {
  int err = ioperm(0, 1024, 1);
  printf("ioperm err %d, %s\n", err, strerror(errno));
  err = iopl(3);
  printf("iopl err %d, %s\n", err, strerror(errno));
}

For me, as a user or root (secure boot enabled), I get:
ioperm err -1, Operation not permitted
iopl err -1, Operation not permitted

Comment 12 stan 2021-11-24 03:21:10 UTC
I'm not sure whether I have secure boot enabled on this system.  It can boot in UEFI secure boot, and while it was my primary system, I did boot it in secure mode.  However, because of the way secure boot works on this hardware, it is difficult to have a second Fedora system boot in secure boot from the same EFI partition.  So, I normally boot without secure boot, and then select the rawhide partition to boot using the menu from this system.  Clumsy, but as long as there is an EFI partition, necessary.  I've thought about using systemd_boot to get around this, but I've been too lazy to set it up.

My results mimic yours.
$ ./test_iopl 
ioperm err -1, Operation not permitted
iopl err -1, Operation not permitted

However, this system *does* start X without complaint, by me as a user.  I am typing this on that system.

When I run that program as root,
# ./test_iopl 
ioperm err 0, Success
iopl err 0, Success

I see that there is a wrapper program for Xorg that is setuid root, so that is probably why it works on this system.
# ls -nZ /usr/libexec/Xorg*
-rwxr-xr-x. 1 0 0 system_u:object_r:bin_t:s0          3552088 Nov 25  2019 /usr/libexec/Xorg
-rwsr-xr-x. 1 0 0 system_u:object_r:xserver_exec_t:s0   19088 Nov 25  2019 /usr/libexec/Xorg.wrap

I haven't run your test program there yet, but it probably fails for user just like this system.  After I post this, I'll reboot that system and check both user and root.  I expect that it probably won't be successful for root now, because it has the same setuid Xorg.wrap as this system, so if root was successful, it would start X.  We'll see.
# ls -nZ /usr/libexec/Xorg*
-rwxr-xr-x. 1 0 0 system_u:object_r:bin_t:s0          2485136 Nov 21 13:51 /usr/libexec/Xorg
-rwsr-xr-x. 1 0 0 system_u:object_r:xserver_exec_t:s0   16352 Nov 21 13:51 /usr/libexec/Xorg.wrap

Comment 13 stan 2021-11-24 17:31:22 UTC
It turns out that the test_iopl program has the same results on rawhide as on the working system.
# ./test_iopl 
ioperm err 0, Success
iopl err 0, Success

The defaults for the Xorg.wrap program are to only allow console users to start X, and to determine if root rights are needed to run.  A config file Xwrapper.config can be installed in /etc/X11 to set these (see man Xorg.wrap), and I did so.  I told Xorg.wrap to allow anybody to start X, and to use root rights.  This got rid of the no permissions error, but X still would not start.  The Xorg.wrap program has not changed since version 1.18 (we are at 1.20 now), so I don't see how this is an X issue.  Something else is preventing it from functioning as normal.  The error in journalctl, shows that the program is still being run with the user UID instead of SUID, and so failing.  Something in the environment has changed.  That is happening in this section of code, after the config file is read, and root use is mandated.

Entering this code, with the config file, and initializations, needs_root_rights == 1 and total_cards == 0 and kms_cards == 0.  I read the logic as saying that the program uid is set to the user uid in the second if logic.  Always, because the first if logic is skipped, and thus the total_cards and kms_cards are still zero, so equal.  I must be missing something, but I don't see how this ever worked.  That is, I don't see how Xorg is ever started with root rights.  And yet it did work.  =><=

<code>
#ifdef WITH_LIBDRM
    /* Detect if we need root rights, except when overriden by the config */
    if (needs_root_rights == -1) {
        for (i = 0; i < 16; i++) {
            snprintf(buf, sizeof(buf), DRM_DEV_NAME, DRM_DIR_NAME, i);
            fd = open(buf, O_RDWR);
            if (fd == -1)
                continue;

            total_cards++;

            memset(&res, 0, sizeof(struct drm_mode_card_res));
            r = ioctl(fd, DRM_IOCTL_MODE_GETRESOURCES, &res);
            if (r == 0)
                kms_cards++;

            close(fd);
        }
    }
#endif

    /* If we've found cards, and all cards support kms, drop root rights */
    if (needs_root_rights == 0 || (total_cards && kms_cards == total_cards)) {
        gid_t realgid = getgid();
        uid_t realuid = getuid();

        if (setresgid(-1, realgid, realgid) != 0) {
            fprintf(stderr, "%s: Could not drop setgid privileges: %s\n",
                progname, strerror(errno));
            exit(1);
        }
        if (setresuid(-1, realuid, realuid) != 0) {
            fprintf(stderr, "%s: Could not drop setuid privileges: %s\n",
                progname, strerror(errno));
            exit(1);
        }
    }

    snprintf(buf, sizeof(buf), "%s/Xorg", SUID_WRAPPER_DIR);

    /* Check if the server is executable by our real uid */
    if (access(buf, X_OK) != 0) {
        fprintf(stderr, "%s: Missing execute permissions for %s: %s\n",
            progname, buf, strerror(errno));
        exit(1);
    }

    argv[0] = buf;
    if (getuid() == geteuid())
        (void) execv(argv[0], argv);
    else
        (void) execve(argv[0], argv, empty_envp);
    fprintf(stderr, "%s: Failed to execute %s: %s\n",
        progname, buf, strerror(errno));
    exit(1);
}
</code>

Comment 14 stan 2021-11-24 17:46:15 UTC
Reading it in the comment, I see that the second if login is also skipped, because total_cards will be zero, so anded with kms_cards == total_cards, even if it is true, will not be true.  So it is only the last few lines that are changing the uid back to being the user id instead of the suid.

Comment 15 stan 2021-11-25 17:07:25 UTC
Developments:

It turns out that the problem isn't startx (Xorg.wrap) from xorg-wrapper.c, with a small difference (explained below).  It is actually starting Xorg, it is Xorg that is failing with a segmentation fault.  I reset the target in the rawhide system to graphical.target and rebooted.  It hung the system, but I finally got a backtrace in the /var/log/xorg.log* files.  I'll attach that log to this bugzilla.  Here is the backtrace from graphical.target with a little context.

[   201.612] (II) Module glamoregl: vendor="X.Org Foundation"
[   201.612] 	compiled for 1.20.11, module version = 1.0.1
[   201.612] 	ABI class: X.Org ANSI C Emulation, version 0.4                <----------  this is where every startx xorg log ended.
[   201.617] (EE) 
[   201.617] (EE) Backtrace:
[   201.686] (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x139) [0x5595fb4cff09]
[   201.687] (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7fa9c6de2740]
[   201.687] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   201.687] (EE) 2: /lib64/libEGL_mesa.so.0 (?+0x0) [0x7fa9bc05314e]
[   201.688] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   201.688] (EE) 3: /lib64/libEGL_mesa.so.0 (?+0x0) [0x7fa9bc04b438]
[   201.688] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   201.688] (EE) 4: /lib64/libEGL_mesa.so.0 (?+0x0) [0x7fa9bc0437fc]
[   201.688] (EE) 5: /usr/lib64/xorg/modules/libglamoregl.so (glamor_egl_init+0xa6) [0x7fa9bc0e17c6]
[   201.689] (EE) unw_get_proc_name failed: no unwind info found [-10]
[   201.689] (EE) 6: /usr/lib64/xorg/modules/drivers/radeon_drv.so (?+0x0) [0x7fa9c6195ebf]
[   201.689] (EE) 7: /usr/libexec/Xorg (InitOutput+0x1153) [0x5595fb3b92b3]
[   201.690] (EE) 8: /usr/libexec/Xorg (miPutImage+0xb59) [0x5595fb35f8da]
[   201.690] (EE) 9: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7fa9c6dcd590]
[   201.690] (EE) 10: /lib64/libc.so.6 (__libc_start_main+0x7c) [0x7fa9c6dcd63c]
[   201.691] (EE) 11: /usr/libexec/Xorg (_start+0x25) [0x5595fb360695]
[   201.691] (EE) 
[   201.691] (EE) Segmentation fault at address 0x59
[   201.691] (EE) 
Fatal server error:
[   201.691] (EE) Caught signal 11 (Segmentation fault). Server aborting


Here is a successful startx of the same part of the log for illustration.  The successes from rawhide have rolled off, I didn't think of this in time.

[   479.577] (II) Module glamoregl: vendor="X.Org Foundation"
[   479.577] 	compiled for 1.20.6, module version = 1.0.1
[   479.577] 	ABI class: X.Org ANSI C Emulation, version 0.4                     <--------------  where the startx logs are failing
[   479.790] (II) RADEON(0): glamor X acceleration enabled on AMD OLAND (DRM 2.50.0, 5.10.23-100.20210410.fc31.x86_64, LLVM 9.0.1)   <---- successfully continues
[   479.790] (II) RADEON(0): glamor detected, initialising EGL layer.
[   479.790] (II) RADEON(0): KMS Color Tiling: enabled
[   479.790] (II) RADEON(0): KMS Color Tiling 2D: enabled
[   479.790] (==) RADEON(0): TearFree property default: auto
[   479.790] (II) RADEON(0): KMS Pageflipping: enabled
[   479.814] (II) RADEON(0): Output DisplayPort-0 has no monitor section
[   479.847] (II) RADEON(0): Output DVI-0 has no monitor section
[   479.871] (II) RADEON(0): EDID for output DisplayPort-0
[   479.904] (II) RADEON(0): EDID for output DVI-0
[

There is a slight difference between a startx from multi-user and a graphical.target start of Xorg.  It shows up in the user and group of the log files.

-rw-r--r--. 1   0     0      14402 Nov 24 16:41 Xorg.0.log
-rw-r--r--. 1   0     0      16174 Nov 24 16:41 Xorg.0.log.old
-rw-r--r--. 1   0  9999      14352 Nov 24 12:27 Xorg.1.log
-rw-r--r--. 1   0  9999      14304 Nov 23 20:52 Xorg.1.log.old

The graphical start of Xorg has both uid and gid as root, while the startx start of Xorg from multi-user has only uid as root and the gid is from the person running the startx.  I don't know if this is the reason that the backtrace succeeded in the graphical start, but it is a suggestive coincidence.

If I'm reading the backtrace correctly (not a given), it seems to be libglamoregl.so that is causing the problem.  But that doesn't seem to have been updated as part of the updates that caused the problem.  Maybe it has a dependency on something that was updated, needed to be rebuilt, and wasn't?

Comment 16 stan 2021-11-25 17:09:47 UTC
Created attachment 1843614 [details]
This is the complete xorg log file that ended with the backtrace from a segmentation fault.

Attached the complete xorg log from the graphical.target segmentation fault.

Comment 17 stan 2021-12-23 18:04:14 UTC
It seems that there was a package conflict preventing full updates.  I removed some blocking packages, got a whole slug of updates, and X is no working again.  Closing.


Note You need to log in before you can comment on or make changes to this bug.