Bug 1370136 - glibc update corrupts display of a running system
Summary: glibc update corrupts display of a running system
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
: 1362711 (view as bug list)
Depends On:
Blocks: F25FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2016-08-25 12:00 UTC by Kamil Páral
Modified: 2016-09-13 12:09 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-13 12:09:49 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
screen corruption photo (357.19 KB, image/jpeg)
2016-08-25 12:00 UTC, Kamil Páral
no flags Details
VT cursor visible in graphics (VM) (502.31 KB, image/png)
2016-08-25 12:51 UTC, Kamil Páral
no flags Details
VT cursor moving in graphics (VM) (392.18 KB, image/png)
2016-08-25 12:52 UTC, Kamil Páral
no flags Details
KDE corruption in VM (1004.49 KB, application/octet-stream)
2016-08-25 13:31 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2016-08-25 12:00:30 UTC
Created attachment 1193995 [details]
screen corruption photo

Description of problem:
On bare metal, I've seen heavy screen corruption several times during dnf update. I narrowed the problem down to glibc. When glibc is updated, the display gets corrupted and becomes completely unreadable (see attached photo). The corruption is not static, it "moves" when something on the display is updated. The corruption can usually be fixed by switching to a different VT and back, but I also experienced a case where this did not help and I had to blindly reboot the computer.

Petr Schindler also reproduced this on a different computer (both have an AMD graphics card, not sure if it is related).

The corruption happens immediately when glibc package is updated. The system journal is not much helpful:

Aug 25 13:44:16 dhcp-28-134.brq.redhat.com systemd[1]: Reexecuting.
Aug 25 13:44:16 dhcp-28-134.brq.redhat.com systemd[1]: systemd 231 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Aug 25 13:44:16 dhcp-28-134.brq.redhat.com systemd[1]: Detected architecture x86-64.
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com systemd[1]: Received SIGHUP from PID 1 (systemd).
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com systemd[1]: Reloading.
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com systemd[1]: Stopping Command Scheduler...
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com crond[1006]: (CRON) INFO (Shutting down)
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com systemd[1]: Stopped Command Scheduler.
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=crond comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=crond comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com systemd[1]: Started Command Scheduler.
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=crond comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com crond[2212]: (CRON) INFO (Syslog will be used instead of sendmail.)
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com crond[2212]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 6% if used.)
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com crond[2212]: (CRON) INFO (running with inotify support)
Aug 25 13:44:17 dhcp-28-134.brq.redhat.com crond[2212]: (CRON) INFO (@reboot jobs will be run at computer's startup.)


Should systemd be reloading itself? Is this a bug in systemd?

This happens when you update from
glibc-2.24-1.fc25.x86_64
to
glibc-2.24-3.fc25.x86_64


Version-Release number of selected component (if applicable):
glibc-2.24-3.fc25.x86_64
glibc-all-langpacks-2.24-3.fc25.x86_64
glibc-common-2.24-3.fc25.x86_64
glibc-devel-2.24-3.fc25.x86_64
glibc-headers-2.24-3.fc25.x86_64
glibc-langpack-en-2.24-3.fc25.x86_64
kernel-4.8.0-0.rc2.git3.1.fc25.x86_64
libwayland-client-1.11.91-1.fc25.x86_64
libwayland-cursor-1.11.91-1.fc25.x86_64
libwayland-server-1.11.91-1.fc25.x86_64
mesa-libwayland-egl-12.0.1-2.fc25.x86_64
mesa-dri-drivers-12.0.1-2.fc25.x86_64
systemd-231-3.fc25.x86_64
xorg-x11-drv-ati-7.7.0-1.20160518git1181b9c.fc25.x86_64
xorg-x11-server-common-1.18.4-2.fc25.x86_64
xorg-x11-server-utils-7.7-19.fc24.x86_64
xorg-x11-server-Xorg-1.18.4-2.fc25.x86_64
xorg-x11-server-Xwayland-1.18.4-2.fc25.x86_64

Fedora-Workstation-Live-x86_64-25_Alpha-2.iso

How reproducible:
always, at least on my computer

Steps to Reproduce:
1. install Fedora-Workstation-Live-x86_64-25_Alpha-2.iso (F25 Alpha 1.2)
2. run sudo dnf update glibc*
3. see corruption during update
4. switch to VT to fix it or ssh in, downgrade glibc back
5. clean boot the system
6. go back to 2, reproduce it again

Comment 1 Florian Weimer 2016-08-25 12:06:23 UTC
Does the corruption only happen during the glibc update, or does it persist after the update?

Comment 2 Kamil Páral 2016-08-25 12:09:05 UTC
(In reply to Florian Weimer from comment #1)
> Does the corruption only happen during the glibc update, or does it persist
> after the update?

Just during the update (until you reboot or maybe fix it by switching VTs). After reboot, everything is fine.

I reproduced this on X11 as well, so this is not a wayland problem. Also, with X11 (with my limited 1 attempt), I was not able to get rid of the corruption by switching VTs.

Comment 3 Florian Weimer 2016-08-25 12:39:08 UTC
(In reply to Kamil Páral from comment #2)
> (In reply to Florian Weimer from comment #1)
> > Does the corruption only happen during the glibc update, or does it persist
> > after the update?
> 
> Just during the update (until you reboot or maybe fix it by switching VTs).
> After reboot, everything is fine.
> 
> I reproduced this on X11 as well, so this is not a wayland problem. Also,
> with X11 (with my limited 1 attempt), I was not able to get rid of the
> corruption by switching VTs.

The display stack is not restarted while the glibc update is running, and it keeps using the previous glibc version/  So the glibc update itself can hardly be the cause of this issue.  It just happens to trigger it.

I expect that this is some sort of temporary interference from evelated CPU or PCI bus load.  Would you please reassign this bug to some package in the display stack?  Thanks.

Comment 4 Kamil Páral 2016-08-25 12:49:10 UTC
So, this seems to affect all machines, bare metal and VMs, but in a different way:

* on our two machines with AMD GPUs, we see the screen corruption as described. Their graphics cards are:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] [1002:6759]
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics] [1002:130f] (rev d4)

* on our machine with Intel GPU, the screen goes black for half a second during installation, but then is restored with no corruption. The graphics card is:
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller [8086:0152] (rev 09)

* in VMs (kvm,spice,qxl) the screen goes black for a split second and is restored mostly intact, but there's a "VT-like" cursor blinking in the top left corner. Whatever you type on keyboard is printed on this "overlay" and immediately erased, and the cursor keeps moving forward with every typed letter (see screenshots).
00:02.0 VGA compatible controller [0300]: Red Hat, Inc. QXL paravirtual graphic card [1b36:0100] (rev 04)


Offline upgrade doesn't seem to be affected in any way, only when the update is running in a live session.

Comment 5 Kamil Páral 2016-08-25 12:51:13 UTC
Created attachment 1194005 [details]
VT cursor visible in graphics (VM)

This is how VM screen looks like after glibc update. Notice the blinking cursor in top left corner.

Comment 6 Kamil Páral 2016-08-25 12:52:21 UTC
Created attachment 1194010 [details]
VT cursor moving in graphics (VM)

And on this screen, you can see the cursor moved (it has a black background). It moves with every character typed on the keyboard.

Comment 7 Kamil Páral 2016-08-25 13:31:22 UTC
Created attachment 1194035 [details]
KDE corruption in VM

This also affects KDE, but much worse. On AMD cards, the corruption is similar (and I was not able to get rid of it). In VMs, this switches you to a different VT (I guess - in any case, the desktop disappears). You can switch back, but you again see the blinking cursor, and this time, the text typed stays on the screen. That also has the "benefit" of showing your passwords, if you happen to type one. See the video.

This also affects the KDE graphical update manager, it also triggers this bug (i.e. this is not specific to DNF).

Also, this doesn't seem to be caused by the newer glibc. I have triggered the same problem when downgrading to an older glibc. You just need to reboot between your attempts, and (it seems) any glibc version change will trigger this.

Comment 8 Kamil Páral 2016-08-25 13:34:36 UTC
> Would you please reassign this bug to some package in the
> display stack?  Thanks.

I have no idea where to assign this. It affects bare metal and VM, all graphics drivers (in a different way). So my blind guesses are systemd, mesa, kernel. Because there is some weirdness with framebuffer/switching VTs at the very same time that systemd prints it's reloading, trying systemd.

Comment 9 Kamil Páral 2016-08-25 13:39:36 UTC
This is probably at least a conditional (if not full) violation of:
"The installed system must be able to download and install updates with the default console package manager. "
https://fedoraproject.org/wiki/Fedora_25_Alpha_Release_Criteria#Updates

Proposing for discussion.

To summarize, this is what we learned so far:
* GNOME + Radeon -> game stopper (display unreadable)
* GNOME + Intel -> cosmetic issue (screen blinking)
* GNOME + VM -> minor annoyance (screen blinking, VT cursor visible)
* KDE + Radeon -> game stopper (display unreadable)
* KDE + VM -> major annoyance (VT switch, typed text permanently visible)

KDE + Intel wasn't tested (yet), because we've hit an unrelated issue which prevented us from trying it.

Comment 10 Kamil Páral 2016-08-25 13:53:42 UTC
Another symptom was found by nirik. Once the bug happens, regardless of desktop environment or hardware/VM, you can then use alt+left/right to switch VTs, *even when you're in graphics*. That is usually impossible, alt+left/right only works in text VTs, but not in graphics. But after glibc update glitch, you can do it. It almost seems like the glibc update switched you to a text VT mode, while the graphics is still displaying "in the background". Systemd or kernel folk will know more, hope this helps. Trivially reproducible in VMs.

Comment 11 Adam Williamson 2016-08-25 14:32:12 UTC
meh, this is why we have offline update. ;) not sure if I'd call it a blocker.

Comment 12 Kamil Páral 2016-08-25 14:38:26 UTC
(In reply to Kamil Páral from comment #9)
> To summarize, this is what we learned so far:
> * GNOME + Radeon -> game stopper (display unreadable)
> * GNOME + Intel -> cosmetic issue (screen blinking)
> * GNOME + VM -> minor annoyance (screen blinking, VT cursor visible)
> * KDE + Radeon -> game stopper (display unreadable)
> * KDE + VM -> major annoyance (VT switch, typed text permanently visible)
> 
> KDE + Intel wasn't tested (yet), because we've hit an unrelated issue which
> prevented us from trying it.

Please note that we have a very limited hardware to test with (2 Radeon cards, 1 Intel). We should not draw any definitive conclusions from that (applying it to all cards from that vendor). Also, we have no nvidia card to test with.

Comment 13 Adam Williamson 2016-08-25 16:40:24 UTC
On my test box, with an Intel adapter (8086:0126), the system seems to flip to a real live VT for a second or two in glibc %post, then switches back to the desktop, with no graphical corruption evident.

I also have a system with an AMD adapter I'll test soonish. For the record, this can be tested from a live image, you don't need to install (though the live image will start behaving a bit strangely in other ways after glibc is updated).

Comment 14 Kevin Fenzi 2016-08-25 17:38:40 UTC
So, glibc calls glibc_post_upgrade in post. In that it has: 

  /* Check if telinit is available and either SysVInit fifo,
     or upstart telinit.  */
  if (access ("/sbin/telinit", X_OK)
      || ((!!access ("/dev/initctl", F_OK))
          ^ !access ("/sbin/initctl", X_OK)))
    _exit (0);

  /* Check if we are not inside of some chroot, because we'd just
     timeout and leave /etc/initrunlvl.

     On more modern systems this test is not sufficient to detect
     if we're in a chroot.  */
  if (readlink ("/proc/1/exe", initpath, 256) <= 0 ||
      readlink ("/proc/1/root", initpath, 256) <= 0)
    _exit (0);

  /* Here's another well known way to detect chroot, at least on an
     ext and xfs filesystems and assuming nothing mounted on the chroot's
     root. */
  if (stat ("/", &statbuf) != 0
      || (statbuf.st_ino != 2
          && statbuf.st_ino != 128))
    _exit (0);

  if (check_elf ("/proc/1/exe"))
    verbose_exec (116, "/sbin/telinit", "/sbin/telinit", "u");

The 'telinit' man page says: 

"
       u, U
           Serialize state, reexecute daemon and deserialize state again. This is equivalent to systemctl daemon-reexec."

But this doesn't seem to be the case. 
Manually running: 

sudo /sbin/telinit u

here causes this exact bug. 

Doing: 

sudo /sbin/systemctl daemon-reexec 

doesn't. 

So, there's some systemd behavior change here. Either it needs to fix 'telinit u' or glibc needs to change to call 'systemctl daemon-reexec' or something.

Comment 15 Adam Williamson 2016-08-25 17:49:36 UTC
We think Kevin's wrong there; it's not that direct systemctl call is OK and telinit -u is bad, it's simply that the bug only happens on the *first* re-exec. He didn't see the bug when he ran 'systemctl daemon-reexec' directly because he'd already done `telinit -u` on that boot.

Comment 16 Kamil Páral 2016-08-25 17:54:06 UTC
Discussed today at Go/NoGo meeting. Voted as RejectedBlocker - this is obviously annoying if you hit it, but it's not exactly fatal (you can just reboot), it doesn't affect all systems (many show no significant symptoms), and it's probably workaroundable by doing the update from a VT (or offline, for GNOME).

Comment 17 Kamil Páral 2016-08-25 17:56:27 UTC
I have tested "sudo systemctl daemon-reexec" on both GNOME and KDE in a VM and the symptoms are exactly the same as described in this bug. So this is the real issue, daemon-reexec causes this. It's trivial to reproduce. If you do, please note that the corruption/issues are only visible on your first invocation of daemon-reexec. If you want to trigger it again, reboot the system first.

Comment 18 Kevin Fenzi 2016-08-25 17:59:22 UTC
(In reply to Adam Williamson from comment #15)
> We think Kevin's wrong there; it's not that direct systemctl call is OK and
> telinit -u is bad, it's simply that the bug only happens on the *first*
> re-exec. He didn't see the bug when he ran 'systemctl daemon-reexec'
> directly because he'd already done `telinit -u` on that boot.

Yeah, my bad. I didn't reboot between tests. It happens with daemon-reexec too.

Comment 19 Zbigniew Jędrzejewski-Szmek 2016-08-25 19:18:28 UTC
This is fixed upstream [https://github.com/systemd/systemd/commit/bd64d82c1c0e3fe2a5f9b3dd9132d62834f50b2d, https://github.com/systemd/systemd/commit/158fbf7661912adf0f42c93155499119811dde82].

I'd mark this as duplicate of #1367766 (in fact the fix above was developed for #1367766), but now I checked the version numbers and see that #1367766 must have a different reason, since the patch which was the underlying cause and was reverted in the fix was not yet present in systemd-229 which is in F24.

Comment 20 Kamil Páral 2016-08-26 08:10:58 UTC
Proposing as a Final blocker, just to make sure this doesn't slip through the cracks.

Comment 21 Geoffrey Marr 2016-08-29 19:59:01 UTC
Discussed during the 2016-08-29 blocker review meeting: [1]

The decision to delay classification of this as a bug was made because there is a likely possibility that this bug will be fixed before Final.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-29/f25-blocker-review.2016-08-29-16.00.txt

Comment 22 Kamil Páral 2016-09-01 11:06:09 UTC
For the record, I just experienced a system update in VM where the screen went completely black with only a blinking cursor in top left corner (so it looked like switching to a different TTY, but there was no login prompt there). That was in a *VM*, where I never experienced this issue before (the screen usually just flashed). This leads to me a conclusion that this is likely a race condition, and different drivers often behave consistently, but it doesn't have to be a rule. It's very possible that a screen corruption or some other behavior will happen even with drivers for whose it's not usual (in our testing), just because the timing will be slightly different.

Comment 23 Kamil Páral 2016-09-05 14:14:28 UTC
When this bug occurs, the negative effects also include:
* Alt+left/right arrow switching your VTs, instead of say navigating back and forward in your browser/nautilus/etc.
* Ctrl+Alt+Del rebooting your computer immediately (no questions asked) instead of displaying a shutdown/reboot/cancel dialog (with possible warnings about unsaved work or other people logged in).

Comment 24 Kamil Páral 2016-09-06 09:24:29 UTC
Today I've twice seen my display go berserk during live update. glibc was not in the package set, but glib2 and systemd was. The monitor started flashing, showing a sequence of actual picture and a black screen, in fast succession. Impossible to work with that. I could switch to VT, but only once of the two times. I assume this might be the same or a related problem. Intel graphics.

Comment 25 Florian Weimer 2016-09-11 20:10:06 UTC
*** Bug 1362711 has been marked as a duplicate of this bug. ***

Comment 26 Adam Williamson 2016-09-12 18:42:04 UTC
systemd-231-4.fc25 claims to have fixed this:

- Fix issue with daemon-reload messing up graphics (#1367766)

can anyone reproduce after that version of systemd is installed and the system rebooted?

Comment 27 Geoffrey Marr 2016-09-13 02:50:12 UTC
Discussed during the 2016-09-12 blocker review meeting: [1]

The decision to classify this bug as an accepted blocker was made as the bug violates the following criteria:

"The installed system must be able to download and install updates with the default console package manager"

"All known bugs that can cause corruption of user data must be fixed or documented at Common F25 bugs"

This bug can cause the system to reboot before the update completes, breaking the system and risking data loss.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-09-12/f25-blocker-review.2016-09-12-16.01.txt

Comment 28 Kamil Páral 2016-09-13 12:09:49 UTC
(In reply to Adam Williamson from comment #26)
> systemd-231-4.fc25 claims to have fixed this:
> 
> - Fix issue with daemon-reload messing up graphics (#1367766)

I tested this by downgrading to glibc-2.24-1.fc25.x86_64 and back to -3, in VM and bare metal (AMD graphics), on GNOME and KDE. No issues. Fixed.


Note You need to log in before you can comment on or make changes to this bug.