Bug 1010603 (systemd-suspend-segv) - systemd Caught <SEGV> with kernel-PAE-3.11.1-200.fc19.i686 on suspend/resume
Summary: systemd Caught <SEGV> with kernel-PAE-3.11.1-200.fc19.i686 on suspend/resume
Keywords:
Status: CLOSED ERRATA
Alias: systemd-suspend-segv
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1010506 1011233 1013235 1014381 1014844 1015279 1020078 1021275 1021629 1023748 1025683 1027081 1029849 1039320 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-21 21:38 UTC by Satish Balay
Modified: 2014-02-17 06:14 UTC (History)
54 users (show)

Fixed In Version: kernel-3.11.7-100.fc18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-10 07:54:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (86.83 KB, text/plain)
2013-09-21 21:38 UTC, Satish Balay
no flags Details
systemd-test.txt (386.93 KB, text/plain)
2013-09-21 21:40 UTC, Satish Balay
no flags Details
backtrace (2.63 KB, text/plain)
2013-09-27 21:48 UTC, Jiri Popelka
no flags Details
messages_3.11.6-200.fc19.i686: not OK (131.69 KB, text/plain)
2013-11-03 16:14 UTC, openfred
no flags Details
messages_3.11.6-201.fc19.i686: OK (372.85 KB, text/plain)
2013-11-03 16:15 UTC, openfred
no flags Details
messages_3.11.6-200.fc19.i686.PAE: not OK (157.38 KB, text/plain)
2013-11-03 16:16 UTC, openfred
no flags Details
messages_3.11.6-201.fc19.i686.PAE: OK (384.35 KB, text/plain)
2013-11-03 16:17 UTC, openfred
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 69663 0 None None None Never
Linux Kernel 61781 0 None None None Never
Red Hat Bugzilla 1010506 0 unspecified CLOSED [abrt] systemd-204-11.fc19: crash: Process /usr/lib/systemd/systemd was killed by signal 11 (SIGSEGV) 2021-02-22 00:41:40 UTC

Internal Links: 1010506

Description Satish Balay 2013-09-21 21:38:11 UTC
Created attachment 801053 [details]
dmesg

Description of problem:

Systemd appears to SEGV with  kernel-PAE-3.11.1-200.fc19.i686 on a thinkpad T400 on suspend resume. The network fails to resume after this crash.

Version-Release number of selected component (if applicable):

systemd-204-11.fc19.i686
kernel-PAE-3.11.1-200.fc19.i686

How reproducible:
I've seen it multiple times with this kernel. [I've attempted suspend/resume until crash].

I've also seen this issue with one of the 3.11-rc kernels as mentioned at https://bugzilla.redhat.com/show_bug.cgi?id=989373#c14

Steps to Reproduce:
1. 32bit F19 [upgraded from F18] on a Thinkpad T400
2. upgrade kernel to kernel-PAE-3.11.1-200.fc19.i686
3. suspend/resume multiple times until you see a SEGV in /var/log/messages

Actual results:
SEGV in /var/log/messages [and network doesn't work.] Fn-F4 to suspend fails to work [however 'poweroff -f' did work on my recent try]

Expected results:

No such problem.
Additional info:

No issue when using kernel-PAE-3.9.9-302.fc19.i686. [I've briefly used  3.10 kernels without this issue].

I've also used 3.11 kernel with F19 [x86_64] on a couple of other machines without this issue.



text from /var/log/messages

>>>>>>>>>>>>>>
Sep 21 16:11:36 nemo kernel: [  333.051212] PM: resume of devices complete after 916.132 msecs
Sep 21 16:11:36 nemo kernel: [  333.051938] Restarting tasks ...
Sep 21 16:11:36 nemo kernel: [  333.052580] traps: systemd[1] general protection ip:b73442c0 sp:bf99784c error:0 in libc-2.17.so[b732d000+1b8000]
Sep 21 16:11:36 nemo ntpd[598]: Deleting interface #7 wlan0, 192.168.0.3#123, interface stats: received=4, sent=4, dropped=0, active_time=13 secs
Sep 21 16:11:36 nemo ntpd[598]: 128.10.19.24 interface 192.168.0.3 -> (none)
Sep 21 16:11:36 nemo ntpd[598]: 199.30.140.76 interface 192.168.0.3 -> (none)
Sep 21 16:11:36 nemo ntpd[598]: 18.85.44.59 interface 192.168.0.3 -> (none)
Sep 21 16:11:36 nemo ntpd[598]: 64.233.245.204 interface 192.168.0.3 -> (none)
Sep 21 16:11:36 nemo ntpd[598]: Deleting interface #6 wlan0, fe80::222:faff:fec7:b58a#123, interface stats: received=0, sent=0, dropped=0, active_time=18 secs
Sep 21 16:11:36 nemo ntpd[598]: peers refreshed
Sep 21 16:11:36 nemo systemd[1]: Caught <SEGV>, dumped core as pid 2624.
Sep 21 16:11:36 nemo systemd[1]: Executing crash shell in 10s...
Sep 21 16:11:36 nemo kernel: [  333.077054] done.
Sep 21 16:11:36 nemo kernel: [  333.081449] video LNXVIDEO:00: Restoring backlight state
Sep 21 16:11:36 nemo systemd-sleep[2621]: System resumed.
Sep 21 16:11:46 nemo systemd[1]: Successfully spawned crash shell as pid 2658.
Sep 21 16:11:46 nemo systemd[1]: Freezing execution.
Sep 21 16:14:15 nemo dbus-daemon[380]: dbus[380]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Sep 21 16:14:15 nemo dbus[380]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Sep 21 16:14:40 nemo dbus-daemon[380]: dbus[380]: [system] Failed to activate service 'net.reactivated.Fprint': timed out
Sep 21 16:14:40 nemo dbus[380]: [system] Failed to activate service 'net.reactivated.Fprint': timed out
<<<<<<<<<<<<<<<<<<

[root@nemo ~]# systemctl list-units -t service --all
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
[root@nemo ~]# systemctl  list-jobs
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
[root@nemo ~]# systemctl list-units -t service --all
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
[root@nemo ~]# gdb /usr/lib/systemd/systemd 1
GNU gdb (GDB) Fedora (7.6-34.fc19)
<snip>
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
0xb7633424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install systemd-204-11.fc19.i686
(gdb) where
#0  0xb7633424 in __kernel_vsyscall ()
#1  0xb74f9b86 in __pause_nocancel () from /lib/libpthread.so.0
#2  0xb76d8bc5 in freeze ()
#3  0xb76662dc in crash ()
#4  <signal handler called>
#5  0xb73442c0 in free@plt () from /lib/libc.so.6
#6  0xb737175f in vfprintf () from /lib/libc.so.6
#7  0xb743a080 in __vsnprintf_chk () from /lib/libc.so.6
#8  0xb7439fa7 in __snprintf_chk () from /lib/libc.so.6
#9  0xb76cb8e6 in log_do_header.constprop.8 ()
#10 0xb76cd13c in log_struct_internal ()
#11 0xb766d606 in manager_loop ()
#12 0xb76631b0 in main ()

Comment 1 Satish Balay 2013-09-21 21:40:23 UTC
Created attachment 801054 [details]
systemd-test.txt

/usr/bin/systemd --test --system --log-level=debug > systemd-test.txt 2>&1

Comment 2 Satish Balay 2013-09-21 22:54:49 UTC
I now see systemd-204-15.fc19.i686 in the updates.

So I install the updates - and I still get a SEGV with systemd-204-15.fc19.i686 [and kernel-PAE-3.11.1-200.fc19.i686]

Comment 3 Satish Balay 2013-09-22 18:31:00 UTC
I tried the following - and still see this issue on the thinkpad T400

kernel-3.11.1-200.fc19.i686 [non-PAE kernel]
kernel-PAE-3.12.0-0.rc1.git4.1.fc21.i686 [current rawhide]

Comment 4 Satish Balay 2013-09-23 00:35:30 UTC
There is another report of systemd SEGV in fedora users mailing list [with the same kernel-PAE-3.11.1-200.fc19.i686 but on a different thinkpad (T61)]

https://lists.fedoraproject.org/pipermail/users/2013-September/441157.html

Comment 5 Christophe Delaere 2013-09-24 08:57:16 UTC
Hi,

Having the same issue on a sony vaio VGN-BZ21VN. Everything was fine until the upgrade to 3.11.1.

Comment 6 Satish Balay 2013-09-24 15:58:09 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=69663 looks similar so adding it as the external bug traker

Comment 7 Timothy Murphy 2013-09-25 13:16:17 UTC
I've had what I think is the same problem with a Thinkpad T61 under Fedora-19.
The problem occurs with kernel 3.11.1-200.fc19.i686.PAE but not with kernels  3.10.11-200.fc19.i686.PAE or 3.10.10-200.fc19.i686.PAE.
The problem occurs most often on resuming from suspension on opening the laptop lid; it is impossible to re-start or shutdown the machine except by pressing the power button.
The problem seems to involve systemd from /var/log/messages.

Comment 8 Lennart Poettering 2013-09-25 18:32:36 UTC
Anyone can get me a backtrace with debug symbols installed?

Comment 9 Satish Balay 2013-09-26 01:04:13 UTC
gcc-debuginfo is huge. So I did: debuginfo-install systemd-204-15.fc19.i686 --exclude=gcc-debuginfo

Let me know if I should install additional debuginfo packages. Here is the stack trace I get:

>>>>>>>>>>>>>>>>
0xb7650424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install libattr-2.4.46-10.fc19.i686 libgcc-4.8.1-1.fc19.i686 pcre-8.32-7.fc19.i686 zlib-1.2.7-10.fc19.i686
(gdb) where
#0  0xb7650424 in __kernel_vsyscall ()
#1  0xb7516b86 in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:81
#2  0xb76f5bc5 in freeze () at ../src/shared/util.c:3438
#3  0xb76832dc in crash (sig=11) at ../src/core/main.c:193
#4  <signal handler called>
#5  __strnlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen-sse2.S:125
#6  0xb738e75f in _IO_vfprintf_internal (s=s@entry=0xbfe40000, format=<optimized out>, 
    format@entry=0xb77433bc "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", 
    ap=0xbfe40158 "yCt\267\242\062t\267\001", ap@entry=0xbfe40144 "\006") at vfprintf.c:1635
#7  0xb7457080 in ___vsnprintf_chk (s=s@entry=0xbfe404cc "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=\267g", 
    maxlen=<optimized out>, maxlen@entry=2048, flags=flags@entry=1, slen=slen@entry=4294967295, 
    format=format@entry=0xb77433bc "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", args=args@entry=0xbfe40144 "\006") at vsnprintf_chk.c:63
#8  0xb7456fa7 in ___snprintf_chk (s=s@entry=0xbfe404cc "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=\267g", 
    maxlen=maxlen@entry=2048, flags=flags@entry=1, slen=slen@entry=4294967295, 
    format=format@entry=0xb77433bc "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n") at snprintf_chk.c:35
#9  0xb76e88e6 in snprintf (
    __fmt=0xb77433bc "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", 
    __n=2048, __s=0xbfe404cc "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=\267g") at /usr/include/bits/stdio2.h:64
#10 log_do_header (header=header@entry=0xbfe404cc "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=\267g", 
    level=level@entry=30, file=file@entry=0xb771298f "../src/core/manager.c", line=line@entry=1644, 
    func=func@entry=0xb77140c2 <__func__.12899> "process_event", object_name=object_name@entry=0x0, 
    object=object@entry=0x0, size=2048) at ../src/shared/log.c:441
#11 0xb76ea13c in log_struct_internal (level=30, level@entry=6, file=file@entry=0xb771298f "../src/core/manager.c", 
    line=line@entry=1644, func=func@entry=0xb77140c2 <__func__.12899> "process_event", 
    format=format@entry=0xb771389c "MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x")
    at ../src/shared/log.c:746
#12 0xb768a606 in process_event (ev=0xbfe40d88, m=0xb791ed80) at ../src/core/manager.c:1641
#13 manager_loop (m=0xb791ed80) at ../src/core/manager.c:1755
#14 0xb76801b0 in main (argc=5, argv=0xbfe412f4) at ../src/core/main.c:1734
(gdb) 

<<<<<<<<<<<<<<<

Comment 10 Jiri Popelka 2013-09-27 21:48:24 UTC
Created attachment 804175 [details]
backtrace

Thinkpad T43
kernel-3.11.1-200.fc19.i686
systemd-204-15.fc19.i686

Comment 11 Leo Wolf 2013-09-27 22:42:50 UTC
This issue is related to kernel commit 1c441e9 "epoll: use freezable blocking call", see https://bugzilla.kernel.org/show_bug.cgi?id=61781

Comment 12 Michal Schmidt 2013-09-27 22:58:09 UTC
(In reply to Leo Wolf from comment #11)
> This issue is related to kernel commit 1c441e9 "epoll: use freezable
> blocking call", see https://bugzilla.kernel.org/show_bug.cgi?id=61781

Thanks. Reassigning to kernel.

Comment 13 Zbigniew Jędrzejewski-Szmek 2013-09-29 10:21:08 UTC
*** Bug 1010506 has been marked as a duplicate of this bug. ***

Comment 14 Milan Bouchet-Valat 2013-10-02 13:44:41 UTC
*** Bug 1014381 has been marked as a duplicate of this bug. ***

Comment 15 Zbigniew Jędrzejewski-Szmek 2013-10-03 19:20:25 UTC
*** Bug 1015279 has been marked as a duplicate of this bug. ***

Comment 16 craigbarnes85 2013-10-04 20:36:40 UTC
I am also experiencing this problem on an Asus EeePC 1005HA, with kernel-3.11.2-201.fc19.i686 and systemd-204-15.fc19.i686.

Sorry if this seems like a "me too" comment -- just thought I'd mention that the problem isn't specific to Thinkpad hardware.

Comment 17 Zbigniew Jędrzejewski-Szmek 2013-10-07 02:39:12 UTC
*** Bug 1011233 has been marked as a duplicate of this bug. ***

Comment 18 Walter Neumann 2013-10-11 21:48:01 UTC
I'm seeing this with kernels 3.11.3-201.fc19.i686, 3.11.2-201.fc19.i686 in about one in two suspends on a Dell D410

Comment 19 Christopher Meng 2013-10-12 01:42:48 UTC
Hitting this.

Comment 20 Dimitrios Apostolou 2013-10-12 15:48:37 UTC
I get the same about twice per day (I suspend-resume several times) since upgrading to kernel-PAE-3.11.x, some time now. Obviously systemd's signal handler catches the SIGSEGV and voluntarily freezes. 

Oct 08 23:48:32  systemd[1]: Caught <SEGV>, dumped core as pid 11989.
Oct 08 23:48:32  systemd[1]: Freezing execution.

I realise that any complex program will segfault from time to time. But having all system's utilities useless because they can't contact systemd (it closes its listening socket before freezing) is bothersome:

# reboot
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused


About twice per day I have to force poweroff my system, dangerous. A tip to everyone: enable magic SysRq to allow syncing and unmounting clean the filesystem. Dear systemd devs, is there some way to trigger a controlled shutdown?

Comment 21 Satish Balay 2013-10-12 16:49:31 UTC
(In reply to Dimitrios Apostolou from comment #20)

> # reboot
> Failed to get D-Bus connection: Failed to connect to socket
> /run/systemd/private: Connection refused
> 
> 
> About twice per day I have to force poweroff my system, dangerous. A tip to
> everyone: enable magic SysRq to allow syncing and unmounting clean the
> filesystem. Dear systemd devs, is there some way to trigger a controlled
> shutdown?

[per comment #0] 'poweroff -f' works - and presumably 'reboot -f' should also work.

Comment 22 Milan Bouchet-Valat 2013-10-12 18:10:23 UTC
(In reply to Dimitrios Apostolou from comment #20)
> I get the same about twice per day (I suspend-resume several times) since
> upgrading to kernel-PAE-3.11.x, some time now. Obviously systemd's signal
> handler catches the SIGSEGV and voluntarily freezes. 
> 
> Oct 08 23:48:32  systemd[1]: Caught <SEGV>, dumped core as pid 11989.
> Oct 08 23:48:32  systemd[1]: Freezing execution.
> 
> I realise that any complex program will segfault from time to time. But
> having all system's utilities useless because they can't contact systemd (it
> closes its listening socket before freezing) is bothersome:
> 
> # reboot
> Failed to get D-Bus connection: Failed to connect to socket
> /run/systemd/private: Connection refused
> 
> 
> About twice per day I have to force poweroff my system, dangerous. A tip to
> everyone: enable magic SysRq to allow syncing and unmounting clean the
> filesystem. Dear systemd devs, is there some way to trigger a controlled
> shutdown?
Go back to kernel 3.10 unti lthe bug is fixed, that's the best workaround.

BTW, I don't think "any complex program will segfault from time to time". systemd doesn't on most systems, even when it runs for weeks.

Comment 23 Dimitrios Apostolou 2013-10-13 03:45:44 UTC
> Failed to get D-Bus connection: Failed to connect to socket
> /run/systemd/private: Connection refused

Just a minor correction, I got that message with "systemctl reboot", not just reboot. The latter gives a "can't contact /dev/initctl" message.

Satish: "systemctl reboot -f" gives an assertion failure. "reboot -f" reboots the system just like flipping the switch, *not* in a controlled manner.

Comment 24 Mikhail Zabaluev 2013-10-13 07:09:35 UTC
(In reply to Dimitrios Apostolou from comment #23)
> "reboot -f"
> reboots the system just like flipping the switch, *not* in a controlled
> manner.

Run "sync" before reboot to get some measure of data preservation.
I came here from bug #1013235, which looks like the same problem.

Comment 25 Walter Neumann 2013-10-16 03:00:24 UTC
Still a problem after update to kernel-3.11.4-201.fc19.i686 and systemd-204-16.fc19.i686

From /var/log/messages:

Oct 15 07:58:53 klein kernel: [ 5117.450268] PM: resume of devices complete after 1745.694 msecs
Oct 15 07:58:53 klein kernel: [ 5117.450666] Restarting tasks ... 
Oct 15 07:58:53 klein kernel: [ 5117.451178] traps: systemd[1] general protection ip:b73b9112 sp:bfa9e540 error:0 in libc-2.17.so[b7342000+1b8000]
Oct 15 07:58:53 klein kernel: [ 5117.472136] done.
Oct 15 07:58:53 klein kernel: [ 5117.475034] video LNXVIDEO:00: Restoring backlight state
Oct 15 07:58:54 klein systemd-sleep[7814]: System resumed.
Oct 15 07:58:54 klein gnome-session[1603]: JS LOG: SM: No battery found
Oct 15 07:58:54 klein upowerd[848]: (upowerd:848): UPower-Linux-WARNING **: energy 48.840000 bigger than full 43.134600
Oct 15 07:59:00 klein ddclient[7335]: WARNING:  cannot connect to checkip.dyndns.org:80 socket: IO::Socket::INET: Bad hostname 'checkip.dyndns.org'
Oct 15 07:59:47 klein gnome-session[1603]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
Oct 15 07:59:47 klein gnome-session[1603]: Received signal:15->'Terminated'
Oct 15 07:59:47 klein gnome-session[1603]: (tracker-miner-fs:1755): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
Oct 15 07:59:47 klein gnome-session[1603]: (tracker-miner-fs:1755): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
Oct 15 07:59:47 klein gnome-session[1603]: (tracker-miner-fs:1755): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
Oct 15 07:59:47 klein gnome-session[1603]: OK
Oct 15 08:01:06 klein dbus-daemon[483]: dbus[483]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.102" (uid=1343 pid=8009 comm="/bin/systemctl restart NetworkManager.service ") interface="org.freedesktop.systemd1.Manager" member="RestartUnit" error name="(unset)" requested_reply="0" destination="org.freedesktop.systemd1" (uid=0 pid=1 comm="/usr/lib/systemd/systemd --switched-root --system ")
Oct 15 08:01:06 klein dbus[483]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.102" (uid=1343 pid=8009 comm="/bin/systemctl restart NetworkManager.service ") interface="org.freedesktop.systemd1.Manager" member="RestartUnit" error name="(unset)" requested_reply="0" destination="org.freedesktop.systemd1" (uid=0 pid=1 comm="/usr/lib/systemd/systemd --switched-root --system ")

Comment 26 Paul Lipps 2013-10-19 02:21:12 UTC
*** Bug 1014844 has been marked as a duplicate of this bug. ***

Comment 27 Zbigniew Jędrzejewski-Szmek 2013-10-21 00:40:12 UTC
*** Bug 1021275 has been marked as a duplicate of this bug. ***

Comment 28 Zbigniew Jędrzejewski-Szmek 2013-10-21 18:27:26 UTC
*** Bug 1021629 has been marked as a duplicate of this bug. ***

Comment 29 Michal Schmidt 2013-10-23 16:01:53 UTC
*** Bug 1020078 has been marked as a duplicate of this bug. ***

Comment 30 openfred 2013-10-23 20:47:16 UTC
@ Michal scmidt
Well done, Bug #1020078 is clearly a duplicates of this bug.
It make sense the bug is more on the kernel side than on systemd side.

Same happens on a Thinkpad T500, detailled logs and "bt full" available on Bug #10200708.

Please note that this bug happens only when suspend is done by Gnome (at least on my T500).
I mean, when I suspend using menu (gnome-shell-extension-alternative-status-menu-3.8.4-1.fc19.noarch), or when Energy setup triggered (ie after, say 15 minutes).

On the other side, if the laptop is suspended using "pm-suspend", recovery is always successfull !

Comment 31 openfred 2013-10-23 21:22:47 UTC
Just updated, and kernel bumped to 3.11.6
kernel-PAE-modules-extra-3.11.6-200.fc19.i686
kernel-PAE-3.11.6-200.fc19.i686

The problem remains:

Oct 23 23:17:18 localhost kernel: [  141.914115] PM: resume of devices complete after 1219.063 msecs
Oct 23 23:17:18 localhost kernel: [  141.914298] Restarting tasks ...
Oct 23 23:17:18 localhost kernel: [  141.914458] traps: systemd[1] general protection ip:b733f2c0 sp:bfcb196c error:0 in libc-2.17.so[b7328000+1b8000]
Oct 23 23:17:18 localhost kernel: [  141.914821] Core dump to |/usr/libexec/abrt-hook-ccpp 11 4294967295 2371 0 0 1382563038 e pipe failed
Oct 23 23:17:18 localhost systemd[1]: Caught <SEGV>, core dump failed.
Oct 23 23:17:18 localhost systemd[1]: Freezing execution.
Oct 23 23:17:18 localhost kernel: [  141.921024] done.
Oct 23 23:17:18 localhost kernel: [  141.921181] video LNXVIDEO:00: Restoring backlight state
Oct 23 23:17:18 localhost systemd-sleep[2368]: System resumed.
Oct 23 23:17:18 localhost systemd-cgroups-agent[2375]: Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: Connection refused
Oct 23 23:17:21 localhost fprintd[1177]: ** Message: user 'fred' claiming the device: 0
Oct 23 23:17:21 localhost fprintd[1177]: ** Message: now monitoring fd 16
Oct 23 23:17:21 localhost fprintd[1177]: ** Message: device 0 claim status 0
Oct 23 23:17:21 localhost fprintd[1177]: ** Message: no longer monitoring fd 16
Oct 23 23:17:21 localhost fprintd[1177]: ** Message: released device 0
Oct 23 23:17:24 localhost gdm[357]: GdmSlave: could not fetch type of session 'c1': Aucun fichier ou dossier de ce type
Oct 23 23:17:25 localhost systemd-cgroups-agent[2398]: Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: Connection refused

Comment 32 openfred 2013-10-24 06:40:32 UTC
This bug is assigned to the kernel team, but is it relevant ?

Even if it may be related to the kernel (3.11 to 3.11.6), what is currently failing is systemd (traps: systemd[1] general protection ip:b733f2c0 sp:bfcb196c error:0 in libc-2.17.so[b7328000+1b8000]).
Consequently, systemd can't connect to dbus (systemd-cgroups-agent[2375]: Failed to get D-Bus connection).

After 6 updates of kernel 3.11, it may clear the problem is not solved on the kernel side...

I tested Ubuntu (Gnome) 13.10, on the same machine, to check how suspend/resume was behaving for them, with Gnome 3.8 and kernel 3.11, like F19.
There is no such problem. As they are not using systemd, we may thing this issue is related to systemd ?

I tried F20 alpha, to check if systemd-208 solves this issue, which happens with systemd-204 (F19). Unfortunately, the new System Menu (still) doesn't give the opportunity to suspend manually, and alternate status menu extension is no more  available for Gnome 3.10. As the T500 never goes to sleep, even after configuring it to sleep after 15 minutes (bug)... I can't check.

Is is possible to assign this bug to systemd ?

Comment 33 Milan Bouchet-Valat 2013-10-24 09:40:46 UTC
(In reply to openfred from comment #32)
> This bug is assigned to the kernel team, but is it relevant ?
> 
> Even if it may be related to the kernel (3.11 to 3.11.6), what is currently
> failing is systemd (traps: systemd[1] general protection ip:b733f2c0
> sp:bfcb196c error:0 in libc-2.17.so[b7328000+1b8000]).
> Consequently, systemd can't connect to dbus (systemd-cgroups-agent[2375]:
> Failed to get D-Bus connection).
> 
> After 6 updates of kernel 3.11, it may clear the problem is not solved on
> the kernel side...
> 
> I tested Ubuntu (Gnome) 13.10, on the same machine, to check how
> suspend/resume was behaving for them, with Gnome 3.8 and kernel 3.11, like
> F19.
> There is no such problem. As they are not using systemd, we may thing this
> issue is related to systemd ?
> 
> I tried F20 alpha, to check if systemd-208 solves this issue, which happens
> with systemd-204 (F19). Unfortunately, the new System Menu (still) doesn't
> give the opportunity to suspend manually, and alternate status menu
> extension is no more  available for Gnome 3.10. As the T500 never goes to
> sleep, even after configuring it to sleep after 15 minutes (bug)... I can't
> check.
> 
> Is is possible to assign this bug to systemd ?
This bug is in the kernel and using 3.10 fixes the problem. Please read the comments above before spamming this report, thanks.

Comment 34 Nerijus Baliūnas 2013-10-25 22:08:13 UTC
(In reply to openfred from comment #30)
> On the other side, if the laptop is suspended using "pm-suspend", recovery
> is always successfull !
Not true, just happened after resuming from pm-suspend.

Comment 35 Matt Molyneaux 2013-10-25 22:32:45 UTC
For completeness sake, the same thing happens on my laptop using Fedora 20 Alpha (32-bit again) with kernel-3.11.6-300.fc20.i686 and systemd-208-2.fc20.i686

openfred, when you said you didn't get this bug on Ubuntu, were you running a 32-bit or 64-bit kernel? That seems to be important here.

Comment 36 openfred 2013-10-26 08:42:35 UTC
All the tests were done with a 32bit kernel.

On F19, with i686 and i686-PAE kernel (with installation done with netinstall, and with installation done with livecd install).

On ubuntu (Ubuntu Gnome to compare Gnome 3.8 with kernel 3.11), with kernel i686 as well. I only tested Ubuntu Gnome using liveusb session, though.

And by the way, there is __8GB__ of memory on my T500 laptop, hope that helps.

Comment 37 Josh Boyer 2013-10-28 13:22:57 UTC
*** Bug 1023748 has been marked as a duplicate of this bug. ***

Comment 38 a.lexander.aristov 2013-10-28 15:42:44 UTC
People. If I may be useful I can install additional software or turn on more logging to trace the cause of the issue. 

The issue is reproducible on my laptop. And it seems it is somehow gone specific. I used to use KDE and I didn't meet the problem. But it was on F17. I then FedUpped to 19th and moved to gnome.

Comment 39 Stan Trzmiel 2013-10-28 15:53:05 UTC
Issue is not Gnome-specific, it happens also on my system (F19 + KDE 4.11) ('tho recently it's much less frequent). Problem lies somwhere between kernel 3.11+ and systemd.

Comment 40 Josh Boyer 2013-10-29 23:55:58 UTC
This is still broken in 3.12-rc7.  Upstream is going to revert 745cdb36da83aeec198650b410ca06304cf792 ("select: use freezable blocking call") and one other commit to fix it there, and it should fall back into 3.11.y stable shortly.

Comment 41 Fred Wells 2013-10-30 02:58:55 UTC
I have the same problem since update to kernel-3.11.2-201/systemd-204-15.

Comment 42 Dominique Brazziel 2013-10-30 23:30:31 UTC
Just happened here with kernel-3.11.6-200/systemd-204-17

Comment 43 Zbigniew Jędrzejewski-Szmek 2013-11-01 12:32:31 UTC
*** Bug 1025683 has been marked as a duplicate of this bug. ***

Comment 44 Fred Wells 2013-11-01 13:28:57 UTC
FWIW, https://bugzilla.redhat.com/show_bug.cgi?id=1005020 may be related, albeit somewhat different symptoms.  Seems suspend/resume have seen problems since kernel-3.10.

Comment 45 Zbigniew Jędrzejewski-Szmek 2013-11-01 13:44:32 UTC
No, #1005020 looks like a typical kernel suspend error. Please note that here the system resumes fine, apart from one process, process 1.

Comment 46 Josh Boyer 2013-11-01 17:33:50 UTC
I've grabbed the two revert patches from upstream that are supposed to fix the systemd segfault issue.  Please test this scratch build and see if that specific issue goes away.

http://koji.fedoraproject.org/koji/taskinfo?taskID=6123298

Comment 47 Satish Balay 2013-11-02 03:55:53 UTC
Ok installed kernel-PAE-3.11.6-200.9.fc19.i686 on the Thinkpad T400 and tried multiple suspends [via combination of 'gnome-shell-menu', Fn-f4, pm-suspend]

Here is the good news: I don't see systemd SEGV on the first or second suspend.

However after 10-15 suspend/resume cycles - the machine goes into suspend state and stays there - and does not resume from this state.

I hard-reset the machine and rebooted. Again around 10-15 suspend/resume steps - the machine fails to recover from suspend.

Then I tried using the 'x86_64' kernel on another Thinkpad T430 [to see if removing these commits cause problems on the 64bit kernel] - and it survived more than 30 suspend/resume cycles.

The Thinkpad T400 is back on kernel-PAE-3.10.11-200.fc19.i686 now. [I'll continue to monitor the x86_64 kernel on the T430]

Comment 48 Satish Balay 2013-11-02 04:33:22 UTC
I retried the same suspend/resume cycles with the old kernel-PAE-3.10.11-200.fc19.i686 - and even this one failed to resume from suspend  after 30 or so cycles.

I'm not sure if this an issue that was always there - or if one of the recent [non-kernel] updates is triggering it. [This kernel was working well for the past many weeks]

One symptom related to this is - after force-reboot - the display is set to lowest brightness setting.

Perhaps others on this bugzilla will have a better experience. On well...

Comment 49 Josh Boyer 2013-11-02 12:46:00 UTC
I've committed the patches now.  Will be in the next official build.

Comment 50 Fedora Update System 2013-11-02 19:14:10 UTC
kernel-3.11.6-302.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.6-302.fc20

Comment 51 Fedora Update System 2013-11-02 19:18:28 UTC
kernel-3.11.6-201.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.6-201.fc19

Comment 52 Fedora Update System 2013-11-02 19:23:32 UTC
kernel-3.11.6-101.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.11.6-101.fc18

Comment 53 Fedora Update System 2013-11-03 04:34:22 UTC
Package kernel-3.11.6-101.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.6-101.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-20545/kernel-3.11.6-101.fc18
then log in and leave karma (feedback).

Comment 54 a.lexander.aristov 2013-11-03 11:43:22 UTC
Updated to kernel-3.11.6-201.fc19. It fixes my issue. I have done a few suspend/resume cycles and all is good. Will monitor it and report if the issue reappears.

Comment 55 openfred 2013-11-03 16:11:35 UTC
On F19 Gnome, 32 bit, with normal and PAE kernels, 3.11.6-201 solves the problem !
Attached logs for 3.11.6-200 (not OK) and 3.11.6-201 (OK), for both normal and PAE kernels.

Regarding Comment #35, I installed F19 Gnome x86_64, and was enable to reproduce the problem, with 3.11.6-200 or 3.11.6-201. Seems 64b kernels are not affected.

Comment 56 openfred 2013-11-03 16:14:07 UTC
Created attachment 818782 [details]
messages_3.11.6-200.fc19.i686: not OK

Comment 57 openfred 2013-11-03 16:15:29 UTC
Created attachment 818783 [details]
messages_3.11.6-201.fc19.i686: OK

Comment 58 openfred 2013-11-03 16:16:40 UTC
Created attachment 818784 [details]
messages_3.11.6-200.fc19.i686.PAE: not OK

Comment 59 openfred 2013-11-03 16:17:57 UTC
Created attachment 818788 [details]
messages_3.11.6-201.fc19.i686.PAE: OK

Comment 60 openfred 2013-11-03 16:26:17 UTC
Update: 
I was __"unable"__ to reproduce the bug with x86_64 kernel, not __"enable"__... sorry

Comment 61 Fedora Update System 2013-11-04 20:16:22 UTC
kernel-3.11.7-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.7-300.fc20

Comment 62 Fedora Update System 2013-11-04 20:19:11 UTC
kernel-3.11.7-100.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.11.7-100.fc18

Comment 63 Fedora Update System 2013-11-05 02:57:12 UTC
kernel-3.11.6-201.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 64 Fedora Update System 2013-11-05 20:02:45 UTC
Package kernel-3.11.7-300.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.7-300.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-20705/kernel-3.11.7-300.fc20
then log in and leave karma (feedback).

Comment 65 Samuel Sieb 2013-11-06 06:05:31 UTC
*** Bug 1027081 has been marked as a duplicate of this bug. ***

Comment 66 Samuel Sieb 2013-11-06 06:08:09 UTC
Can this fix get put in F18 as well?

Comment 67 Samuel Sieb 2013-11-06 06:09:14 UTC
Sorry, I see that it was...  It just hasn't been pushed yet.

Comment 68 Fedora Update System 2013-11-10 07:54:50 UTC
kernel-3.11.7-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 69 Fedora Update System 2013-11-13 02:13:50 UTC
kernel-3.11.7-100.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 70 Michal Schmidt 2013-11-13 12:13:47 UTC
*** Bug 1029849 has been marked as a duplicate of this bug. ***

Comment 71 Zbigniew Jędrzejewski-Szmek 2013-12-08 02:18:17 UTC
*** Bug 1039320 has been marked as a duplicate of this bug. ***

Comment 72 Zbigniew Jędrzejewski-Szmek 2014-02-17 06:14:49 UTC
*** Bug 1013235 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.