740121 – WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3(): TAINTED ---------W

Bug 740121 - WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3(): TAINTED ---------W

Summary: WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3(): TAINTED ------...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	15
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Oleg Nesterov
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	abrt_hash:dbb75361c9befde7a1176567a7e...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-09-21 01:45 UTC by Luke Macken
Modified:	2016-09-20 02:42 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-06-06 19:45:16 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
[PATCH stable-3.0] do_signal_stop: don't clear GROUP_STOP_SIGMASK if task_is_stopped() (828 bytes, patch) 2011-09-23 16:43 UTC, Oleg Nesterov	no flags	Details \| Diff
View All

Description Luke Macken 2011-09-21 01:45:17 UTC

abrt version: 2.0.3
architecture:   x86_64
cmdline:        ro root=/dev/mapper/vg_vorpalblade-lv_root rd_LUKS_UUID=luks-29b27f25-aec9-429d-82d1-ed512bb04bdb rd_LVM_LV=vg_vorpalblade/lv_root rd_LVM_LV=vg_vorpalblade/lv_swap rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us quiet initcall_debug printk.time=y init=/sbin/bootchartd
component:      kernel
kernel:         2.6.40.4-5.fc15.x86_64
kernel_tainted: 512
kernel_tainted_long: Taint on warning.
os_release:     Fedora release 15 (Lovelock)
package:        kernel
reason:         WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3()
time:           Tue Sep 20 12:11:30 2011

backtrace:
:WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3()
:Hardware name: 4291CL9
:Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack tcp_lp fuse ebtable_nat ebtables xt_CHECKSUM bridge 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq mperf rfcomm bnep ip6t_REJECT btusb bluetooth snd_hda_codec_hdmi snd_hda_codec_conexant arc4 snd_hda_intel snd_hda_codec snd_hwdep iwlagn snd_seq snd_seq_device mac80211 snd_pcm thinkpad_acpi uvcvideo cfg80211 iTCO_wdt videodev snd_timer e1000e xhci_hcd i2c_i801 media snd_page_alloc snd iTCO_vendor_support rfkill v4l2_compat_ioctl32 soundcore joydev microcode virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt sdhci_pci sdhci mmc_core wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: nf_conntrack]
:Pid: 22112, comm: plugin-containe Tainted: G        W   2.6.40.4-5.fc15.x86_64 #1
:Call Trace:
: [<ffffffff81054c8e>] warn_slowpath_common+0x83/0x9b
: [<ffffffff81054cc0>] warn_slowpath_null+0x1a/0x1c
: [<ffffffff81064f4a>] do_signal_stop+0x246/0x2c3
: [<ffffffff8148690b>] ? schedule+0x690/0x6be
: [<ffffffff81065c18>] get_signal_to_deliver+0x153/0x3f6
: [<ffffffff81008f56>] do_signal+0x69/0x65e
: [<ffffffff814880b4>] ? _raw_spin_unlock_irqrestore+0x17/0x19
: [<ffffffff811584de>] ? ep_scan_ready_list+0x145/0x165
: [<ffffffff81158f3e>] ? sys_epoll_wait+0x2c7/0x341
: [<ffffffff8100958c>] do_notify_resume+0x28/0x83
: [<ffffffff8148eb10>] int_signal+0x12/0x17

comment:
:I was able to trigger this while playing around sending SIGSTOP/SIGCONT to various process groups, while using the tool tamefox http://github.com/lmacken/tamefox
:I have not been able to easily reproduce it, but it felt like a race condition involving the stopped app grabbing the x server display and locking the interface up.

Comment 1 Chuck Ebbert 2011-09-21 12:59:32 UTC

        if (current->group_stop & GROUP_STOP_PENDING) {
==>             WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
                goto retry;
        }

Comment 2 Oleg Nesterov 2011-09-21 15:03:48 UTC

(In reply to comment #0)
>
> :WARNING: at kernel/signal.c:2013 do_signal_stop+0x246/0x2c3()

Hmm. Thanks. So far I don't understand how this is possible.
And this code was significantly changed in 3.1, I already forgot
the details. I'll try to reread it again. Perhaps utrace patches
broke get_signal_to_deliver...

> :I was able to trigger this while playing around sending SIGSTOP/SIGCONT to
> various process groups, while using the tool tamefox
> http://github.com/lmacken/tamefox

unlikely I can reproduce ;)

> :I have not been able to easily reproduce it,

May be because of WARN_ON_ONCE, you need to reboot to see the warning
again.

I'll try to think more... Can you try the debugging patch if I make it?

Comment 3 Luke Macken 2011-09-21 16:44:10 UTC

(In reply to comment #2)
> (In reply to comment #0)
> May be because of WARN_ON_ONCE, you need to reboot to see the warning
> again.
> 
> I'll try to think more... Can you try the debugging patch if I make it?

Sure. I'll reboot and try and reproduce the issue at some point today as well.

Comment 4 Luke Macken 2011-09-21 17:41:40 UTC

I am able to reliably reproduce this issue by doing the following:

    firefox &
    pkill -STOP firefox
    strace -f -p $(pidof firefox)

Comment 5 Luke Macken 2011-09-21 17:51:14 UTC

(In reply to comment #4)
> I am able to reliably reproduce this issue by doing the following:

By 'reliably', I mean 'occassionally'. It feels a bit racy, and sometimes the strace seems to CONT the process, but sometimes it doesn't. The times that it doesn't work, I am able to reproduce this by opening up 'htop' and attempting to strace to once of firefox's threads.

Comment 6 Oleg Nesterov 2011-09-21 18:43:42 UTC

(In reply to comment #4)
> I am able to reliably reproduce this issue by doing the following:
> 
>     firefox &
>     pkill -STOP firefox
>     strace -f -p $(pidof firefox)

OK, thanks a lot... But this is quite different. I didn't try to
inspect the ptrace paths, because I assume your previous test-case
doesn't use ptrace but still triggers the warning?

Anyway, thanks for the info, I'll continue tomorrow.

Comment 7 Luke Macken 2011-09-21 19:32:43 UTC

Now that I think about it, the original oops *might* have been triggered using ptrace. I frequently use htop and hit 's' to strace the process, I may have done that out of muscle memory to begin with.

Comment 8 Oleg Nesterov 2011-09-23 16:43:00 UTC

Created attachment 524655 [details]
[PATCH stable-3.0] do_signal_stop: don't clear GROUP_STOP_SIGMASK if task_is_stopped()

Could you please test this patch? It should fix the problem.
But, I wasn't able to reproduce it until I understood what
happens (damn! this took me 2 days of grepping ;) May be
your testing has found something else...

This is upstream bug, I'll send the patch to -stable.

Ironically, 3.1 has the similar problem although the code
and the reason are quite different.

And. Both are buggy wrt jctl stop && ptrace/clone, this needs
another fix.

Comment 9 Luke Macken 2011-09-26 18:44:51 UTC

(In reply to comment #8)
> Created attachment 524655 [details]
> [PATCH stable-3.0] do_signal_stop: don't clear GROUP_STOP_SIGMASK if
> task_is_stopped()
> 
> Could you please test this patch? It should fix the problem.
> But, I wasn't able to reproduce it until I understood what
> happens (damn! this took me 2 days of grepping ;) May be
> your testing has found something else...
> 
> This is upstream bug, I'll send the patch to -stable.
> 
> Ironically, 3.1 has the similar problem although the code
> and the reason are quite different.
> 
> And. Both are buggy wrt jctl stop && ptrace/clone, this needs
> another fix.

I am unable to reproduce my problem with your patch applied.

Thanks!

Note You need to log in before you can comment on or make changes to this bug.