1050964 – GDB testsuite: BUG: unable to handle kernel paging request

Bug 1050964 - GDB testsuite: BUG: unable to handle kernel paging request

Summary: GDB testsuite: BUG: unable to handle kernel paging request

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Denys Vlasenko
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1060384 1065689 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-09 12:43 UTC by Jan Kratochvil
Modified:	2014-06-19 05:51 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-05-22 10:28:24 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
The debugging patch for 3.12.6-300 (1.95 KB, patch) 2014-01-17 16:28 UTC, Denys Vlasenko	no flags	Details \| Diff
New debugging patch (2.66 KB, patch) 2014-01-20 11:43 UTC, Denys Vlasenko	no flags	Details \| Diff
Patch to make d_path(len=0) safe (1.79 KB, patch) 2014-01-21 10:23 UTC, Denys Vlasenko	no flags	Details \| Diff
Alternative upstream patch: do not call d_path(buflen=0) (1.52 KB, patch) 2014-02-04 13:41 UTC, Denys Vlasenko	no flags	Details \| Diff
*Patch to fix prepend_name() with buflen < 0** (1.39 KB, patch) 2014-02-05 17:39 UTC, Denys Vlasenko	no flags	Details \| Diff
View All

Description Jan Kratochvil 2014-01-09 12:43:58 UTC

kernel-3.12.6-300.fc20.x86_64

command: gdb/testsuite.unix.-m64/gdb.base/break-interp-BINprelinkNOdebugSEPpieNO segv

unaware if reproducible, probably not

BUG: unable to handle kernel paging request at ffffc90006e58000
IP: [<ffffffff813105ea>] memmove+0x4a/0x1a0
PGD 19900e067 PUD 19900f067 PMD cfc86067 PTE 0
Oops: 0002 [#1] SMP 
Modules linked in: microcode rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6table_filter ip6_tables xt_LOG xt_recent ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_tftp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack netconsole tun ebtable_nat ebtables cfg80211 rfkill i2c_i801 snd_hda_codec_hdmi snd_hda_intel snd_hda_codec coretemp kvm_intel kvm iTCO_wdt serio_raw iTCO_vendor_support tulip lpc_ich r8169 mfd_core snd_hwdep snd_seq snd_seq_device snd_pcm mii snd_page_alloc snd_timer snd soundcore mxm_wmi shpchp wmi acpi_cpufreq i7core_edac edac_core nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc dm_crypt raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx ata_generic pata_acpi raid6_pq radeon i2c_algo_bit drm_kms_helper crc32c_intel ttm drm pata_jmicron i2c_core [last unloaded: microcode]
CPU: 7 PID: 14716 Comm: break-interp-BI Not tainted 3.12.6-300.fc20.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. EX58-UD4/EX58-UD4, BIOS F11t 01/28/2011
task: ffff8801069d8000 ti: ffff880109bae000 task.ti: ffff880109bae000
RIP: 0010:[<ffffffff813105ea>]  [<ffffffff813105ea>] memmove+0x4a/0x1a0
RSP: 0018:ffff880109bafaa8  EFLAGS: 00010206
RAX: ffffc90006e58000 RBX: 00000000ffffff30 RCX: ffffc90006e58000
RDX: 0000000000000090 RSI: ffffc90006e57f50 RDI: ffffc90006e58000
RBP: ffff880109bafc78 R08: 30322f6b636f6d6d R09: 61682f68636f7461
R10: 726b6a2f656d6f68 R11: 2f656661736e752f R12: 00000000000000d0
R13: ffff88008f4cc0b8 R14: ffffc90006e580d0 R15: ffffc90006e57190
FS:  00007fa0867bc740(0000) GS:ffff88019fce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90006e58000 CR3: 0000000106b9e000 CR4: 00000000000007e0
DR0: 0000000056557020 DR1: 00000000ffffca3c DR2: 0000000056557090
DR3: 00000000006010ec DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 ffffffff81200dbb ffff8801069d8000 00000000000002b0 0000001e00000000
 ffffffff0000001e ffff880100001000 ffffffff81c24f20 ffff8801000002b0
 ffff880130bc10c0 ffff880109bafcf8 ffffc90006e572b0 ffffc90006e57000
Call Trace:
 [<ffffffff81200dbb>] ? elf_core_dump+0x9eb/0x1460
 [<ffffffff811edaf1>] ? fsnotify+0x241/0x320
 [<ffffffff811d805e>] ? __mark_inode_dirty+0x12e/0x270
 [<ffffffff812090fc>] do_coredump+0xabc/0xe60
 [<ffffffff813075ce>] ? radix_tree_lookup_slot+0xe/0x10
 [<ffffffff81077853>] ? __sigqueue_free.part.15+0x33/0x40
 [<ffffffff81077e9c>] ? __dequeue_signal+0x13c/0x220
 [<ffffffff8107a972>] get_signal_to_deliver+0x1c2/0x5c0
 [<ffffffff81012428>] do_signal+0x48/0x5e0
 [<ffffffff8107995d>] ? do_send_sig_info+0x5d/0x80
 [<ffffffff81012a30>] do_notify_resume+0x70/0xa0
 [<ffffffff816723e2>] int_signal+0x12/0x17
Code: 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 41 48 83 ea 20 48 83 ea 20 4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 <4c> 89 1f 4c 89 57 08 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 
RIP  [<ffffffff813105ea>] memmove+0x4a/0x1a0
 RSP <ffff880109bafaa8>
CR2: ffffc90006e58000
---[ end trace d364f39cd6b1c01c ]---

Comment 1 Josh Boyer 2014-01-09 12:47:30 UTC

Oleg, any ideas on this one?

Comment 2 Michele Baldessari 2014-01-09 12:53:26 UTC

I had a moderately similar trace a long time ago (closed as it happened only once):
https://bugzilla.redhat.com/show_bug.cgi?id=975360

Not exactly the same, but worth nothing

Comment 3 Oleg Nesterov 2014-01-09 13:16:51 UTC

(In reply to Jan Kratochvil from comment #0)
>
> kernel-3.12.6-300.fc20.x86_64
> 
> command:
> gdb/testsuite.unix.-m64/gdb.base/break-interp-BINprelinkNOdebugSEPpieNO segv
> 
> unaware if reproducible, probably not
> 
> BUG: unable to handle kernel paging request at ffffc90006e58000
> IP: [<ffffffff813105ea>] memmove+0x4a/0x1a0

...

>  [<ffffffff81200dbb>] ? elf_core_dump+0x9eb/0x1460

looks like, fill_files_note() strikes again...

Denys, any ideas? Perhaps you can also look at the recent 72023656961
"fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing".

Comment 4 Jan Kratochvil 2014-01-09 18:29:17 UTC

It looks easily reproducible, not by a single binary run but the fully parallel testsuite run crashed the box again.  So I can test a fix.

Comment 5 Denys Vlasenko 2014-01-16 16:47:21 UTC

The first memmove in fill_files_note() is in this code block:

        /* *Estimated* file count and total data size needed */
        count = current->mm->map_count;
        size = count * 64;
                
        names_ofs = (2 + 3 * count) * sizeof(data[0]);
 alloc:
        if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
                return -EINVAL;
        size = round_up(size, PAGE_SIZE);
        data = vmalloc(size);
        if (!data)
                return -ENOMEM;

        start_end_ofs = data + 2;
        name_base = name_curpos = ((char *)data) + names_ofs;
        remaining = size - names_ofs;
        count = 0;
        for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
                struct file *file;
                const char *filename;

                file = vma->vm_file;
                if (!file)
                        continue;
                filename = d_path(&file->f_path, name_curpos, remaining);
                if (IS_ERR(filename)) {
                        if (PTR_ERR(filename) == -ENAMETOOLONG) {
                                vfree(data);
                                size = size * 5 / 4;
                                goto alloc;
                        }
                        continue;
                }
                
                /* d_path() fills at the end, move name down */
                /* n = strlen(filename) + 1: */
                n = (name_curpos + remaining) - filename;
                remaining = filename - name_curpos;
                memmove(name_curpos, filename, n);
                name_curpos += n;

                *start_end_ofs++ = vma->vm_start;
                *start_end_ofs++ = vma->vm_end;
                *start_end_ofs++ = vma->vm_pgoff;
                count++;
        }

We make a few assumptions here:

* we assume that d_path() always stores filename strictly at the end, so that NUL byte goes into the very last available byte. If this assumption breaks, we can save some garbage past NUL, but memmove would still touch only valid memory...
* we assume that returned filename pointer can't be NULL. This would crash, but with "NULL dereference" message instead...

So at the moment, I don't see a bug here.

The second memmove is here:

        /*
         * Count usually is less than current->mm->map_count,
         * we need to move filenames down.
         */
        n = current->mm->map_count - count;
        if (n != 0) {
                unsigned shift_bytes = n * 3 * sizeof(data[0]);
                memmove(name_base - shift_bytes, name_base,
                        name_curpos - name_base);
                name_curpos -= shift_bytes;
        }

This should be ok as long as n is not negative... which is should never be... right?? :/

Jan, want to try a debug kernel with paranoia checks thrown around these memmoves?

Comment 6 Jan Kratochvil 2014-01-16 16:50:41 UTC

(In reply to Denys Vlasenko from comment #5)
> Jan, want to try a debug kernel with paranoia checks thrown around these
> memmoves?

Going to run it with kernel-debug.rpm but if you like any additional code there please provide a scratch build in Koji (or at least a .patch file), thanks.

Comment 7 Jan Kratochvil 2014-01-17 10:56:10 UTC

kernel-debug-3.12.8-300.fc20.x86_64

The first problem reported in /var/log/messages:

BUG: unable to handle kernel paging request at ffffc9000907e000
IP: [<ffffffff81384f2a>] memmove+0x4a/0x1a0
PGD 195488067 PUD 195489067 PMD 18e6da067 PTE 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole tun ebtable_nat ebtables ip6t_REJECT nf_conntrack_tftp nf_conntrack_ipv6 nf_defrag_ipv6 xt_LOG xt_recent ipt_MASQUERADE iptable_nat xt_conntrack nf_conntrack_ipv4 ip6table_filter nf_defrag_ipv4 ip6_tables nf_nat_ipv4 nf_nat nf_conntrack coretemp kvm_intel kvm snd_hda_codec_hdmi iTCO_wdt snd_hda_intel iTCO_vendor_support gpio_ich snd_hda_codec lpc_ich nfsd tulip snd_hwdep snd_seq mxm_wmi r8169 mii snd_seq_device microcode i2c_i801 auth_rpcgss snd_pcm mfd_core serio_raw nfs_acl snd_page_alloc lockd snd_timer snd soundcore shpchp wmi i7core_edac edac_core acpi_cpufreq sunrpc binfmt_misc dm_crypt raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq ata_generic pata_acpi crc32c_intel radeon i2c_algo_bit drm_kms_helper ttm pata_jmicron drm i2c_core
CPU: 2 PID: 23645 Comm: break-interp-BI Not tainted 3.12.8-300.fc20.x86_64+debug #1
Hardware name: Gigabyte Technology Co., Ltd. EX58-UD4/EX58-UD4, BIOS F11t 01/28/2011
task: ffff88010311cac0 ti: ffff88007605c000 task.ti: ffff88007605c000
RIP: 0010:[<ffffffff81384f2a>]  [<ffffffff81384f2a>] memmove+0x4a/0x1a0
RSP: 0018:ffff88007605da60  EFLAGS: 00010206
RAX: ffffc9000907e000 RBX: 00000000ffffff30 RCX: ffffc9000907e000
RDX: 0000000000000090 RSI: ffffc9000907df50 RDI: ffffc9000907e000
RBP: ffff88007605dc30 R08: 30322f6b636f6d6d R09: 61682f68636f7461
R10: 726b6a2f656d6f68 R11: 2f656661736e752f R12: 00000000000000d0
R13: ffff88017a64ce00 R14: ffffc9000907e0d0 R15: ffffc9000907d190
FS:  00007fa17063f740(0000) GS:ffff880197a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000907e000 CR3: 0000000192a2e000 CR4: 00000000000007e0
DR0: 000000005655702c DR1: 00000000565570a0 DR2: 00000000565570a8
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 ffffffff81256ceb ffffffff812569d8 0000001e0000001e ffff88010311cac0
 ffffffff81c293a0 ffff8801024f53e8 ffff88007605dcb0 00000000000002b0
 ffff880000001000 ffff8800000002b0 ffffc9000907d2b0 ffffc9000907d000
Call Trace:
 [<ffffffff81256ceb>] ? elf_core_dump+0xbfb/0x1920
 [<ffffffff812569d8>] ? elf_core_dump+0x8e8/0x1920
 [<ffffffff81021b73>] ? native_sched_clock+0x13/0x80
 [<ffffffff810b9475>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffff810eed7d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8125f8b5>] do_coredump+0xcc5/0x10e0
 [<ffffffff817479fa>] ? __slab_free+0x19c/0x382
 [<ffffffff81021b73>] ? native_sched_clock+0x13/0x80
 [<ffffffff81021be9>] ? sched_clock+0x9/0x10
 [<ffffffff810b9475>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffff8108c228>] get_signal_to_deliver+0x2d8/0x950
 [<ffffffff81019528>] do_signal+0x48/0x600
 [<ffffffff81019b50>] do_notify_resume+0x70/0xa0
 [<ffffffff8175d422>] int_signal+0x12/0x17
Code: 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 41 48 83 ea 20 48 83 ea 20 4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 <4c> 89 1f 4c 89 57 08 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4
RIP  [<ffffffff81384f2a>] memmove+0x4a/0x1a0
 RSP <ffff88007605da60>
CR2: ffffc9000907e000
---[ end trace eb729bf04b441855 ]---

Comment 8 Denys Vlasenko 2014-01-17 14:25:41 UTC

Jan, this is a test build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6419378

It's exactly the same kernel version from your first oops report, with one patch added.
I did not bump the version from 300 to 301, sorry.
As of this writing, it is still compiling.

The patch adds this debugging hack:

+static int bad_memmove(char *dst, char *src, unsigned n, char *data, unsigned size, const char *pfx)
+{
+       if (!dst) {
+               printk(KERN_ERR "%s: BAD MOVE: NULL dst\n", pfx);
+               return 1;
+       }
+       if (!src) {
+               printk(KERN_ERR "%s: BAD MOVE: NULL src\n", pfx);
+               return 1;
+       }
+       if ((int)n < 0) {
+               printk(KERN_ERR "%s: BAD MOVE: n=%d\n", pfx, n);
+               return 1;
+       }
+       if ((int)size < 0) {
+               printk(KERN_ERR "%s: BAD MOVE: size=%d\n", pfx, size);
+               return 1;
+       }
+       if (src < dst) {
+               printk(KERN_ERR "%s: BAD MOVE: src %p < dst %p\n", pfx, src, dst);
+               return 1;
+       }
+       if (src+n > data+size) {
+               printk(KERN_ERR "%s: BAD MOVE: src+n %p > data+size %p\n", pfx, src+n, data+size);
+               return 1;
+       }
+       if (dst < data) {
+               printk(KERN_ERR "%s: BAD MOVE: dst %p < data %p\n", pfx, dst, data);
+               return 1;
+       }
+       return 0;
+}

...
+if (bad_memmove(name_curpos, filename, n, (char*)data, size, "storing")) {
+ vfree(data);
+ return -EINVAL;
+}
                memmove(name_curpos, filename, n);
...
+if (bad_memmove(name_base - shift_bytes, name_base,
+                       name_curpos - name_base,
+               (char*)data, size, "final_fix")) {
+ vfree(data);
+ return -EINVAL;
+}
                memmove(name_base - shift_bytes, name_base,
                        name_curpos - name_base);


added to fs/binfmt_elf.c.

IOW: this kernel should NOT oops, but it should write a "BAD MOVE:" into dmesg when a bogus memmove() is attempted.
Please try reproducing with this kernel, and watch dmesg.

Comment 9 Denys Vlasenko 2014-01-17 16:28:31 UTC

Created attachment 851700 [details]
The debugging patch for 3.12.6-300

Comment 10 Jan Kratochvil 2014-01-17 18:52:19 UTC

[ 9570.335110] storing: BAD MOVE: src ffffc9000d7c8f30 < dst ffffc9000d7c9000
[ 9658.205069] storing: BAD MOVE: src ffffc9000bd32f30 < dst ffffc9000bd33000
[ 9750.637397] storing: BAD MOVE: src ffffc90009f53f2f < dst ffffc90009f54000

Comment 11 Denys Vlasenko 2014-01-20 10:53:43 UTC

It comes from here:

                filename = d_path(&file->f_path, name_curpos, remaining);
                if (IS_ERR(filename)) {
                        if (PTR_ERR(filename) == -ENAMETOOLONG) {
                                vfree(data);
                                size = size * 5 / 4;
                                goto alloc;
                        }
                        continue;
                }
        
                /* d_path() fills at the end, move name down */
                /* n = strlen(filename) + 1: */
                n = (name_curpos + remaining) - filename;
                remaining = filename - name_curpos;
if (bad_memmove(name_curpos, filename, n, (char*)data, size, "storing")) {
 vfree(data);
 return -EINVAL;
}

I'm confused. d_path() has returned a "filename" ptr which points *before* start of buffer we gave it to use for filename??

Building a new debug kernel to see how that happens...

Comment 12 Denys Vlasenko 2014-01-20 11:43:12 UTC

Created attachment 852718 [details]
New debugging patch

The build is being compiled:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6428825

Jan, can you try this one?

Comment 13 Jan Kratochvil 2014-01-20 17:03:34 UTC

4914.040568 storing: BAD MOVE: src ffffc90007b3ef30 < dst ffffc90007b3f000, n 208
4914.040625  was d_path(ffff8800dbb6f650, ffffc90007b3f000, 0)
5017.422474 storing: BAD MOVE: src ffffc90006b76f30 < dst ffffc90006b77000, n 208
5017.422532  was d_path(ffff880101e4bf50, ffffc90006b77000, 0)
5117.538821 storing: BAD MOVE: src ffffc9000bbd1f2f < dst ffffc9000bbd2000, n 209
5117.538874  was d_path(ffff880024363f50, ffffc9000bbd2000, 0)

Comment 14 Denys Vlasenko 2014-01-20 18:20:26 UTC

Yes, we caught it.
d_path() is being called with buflen=0, and apparently in such case it does not return -ENAMETOOLONG as code expects.

Comment 15 Denys Vlasenko 2014-01-21 10:23:30 UTC

Created attachment 853096 [details]
Patch to make d_path(len=0) safe

New test build compiling:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6433553

Comment 16 Denys Vlasenko 2014-01-21 10:46:39 UTC

Oh oh.

fs/dcache.c: In function 'path_with_deleted':
fs/dcache.c:3018:2: error: 'error' undeclared (first use in this function)

Fixed. New test build compiling:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6433617

Comment 17 Jan Kratochvil 2014-01-21 10:54:32 UTC

BTW please use %define buildid as identifying the installed builds is otherwise a mess.

Also please post the patches along as 3.12.6 is old kernel version now.

Thanks for fixing it.

Comment 18 Denys Vlasenko 2014-01-21 11:58:38 UTC

Jan, are you saying it works for yon now - meaning no oopses *and no debug messages in dmesg*?

Comment 19 Jan Kratochvil 2014-01-21 12:07:15 UTC

I haven't yet tested your hopefully final patch from Comment 16, the kernel is still building in Koji.

So far I use your previos debug patches and it no longer oopses.  It probably does not dump the core properly in that case - I did not check that.

Comment 20 Jan Kratochvil 2014-01-21 14:35:36 UTC

I think it works now, it seems to no longer crash.

I do not see any obvious regressions against GDB testsuite run from 2013-10-24.

Comment 21 Denys Vlasenko 2014-01-22 12:52:40 UTC

(In reply to Jan Kratochvil from comment #20)
> I think it works now, it seems to no longer crash.

It is not supposed to crash, since debugging patch checks for attempts of bad memmove() and prevents them, even without aborting coredumping.

Did you check dmesg for BAD MOVE messages?

I ask because we have serious doubts that my fix is effective. It makes code a bit more robust, yes, but it is possible that the observed failure isn't on those code paths it fixes!

Comment 22 Jan Kratochvil 2014-01-22 13:05:52 UTC

I verified first the build from Comment 16 (kernel-debug.rpm) fixes it.

I also verified / I run my own build with trivial fix of the Comment 15 patch:
  https://koji.fedoraproject.org/koji/taskinfo?taskID=6434857

In neither of the two kernels above I see any 'bad move\|bug:\|bad2' regex.

There is no "BAD MOVE" message in the Comment 15 patch / build of mine so such message really cannot happen.  I do not know which patches you put into the Comment 16 build.

You were suggesting kernel-debug rpms but I cannot run them besides a simple crash verification as they are too slow for any normal work.

Comment 23 Denys Vlasenko 2014-01-22 15:41:36 UTC

(In reply to Jan Kratochvil from comment #22)
> I verified first the build from Comment 16 (kernel-debug.rpm) fixes it.
> 
> I also verified / I run my own build with trivial fix of the Comment 15
> patch:
>   https://koji.fedoraproject.org/koji/taskinfo?taskID=6434857
> 
> In neither of the two kernels above I see any 'bad move\|bug:\|bad2' regex.

Well, if there is no oops with only this patch, then we have a fix.

> There is no "BAD MOVE" message in the Comment 15 patch / build of mine so
> such message really cannot happen.  I do not know which patches you put into
> the Comment 16 build.
> 
> You were suggesting kernel-debug rpms but I cannot run them besides a simple
> crash verification as they are too slow for any normal work.

I was suggesting to run this build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6433617

it is not a debug build.

Comment 24 Denys Vlasenko 2014-01-31 16:46:47 UTC

Al Viro has a patch which is going upstream:

http://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/commit/?h=for-linus&id=f6500801522c61782d4990fa1ad96154cb397cd4

However, it fixes length check in __dentry_path() static function, which is only eachable by dentry_path() and dentry_path_raw().

This fix is not affecting d_path() code path.
I am not sure this patch will fix the problem Jan sees.

Comment 25 Oleg Nesterov 2014-01-31 17:56:47 UTC

(In reply to Denys Vlasenko from comment #24)
> 
> However, it fixes length check in __dentry_path() static function, which is
> only eachable by dentry_path() and dentry_path_raw().
> 
> This fix is not affecting d_path() code path.
> I am not sure this patch will fix the problem Jan sees.

Yes... I guess we need to reproduce the problem once again with another
debugging patch which prints the filename and d_op->d_dname.

Comment 26 Josh Boyer 2014-02-03 13:11:54 UTC

*** Bug 1060384 has been marked as a duplicate of this bug. ***

Comment 27 Denys Vlasenko 2014-02-04 13:41:07 UTC

Created attachment 859160 [details]
Alternative upstream patch: do not call d_path(buflen=0)

Sent upstream for review

Comment 28 Jan Kratochvil 2014-02-05 09:32:13 UTC

BTW confirming the 2 patches from Comment 24 and Comment 27 fix it for my nightly testsuite runs.

kernel-3.12.9-301.dentrybuflen.fc20.x86_64
https://koji.fedoraproject.org/koji/taskinfo?taskID=6491049

Comment 29 Oleg Nesterov 2014-02-05 14:34:41 UTC

(In reply to Jan Kratochvil from comment #28)
> BTW confirming the 2 patches from Comment 24 and Comment 27 fix it for my
> nightly testsuite runs.

Sure, but the patch in #27 just hides the problem... We do
not know why/how/where d_path(buflen => 0) fails.

Comment 30 Denys Vlasenko 2014-02-05 17:24:28 UTC

The bug is here:

static int prepend_name(char **buffer, int *buflen, struct qstr *name)
{
        const char *dname = ACCESS_ONCE(name->name);
        u32 dlen = ACCESS_ONCE(name->len);
        char *p;

        if (*buflen < dlen + 1)     <---------------
                return -ENAMETOOLONG;


dlen is **unsigned** datatype (in practice, on all arches it is 'unsigned int'). *buflen will get promoted from int to this type.
Thus, if buflen was already negative, the check will fail to detect the problem it is intended to catch.
Here is how it looks on an instrumented kernel:


will_segfault[5293]: segfault at 0 ip 000000000040044e sp 00007fffa9fcc2f0 error 4 in will_segfault[400000+1000]
 d_path(len:0): took path_with_deleted branch
 path_with_deleted: took normal branch, *buflen:-1
^^^^^^^^^^^^^^^^^ buflen is negative b/c path_with_deleted does prepend(buf, buflen, "\0", 1); without checking its failure status...

 >prepend_path(bptr:ffffc90004f88190,blen:-1)
 <prepend_path(bptr:ffffc90004f88182,blen:-15) error:0
^^^^^^^^^^^^^^^^^ happily prepended data before buffer start
 >prepend_path(bptr:ffffc90004f88182,blen:-15)
 <prepend_path(bptr:ffffc90004f8817e,blen:-19) error:0
 >prepend_path(bptr:ffffc90004f8817e,blen:-19)
 <prepend_path(bptr:ffffc90004f8817a,blen:-23) error:0
 BAD d_path(buf:ffffc90004f88190,len:0): ffffc90004f8817a (22 before start)
 BAD fname:'/usr/bin/will_segfault'

Comment 31 Oleg Nesterov 2014-02-05 17:34:09 UTC

(In reply to Denys Vlasenko from comment #30)
>
> static int prepend_name(char **buffer, int *buflen, struct qstr *name)
> {
>         const char *dname = ACCESS_ONCE(name->name);
>         u32 dlen = ACCESS_ONCE(name->len);
>         char *p;
> 
>         if (*buflen < dlen + 1)     <---------------
>                 return -ENAMETOOLONG;
> 
> 
> dlen is **unsigned**

heh ;)

That is why unchecked prepend() does matter. I am wondering if we
should change it to not decrement  *buflen unconditionally too.
This will fix the bug as well.

But prepend_name() should be changed anyway, imo. Please send the
patch to Al.

Comment 32 Denys Vlasenko 2014-02-05 17:39:25 UTC

Created attachment 859769 [details]
Patch to fix prepend_name() with *buflen < 0

Sent upstream for review

Comment 33 Josh Boyer 2014-02-17 20:53:48 UTC

*** Bug 1065689 has been marked as a duplicate of this bug. ***

Comment 34 Justin M. Forbes 2014-02-24 14:05:23 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.13.4-200.fc20.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 35 Jan Kratochvil 2014-02-28 11:17:45 UTC

3.13.4-200.fc20 does not contain the fix yet.

Comment 36 Jan Kratochvil 2014-02-28 12:05:41 UTC

Could there be some fix included at least into Fedora kernel?

The machine usually gets stuck over night.
But having a custom kernel build I doubt Fedora maintainers will debug it.
Also not sure if it cannot be related:



BUG: soft lockup - CPU#3 stuck for 22s! [gdb:13619]
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole tun ebtable_nat ebtables ip6t_REJECT nf_conntrack_tftp xt_LOG ipt_MASQUERADE nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_conntrack nf_nat nf_conntrack ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device coretemp kvm_intel snd_pcm kvm serio_raw snd_page_alloc tulip iTCO_wdt iTCO_vendor_support snd_timer nfsd i7core_edac gpio_ich i2c_i801 mxm_wmi r8169 mii snd edac_core microcode soundcore auth_rpcgss acpi_cpufreq shpchp lpc_ich mfd_core nfs_acl lockd wmi sunrpc binfmt_misc dm_crypt raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx ata_generic pata_acpi raid6_pq radeon i2c_algo_bit drm_kms_helper ttm crc32c_intel drm pata_jmicron i2c_core
CPU: 3 PID: 13619 Comm: gdb Tainted: G      D      3.13.4-200.dentrybuflen.fc20.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. EX58-UD4/EX58-UD4, BIOS F11t 01/28/2011
task: ffff880107e15ee0 ti: ffff8800cb6c8000 task.ti: ffff8800cb6c8000
RIP: 0010:[<ffffffff8168d5aa>]  [<ffffffff8168d5aa>] _raw_spin_lock+0x2a/0x40
RSP: 0000:ffff8800cb6c97a0  EFLAGS: 00000293
RAX: 000000000000002a RBX: ffff8800cb6c97f0 RCX: 000000000000002c
RDX: 000000000000002c RSI: ffff880000000848 RDI: ffffea000650e5b0
RBP: ffff8800cb6c97a0 R08: ffff8800cb6c97e8 R09: 00000000fffffff3
R10: ffff88019fbe6ec0 R11: ffffea00054c3740 R12: ffffea0005455b80
R13: ffff8800cb6c9b00 R14: 0000000000000297 R15: ffff8800cb6c9730
FS:  0000000000000000(0000) GS:ffff88019fc60000(0063) knlGS:00000000f5bf1940
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 000000002c3e0000 CR3: 00000000b54db000 CR4: 00000000000007e0
DR0: 000000000804a028 DR1: 000000000804a0ac DR2: 000000000804a080
DR3: 000000000804a084 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 ffff8800cb6c97d8 ffffffff811a7a9a ffff8800b2ca5590 ffff8800cb6c98d0
 0000000021200000 ffff8800cb6c9850 ffff8801686b9ac0 ffff8800cb6c9818
 ffffffff8117f3a3 ffff8800cb6c9818 ffffea00061b0000 0000000000000000
Call Trace:
 [<ffffffff811a7a9a>] page_check_address_pmd+0x7a/0x130
 [<ffffffff8117f3a3>] page_referenced_one+0x103/0x170
 [<ffffffff81180f20>] page_referenced+0x260/0x340
 [<ffffffff8115e115>] shrink_active_list+0x1a5/0x330
 [<ffffffff8115e723>] shrink_lruvec+0x483/0x6a0
 [<ffffffff8115e9a6>] shrink_zone+0x66/0x1a0
 [<ffffffff8115f090>] do_try_to_free_pages+0x100/0x670
 [<ffffffff8115f6ef>] try_to_free_pages+0xef/0x170
 [<ffffffff81153d70>] __alloc_pages_nodemask+0x720/0xa90
 [<ffffffff8119555a>] alloc_pages_vma+0x9a/0x140
 [<ffffffff811a904c>] do_huge_pmd_anonymous_page+0xfc/0x410
 [<ffffffff81174ed8>] handle_mm_fault+0x608/0xe30
 [<ffffffff816911dc>] __do_page_fault+0x15c/0x540
 [<ffffffff81179641>] ? vma_rb_erase+0x121/0x220
 [<ffffffff8117b4b7>] ? do_munmap+0x297/0x3b0
 [<ffffffff816915ce>] do_page_fault+0xe/0x10
 [<ffffffff8168dd48>] page_fault+0x28/0x30
Code: 00 66 66 66 66 90 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 0f b7 07 89 d1 66 39 d0 74 f4 f3 90 <0f> b7 07 66 39 c8 75 f6 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00




BUG: soft lockup - CPU#1 stuck for 22s! [kswapd0:65]
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole tun ebtable_nat ebtables ip6t_REJECT nf_conntrack_tftp xt_LOG ipt_MASQUERADE nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_conntrack nf_nat nf_conntrack ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device coretemp kvm_intel snd_pcm kvm serio_raw snd_page_alloc tulip iTCO_wdt iTCO_vendor_support snd_timer nfsd i7core_edac gpio_ich i2c_i801 mxm_wmi r8169 mii snd edac_core microcode soundcore auth_rpcgss acpi_cpufreq shpchp lpc_ich mfd_core nfs_acl lockd wmi sunrpc binfmt_misc dm_crypt raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx ata_generic pata_acpi raid6_pq radeon i2c_algo_bit drm_kms_helper ttm crc32c_intel drm pata_jmicron i2c_core
CPU: 1 PID: 65 Comm: kswapd0 Tainted: G      D      3.13.4-200.dentrybuflen.fc20.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. EX58-UD4/EX58-UD4, BIOS F11t 01/28/2011
task: ffff880194ad08a0 ti: ffff880195e2c000 task.ti: ffff880195e2c000
RIP: 0010:[<ffffffff8168d5ad>]  [<ffffffff8168d5ad>] _raw_spin_lock+0x2d/0x40
RSP: 0018:ffff880195e2dc50  EFLAGS: 00000297
RAX: 000000000000002a RBX: 0000000000000000 RCX: 000000000000002b
RDX: 000000000000002b RSI: ffff880000000ff8 RDI: ffffea000650e5b0
RBP: ffff880195e2dc50 R08: ffff880195e2dc98 R09: ffff8801937cfdf8
R10: ffffffff81cff9c0 R11: 0000000000000005 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88019fc20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000f5d39000 CR3: 0000000001c0c000 CR4: 00000000000007e0
DR0: 000000000804a028 DR1: 000000000804a01c DR2: 00000000006010e8
DR3: 00000000006010ec DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 ffff880195e2dc88 ffffffff811a7a9a ffff8800b2ca5928 ffff880195e2dd80
 000000003fe00000 ffff880195e2dd00 ffff8801686b94c0 ffff880195e2dcc8
 ffffffff8117f3a3 ffff880195e2dcc8 ffffea0005938000 0000000000000000
Call Trace:
 [<ffffffff811a7a9a>] page_check_address_pmd+0x7a/0x130
 [<ffffffff8117f3a3>] page_referenced_one+0x103/0x170
 [<ffffffff81180f20>] page_referenced+0x260/0x340
 [<ffffffff8115e115>] shrink_active_list+0x1a5/0x330
 [<ffffffff8115fdf0>] kswapd+0x320/0x840
 [<ffffffff8115fad0>] ? mem_cgroup_shrink_node_zone+0x120/0x120
 [<ffffffff8108f242>] kthread+0xd2/0xf0
 [<ffffffff8108f170>] ? insert_kthread_work+0x40/0x40
 [<ffffffff81695a7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8108f170>] ? insert_kthread_work+0x40/0x40
Code: 66 66 90 55 48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 0f b7 07 89 d1 66 39 d0 74 f4 f3 90 0f b7 07 <66> 39 c8 75 f6 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66



3.13.4-200.dentrybuflen.fc20.x86_64:
  https://koji.fedoraproject.org/koji/taskinfo?taskID=6560937
page_check_address_pmd:
1618		*ptl = pmd_lock(mm, pmd);
   0xffffffff811a7a9a <+122>:	mov    %r15,(%r12)
   0xffffffff811a7a9e <+126>:	mov    (%rbx),%rdi

Comment 37 Jan Kratochvil 2014-03-02 20:41:33 UTC

kernel-3.13.5-200.fc20.x86_64 probably fixed the "stuck CPU" problems from Comment 36 above, the fix is built with it in:
  https://koji.fedoraproject.org/koji/taskinfo?taskID=6580744

Comment 38 Jan Kratochvil 2014-04-07 11:39:56 UTC

FYI the latest F-20 scratch build:
kernel-3.13.8-200.dentrybuflen.fc20.x86_64
https://koji.fedoraproject.org/koji/taskinfo?taskID=6702294

Comment 39 Justin M. Forbes 2014-05-21 19:41:18 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.14.4-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 40 Denys Vlasenko 2014-05-22 08:56:32 UTC

Upstream fixes for this BZ:

commit f6500801522c61782d4990fa1ad96154cb397cd4
Author: Al Viro <viro.org.uk>
Date:   Sun Jan 26 12:37:55 2014 -0500

    __dentry_path() fixes

    * we need to save the starting point for restarts
    * reject pathologically short buffers outright

    Spotted-by: Denys Vlasenko <dvlasenk>
    Spotted-by: Oleg Nesterov <oleg>
    Signed-off-by: Al Viro <viro.org.uk>


commit e825196d48d2b89a6ec3a8eff280098d2a78207e
Author: Al Viro <viro.org.uk>
Date:   Sun Mar 23 00:28:40 2014 -0400

    make prepend_name() work correctly when called with negative *buflen

    In all callchains leading to prepend_name(), the value left in *buflen
    is eventually discarded unused if prepend_name() has returned a negative.
    So we are free to do what prepend() does, and subtract from *buflen
    *before* checking for underflow (which turns into checking the sign
    of subtraction result, of course).

    Cc: stable.org
    Signed-off-by: Al Viro <viro.org.uk>

Comment 41 Denys Vlasenko 2014-05-22 10:28:24 UTC

These fixes went into Linux-3.14.
Fedora's kernel is at 3.15-something.
Closing this BZ.

Comment 42 Jan Kratochvil 2014-06-19 05:51:35 UTC

Confirming kernel-3.14.7-200.fc20.x86_64 does not crash under local nightly workload, thanks.

Note You need to log in before you can comment on or make changes to this bug.