Bug 975360
Summary: | A userspace app crash made the kernel OOPS in elf_core_dump | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michele Baldessari <michele> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 18 | CC: | dvlasenk, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, michele, onestero | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-11-02 20:20:38 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michele Baldessari
2013-06-18 08:51:48 UTC
Oleg, have you seen anything like this? (In reply to Josh Boyer from comment #1) > Oleg, have you seen anything like this? No... Josh, if you have 3.9.5-201.fc18 sources installed, could you send me (privately) the result of "make fs/binfmt_elf.s" ? And "objdump -d fs/binfmt_elf.o" just in case. Not sure this will help, but elf_core_dump+0xaf0 tells me almost nothing :/ Is it easy to reproduce? I mean, does the kernel crashes every time / often if you send a coredumping sig to ksmtuned? Hi Oleg, nope definitely not reproduceable. I just tried sending SIGSEGV to ksmtuned multiple times and nothing out of the ordinary showed up. (The box was upgraded to 3.9.6-200 for the record). I'll try fiddling some more and update here if I can somehow reproduce regards, Michele Hi Michele, (In reply to Michele Baldessari from comment #3) > > nope definitely not reproduceable. As expected ;) > I'll try fiddling some more and update here if I can somehow reproduce Thanks. Meanwhile I am trying to guess where does it crash. scripts/decodecode reports: All code ======== 0: 48 8b 81 90 02 00 00 mov 0x290(%rcx),%rax 7: 4c 8b 28 mov (%rax),%r13 a: 4d 85 ed test %r13,%r13 d: 0f 84 da 09 00 00 je 0x9ed 13: 4c 8b b5 70 fe ff ff mov -0x190(%rbp),%r14 1a: c7 85 88 fe ff ff 00 movl $0x0,-0x178(%rbp) 21: 00 00 00 24: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 2b:* 49 8b 85 a0 00 00 00 mov 0xa0(%r13),%rax <-- trapping instruction 32: 48 85 c0 test %rax,%rax 35: 74 60 je 0x97 37: 48 8d 78 10 lea 0x10(%rax),%rdi 3b: 89 da mov %ebx,%edx 3d: 4c 89 f6 mov %r14,%rsi Code starting with the faulting instruction =========================================== 0: 49 8b 85 a0 00 00 00 mov 0xa0(%r13),%rax 7: 48 85 c0 test %rax,%rax a: 74 60 je 0x6c c: 48 8d 78 10 lea 0x10(%rax),%rdi 10: 89 da mov %ebx,%edx 12: 4c 89 f6 mov %r14,%rsi My fs/binfmt_elf.s is quite different, but I tried to search for the constants above. And this part looks promising: movq 160(%r12), %rax # <variable>.vm_file, file testq %rax, %rax # file je .L82 #, leaq 16(%rax), %rdi #, tmp429 movl %r14d, %edx # remaining, remaining movq %r13, %rsi # name_curpos.1180, name_curpos.1180 call d_path # If my wild guess is correct, this is fill_files_note()... r13 is vma. But it is not mm->mmap, rax == 0... looks like vma->next is corrupted? Unlikely. I recompiled 3.9.5-201.fc18.x86_64 on my machine (meaning: same source, different gcc). Attaching resulting binfmt_elf.{o,s} Created attachment 772764 [details]
binfmt_elf.o
Created attachment 772765 [details]
binfmt_elf.s
Disassembly of binfmt_elf.o: fa6: 48 89 85 80 fe ff ff mov %rax,-0x180(%rbp) fad: 48 8b 85 48 fe ff ff mov -0x1b8(%rbp),%rax fb4: 48 8b 80 90 02 00 00 mov 0x290(%rax),%rax fbb: 4c 8b 28 mov (%rax),%r13 fbe: 4d 85 ed test %r13,%r13 fc1: 0f 84 ee 0c 00 00 je 1cb5 <elf_core_dump+0x1895> fc7: c7 85 90 fe ff ff 00 movl $0x0,-0x170(%rbp) fce: 00 00 00 fd1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) fd8: 49 8b 85 a0 00 00 00 mov 0xa0(%r13),%rax ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ fdf: 48 85 c0 test %rax,%rax fe2: 74 61 je 1045 <elf_core_dump+0xc25> fe4: 48 8d 78 10 lea 0x10(%rax),%rdi fe8: 89 da mov %ebx,%edx fea: 4c 89 f6 mov %r14,%rsi fed: e8 00 00 00 00 callq ff2 <elf_core_dump+0xbd2> fee: R_X86_64_PC32 d_path-0x4 ff2: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax ff8: 0f 87 74 0c 00 00 ja 1c72 <elf_core_dump+0x1852> ffe: 89 d9 mov %ebx,%ecx 1000: 4c 89 f7 mov %r14,%rdi 1003: 48 89 c6 mov %rax,%rsi 1006: 4c 01 f1 add %r14,%rcx 1009: 89 c3 mov %eax,%ebx 100b: 49 83 c7 18 add $0x18,%r15 100f: 41 89 cc mov %ecx,%r12d 1012: 44 29 f3 sub %r14d,%ebx 1015: 41 29 c4 sub %eax,%r12d 1018: 4c 89 e2 mov %r12,%rdx 101b: 4d 01 e6 add %r12,%r14 101e: e8 00 00 00 00 callq 1023 <elf_core_dump+0xc03> 101f: R_X86_64_PC32 memmove-0x4 Corresponding binfmt_elf.s: movq %rax, -384(%rbp) # name_curpos, %sfp .LVL288: .loc 1 1439 0 movq -440(%rbp), %rax # %sfp, pfo_ret__ .LVL289: movq 656(%rax), %rax # pfo_ret___772->mm, pfo_ret___772->mm movq (%rax), %r13 # _773->mmap, vma .LVL290: testq %r13, %r13 # vma je .L273 #, .loc 1 1438 0 movl $0, -368(%rbp) #, %sfp .LVL291: .p2align 4,,10 .p2align 3 .L199: .LBB1503: .loc 1 1443 0 movq 160(%r13), %rax # vma_1103->vm_file, file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .LVL292: .loc 1 1444 0 testq %rax, %rax # file je .L194 #, .loc 1 1446 0 leaq 16(%rax), %rdi #, D.32731 movl %ebx, %edx # remaining, movq %r14, %rsi # name_curpos, call d_path # .LVL293: .loc 1 1447 0 cmpq $-4096, %rax #, filename ja .L515 #, .loc 1 1458 0 movl %ebx, %ecx # remaining, D.32709 .loc 1 1460 0 movq %r14, %rdi # name_curpos, movq %rax, %rsi # filename, .loc 1 1458 0 addq %r14, %rcx # name_curpos, D.32732 .LVL294: .loc 1 1459 0 movl %eax, %ebx # filename, remaining .LVL295: .loc 1 1465 0 addq $24, %r15 #, start_end_ofs .LVL296: movl %ecx, %r12d # D.32732, D.32699 .loc 1 1459 0 subl %r14d, %ebx # name_curpos, remaining .LVL297: subl %eax, %r12d # filename, D.32699 .loc 1 1460 0 movq %r12, %rdx # D.32699, .loc 1 1461 0 addq %r12, %r14 # D.32699, name_curpos .LVL298: .loc 1 1460 0 call memmove # binfmt_elf.c: static void fill_files_note(struct memelfnote *note) ... for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) { struct file *file; const char *filename; file = vma->vm_file; ^^^^^^^^^^^^^^^^^^^^^^^^^^^ if (!file) continue; filename = d_path(&file->f_path, name_curpos, remaining); if (IS_ERR(filename)) { if (PTR_ERR(filename) == -ENAMETOOLONG) { vfree(data); size = size * 5 / 4; goto alloc; } continue; } /* d_path() fills at the end, move name down */ /* n = strlen(filename) + 1: */ n = (name_curpos + remaining) - filename; remaining = filename - name_curpos; memmove(name_curpos, filename, n); Corresponding part of kernel-3.9.5-201.fc18.x86_64's vmlinux.bin: ffffffff811f373a: 48 8b 8d 38 fe ff ff mov -0x1c8(%rbp),%rcx ffffffff811f3741: 49 83 c7 10 add $0x10,%r15 ffffffff811f3745: 48 8b 81 90 02 00 00 mov 0x290(%rcx),%rax ffffffff811f374c: 4c 8b 28 mov (%rax),%r13 ffffffff811f374f: 4d 85 ed test %r13,%r13 ffffffff811f3752: 0f 84 da 09 00 00 je 0xffffffff811f4132 ffffffff811f3758: 4c 8b b5 70 fe ff ff mov -0x190(%rbp),%r14 ffffffff811f375f: c7 85 88 fe ff ff 00 movl $0x0,-0x178(%rbp) ffffffff811f3766: 00 00 00 ffffffff811f3769: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) ffffffff811f3770: 49 8b 85 a0 00 00 00 mov 0xa0(%r13),%rax ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ffffffff811f3777: 48 85 c0 test %rax,%rax ffffffff811f377a: 74 60 je 0xffffffff811f37dc ffffffff811f377c: 48 8d 78 10 lea 0x10(%rax),%rdi ffffffff811f3780: 89 da mov %ebx,%edx ffffffff811f3782: 4c 89 f6 mov %r14,%rsi ffffffff811f3785: e8 26 18 fc ff callq 0xffffffff811b4fb0 ffffffff811f378a: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax ffffffff811f3790: 0f 87 21 09 00 00 ja 0xffffffff811f40b7 ffffffff811f3796: 89 d9 mov %ebx,%ecx ffffffff811f3798: 4c 89 f7 mov %r14,%rdi Thanks Denys! So you came to the same conclusion... Unfortunately, this means we need more info. Because this really looks like we have a bug somewhere else, just in manifests itself in elf_core_dump(). vma list is corrupted or we race with someone which plays with mm->mmap (nobody should not). Hi Oleg & Denys, I've never managed to reproduce it here. I guess we can either close this one out or leave it open to see if some other soul stumbles into this. Whatever works for you ;) thanks again, Michele *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those. Never seen this one since I reported it. Closing |