Bug 968587
Summary: | instant and remote reliable crash - kernel 3.9.xx | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Laurent Jacquot <jk> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | drjones, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-06-03 08:43:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Laurent Jacquot
2013-05-29 22:17:38 UTC
Some more data on the filesystem config: <root@host>#df -h|egrep Musique\|Vidéos /dev/mapper/Store-Rushs 385G 132G 234G 37% /home/alex/Vidéos /dev/mapper/Store-Music 385G 143G 223G 40% /home/alex/Musique /dev/mapper/Store-Videos 385G 270G 96G 74% /home/alex/Vidéos/Divx <root@host># lvs |egrep Videos\|Music\|Rushs Music Store -wi-ao--- 390,62g Rushs Store -wi-ao--- 390,62g Videos Store -wi-ao--- 390,62g <root@host># pvs |grep Store /dev/md1 Store lvm2 a-- 509,97g 21,69g /dev/md2 Store lvm2 a-- 599,97g 14,03g /dev/md3 Store lvm2 a-- 599,97g 129,34g /dev/md4 Store lvm2 a-- 599,97g 209,34g <root@host># cat /proc/mdstat |grep md[1-6] Personalities : [raid6] [raid5] [raid4] [raid1] md3 : active raid5 sdb7[1] sda7[0] sdd7[4] sdc7[2] 629140992 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/2 pages [0KB], 65536KB chunk md2 : active raid5 sda6[0] sdd6[4] sdb6[1] sdc6[2] 629140992 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/2 pages [0KB], 65536KB chunk md1 : active raid5 sdd5[4] sda5[0] sdb5[1] sdc5[2] 534769152 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 1/2 pages [4KB], 65536KB chunk md5 : active raid5 sda9[0] sdb9[1] sdd9[4] sdc9[2] 413683200 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/2 pages [0KB], 65536KB chunk md4 : active raid5 sdd8[4] sdb8[1] sda8[0] sdc8[2] 629140992 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/2 pages [0KB], 65536KB chunk Thanks for all the details. The piece that looks the most likely is the 9p use. Please enable kdump on the host, reproduce the issue, and then grab the backtraces from each cpu. If the backtrace confirms 9p, then it might be possible to identify which of the 30 or so changes to it since v3.8 introduced this regression by code analysis alone. Either way, please paste the backtraces here. Also, if a backtrace points at 9p, but more debugging is necessary, then you may want to consider building a kernel with CONFIG_NET_9P_DEBUG, reproducing with that (still with kdump), and then extracting the kernel's ring buffer from the core. Actually, either way, extracting the kernel's ring buffer could be useful, particularly if you boot your kernel with the ignore_loglevel kernel parameter. Ok I managed to capture the vmcore (BTW I'm impressed by the kdump feature!) here is my attemps to provide some info: <root@host># crash /var/crash/var/crash/127.0.0.1-2013.05.31-19\:02\:59/vmcore /usr/lib/debug/lib/modules/3.9.4-200.fc18.i686.PAE/vmlinux ...snip... KERNEL: /usr/lib/debug/lib/modules/3.9.4-200.fc18.i686.PAE/vmlinux DUMPFILE: /var/crash/var/crash/127.0.0.1-2013.05.31-19:02:59/vmcore [PARTIAL DUMP] CPUS: 4 DATE: Fri May 31 21:02:49 2013 UPTIME: 00:03:05 LOAD AVERAGE: 0.49, 0.51, 0.23 TASKS: 306 NODENAME: jack RELEASE: 3.9.4-200.fc18.i686.PAE VERSION: #1 SMP Fri May 24 20:24:58 UTC 2013 MACHINE: i686 (2942 Mhz) MEMORY: 8 GB PANIC: "Oops: 0000 [#1] SMP " (check log for details) PID: 1936 COMMAND: "vhost-1935" TASK: e7243fc0 [THREAD_INFO: e9f04000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> log ...snip dmesg output....[ 184.960203] BUG: unable to handle kernel NULL pointer dereference at 0000000c [ 184.962813] IP: [<c08b9c72>] dev_queue_xmit+0xb2/0x3c0 [ 184.965471] *pdpt = 0000000027077001 *pde = 0000000000000000 [ 184.968183] Oops: 0000 [#1] SMP [ 184.970984] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc bnep bluetooth rfkill raid1 snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm acpi_cpufreq mperf coretemp iTCO_wdt iTCO_vendor_support snd_page_alloc snd_timer snd soundcore uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core microcode videodev media i2c_i801 lpc_ich r8169 mfd_core mii asus_atk0110 vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx ata_generic pata_acpi nouveau mxm_wmi wmi video crc32c_intel i2c_algo_bit drm_kms_helper ttm drm [ 184.981413] firewire_ohci firewire_core i2c_core crc_itu_t pata_jmicron [ 184.988163] Pid: 1936, comm: vhost-1935 Not tainted 3.9.4-200.fc18.i686.PAE #1 System manufacturer System Product Name/P7P55D LE [ 184.991761] EIP: 0060:[<c08b9c72>] EFLAGS: 00010202 CPU: 0 [ 184.995422] EIP is at dev_queue_xmit+0xb2/0x3c0 [ 184.999052] EAX: e7195f40 EBX: e71c5800 ECX: 00000000 EDX: 000016e2 [ 185.002683] ESI: 18e6c000 EDI: e9dabe00 EBP: e9f05de4 ESP: e9f05dbc [ 185.006356] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 185.010214] CR0: 8005003b CR2: 0000000c CR3: 2713f000 CR4: 000027f0 [ 185.013992] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 185.017773] DR6: ffff0ff0 DR7: 00000400 [ 185.021578] Process vhost-1935 (pid: 1936, ti=e9f04000 task=e7243fc0 task.ti=e9f04000) [ 185.025596] Stack: [ 185.029521] e71956e2 000016e2 00000000 0000000c e9f05e00 ee916000 f6abcb00 e71c5800 [ 185.033508] f6bed800 0000000c e9f05e00 f7fcf2f9 c08af530 000016e2 e71c5800 00000001 [ 185.037542] 0000000c e9f05e90 f7fd8b17 000016ee 00000000 006860ee e9f05f0c 0000000c [ 185.041768] Call Trace: [ 185.045813] [<f7fcf2f9>] macvlan_start_xmit+0x59/0x110 [macvlan] [ 185.049917] [<c08af530>] ? memcpy_fromiovecend+0x50/0x80 [ 185.054053] [<f7fd8b17>] macvtap_get_user+0x407/0x610 [macvtap] [ 185.058420] [<f7fd8d5a>] macvtap_sendmsg+0x3a/0x50 [macvtap] [ 185.062708] [<f80231bf>] handle_tx+0x27f/0x5f0 [vhost_net] [ 185.066906] [<f8023560>] handle_tx_kick+0x10/0x20 [vhost_net] [ 185.071080] [<f801f8f9>] vhost_worker+0xb9/0x140 [vhost_net] [ 185.075374] [<f801f840>] ? memory_access_ok.isra.9+0xa0/0xa0 [vhost_net] [ 185.079665] [<c046e7c4>] kthread+0x94/0xa0 [ 185.083875] [<c09b0837>] ret_from_kernel_thread+0x1b/0x28 [ 185.088291] [<c046e730>] ? kthread_create_on_node+0xc0/0xc0 [ 185.092654] Code: 8b 83 b0 00 00 00 8b 53 50 89 53 18 66 83 78 02 00 74 34 8b 8b a0 00 00 00 89 ce 2b b3 a8 00 00 00 f6 40 06 11 0f 84 f2 02 00 00 <0f> b6 49 0c c0 e9 04 0f b6 c9 8d 34 8e 0f b7 40 04 83 e8 01 0f [ 185.102014] EIP: [<c08b9c72>] dev_queue_xmit+0xb2/0x3c0 SS:ESP 0068:e9f05dbc [ 185.106624] CR2: 000000000000000c crash> bt -a PID: 1936 TASK: e7243fc0 CPU: 0 COMMAND: "vhost-1935" #0 [e9f05c70] crash_kexec at c04b3129 #1 [e9f05cc8] update_sd_lb_stats at c0488858 #2 [e9f05cf8] __do_page_fault at c09acd30 #3 [e9f05d74] do_page_fault at c09aceb8 #4 [e9f05d7c] error_code (via page_fault) at c09a9e75 EAX: e7195f40 EBX: e71c5800 ECX: 00000000 EDX: 000016e2 EBP: e9f05de4 DS: 007b ESI: 18e6c000 ES: 007b EDI: e9dabe00 GS: 00e0 CS: 0060 EIP: c08b9c72 ERR: ffffffff EFLAGS: 00010202 #5 [e9f05db0] dev_queue_xmit at c08b9c72 #6 [e9f05de8] macvlan_start_xmit at f7fcf2f4 [macvlan] #7 [e9f05e04] macvtap_get_user at f7fd8b12 [macvtap] #8 [e9f05e94] macvtap_sendmsg at f7fd8d55 [macvtap] #9 [e9f05eb0] handle_tx at f80231bc [vhost_net] #10 [e9f05f3c] handle_tx_kick at f802355b [vhost_net] #11 [e9f05f44] vhost_worker at f801f8f6 [vhost_net] #12 [e9f05f70] kthread at c046e7c2 #13 [e9f05fb0] ret_from_kernel_thread at c09b0834 PID: 0 TASK: f70e8cc0 CPU: 1 COMMAND: "swapper/1" #0 [f70e7e70] crash_nmi_callback at c0431936 #1 [f70e7e74] nmi_handle.isra.0 at c09aa7f6 #2 [f70e7e94] do_nmi at c09aa903 #3 [f70e7ec0] nmi at c09a9f32 EAX: 00000020 EBX: 00000008 ECX: 00000001 EDX: 00000000 EBP: f70e7f14 DS: 007b ESI: 00000004 ES: 007b EDI: 00000020 GS: 00e0 CS: 0060 EIP: c06f6fad ERR: 00000020 EFLAGS: 00200046 #4 [f70e7ef4] intel_idle at c06f6fad #5 [f70e7f18] cpuidle_enter at c087ace1 #6 [f70e7f28] cpuidle_wrap_enter at c087b4eb #7 [f70e7f44] cpuidle_enter_tk at c087b55d #8 [f70e7f50] cpuidle_enter_state at c087b1bb #9 [f70e7f5c] cpuidle_idle_call at c087b27d #10 [f70e7f80] cpu_idle at c0419cb5 PID: 0 TASK: f70e9980 CPU: 2 COMMAND: "swapper/2" #0 [f70f1e70] crash_nmi_callback at c0431936 #1 [f70f1e74] nmi_handle.isra.0 at c09aa7f6 #2 [f70f1e94] do_nmi at c09aa903 #3 [f70f1ec0] nmi at c09a9f32 EAX: 00000020 EBX: 00000008 ECX: 00000001 EDX: 00000000 EBP: f70f1f14 DS: 007b ESI: 00000004 ES: 007b EDI: 00000020 GS: 00e0 CS: 0060 EIP: c06f6fad ERR: 00000020 EFLAGS: 00200046 #4 [f70f1ef4] intel_idle at c06f6fad #5 [f70f1f18] cpuidle_enter at c087ace1 #6 [f70f1f28] cpuidle_wrap_enter at c087b4eb #7 [f70f1f44] cpuidle_enter_tk at c087b55d #8 [f70f1f50] cpuidle_enter_state at c087b1bb #9 [f70f1f5c] cpuidle_idle_call at c087b27d #10 [f70f1f80] cpu_idle at c0419cb5 PID: 0 TASK: f70ea640 CPU: 3 COMMAND: "swapper/3" #0 [f70f3e70] crash_nmi_callback at c0431936 #1 [f70f3e74] nmi_handle.isra.0 at c09aa7f6 #2 [f70f3e94] do_nmi at c09aa903 #3 [f70f3ec0] nmi at c09a9f32 EAX: 00000020 EBX: 00000008 ECX: 00000001 EDX: 00000000 EBP: f70f3f14 DS: 007b ESI: 00000004 ES: 007b EDI: 00000020 GS: 00e0 CS: 0060 EIP: c06f6fad ERR: 00000020 EFLAGS: 00200046 #4 [f70f3ef4] intel_idle at c06f6fad #5 [f70f3f18] cpuidle_enter at c087ace1 #6 [f70f3f28] cpuidle_wrap_enter at c087b4eb #7 [f70f3f44] cpuidle_enter_tk at c087b55d #8 [f70f3f50] cpuidle_enter_state at c087b1bb #9 [f70f3f5c] cpuidle_idle_call at c087b27d #10 [f70f3f80] cpu_idle at c0419cb5 crash>exit I do not see 9p being in the forefront, it seems to be in the macvtap stuff. I can dig more on the vmcore (provided you help me), or I can put it on a ftp server, after all, this happens when an ftp client tries to access my server! regards Laurent Looks like 954181 I agree with Josh this is a very similar bactrace and similar weird crashes. The vmcore I captured is availaible at dev.lutty.net through ftp with fedora as both usr and pwd Agreed. Dupping. *** This bug has been marked as a duplicate of bug 954181 *** |