Bug 1046344
| Summary: | kernel 3.12.6-200 keeps rebooting itself | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | H.J. Lu <hongjiu.lu> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 19 | CC: | gansalmon, hongjiu.lu, itamar, jonathan, kernel-maint, madhu.chinakonda, michele | ||||
| Target Milestone: | --- | Flags: | jforbes:
needinfo?
|
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-06-23 14:40:51 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
H.J. Lu
2013-12-24 18:30:27 UTC
Odd. Are there any clues in /var/log/messages? Does the machine oops and then gets rebooted? There are not all too many patches between the two kernels: git lg v3.12.5..v3.12.6 |wc -l 119 Any chance you could bisect this? Created attachment 841682 [details]
/var/log/messages
Here is /var/log/messages during rebooting period.
(In reply to Michele Baldessari from comment #1) > There are not all too many patches between the two kernels: > git lg v3.12.5..v3.12.6 |wc -l > 119 > > Any chance you could bisect this? I will see what I can do after New Year. I've taken a short look. Interestingly after the first start with 3.12.6 kernel the system lasted a couple of hours. Afterwards we only get 2/3 minutes: Dec 23 12:22:44 version 3.12.6-200.0.fc19.x86_64 <-- First boot Dec 23 16:36:47 version 3.12.6-200.0.fc19.x86_64 <-- >2 hours Dec 23 16:39:13 version 3.12.6-200.0.fc19.x86_64 <-- 2/3 minutes from now on Dec 23 16:41:54 version 3.12.6-200.0.fc19.x86_64 Dec 23 16:44:42 version 3.12.6-200.0.fc19.x86_64 Dec 23 16:47:31 version 3.12.6-200.0.fc19.x86_64 The messages before the reboot are all similar. Chances are that either: - systemd[1]: Started Load Kernel Modules. - systemd[1]: Starting Apply Kernel Variables... Have something to do with this. The ioatdma warning is present with 3.12.5 as well so can probably be ignored. Either we can try to bisect this or we can collect a crashdump (https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes) in order to see what is going on here. I took another look at the commits between 3.12.5 and 3.12.6 but nothing stood out. Let me know if you need a hand with crash or via bisecting (crash is probably quicker to at least isolate the issue) regards, Michele (In reply to Michele Baldessari from comment #4) > I've taken a short look. Interestingly after the first start with 3.12.6 > kernel > the system lasted a couple of hours. Afterwards we only get 2/3 minutes: > Dec 23 12:22:44 version 3.12.6-200.0.fc19.x86_64 <-- First boot > Dec 23 16:36:47 version 3.12.6-200.0.fc19.x86_64 <-- >2 hours Machine was under heavy load just before the first reboot. > Dec 23 16:39:13 version 3.12.6-200.0.fc19.x86_64 <-- 2/3 minutes from now on > Dec 23 16:41:54 version 3.12.6-200.0.fc19.x86_64 > Dec 23 16:44:42 version 3.12.6-200.0.fc19.x86_64 > Dec 23 16:47:31 version 3.12.6-200.0.fc19.x86_64 > > The messages before the reboot are all similar. Chances are that either: > - systemd[1]: Started Load Kernel Modules. > - systemd[1]: Starting Apply Kernel Variables... Can I find out what exactly they are doing? > Have something to do with this. > > The ioatdma warning is present with 3.12.5 as well so can probably be > ignored. Yes, it can be ignored. > Either we can try to bisect this or we can collect a crashdump > (https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes) in > order to see what is going on here. I am not sure if kdump will help here since there was no kernel message when it happened. > I took another look at the commits between 3.12.5 and 3.12.6 but nothing > stood out. > Let me know if you need a hand with crash or via bisecting (crash is > probably quicker to at least isolate > the issue) I will try bisect after New Year. But it may take a while to trigger the first reboot. (In reply to H.J. Lu from comment #5) > (In reply to Michele Baldessari from comment #4) > > I've taken a short look. Interestingly after the first start with 3.12.6 > > kernel > > the system lasted a couple of hours. Afterwards we only get 2/3 minutes: > > Dec 23 12:22:44 version 3.12.6-200.0.fc19.x86_64 <-- First boot > > Dec 23 16:36:47 version 3.12.6-200.0.fc19.x86_64 <-- >2 hours > > Machine was under heavy load just before the first reboot. Ah ok. Could be related yes. > > Dec 23 16:39:13 version 3.12.6-200.0.fc19.x86_64 <-- 2/3 minutes from now on > > Dec 23 16:41:54 version 3.12.6-200.0.fc19.x86_64 > > Dec 23 16:44:42 version 3.12.6-200.0.fc19.x86_64 > > Dec 23 16:47:31 version 3.12.6-200.0.fc19.x86_64 > > > > The messages before the reboot are all similar. Chances are that either: > > - systemd[1]: Started Load Kernel Modules. > > - systemd[1]: Starting Apply Kernel Variables... > > Can I find out what exactly they are doing? Now that I looked more closely the first one is for loading modules statically (man modules-load.d and man systemd-modules-load). And at least here it is unconfigured. The second one parses /etc/sysctl.d (man sysctl.d) > > Have something to do with this. > > > > The ioatdma warning is present with 3.12.5 as well so can probably be > > ignored. > > Yes, it can be ignored. > > > Either we can try to bisect this or we can collect a crashdump > > (https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes) in > > order to see what is going on here. > > I am not sure if kdump will help here since there was no kernel > message when it happened. Oh so also on screen there was no visual feedback whatsoever? Unless for some reason it is not shown, then I agree a crash dump will do little. > > I took another look at the commits between 3.12.5 and 3.12.6 but nothing > > stood out. > > Let me know if you need a hand with crash or via bisecting (crash is > > probably quicker to at least isolate > > the issue) > > I will try bisect after New Year. But it may take a while to trigger > the first reboot. Ok, thanks for your help here. (In reply to Michele Baldessari from comment #6) > > > > > > The messages before the reboot are all similar. Chances are that either: > > > - systemd[1]: Started Load Kernel Modules. > > > - systemd[1]: Starting Apply Kernel Variables... > > > > Can I find out what exactly they are doing? > > Now that I looked more closely the first one is for loading modules > statically (man modules-load.d and man systemd-modules-load). And at least > here it is unconfigured. > > The second one parses /etc/sysctl.d (man sysctl.d) Why does OS do that after running for more than 2 hours? > > Oh so also on screen there was no visual feedback whatsoever? I couldn't tell since machine was rebooted :-(. (In reply to H.J. Lu from comment #7) > (In reply to Michele Baldessari from comment #6) > > > > > > > > The messages before the reboot are all similar. Chances are that either: > > > > - systemd[1]: Started Load Kernel Modules. > > > > - systemd[1]: Starting Apply Kernel Variables... > > > > > > Can I find out what exactly they are doing? > > > > Now that I looked more closely the first one is for loading modules > > statically (man modules-load.d and man systemd-modules-load). And at least > > here it is unconfigured. > > > > The second one parses /etc/sysctl.d (man sysctl.d) > > Why does OS do that after running for more than 2 hours? Not sure I saw that it was run after two hours. Do you have a specific timestamp in mind? There are some slight oddities in the timings but I think that is dmesg vs syslog vs journal > > > > Oh so also on screen there was no visual feedback whatsoever? > > I couldn't tell since machine was rebooted :-(. Ah ok. If you are in front of the machine and see it reboot and it is an oops, feel free to take a picture and upload it here I will try to get more info after New Year. Just adding the needinfo flag as a reminder Several machines share one set of monitor/USB keyboard. When it was connected to monitor/USB keyboard, kernel 3.12.6 ran fine. After I unplugged monitor/USB keyboard to use them on another machine, the headless machine rebooted itself without any messages almost immediately: Jan 10 14:52:07 gnu-mic-2 systemd-logind[662]: Removed session 19. Jan 10 14:55:59 gnu-mic-2 kernel: [111751.236673] usb 6-1: USB disconnect, device number 2 Jan 10 14:55:59 gnu-mic-2 kernel: [111751.236679] usb 6-1.1: USB disconnect, device number 3 Jan 10 14:55:59 gnu-mic-2 acpid: input device has been disconnected, fd 10 Jan 10 14:55:59 gnu-mic-2 acpid: input device has been disconnected, fd 11 Jan 10 14:58:57 gnu-mic-2 rsyslogd: [origin software="rsyslogd" swVersion="7.2.6" x-pid="649" x-info="http://www.rsyslog.com"] start Jan 10 14:58:57 gnu-mic-2 kernel: [ 0.000000] Initializing cgroup subsys cpuset Jan 10 14:58:57 gnu-mic-2 kernel: [ 0.000000] Initializing cgroup subsys cpu Jan 10 14:58:57 gnu-mic-2 kernel: [ 0.000000] Initializing cgroup subsys cpuacct It kept rebooting until I plugged in monitor/USB keyboard. Will 3.11/3.12 kernels reboot without monitor/USB keyboard in some cases? It seems to be a nouveau driver bug which is fixed in kernel 3.12.9-200.fc19.x86_64. Now I got [ 8410.941640] usb 6-1: USB disconnect, device number 2 [ 8410.941647] usb 6-1.1: USB disconnect, device number 3 [ 8426.407362] pci_pm_runtime_suspend(): nouveau_pmops_runtime_suspend+0x0/0xd0 [nouveau] returns -22 when I unplugged the USB keyboard, instead of reboot. Can someone confirm that it is a known bug fixed in 3.12.9? *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.13.5-100.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. |