Bug 1255074
Summary: | ipmi SOL console is not responsive during installation on Power8 baremetal machine | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jakub Čajka <jcajka> | ||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 24 | CC: | dan, gansalmon, hannsj_uhl, itamar, jcajka, jonathan, kernel-maint, madhu.chinakonda, mchehab, menantea, ovasik | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | ppc64le | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-10-26 16:53:25 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1071880, 1051573, 1251889 | ||||||||||||
Attachments: |
|
Description
Jakub Čajka
2015-08-19 14:30:22 UTC
for info, I don't have this problem when I install same iso on a bare metal power8 using ipmi also. at least differences between Jakub environment and mine is: 1. I do not use installation from DVD (network install for me) 2. I used a 8247-22L with FW810.21 (TV810_108) when Jacub used 8247-21L with FW810.20 (SV810_101). For the record no change with FW level FW810.30 (SV810_124) on 8247-21L(one socket, 10 cores) with ppc64le Alpha RC2. Seems to also affect ppc64 Alpha RC2. Aug 26 13:15:30 generic-02 kernel: genirq: Flags mismatch irq 26. 00000000 (hvc_console) vs. 00000000 (hvc_console) Aug 26 13:15:30 generic-02 kernel: CPU: 23 PID: 11049 Comm: (echo) Not tainted 4.2.0-0.rc5.git0.2.fc23.ppc64 #1 Aug 26 13:15:30 generic-02 kernel: Call Trace: Aug 26 13:15:30 generic-02 kernel: [c000000feb717560] [c00000000098a9b4] .dump_stack+0x88/0xb4 (unreliable) Aug 26 13:15:30 generic-02 kernel: [c000000feb7175e0] [c00000000013acd8] .__setup_irq+0x6a8/0x6f0 Aug 26 13:15:30 generic-02 kernel: [c000000feb7176a0] [c00000000013af48] .request_threaded_irq+0x128/0x270 Aug 26 13:15:30 generic-02 kernel: [c000000feb717750] [c00000000059f224] .notifier_add_irq+0x74/0xa0 Aug 26 13:15:30 generic-02 kernel: [c000000feb7177d0] [c00000000059e6a4] .hvc_open+0xe4/0x1b0 Aug 26 13:15:30 generic-02 kernel: [c000000feb717860] [c000000000577f48] .tty_open+0x188/0x740 Aug 26 13:15:30 generic-02 kernel: [c000000feb717950] [c0000000002ccca0] .chrdev_open+0x110/0x270 Aug 26 13:15:30 generic-02 kernel: [c000000feb717a00] [c0000000002c2404] .do_dentry_open+0x2b4/0x420 Aug 26 13:15:30 generic-02 kernel: [c000000feb717ab0] [c0000000002d96e8] .path_openat+0x518/0x1520 Aug 26 13:15:30 generic-02 kernel: [c000000feb717c00] [c0000000002dc5dc] .do_filp_open+0xec/0x160 Aug 26 13:15:30 generic-02 kernel: [c000000feb717d70] [c0000000002c4780] .do_sys_open+0x1a0/0x300 Aug 26 13:15:30 generic-02 kernel: [c000000feb717e30] [c000000000009360] system_call+0x38/0xd0 Aug 26 13:15:30 generic-02 kernel: hvc_open: request_irq failed with rc -16. Aug 26 13:15:30 generic-02 kernel: genirq: Flags mismatch irq 26. 00000000 (hvc_console) vs. 00000000 (hvc_console) Aug 26 13:15:30 generic-02 kernel: CPU: 16 PID: 1 Comm: systemd Not tainted 4.2.0-0.rc5.git0.2.fc23.ppc64 #1 Aug 26 13:15:30 generic-02 kernel: Call Trace: Aug 26 13:15:30 generic-02 kernel: [c000001ffc107560] [c00000000098a9b4] .dump_stack+0x88/0xb4 (unreliable) Aug 26 13:15:30 generic-02 kernel: [c000001ffc1075e0] [c00000000013acd8] .__setup_irq+0x6a8/0x6f0 Aug 26 13:15:30 generic-02 kernel: [c000001ffc1076a0] [c00000000013af48] .request_threaded_irq+0x128/0x270 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107750] [c00000000059f224] .notifier_add_irq+0x74/0xa0 Aug 26 13:15:30 generic-02 kernel: [c000001ffc1077d0] [c00000000059e6a4] .hvc_open+0xe4/0x1b0 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107860] [c000000000577f48] .tty_open+0x188/0x740 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107950] [c0000000002ccca0] .chrdev_open+0x110/0x270 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107a00] [c0000000002c2404] .do_dentry_open+0x2b4/0x420 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107ab0] [c0000000002d96e8] .path_openat+0x518/0x1520 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107c00] [c0000000002dc5dc] .do_filp_open+0xec/0x160 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107d70] [c0000000002c4780] .do_sys_open+0x1a0/0x300 Aug 26 13:15:30 generic-02 kernel: [c000001ffc107e30] [c000000000009360] system_call+0x38/0xd0 Aug 26 13:15:30 generic-02 kernel: hvc_open: request_irq failed with rc -16. Aug 26 13:15:30 generic-02 kernel: genirq: Flags mismatch irq 26. 00000000 (hvc_console) vs. 00000000 (hvc_console) Aug 26 13:15:30 generic-02 kernel: CPU: 32 PID: 11061 Comm: (tmux) Not tainted 4.2.0-0.rc5.git0.2.fc23.ppc64 #1 Aug 26 13:15:30 generic-02 kernel: Call Trace: Aug 26 13:15:30 generic-02 kernel: [c000001e46d13560] [c00000000098a9b4] .dump_stack+0x88/0xb4 (unreliable) Aug 26 13:15:30 generic-02 kernel: [c000001e46d135e0] [c00000000013acd8] .__setup_irq+0x6a8/0x6f0 Aug 26 13:15:30 generic-02 kernel: [c000001e46d136a0] [c00000000013af48] .request_threaded_irq+0x128/0x270 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13750] [c00000000059f224] .notifier_add_irq+0x74/0xa0 Aug 26 13:15:30 generic-02 kernel: [c000001e46d137d0] [c00000000059e6a4] .hvc_open+0xe4/0x1b0 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13860] [c000000000577f48] .tty_open+0x188/0x740 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13950] [c0000000002ccca0] .chrdev_open+0x110/0x270 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13a00] [c0000000002c2404] .do_dentry_open+0x2b4/0x420 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13ab0] [c0000000002d96e8] .path_openat+0x518/0x1520 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13c00] [c0000000002dc5dc] .do_filp_open+0xec/0x160 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13d70] [c0000000002c4780] .do_sys_open+0x1a0/0x300 Aug 26 13:15:30 generic-02 kernel: [c000001e46d13e30] [c000000000009360] system_call+0x38/0xd0 Aug 26 13:15:30 generic-02 kernel: hvc_open: request_irq failed with rc -16. Hm... disabling shell on tty2(according to anaconda docs) with "inst.noshell" seems to "unfreeze" the SOL console. It seems that setup of another shell "messes" things up. On both BE/LE with Alpha_RC2 Server DVD. Also it only occurs with DVD as installations source(remote tree works fine i. e. inst.repo). I have observed frozen console even with inst.repo. It seems as some sort of race condition triggering this, as remote installation repository might induce considerable time delay. Also I have occasionally seen the trace to appear once in the logs(while using inst.repo). In this case console stayed usable. Fedora 23 Beta ppc64le still have the problem but the ipmi is not freezed. We can switch between different anaconda consoles only the main console does not respond anymore. Created attachment 1075512 [details]
anaconda.log
Created attachment 1075513 [details]
X log
Created attachment 1075515 [details]
syslog
Created attachment 1075517 [details]
lspci
Problem is a consequence of X server start returning ok in anaconda. X server load a video driver due to ATI Radeon device presence.. Please have a look to previous logs. I did few tests, and Tyan doesn't suffer from this issue. I tried that on P8 with real CD-ROM and with CD emulation. The ipmi sol got frozen in both cases. (In reply to Menanteau Guy from comment #11) > Problem is a consequence of X server start returning ok in anaconda. X > server load a video driver due to ATI Radeon device presence.. Please have a > look to previous logs. In our case there's no VGA card installed. That means it's probably unrelated. The issue is reproducible on IBM Power7R2 with Sapphire firmware too. I couldn't reproduce that with PowerVM and standard serial port. The console gets frozen in ~ 80% of retries. And the following appears on the console only when it gets frozen: [ 84.116318] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] It doesn't appear in cases when the console works ok. One more note ... the console freeze is triggered with the first keypress. I got a case where I have observed frozen console for all tries using inst.repo. OS files are stored on a machine on same network as victim and link seems fast enough to get always the problem with f23 ppc63le TC1 (see comment 5) I use these commands to trig the problem: cd /tmp; wget http://server_machine/pub/linux/fedora/23/ppc64/os/ppc/ppc64/vmlinuz; wget http://server_machine/pub/linux/fedora/23/ppc64/os/ppc/ppc64/initrd.img; kexec --load vmlinuz --initrd initrd.img --command-line " inst.repo=http://server_machine/pub/linux/fedora/23/ppc64/os/ "; kexec --exec Then I tried the same test, but loading vmlinuz and initrd from old f22 and still pointing on f23 files. I used following commands: cd /tmp; wget http://server_machine/pub/linux/fedora/22/ppc64/os/ppc/ppc64/vmlinuz; wget http://server_machine/pub/linux/fedora/22/ppc64/os/ppc/ppc64/initrd.img; kexec --load vmlinuz --initrd initrd.img --command-line " inst.repo=http://server_machine/pub/linux/fedora/23/ppc64/os/ "; kexec --exec This time I didn't have any problem. Also note that as expected, in this last case, kernel-4.0.4-301 did not report any problem like "genirq: Flags mismatch irq 31" (see comment 2) victim machine details: Server-8247-22L-SN1010CEA FW840.00 (TV840_036) Machine type-model: 8247-22L Serial number: 1010CEA Date: 2015-10-20 Time: 9:44:48 UTC Service Processor: Primary (Location: U78CB.001.WZS008F-P1) Problem still present on ppc64/ppc64le fedora 24 beta 1.6. As signaled "comment 3", disabling shell on tty2(according to anaconda docs) with "inst.noshell" seems to "unfreeze" the SOL console. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 24 has now been rebased to 4.7.4-200.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 25, and are still experiencing this issue, please change the version to Fedora 25. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. |