Bug 1131765

Summary: Soft hangs on an i686 machine with 3.17-rc1
Product: [Fedora] Fedora Reporter: Bruno Wolff III <bruno>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: bruno, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-22 04:12:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Picture of alt-sysrq-c traceback during hang none

Description Bruno Wolff III 2014-08-20 04:00:24 UTC
Created attachment 928594 [details]
Picture of alt-sysrq-c traceback during hang

Description of problem:
I am getting a soft hang (ctrl-atl-del still starts a reboot) during boots with 3.17-rc1 (but not 3.16 even after rebuilding the initramfs to match 3.17-rc1).
This doesn't happen on my F21 based x86_64 machine.

I am still having trouble collecting netconsole output when the problem is in the early boot. So I took a picture of the traceback from alt-sysrq-c while the hang was in progress. I don't know if that will tell you what is hanging.

Version-Release number of selected component (if applicable):
kernel-PAE-core-3.17.0-0.rc1.git0.1.fc22.i686

Comment 1 Josh Boyer 2014-08-20 11:53:17 UTC
Yeah, that picture doesn't really tell us anything other than you forced a sysrq-c.

If you can get netconsole working, that woudl be helpful.  The only other avenue is bisection I suppose.

Comment 2 Bruno Wolff III 2014-08-20 12:02:58 UTC
I don't think netconsole is going to provide much help. The boot process appeared fairly normal up until where it hung. It was before asking for the luks passwords. USB devices were being detected.
I'll plan on going the bisect route.

The netconsole stuff is weird, because I don't get anything from the 3.17-rc1 boot, but then I seem to get everything from the following 3.16 boot. It looks like the output gets queued up and then all gets sent once netconsole is up.

I do see errors in the early booting for netconsole. Typically it is because the network device isn't valid yet. (I tried using eth0, and eth1 instead of p7p1 on the kernel parameters, but that didn't get me any output when the system hangs.)

I have another system I can test on, that I have been holding off on. But I'll do something quick. It has a different USB setup and if it boots that might point to some sort of USB problem. I have seen issues in the past with USB 2 devices connected tp USB 1.1 on the motherboard.

Comment 3 Bruno Wolff III 2014-08-20 12:08:28 UTC
My other i686 machine, which is f21 with rawhide nodebug kernels did boot normally.
So I wouldn't be surprised if the issue was USB related, but it's too early to tell for sure. I'll start going down the bisect route.

Comment 4 Bruno Wolff III 2014-08-21 13:01:06 UTC
I verified the vanilla kernels work the same way (v3.16 is OK and v3.17-rc1 is broken), so I should be able to bisect this. It will probably take a week.

I'll also be testing the new Fedora kernels in case the problem gets fixed independently.

Comment 5 Bruno Wolff III 2014-08-22 04:12:33 UTC
This appears to be fixed in 3.17.0-0.rc1.git1.2.fc22.1.i686+PAE, which is going to save me a lot of trouble.