200382 – 32-bit dom0 will not boot due to do_general_protection loop

Bug 200382 - 32-bit dom0 will not boot due to do_general_protection loop

Summary: 32-bit dom0 will not boot due to do_general_protection loop

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel-xen
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Juan Quintela
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	200124
TreeView+	depends on / blocked

Reported:	2006-07-27 11:57 UTC by Stephen Tweedie
Modified:	2007-11-30 22:11 UTC (History)
CC List:	1 user (show)
Fixed In Version:	kernel-xen-2.6.17-1.2462.fc6
Clone Of:
Environment:
Last Closed:	2006-07-28 01:35:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
alt-sysrq-p logs showing stack traces from the hanging loop (18.53 KB, text/plain) 2006-07-27 12:00 UTC, Stephen Tweedie	no flags	Details
Fix hang looping in do_general_protection() (1.16 KB, patch) 2006-07-27 14:15 UTC, Stephen Tweedie	no flags	Details \| Diff
View All

Description Stephen Tweedie 2006-07-27 11:57:36 UTC

Description of problem:
Booting a 32-bit dom0 makes good progress, but hangs after init is started.  The
last lines logged are

INIT: version 2.86 booting
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
		Welcome to Fedora Core
		Press 'I' to enter interactive startup.
Setting clock  (utc): Wed Jul 26 20:01:55 BST 2006 [  OK  ]
Starting udev: [  OK  ]
Setting hostname ghost.scot.redhat.com:  [  OK  ]

Version-Release number of selected component (if applicable):
2.6.17-1.2439

How reproducible:
100%

Steps to Reproduce:
1. Install on LVM
2. Boot

Actual results:
Hangs in lvm.static while setting up lvm volumes.  Stack shows various call
traces indicating a loop in do_general_protection().

Expected results:
No hang!

Additional info:
Stack traces to follow as an attachment.

Comment 1 Stephen Tweedie 2006-07-27 12:00:55 UTC

Created attachment 133144 [details]
alt-sysrq-p logs showing stack traces from the hanging loop

Comment 2 Jeremy Katz 2006-07-27 12:05:32 UTC

Interestingly, the kernel boots fine for me although I'm running on a pretty
outdated install.  Now that we're making some progress, I'll reinstall it when I
get in this morning.

[root@dhcp113 ~]# uname -a
Linux dhcp113.install.boston.redhat.com 2.6.17-1.2439.fc6xen #1 SMP Sun Jul 23
22:44:37 EDT 2006 i686 i686 i386 GNU/Linux

Comment 3 Stephen Tweedie 2006-07-27 12:37:21 UTC

Then I wonder what the difference is.  1.2449 still shows the problem (as long
as I boot with a home-built HV --- without that, it doesn't even get into the
main kernel boot as it loops with HV timekeeping debug messages.)

I'm booting off LVM, current rawhide, PAE kernel works fine, 1G memory,
dual-core Dempsey.

Comment 4 Stephen Tweedie 2006-07-27 14:15:00 UTC

Created attachment 133152 [details]
Fix hang looping in do_general_protection()

For some reason, Xen is not setting user_cs exactly as we request it, but
rather is setting a _wider_ cs in the GDT.  That's perhaps a little insecure
but should work; except that check_lazy_exec_limit() sees that the segment
doesn't match exactly what it should be, and so tries to fix it.

Now, we've just *narrowed* cs:, which can never cure a GPF fault.  Yet we
return 1 to do_general_protection(), which therefore thinks it has made
progress, so it doesn't try any of the other fixups which might be needed.  And
this repeats forever, as we keep trying to set the same narrowed CS and we keep
on actually getting a slightly wider one, resulting in an infinite loop of:

#GPF fixup (0[seg:0]) at 080c76e1, CPU#0.
 exec_limit: ffffffff, user_cs: 0000ffff/00cffb00, CPU_cs: 000067ff/00cffb00.

fixups.

The easy fix is simply to return 1 from check_lazy_exec_limit() only if we
widened the segment in either direction, never if we purely narrowed it.  That
way do_general_protection() will still try to resolve the GPF in other ways
when this occurs.

Verified to cure the hang on my own box.

Note You need to log in before you can comment on or make changes to this bug.