Bug 442949 - F-9 xen pv_ops : unimplemented failsafe_callback() called while running prelink
F-9 xen pv_ops : unimplemented failsafe_callback() called while running prelink
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel-xen (Show other bugs)
rawhide
x86_64 Linux
high Severity medium
: ---
: ---
Assigned To: Eduardo Habkost
Virtualization Bugs
:
Depends On:
Blocks: PvOpsTracker
  Show dependency treegraph
 
Reported: 2008-04-17 16:03 EDT by Stephen Tweedie
Modified: 2009-12-14 15:37 EST (History)
1 user (show)

See Also:
Fixed In Version: kernel-xen-2.6.25.2-2.fc10
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-11 18:04:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg log of oopses (4.98 KB, text/plain)
2008-04-17 16:05 EDT, Stephen Tweedie
no flags Details
clear %fs when loading new TLS descriptors (1/2) (1.96 KB, patch)
2008-04-24 12:09 EDT, Eduardo Habkost
no flags Details | Diff
clear %fs when loading new TLS descriptors (2/2) (2.03 KB, patch)
2008-04-24 12:10 EDT, Eduardo Habkost
no flags Details | Diff

  None (edit)
Description Stephen Tweedie 2008-04-17 16:03:16 EDT
Description of problem:
Oops observed on rawhide running 2.6.25-1.fc9.x86_64.xen:

Kernel BUG at ffffffff80465fc0 [verbose debug info unavailable]
invalid opcode: 0000 [1] 
Pid: 3699, comm: prelink Not tainted 2.6.25-1.fc9.x86_64.xen #1
RIP: e030:[<ffffffff80465fc0>]  [<ffffffff80465fc0>] xen_failsafe_callback+0x0/0x10
RSP: e02b:ffff88001459be00  EFLAGS: 00010002

and no backtrace was observed.  (Backtrace was obtained for a second oops
shortly afterwards.)  Full dmesg to be attached.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.25-1.fc9.x86_64
prelink-0.4.0-3.x86_64

How reproducible:
Unknown: only observed once so far.

Steps to Reproduce:
Unknown: was observed during automatic daily background prelink.
Comment 1 Stephen Tweedie 2008-04-17 16:05:06 EDT
Created attachment 302798 [details]
dmesg log of oopses
Comment 2 Mark McLoughlin 2008-04-18 03:11:43 EDT
So, here's our invalid opcode:

    ENTRY(xen_failsafe_callback)
            /*FIXME: implement me! */
            ud2a
    ENDPROC(xen_failsafe_callback)

Next thing, of course, is to find out what's going wrong that the failsafe
callback is invoked.

(Note: "invalid opcode" is generally just a BUG(), which is implemented using
ud2 ... that caught me out before. Interesting that report_bug() continues to
claim that it's a BUG() even if it can't find IP in the bug table, like in this
case)
Comment 3 Stephen Tweedie 2008-04-18 09:26:30 EDT
Seems to be reproducible: running /etc/cron.daily/prelink manually just resulted
in the same error within under a minute for me.
Comment 4 Mark McLoughlin 2008-04-18 10:54:56 EDT
(Removing from F9Blocker again - this isn't reproducible for me on a fresh
install and it doesn't cause problems during installation. At this point it
doesn't look like it would warrant holding up the release)
Comment 5 Stephen Tweedie 2008-04-18 12:44:39 EDT
It seems to happen every time for me, at least if I force the prelink with

touch /var/lib/misc/prelink.force

I also noticed that the prelink job itself errors out with:
>>>>>
/etc/cron.daily/prelink: line 47:  2738 Segmentation fault     
/usr/sbin/prelink -av $PRELINK_OPTS >> /var/log/prelink/prelink.log 2>&1
/usr/bin/ldd: line 161: /lib/ld-linux.so.2: cannot execute binary file
>>>>>
where /lib/ld-linux.so.2 is the old 32-bit glibc.  Did you have this installed
on your test-case install that completed prelink without error?
Comment 6 Mark McLoughlin 2008-04-21 12:10:40 EDT
(In reply to comment #5)
> It seems to happen every time for me, at least if I force the prelink with
> 
> touch /var/lib/misc/prelink.force

Yeah, had tried that and variations of e.g. "prelink -au" followed by "prelink -avf"

> I also noticed that the prelink job itself errors out with:
> >>>>>
> /etc/cron.daily/prelink: line 47:  2738 Segmentation fault     
> /usr/sbin/prelink -av $PRELINK_OPTS >> /var/log/prelink/prelink.log 2>&1
> /usr/bin/ldd: line 161: /lib/ld-linux.so.2: cannot execute binary file
> >>>>>
> where /lib/ld-linux.so.2 is the old 32-bit glibc.  Did you have this installed
> on your test-case install that completed prelink without error?

Yep, have that.

Comment 7 Eduardo Habkost 2008-04-22 17:59:14 EDT
I couldn't reproduce it here, either. Maybe running it with 'kstack=64' on the 
kernel command-line could reveal other useful kernel addresses on the stack.
Comment 8 Stephen Tweedie 2008-04-23 12:39:01 EDT
'kstack=64' makes no difference.
Comment 9 Eduardo Habkost 2008-04-23 12:57:14 EDT
(In reply to comment #8)
> 'kstack=64' makes no difference.

Hasn't it shown more data after the "Stack:" line on the oops?
Comment 10 Eduardo Habkost 2008-04-23 14:30:39 EDT
'rpm -qa' output may help me to reproduce the bug. I bet there is an specific
file that triggers the bug when loaded by the prelink script, so maybe having
the exact set of packages installed will make the bug reproducible.
Comment 11 Eduardo Habkost 2008-04-24 12:09:39 EDT
Created attachment 303656 [details]
clear %fs when loading new TLS descriptors (1/2)

__switch_to() is on the backtrace before failsafe_callback(). Probably it is
being triggered when returing from a hypercall at
paravirt_leave_lazy_cpu_mode().

The two attached patches were an attempt to fix this, but I haven't tested them
enough to make sure they are correct.
Comment 12 Eduardo Habkost 2008-04-24 12:10:37 EDT
Created attachment 303657 [details]
clear %fs when loading new TLS descriptors (2/2)
Comment 13 Mark McLoughlin 2008-05-09 11:56:05 EDT
Here's where this shows up on kerneloops.org:

  http://www.kerneloops.org/oops.php?number=9341
Comment 14 Mark McLoughlin 2008-05-11 18:04:56 EDT
Should be fixed with kernel-xen-2.6.25-4.fc9 and kernel-xen-2.6.25.2-2.fc10

* Sun May 11 2008 Mark McLoughlin <markmc@redhat.com>
- Fix oops during prelink (ehabkost, #442949)
Comment 15 Fedora Update System 2008-05-12 08:41:24 EDT
kernel-xen-2.6-2.6.25-4.fc9 has been submitted as an update for Fedora 9

Note You need to log in before you can comment on or make changes to this bug.