894360 – starting a F18 install as a CentOS5 xen guest

Bug 894360 - starting a F18 install as a CentOS5 xen guest

Summary: starting a F18 install as a CentOS5 xen guest

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	5.10
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	514489
TreeView+	depends on / blocked

Reported:	2013-01-11 14:54 UTC by Florian La Roche
Modified:	2013-09-30 23:45 UTC (History)
CC List:	10 users (show)
Fixed In Version:	kernel-2.6.18-360.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-30 23:45:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Installation progress screenshot (41.94 KB, image/png) 2013-01-23 08:47 UTC, Wei Shi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:1348	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 5 kernel update	2013-10-01 00:41:39 UTC

Description Florian La Roche 2013-01-11 14:54:21 UTC

Description of problem:

[    3.534209] async_tx: api initialized (async)
[    3.535048] xor: automatically using best checksumming function:
[    3.536056] invalid opcode: 0000 [#1] SMP 
[    3.536060] Modules linked in: xor(+) async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs
[    3.536073] CPU 0 
[    3.536075] Pid: 211, comm: modprobe Not tainted 3.6.10-4.fc18.x86_64 #1  
[    3.536079] RIP: e030:[<ffffffffa006bbfc>]  [<ffffffffa006bbfc>] xor_avx_2+0x4c/0x250 [xor]
[    3.536085] RSP: e02b:ffff880003871cc0  EFLAGS: 00010282
[    3.536088] RAX: 0000000000000000 RBX: ffff880003874000 RCX: 0000000000000050
[    3.536091] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000008005023b
[    3.536094] RBP: ffff880003871d98 R08: 0000000000000000 R09: 00000000000000f0
[    3.536097] R10: 0000000000007ff0 R11: 0720073a076e076f R12: ffff880003877000
[    3.536099] R13: 0000000000000008 R14: 000000008005003b R15: ffff880003874000
[    3.536105] FS:  00007f20ca066740(0000) GS:ffff88002fc00000(0000) knlGS:0000000000000000
[    3.536108] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[    3.536110] CR2: 00007fa0d58f39f0 CR3: 00000000277a2000 CR4: 0000000000000620
[    3.536113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.536116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[    3.536119] Process modprobe (pid: 211, threadinfo ffff880003870000, task ffff88002a441710)
[    3.536122] Stack:
[    3.536124]  ffffffff8105e3f8 0000000000000000 0000000000000006 00000000ffffffff
[    3.536129]  0000000000000000 0000000000000034 ffffffff8162ccab ffff880003874000
[    3.536133]  ffff880003877000 ffffffffa006e000 00000000fffb79ef ffffffff816242ca
[    3.648235] Call Trace:
[    3.648242]  [<ffffffff8105e3f8>] ? console_unlock+0x1e8/0x440
[    3.648247]  [<ffffffff8162ccab>] ? xen_hypervisor_callback+0x1b/0x20
[    3.648252]  [<ffffffff816242ca>] ? error_exit+0x2a/0x60
[    3.648255]  [<ffffffff81623df8>] ? retint_restore_args+0x5/0x6
[    3.648259]  [<ffffffffa006c97a>] do_xor_speed+0x6e/0xc8 [xor]
[    3.648263]  [<ffffffffa0077075>] calibrate_xor_blocks+0x75/0x1000 [xor]
[    3.648274]  [<ffffffffa0077000>] ? 0xffffffffa0076fff
[    3.648279]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    3.648284]  [<ffffffff810be400>] sys_init_module+0x140/0x21f0
[    3.648289]  [<ffffffff812fb480>] ? ddebug_proc_open+0xd0/0xd0
[    3.648292]  [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b
[    3.648295] Code: 49 89 d4 65 48 8b 04 25 28 00 00 00 48 89 84 24 98 00 00 00 31 c0 49 c1 ed 09 e8 30 7a f9 e0 66 90 49 89 c6 e8 c6 89 f9 e0 66 90 <c5> fc 29 04 24 c5 fc 29 4c 24 20 c5 fc 29 54 24 40 c5 fc 29 5c 
[    3.648345] RIP  [<ffffffffa006bbfc>] xor_avx_2+0x4c/0x250 [xor]
[    3.648349]  RSP <ffff880003871cc0>
[    3.648352] ---[ end trace eb2b88f802238019 ]---
dracut-pre-udev[191]: //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line 32:   211 Segmentation fault      modprobe $m &>/dev/null




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Andrew Jones 2013-01-14 12:43:56 UTC

Hi,
do F17 installs on the same host work?

Comment 2 Florian La Roche 2013-01-14 15:43:26 UTC

Cannot test F17 here, but F16 did work. Also F18 continues to install
ok, but not sure how much I can trust this system.

best regards,

Florian La Roche

Comment 3 Andrew Jones 2013-01-22 17:00:54 UTC

This is likely due to

commit 841e3604d35aa70d399146abdc526d8c89a2c2f5
Author: Suresh Siddha <suresh.b.siddha>
Date:   Fri Aug 24 14:13:00 2012 -0700

    x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage
    
    use kernel_fpu_begin/end() instead of unconditionally accessing cr0 and
    saving/restoring just the few used xmm/ymm registers.

which is in the f18 kernel. RHEL5 Xen (and its clones) rely on cr0 changes to keep consistent fpu state. The patch above removes those as an optimization, and unfortunately doesn't supply any alternative paths as long as the AVX cpufeature is present. RHEL5 Xen exposes AVX to guests (until now that's been harmless and possibly allowed guests to benefit from a small performance boost).

Upstream Xen has enhanced their fpu save/restore, so it's possible that running over a later Xen wouldn't have this problem. Either way for PV guests they wouldn't have the problem, because I see in upstream code that AVX is masked for PV guests when the domain can't use XSAVE. No domain running over RHEL5 Xen can use XSAVE, as it's not supported, and is already masked. So for the resolution we should also mask AVX from the guests in the hypervisor.

Comment 4 Andrew Jones 2013-01-22 17:08:34 UTC

A workaround for installing F18 and other distros using kernels >= v3.7-rc1 is to add the following parameter to the guest's kernel command line

clearcpuid=156

e.g. with virt-install use '-x clearcpuid=156'

Comment 5 Wei Shi 2013-01-23 08:45:32 UTC

(In reply to comment #4)
> A workaround for installing F18 and other distros using kernels >= v3.7-rc1
> is to add the following parameter to the guest's kernel command line
> 
> clearcpuid=156
> 
> e.g. with virt-install use '-x clearcpuid=156'

Launch a guest with F18 iso using xm command, after added parameter "clearcpuid=156", the "Segmentation fault" disappear but the installation progress will stopped at one step for 235s(see attachment), after that, it successfully launch anaconda.

Comment 6 Wei Shi 2013-01-23 08:47:31 UTC

Created attachment 685709 [details]
Installation progress screenshot

after 235s, the progress can go on and finally launch anonconda

Comment 7 Andrew Jones 2013-01-23 08:59:08 UTC

(In reply to comment #5)
> 
> Launch a guest with F18 iso using xm command, after added parameter
> "clearcpuid=156", the "Segmentation fault" disappear but the installation
> progress will stopped at one step for 235s(see attachment), after that, it
> successfully launch anaconda.

That's xenbus waiting for devices. I'm not sure what it's waiting for, but I would guess it's a different problem (possibly config related). Please open a new bug and attach your guest config file.

Comment 8 Wei Shi 2013-01-23 10:45:43 UTC

(In reply to comment #7)
> (In reply to comment #5)
> > 
> > Launch a guest with F18 iso using xm command, after added parameter
> > "clearcpuid=156", the "Segmentation fault" disappear but the installation
> > progress will stopped at one step for 235s(see attachment), after that, it
> > successfully launch anaconda.
> 
> That's xenbus waiting for devices. I'm not sure what it's waiting for, but I
> would guess it's a different problem (possibly config related). Please open
> a new bug and attach your guest config file.

Sorry, it's my fault, i make a mistake on disk in config file(using tap:qcow for a raw image), no such problem yet, parameter "clearcpuid" is a workaround for this bug.

Comment 9 RHEL Program Management 2013-05-01 06:40:54 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 10 Andrew Jones 2013-05-02 07:17:11 UTC

PM,

kernel-xen is the kernel. So the component is scheduled to be updated and we need the pm_ack

drew

Comment 11 RHEL Program Management 2013-06-07 14:30:27 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 13 Phillip Lougher 2013-06-11 17:51:23 UTC

Patch(es) available in kernel-2.6.18-360.el5
You can download this test kernel (or newer) from http://people.redhat.com/plougher/el5/
Detailed testing feedback is always welcomed.
If you require guidance regarding testing, please ask the bug assignee.

Comment 15 Wei Shi 2013-06-14 05:41:01 UTC

Reproduced:
 kernel-xen-2.6.18-359.el5

Boot a 64bit hvm guest from RHEL-7.0-20130606.0-Server-x86_64-dvd1-ks.iso will lead to guest crash:
[    6.021764] Call Trace:
[    6.023606]  [<ffffffffa00b5071>] do_xor_speed+0x71/0xc2 [xor]
[    6.027143]  [<ffffffffa00b512d>] calibrate_xor_blocks+0x6b/0xf3e [xor]
[    6.031164]  [<ffffffffa00b50c2>] ? do_xor_speed+0xc2/0xc2 [xor]
[    6.037366]  [<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[    6.040833]  [<ffffffff810c5717>] load_module+0xf47/0x1400
[    6.044384]  [<ffffffff81307600>] ? ddebug_proc_write+0xf0/0xf0
[    6.048178]  [<ffffffff810c1e34>] ? copy_module_from_fd.isra.42+0x44/0x140
[    6.053059]  [<ffffffff810c5d66>] SyS_finit_module+0x86/0xb0
[    6.057388]  [<ffffffff8160f399>] system_call_fastpath+0x16/0x1b
[    6.061935] Code: 89 d4 53 48 89 f3 e8 80 a3 f6 e0 84 c0 0f 84 b9 01 00 00 e8 63 a4 f6 e0 4d 85 ed 49 8d 45 ff 0f 84 9b 01 00 00 66 0f 1f 44 00 00 <c4> c1 7d 6f 04 24 c5 fc 57 03 c5 fd 7f 03 c4 c1 7d 6f 4c 24 20 
[    6.096859] RIP  [<ffffffffa00afc60>] xor_avx_2+0x40/0x210 [xor]
[    6.101594]  RSP <ffff88003f83fd28>
[    6.106464] ---[ end trace 85ff96b28d97c5f0 ]---
dracut-pre-udev[200]: //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line 32:   231 Segmentation fault      modprobe $m &>/dev/null

Verified:
 kernel-xen-2.6.18-360.el5

Guest boot up successfully.

Comment 17 errata-xmlrpc 2013-09-30 23:45:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1348.html

Note You need to log in before you can comment on or make changes to this bug.