Bug 209910

Summary: Xen crashes after SELinux modules loading and switch to enforcing mode
Product: [Fedora] Fedora Reporter: Aleksander Adamowski <bugs-redhat>
Component: kernel-xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: ckjohnson, eparis, jmorris, kmacmill, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-13 11:42:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
var/log/messages excerpt from attempt to boot kernel-xen0.x86_64 2.6.18-1.2189.fc5
none
Excerpt of xend.log from xenU crash on 2187 and selinux permissive
none
Excerpt from our log none

Description Aleksander Adamowski 2006-10-07 23:21:02 UTC
Description of problem:

We have a system running 6 virtual servers under Xen on x86_64 architecture (HP
ProLiant 385 server with 2 dual core Opteron processors).

Recently we've decided to start using SELinux on guest machines.

On most of them we've prepared custom policy binary modules (based on avc
entries from kernel log) and loaded them into kernel with semodule tool.

On two of them, we switched SELinux to enforcing mode (one has the custom module
loaded, one is running vanilla policy).

After those changes, the servers started crashing, all at the same time
(together with the zero domain).

We've been able to capture a kernel stack trace once:

Kernel BUG at drivers/xen/netfront/netfront.c:663
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: ipv6 xennet ip_conntrack_netbios_ns ipt_REJECT ipt_LOG
ipt_re
cent xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables
x_tables
 dm_mirror dm_mod
Pid: 8, comm: xenwatch Not tainted 2.6.17-1.2187_FC5xenU #1
RIP: e030:[<ffffffff88055b57>]
<ffffffff88055b57>{:xennet:network_alloc_rx_buffe
rs+507}
RSP: e02b:ffff880000573df8  EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff880000eb6980 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000001040
RBP: ffff880008da0580 R08: 0000000000000010 R09: 00000000000000ac
R10: ffff880008da06c8 R11: 0000000000000000 R12: 0000000000000208
R13: ffff880008da06c8 R14: 00000000000000ff R15: ffff880008da5a50
FS:  00002aaaaaab78b0(0000) GS:ffffffff80524000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process xenwatch (pid: 8, threadinfo ffff880000572000, task ffff88000056c100)
Stack: ffff880000573e18 ffff880008da0000 0000010000000001 ffffffff8026b51c
       00000000000000cc 000000348024d92e ffffffff8036db0a 000000000056c100
       000000000000000f ffff880008da0000
Call Trace: <ffffffff8026b51c>{_spin_unlock_irqrestore+9}
       <ffffffff8036db0a>{xenwatch_thread+169}
<ffffffff88056130>{:xennet:backen
d_changed+668}
       <ffffffff8036da61>{xenwatch_thread+0}
<ffffffff802975d1>{keventd_create_k
thread+0}
       <ffffffff8036d251>{xenwatch_handle_callback+21}
<ffffffff8036dbed>{xenwat
ch_thread+396}
       <ffffffff802977d4>{autoremove_wake_function+0}
<ffffffff802975d1>{keventd
_create_kthread+0}
       <ffffffff80237e0d>{kthread+212} <ffffffff802670ae>{child_rip+8}
       <ffffffff802975d1>{keventd_create_kthread+0}
<ffffffff80237d39>{kthread+0
}
       <ffffffff802670a6>{child_rip+0}

Code: 0f 0b 68 88 7c 05 88 c2 97 02 8b 85 f0 00 00 00 4c 63 f2 48
RIP <ffffffff88055b57>{:xennet:network_alloc_rx_buffers+507} RSP
<ffff880000573d
f8>
 <3>BUG: sleeping function called from invalid context at
include/linux/rwsem.h:
43
in_atomic():0, irqs_disabled():1

Call Trace: <ffffffff80291b41>{blocking_notifier_call_chain+31}
       <ffffffff80217f42>{do_exit+32} <ffffffff802706bd>{kernel_math_error+0}
       <ffffffff80270fcc>{do_invalid_op+163}
<ffffffff88055b57>{:xennet:network_
alloc_rx_buffers+507}
       <ffffffff80210cf6>{__alloc_pages+271} <ffffffff80266daf>{error_exit+0}
       <ffffffff88055b57>{:xennet:network_alloc_rx_buffers+507}
       <ffffffff88055be4>{:xennet:network_alloc_rx_buffers+648}
       <ffffffff8026b51c>{_spin_unlock_irqrestore+9}
<ffffffff8036db0a>{xenwatch
_thread+169}
       <ffffffff88056130>{:xennet:backend_changed+668}
<ffffffff8036da61>{xenwat
ch_thread+0}
       <ffffffff802975d1>{keventd_create_kthread+0}
<ffffffff8036d251>{xenwatch_
handle_callback+21}
       <ffffffff8036dbed>{xenwatch_thread+396}
<ffffffff802977d4>{autoremove_wak
e_function+0}
       <ffffffff802975d1>{keventd_create_kthread+0}
<ffffffff80237e0d>{kthread+2
12}
       <ffffffff802670ae>{child_rip+8}
<ffffffff802975d1>{keventd_create_kthread
+0}
       <ffffffff80237d39>{kthread+0} <ffffffff802670a6>{child_rip+0}
BUG: xenwatch/8, lock held at task exit time!
 [ffffffff80474400] {xenwatch_mutex}
.. held by:          xenwatch:    8 [ffff88000056c100, 110]
... acquired at:               xenwatch_thread+0xa9/0x1a5

Version-Release number of selected component (if applicable):
2.6.17-1.2187_FC5xen0


How reproducible:

Usually systems crash every 2/3 days.

Comment 1 James Morris 2006-10-09 15:08:47 UTC
It's difficult to imagine how SELinux might be able to cause this:

Kernel BUG at drivers/xen/netfront/netfront.c:663
invalid opcode: 0000 [1] SMP

Have you seen anything corresponding in the audit logs?

Comment 2 Aleksander Adamowski 2006-10-09 16:25:34 UTC
The audit logs on Domain0 show the following entries at the time of crash:

time->Mon Oct  9 02:24:34 2006
type=PATH msg=audit(1160353474.279:1269): item=0
name="/var/xen/someguestdomain.img" inode=48037918 dev=68:03 mode=0100755 ouid=0
ogid=0 rdev=00:00 obj=root:object_r:var_t:s0
type=CWD msg=audit(1160353474.279:1269):  cwd="/"
type=SYSCALL msg=audit(1160353474.279:1269): arch=c000003e syscall=21
success=yes exit=0 a0=5f45e0 a1=4 a2=1 a3=5f45e0 items=1 pid=1931
auid=4294967295 uid=0 gid=0 euid=0
 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) comm="python"
exe="/usr/bin/python" subj=system_u:system_r:xend_t:s0
type=AVC msg=audit(1160353474.279:1269): avc:  denied  { read } for  pid=1931
comm="python" name="someguestdomain.img" dev=cciss/c0d0p3 ino=48037918
scontext=system_u:system_r:xe
nd_t:s0 tcontext=root:object_r:var_t:s0 tclass=file
----
time->Mon Oct  9 02:24:36 2006
type=SYSCALL msg=audit(1160353476.127:1270): arch=c000003e syscall=16
success=yes exit=0 a0=3 a1=89a3 a2=7ffffd471fb0 a3=0 items=0 pid=20816
auid=4294967295 uid=0 gid=0 e
uid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) comm="brctl"
exe="/usr/sbin/brctl" subj=system_u:system_r:udev_t:s0-s0:c0.c255
type=ANOM_PROMISCUOUS msg=audit(1160353476.127:1270): dev=vif8.0 prom=0
old_prom=256 auid=4294967295
----
time->Mon Oct  9 02:24:50 2006
type=PATH msg=audit(1160353490.148:1271): item=0
name="/var/xen/someguestdomain.img" inode=48037918 dev=68:03 mode=0100755 ouid=0
ogid=0 rdev=00:00 obj=root:object_r:var_t:s0
type=CWD msg=audit(1160353490.148:1271):  cwd="/"
type=SYSCALL msg=audit(1160353490.148:1271): arch=c000003e syscall=21
success=yes exit=0 a0=6972d0 a1=4 a2=1 a3=6972d0 items=1 pid=1931
auid=4294967295 uid=0 gid=0 euid=0
 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) comm="python"
exe="/usr/bin/python" subj=system_u:system_r:xend_t:s0
type=AVC msg=audit(1160353490.148:1271): avc:  denied  { read } for  pid=1931
comm="python" name="someguestdomain.img" dev=cciss/c0d0p3 ino=48037918
scontext=system_u:system_r:xe
nd_t:s0 tcontext=root:object_r:var_t:s0 tclass=file


Also, at the moment of the crash domain0 has logged the following in the kernel log:

Oct  9 02:24:35 domzero kernel: xenbr0: port 8(vif8.0) entering disabled state
Oct  9 02:24:36 domzero kernel: device vif8.0 left promiscuous mode
Oct  9 02:24:36 domzero kernel: xenbr0: port 8(vif8.0) entering disabled state

Domain0 has continued running in some broken state, and kept logging e.g.
netfilter messages, but the guest domains were dead after this event.

We always run SElinux in permissive mode on Domain0.

Comment 3 James Morris 2006-10-09 16:36:45 UTC
The file, someguestdomain.img, needs to be labeled as something that xend can
read.  If you need assistance with that, please advise.

Just curious, why do you use permissive mode?


Comment 4 Herbert Xu 2006-10-10 03:27:54 UTC
This is a known problem with the 2.6.17 kernel in FC5.  It should be fixed with
the 2.6.18 kernel in FC5-testing (2189 or later).  So please give that a try.

Comment 5 Christopher Johnson 2006-10-11 13:58:09 UTC
Created attachment 138238 [details]
var/log/messages excerpt from attempt to boot kernel-xen0.x86_64 2.6.18-1.2189.fc5

Attachment is an attempt to boot kernel-xen0.x86_64 2.6.18-1.2189.fc5 which was
exceedingly slow, and after several minutes had not started sshd, nor enabled
console logon, nor responsive to ctrl-alt-del, so we had to force power off.

Like the original poster my server is also a Compaq DL385 with single Opteron
280, 1GB memory.

I had been running one xen guest with SELINUX=enforcing on build 2122 kernel
(xen0 and xenU) for months with no instability.  After updating to 2187 and
updating to xen-3.0.2-3.FC5 this weekend, I experienced multiple xenU crashes
in a 24 hour period.  And in each case it was not possible to create the guest
successfully until after rebooting the xen0 system.

The 2189 kernel did not result in a usable xen0 system.

Changing the xenU selinux config to SELINUX=permissive and booting xen0 on 2187
resulted in both xen0 and xenU operational for 17 hours now.  Time will tell
whether this is a definitive avoidance of crash conditions.

Comment 6 Christopher Johnson 2006-10-11 15:11:26 UTC
SELINUX=permissive on xen0 and xenU is not sufficient to avoid xenU crashing.  I
did not make 18 hours.

Name                              ID Mem(MiB) VCPUs State  Time(s)
Domain-0                           0      712     2 r-----   754.3
Zombie-www                         2      256     1 ----cd    27.8


Comment 7 Christopher Johnson 2006-10-11 15:15:28 UTC
Created attachment 138242 [details]
Excerpt of xend.log from xenU crash on 2187 and selinux permissive

Comment 8 Aleksander Adamowski 2006-10-11 20:21:46 UTC
Created attachment 138274 [details]
Excerpt from our log

Same here, attaching xend.log excerpt. SELinux was set to permissive on all
domains.

Comment 9 Aleksander Adamowski 2006-10-11 20:26:30 UTC
I have the following line in "xm list" output too:

Name                              ID Mem(MiB) VCPUs State  Time(s)
Domainname-zacheta                     8      512     1 ----cd   632.5


Comment 10 Herbert Xu 2006-10-13 11:42:06 UTC
Well, the 2187 is known to contain broken xen networking code that results in
crashes that you are all seeing.  The 2189 kernel contains fixes to that
problem.  If you are having problems installing the 2189 kernel, please report
them separately.  Thanks.

*** This bug has been marked as a duplicate of 206630 ***

Comment 11 Herbert Xu 2006-10-13 12:01:29 UTC

*** This bug has been marked as a duplicate of 199944 ***

Comment 12 Chris Langlands 2006-10-13 12:56:23 UTC
The 2189 kernel is not compatible with any currently available FC5 xen userspace
packages.  Not sure how it can be considered to be a fix.

Comment 13 Herbert Xu 2006-10-16 10:19:44 UTC
Please note that the bug 209910 was closed not because it has been resolved, but
because it is a duplicate of 199944 which is still open.  If you're having
problems using the 2189 kernel, please file bugs on that so that they can be
resolved.  Thanks.