Bug 1329264 - lxc + lvm triggers a bug on container shutdown only with kernel 4.4.x
Summary: lxc + lvm triggers a bug on container shutdown only with kernel 4.4.x
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-21 13:34 UTC by Davide Repetto
Modified: 2017-04-29 11:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-28 17:22:39 UTC
Type: Bug


Attachments (Terms of Use)

Description Davide Repetto 2016-04-21 13:34:25 UTC
Description of problem:
=======================
An LXC container configured to mount LVM volumes as its root and /home will trigger some sort of bug on exit. Probably while unmounting the LVM partition something also destroys the LVM device or something of the sort.
The result after that is that the *host* machine will not even be able to reboot or complete a "ps ax" command and the remote login via SSH wouldn't work anymore.

This is a configuration I've been using successfully for years (in order to support user quotas inside LXC containers) and up to kernel-4.3.5-300.fc23.x86_64 things have worked quite well indeed.

This is how filesystems are declared in the LXC's fstab of the container:
=========================================================================  
/dev/mapper/machine_root      /lxc/machine/rootfs            ext4        defaults,noatime                              1 1
/dev/mapper/machine_home      /lxc/machine/rootfs/home       ext4        defaults,usrquota,grpquota,noatime,data=writeback,commit=60,barrier=0            1 2

Version-Release number of selected component (if applicable):
=============================================================
Tasted with:
kernel.x86_64 4.4.4-301.fc23
kernel.x86_64 4.4.6-301.fc23

How reproducible:
=================
Always

Steps to Reproduce:
===================
1. lxc-stop -n container

Actual results:
===============
Something like this (in the logs) with the HOST OS left in a crippled state:
----------------------------------------------------------------------------
[mar apr 19 18:06:08 2016] VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds.  Have a nice day...
[mar apr 19 18:06:08 2016] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f8
[mar apr 19 18:06:08 2016] IP: [<ffffffff812c19bd>] ext4_evict_inode+0x2d/0x4e0
[mar apr 19 18:06:08 2016] PGD 0
[mar apr 19 18:06:08 2016] Oops: 0000 [#1] SMP
[mar apr 19 18:06:08 2016] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_state binfmt_misc veth bridge stp llc nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack raid10 intel_rapl iosf_mbi iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal mgag200 coretemp ttm drm_kms_helper kvm_intel ipmi_devintf ppdev drm kvm joydev ipmi_ssif int3403_thermal lpc_ich irqbypass crct10dif_pclmul crc32_pclmul int3402_thermal int340x_thermal_zone video winbond_cir rc_core parport_pc shpchp parport i2c_i801 ie31200_edac ipmi_si ipmi_msghandler edac_core tpm_tis tpm int3400_thermal acpi_thermal_rel nfsd auth_rpcgss nfs_acl lockd grace sunrpc raid1 igb crc32c_intel i2c_algo_bit dca ptp pps_core fjes
[mar apr 19 18:06:08 2016] CPU: 1 PID: 1513 Comm: lxc-autostart Not tainted 4.4.6-301.fc23.x86_64 #1
[mar apr 19 18:06:08 2016] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.02.0003.070120151022 07/01/2015
[mar apr 19 18:06:08 2016] task: ffff88082680bc00 ti: ffff88082203c000 task.ti: ffff88082203c000
[mar apr 19 18:06:08 2016] RIP: 0010:[<ffffffff812c19bd>]  [<ffffffff812c19bd>] ext4_evict_inode+0x2d/0x4e0
[mar apr 19 18:06:08 2016] RSP: 0018:ffff88082203fd30  EFLAGS: 00010202
[mar apr 19 18:06:08 2016] RAX: ffff88081f855800 RBX: ffff8808185244c0 RCX: 0000000000000034
[mar apr 19 18:06:08 2016] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8808185244c0
[mar apr 19 18:06:08 2016] RBP: ffff88082203fd40 R08: ffff8808185245e0 R09: 0000000000000000
[mar apr 19 18:06:08 2016] R10: 0000000000000019 R11: 0000000000850000 R12: ffff8808185244c0
[mar apr 19 18:06:08 2016] R13: ffffffff818336e0 R14: ffff880818524548 R15: ffffffff81d3af80
[mar apr 19 18:06:08 2016] FS:  00007f42a1988840(0000) GS:ffff88082f440000(0000) knlGS:0000000000000000
[mar apr 19 18:06:08 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mar apr 19 18:06:08 2016] CR2: 00000000000001f8 CR3: 000000081f4bd000 CR4: 00000000001406e0
[mar apr 19 18:06:08 2016] Stack:
[mar apr 19 18:06:08 2016]  ffff8808185244c0 ffff8808185245e0 ffff88082203fd68 ffffffff81248a1a
[mar apr 19 18:06:08 2016]  ffff88081f855800 ffff8808185244c0 ffffffff818336e0 ffff88082203fda0
[mar apr 19 18:06:08 2016]  ffffffff81248cee ffff88082b411800 ffff880829ea9800 0000000000000000
[mar apr 19 18:06:08 2016] Call Trace:
[mar apr 19 18:06:08 2016]  [<ffffffff81248a1a>] evict+0xaa/0x170
[mar apr 19 18:06:08 2016]  [<ffffffff81248cee>] iput+0x1be/0x240
[mar apr 19 18:06:08 2016]  [<ffffffff812b316d>] devpts_del_ref+0x2d/0x40
[mar apr 19 18:06:08 2016]  [<ffffffff8149e166>] pty_unix98_shutdown+0x36/0x50
[mar apr 19 18:06:08 2016]  [<ffffffff81495037>] release_tty+0x37/0xf0
[mar apr 19 18:06:08 2016]  [<ffffffff814954fd>] tty_release+0x40d/0x560
[mar apr 19 18:06:08 2016]  [<ffffffff8122fc3c>] __fput+0xdc/0x1e0
[mar apr 19 18:06:08 2016]  [<ffffffff8122fd7e>] ____fput+0xe/0x10
[mar apr 19 18:06:08 2016]  [<ffffffff810c0b13>] task_work_run+0x73/0x90
[mar apr 19 18:06:08 2016]  [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
[mar apr 19 18:06:08 2016]  [<ffffffff81003d31>] syscall_return_slowpath+0xa1/0xb0
[mar apr 19 18:06:08 2016]  [<ffffffff817a070c>] int_ret_from_sys_call+0x25/0x8f
[mar apr 19 18:06:08 2016] Code: 44 00 00 55 48 89 e5 41 54 53 49 89 fc 0f 1f 44 00 00 41 8b 44 24 48 85 c0 0f 84 b9 00 00 00 49 8b 44 24 28 48 8b 90 58 04 00 00 <48> 8b ba f8 01 00 00 48 85 ff 74 25 41 0f b7 04 24 89 c1 66 81
[mar apr 19 18:06:08 2016] RIP  [<ffffffff812c19bd>] ext4_evict_inode+0x2d/0x4e0
[mar apr 19 18:06:08 2016]  RSP <ffff88082203fd30>
[mar apr 19 18:06:08 2016] CR2: 00000000000001f8
[mar apr 19 18:06:08 2016] ---[ end trace 91f7697a53f02b20 ]---


Expected results:
=================
Container cleanly shutdown and the host os still in tip-top shape.

Additional info:
================
The guest OS inside the lxc machines in use with that system varies between.
CentOS release 5.11 (Final) and CentOS release 6.7 (Final)

Relevant bits of the LXC config files of one of the containers:
---------------------------------------------------------------
lxc.mount = /lxc/machine/fstab

lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m

lxc.cgroup.devices.allow = c 7:23 rwm
lxc.cgroup.devices.allow = c 253:1 rwm
lxc.cgroup.devices.allow = c 253:0 rwm

lxc.cgroup.devices.allow = b 7:23 rwm
lxc.cgroup.devices.allow = b 253:1 rwm
lxc.cgroup.devices.allow = b 253:0 rwm

lxc.autodev = 1
lxc.kmsg = 0
lxc.start.auto = 1
lxc.start.delay = 5
lxc.start.order = 10

lxc.cap.drop =
lxc.cap.drop = mac_admin mac_override
lxc.cap.drop = sys_module sys_nice sys_pacct
lxc.cap.drop = sys_rawio sys_time

Comment 1 Josh Boyer 2016-05-27 13:24:26 UTC
Is this still happening with a 4.5.y kernel?

Comment 2 Davide Repetto 2016-06-09 09:58:42 UTC
Yes, it still happens with kernel 4.5.6-300.

Sorry it took so long Josh, but I couldn't test on production machines so I had to take the time to rough up a test machine in my lab.

Comment 3 Laura Abbott 2016-09-23 19:34:02 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 4 Davide Repetto 2016-10-05 17:46:14 UTC
In the meantime I upgraded the machine to Fedora24 and the bug is still happening; now, with kernel-4.7.5-200.fc24.x86_64.

Last working kernel is kernel-4.3.5-300.fc23.x86_64 (which I'm still using)

Comment 5 Justin M. Forbes 2017-04-11 14:47:33 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 6 Justin M. Forbes 2017-04-28 17:22:39 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 7 Davide Repetto 2017-04-29 11:43:13 UTC
Bug Currently fixed with kernel 4.10.12-200.fc25.


Note You need to log in before you can comment on or make changes to this bug.