Bug 886030

Summary: Device in use by a jbd2 process after unmounting an ext3 filesystem
Product: [Fedora] Fedora Reporter: Federico Simoncelli <fsimonce>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: esandeen, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, obasan, owasserm
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-03 16:39:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Federico Simoncelli 2012-12-11 10:55:04 UTC
Description of problem:
Unmounting an ext3 filesystem result in a stuck jbd2 process that holds the device in use and prevents the same filesystem to be checked (fsck) or mounted again.

Version-Release number of selected component (if applicable):
kernel-3.6.9-2.fc17.x86_64

Steps to Reproduce:
1. mount -t ext3 /dev/bc9201b2-d1db-442c-b7fd-74777ba3596f/master /mnt/tmp
2. umount /mnt/tmp
3. fsck -p /dev/bc9201b2-d1db-442c-b7fd-74777ba3596f/master
  
Actual results:
fsck.ext3: Device or resource busy while trying to open /dev/mapper/bc9201b2--d1db--442c--b7fd--74777ba3596f-master

root      1658  0.0  0.0      0     0 ?        S    03:49   0:00 [jbd2/dm-12-8]


Expected results:
The filesystem should be succesfully unmounted and the device closed.

Comment 1 Orit Wasserman 2012-12-11 13:12:52 UTC
Where did you execute the command ? in the host ?

Was the guest running at the time ?

Comment 2 Eric Sandeen 2012-12-11 14:30:59 UTC
How do you know jbd is stuck?  Can you do sysrq-w to see where things are stuck, if anywhere?

What does blkid /dev/bc9201b2-d1db-442c-b7fd-74777ba3596f/master say?

And maybe ls -l /dev/bc9201b2-d1db-442c-b7fd-74777ba3596f/master

What kind of device is that, is it multipath?

Anything interesting in dmesg?

Comment 3 Eric Sandeen 2012-12-13 20:29:09 UTC
We can reproduce this by:

mount something on /mnt/test
service ntpd restart
umount /mnt/test

systemtap shows us that umount exits w/ 0, but all the fs daemons are still running on it, and the blkdev is still open & busy.

If we do

service ntpd restart

once more after that 2nd restart, *then* the fs truly unmounts, down this sort of path:

process name: ntpd
process traceback:
  ntpd(24591)
Returning from:  0xffffffff8125b910 : jbd2_journal_destroy+0x0/0x250 [kernel]
Returning to  :  0xffffffff8123107c : ext4_put_super+0x6c/0x340 [kernel]
 0xffffffff81191ff1 : generic_shutdown_super+0x61/0xe0 [kernel]
 0xffffffff811920a0 : kill_block_super+0x30/0x80 [kernel]
 0xffffffff81192487 : deactivate_locked_super+0x57/0x90 [kernel]
 0xffffffff8119303e : deactivate_super+0x4e/0x70 [kernel]
 0xffffffff811ad9f7 : mntput_no_expire+0xd7/0x130 [kernel]
 0xffffffff811ada76 : mntput+0x26/0x40 [kernel]
 0xffffffff811ae655 : release_mounts+0x85/0xa0 [kernel]
 0xffffffff811b0136 : put_mnt_ns+0x86/0xa0 [kernel]
 0xffffffff810855bf : free_nsproxy+0x1f/0xb0 [kernel]
 0xffffffff81085800 : switch_task_namespaces+0x50/0x60 [kernel]
 0xffffffff81085820 : exit_task_namespaces+0x10/0x20 [kernel]
 0xffffffff81062966 : do_exit+0x466/0x8c0 [kernel]
 0xffffffff8106310f : do_group_exit+0x3f/0xa0 [kernel]
 0xffffffff81063187 : sys_exit_group+0x17/0x20 [kernel]
 0xffffffff81626fe9 : system_call_fastpath+0x16/0x1b [kernel]
jbd2_journal_destroy exits retval 0

A few observations:

1) This is not unique to ext3, or to the mountpoint in question
2) This only happens if PrivateTmp is set to true in /lib/systemd/system/ntpd.service

Comment 4 Eric Sandeen 2012-12-13 20:57:21 UTC
and

3) ntpd is the only service on the system with PrivateTmp=true

however, if I set it to true on, say, ypbind, I don't see the same behavior.

Comment 5 Eric Sandeen 2012-12-13 21:09:27 UTC
Steps to reproduce, with systemd-44-20.fc17.x86_64 and ntp-4.2.6p5-2.fc17.x86_64:

#!/bin/bash

umount mnt &>/dev/null
truncate --size 1g fsfile
mkfs.ext4 -F fsfile &>/dev/null
mkdir -p mnt

echo "=== loopback mount & check daemons/mnts ==="
mount -o loop fsfile mnt
ps aux | grep loop | grep jbd
mount | grep fsfile

echo "=== restart ntpd ==="
service ntpd restart
echo "=== umount loopback & check daemons/mnts again ==="
umount mnt
ps aux | grep loop | grep jbd
mount | grep fsfile

echo "=== restart ntpd again ==="
service ntpd restart
echo "=== check daemons/mnts again==="
ps aux | grep loop | grep jbd
mount | grep fsfile

------------

after the first umount, you'll see that the mount has disappeared but the jbd daemon continues.

After the 2nd ntpd restart, the daemon on the loop disappears.

Comment 6 Eric Sandeen 2012-12-13 21:11:36 UTC
This bug probably needs to go to systemd or to ntp, not sure which.

Comment 7 Eric Sandeen 2012-12-13 21:44:07 UTC
Same as bug 851970, which apparently has simply abandoned F17 for this problem.

Comment 8 Josh Boyer 2013-01-03 16:39:37 UTC

*** This bug has been marked as a duplicate of bug 851970 ***