857463 – Netfs will unmount NFS partitions leaving orphaned loopback devices when mounting /dev/loop devices over iSCSI

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 857463 - Netfs will unmount NFS partitions leaving orphaned loopback devices when mounting /dev/loop devices over iSCSI

Summary: Netfs will unmount NFS partitions leaving orphaned loopback devices when moun...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	nfs-maint
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-14 13:48 UTC by Kyle Squizzato
Modified:	2018-11-30 21:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-11-26 12:05:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kyle Squizzato 2012-09-14 13:48:23 UTC

Description of problem:
When a loopback device is created against a file or image on an NFS share and the netfs service is stopped the device is properly unmounted (no longer visible via mount or /proc/mounts) but the /dev/loop0 device remains orphaned.

Version-Release number of selected component (if applicable):
initscripts-9.03.31-2.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a system with a filesystem mounted over NFS from fstab, for example /mnt/looptest
2. Create a file on that filesystem (dd if=/dev/zero bs=1024 count=5 of=/nfsshare/share.img)
3. Create a loopback over that (losetup /dev/loop0 /mnt/nfs/share.img)
4. Manually start tgtd (not from init script)
5. Use tgtadm to set up an iSCSI device from the loopback file
6. Stop netfs (service netfs stop)
7. Run "mount" - observe that filesystem is umounted
8. Run "losetup --all" - observe that loopback is still present
9. Optional - Reboot the machine, a hang will occur. 
  
Actual results:
/dev/loopX devices are orphaned.

Expected results:
/dev/loopX devices should be cleaned up when the netfs service is stopped.

Additional info:
Here's a quick example of the orphaned devices:

# service netfs stop
Unmounting NFS filesystems:  umount.nfs: /mnt/nfs: device is busy
umount.nfs: /mnt/nfs: device is busy
                                                           [FAILED]
Unmounting NFS filesystems (retry):                        [  OK  ]
# mount
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/vda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
# losetup --all
/dev/loop0: [0016]:259301 (/mnt/nfs/share.img)

Comment 3 Kyle Squizzato 2012-09-14 14:17:41 UTC

Here's a resource that may prove useful as well
You might find this strace of VGS

https://eucalyptus.atlassian.net/secure/attachment/12009/strace-of-vgs-during-shutdown-public.png

attached to this bug
https://eucalyptus.atlassian.net/browse/EUCA-3183

helpful.

Comment 6 Dave Wysochanski 2012-11-20 21:46:28 UTC

That vgs strace in Comment #3 has nothing to do with this problem as far as I know.  That hang is a side-effect simply because LVM was not configured with a proper filter to reject the loop device.  You should always configure an LVM filter which rejects any device that does not contain LVM metadata.  As far as I know, the loop device is associated with an NFS file, and exported via tgtd, but does not have LVM metadata on it - please correct me if I'm wrong.

Comment 7 Dave Wysochanski 2012-11-21 23:27:47 UTC

Actually I take that back.  The original reporducer did include LVM, but the one in this bz does not, hence the reference to vgs failing.  I'm not sure it's relevant to the bug though and I think that is why Kyle removed it from the reproducer to simplify.

Comment 8 Jeff Layton 2012-11-26 11:59:54 UTC

I don't think this is a kernel bug at all. If anything it might be an initscripts bug, but looks even more like NOTABUG to me.

Making a loop device on a file essentially just has the kernel hold it open. So it's quite true that the file is busy when the netfs script tries to unmount the filesystem the first time. The (retry) attempt then does a lazy umount. That just detaches the superblock from the filesystem hierarchy, but leaves it in place until its refcount drops. Once you destroy the loop device, then the superblock should get torn down.

So, this all looks like it's working as expected. If you create a loop device manually, why should the netfs script or kernel be responsible for cleaning that up? You should ensure that you do that before running "netfs stop".

I'm going to go ahead and NAK this bug, but feel free to reopen it if you want to discuss it further.

Comment 9 RHEL Program Management 2012-11-26 12:05:59 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.