RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1362667 - python fails when removing a directory tree located on nfs
Summary: python fails when removing a directory tree located on nfs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.4
Hardware: Unspecified
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Miroslav Hradílek
URL: https://github.com/sosreport/sos/pull...
Whiteboard:
Depends On:
Blocks: 1298243 1346768 1481910
TreeView+ depends on / blocked
 
Reported: 2016-08-02 19:48 UTC by Terry Bowling
Modified: 2020-12-14 07:39 UTC (History)
12 users (show)

Fixed In Version: sos-3.4-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1481910 (view as bug list)
Environment:
Last Closed: 2017-08-01 23:08:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2824591 0 None None None 2017-07-12 06:53:13 UTC
Red Hat Product Errata RHBA-2017:2203 0 normal SHIPPED_LIVE sos bug fix and enhancement update 2017-08-01 19:41:56 UTC

Description Terry Bowling 2016-08-02 19:48:19 UTC
Description of problem:

The problem occurs when python tries to remove a directory tree located on nfs, and happens as the result of how nfs deals with the deletion of in-use files.  If a file or directory is removed while it is still held open, the nfs client keeps a placeholder for the deleted file, as long as the file is still open.  This file is referred to as a 'sillyrenamed' file, has a name that begins with '.nfs', followed by a string of numbers, for example: .nfs000000000851330800000069  Once the file is closed, it disappears automatically.

When python recursively removes a directory tree, it gets a directory listing, sees these sillyrenamed files, and tries to delete them.  Because they're not regular files to be deleted, the nfs client returns an error when it's attempted, causing python to see a failure.

Version-Release number of selected component (if applicable):

RHEL 7.2
python-2.7.5-34.el7.x86_64
sos-3.2-35.el7_2.3.noarch

How reproducible:


Steps to Reproduce:
1.  mount /tmp/sosreport to a NFS share with RW access
2.  sosreport --tmp-dir=/tmp/sosreport
3.

Actual results:

Your sosreport has been generated and saved in:
  /tmp/sosreport/sosreport-hostname50-20160520104455.tar.xz

The checksum is: 35a03eb45632af8e194e338555977341

Please send this file to your support representative.

Traceback (most recent call last):
  File "/sbin/sosreport", line 25, in <module>
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1593, in main
    sos.execute()
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1579, in execute
    self.archive.cleanup()
  File "/usr/lib/python2.7/site-packages/sos/archive.py", line 238, in cleanup
    shutil.rmtree(self._archive_root)
  File "/usr/lib64/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/tmp/sosreport/sos.kN3Ivz/sosreport-hostname50-20160520104455'

Expected results:

sosreport file would be created successfully using the NFS share for its temp files so as not to fill local disk.

Additional info:

Comment 1 Terry Bowling 2016-08-02 20:40:15 UTC
Further details of the NFS/python errors.

The problem occurs when python tries to remove a directory tree located on nfs, and happens as the result of how nfs deals with the deletion of in-use files.  If a file or directory is removed while it is still held open, the nfs client keeps a placeholder for the deleted file, as long as the file is still open.  This file is referred to as a 'sillyrenamed' file, has a name that begins with '.nfs', followed by a string of numbers, for example: .nfs000000000851330800000069  Once the file is closed, it disappears automatically.

When python recursively removes a directory tree, it gets a directory listing, sees these sillyrenamed files, and tries to delete them.  Because they're not regular files to be deleted, the nfs client returns an error when it's attempted, causing python to see a failure.

In the case of 'sosreport', the error handling restarts the cleanup process, some of which has already completed.  Because some of the directories have already been removed, and because some of the open files are now closed, python sees a _different_ error this second time through the cleanup.  In this case, it (correctly) complains because it was unable to find the directory to remove, since the directory is in fact not there.

Here are the operations just prior to completion when the temp directory is on nfs (the relevant ones, anyway):

17908 17:57:39.651326 rmdir("/tmp/sosreport/sos.uD3xcU/sosreport-asdf-20160628175736") = 0 <0.000661>

17925 17:57:41.660863 close(3</tmp/sosreport/sos.uD3xcU/tmp8D6vS6>) = 0 <0.043728>
17925 17:57:41.704689 close(4</tmp/sosreport/sos.uD3xcU/tmp8D6vS6>) = 0 <0.000037>
17925 17:57:41.704822 close(5</tmp/sosreport/sos.uD3xcU/tmp7vtkI8>) = 0 <0.000040>

17925 17:57:41.705208 close(8</tmp/sosreport/sos.uD3xcU/tmpFFEtUh>) = 0 <0.000053>
17925 17:57:41.705331 close(9</tmp/sosreport/sos.uD3xcU/tmpFFEtUh>) = 0 <0.000041>
17925 17:57:41.705439 close(10</tmp/sosreport/sos.uD3xcU/tmpWA9HP3>) = 0 <0.000045>
17925 17:57:41.705548 close(11</tmp/sosreport/sos.uD3xcU/tmpWA9HP3>) = 0 <0.000044>

17908 17:57:42.850021 unlink("/tmp/sosreport/sos.uD3xcU/tmp8D6vS6") = 0 <0.001250>
17908 17:57:42.851443 unlink("/tmp/sosreport/sos.uD3xcU/tmp7vtkI8") = 0 <0.000837>

17908 17:57:42.852442 close(9</tmp/sosreport/sos.uD3xcU/tmpFFEtUh>) = 0 <0.000018>
17908 17:57:42.852563 unlink("/tmp/sosreport/sos.uD3xcU/tmpFFEtUh") = 0 <0.000841>

17908 17:57:42.853576 close(11</tmp/sosreport/sos.uD3xcU/tmpWA9HP3>) = 0 <0.000020>
17908 17:57:42.853670 unlink("/tmp/sosreport/sos.uD3xcU/tmpWA9HP3") = 0 <0.000994>

17908 17:57:43.856727 openat(AT_FDCWD, "/tmp/sosreport/sos.uD3xcU", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 7 <0.000132>

17908 17:57:43.856967 getdents(7</tmp/sosreport/sos.uD3xcU>, {{d_ino=139539204, d_off=1, d_reclen=24, d_name=".", d_type=DT_UNKNOWN} {d_ino=778186, d_off=2, d_reclen=24, d_name="..", d_type=DT_UNKNOWN} {d_ino=139539207, d_off=3, d_reclen=48, d_name=".nfs000000000851330700000068", d_type=DT_REG} {d_ino=139539208, d_off=4, d_reclen=48, d_name=".nfs000000000851330800000069", d_type=DT_REG} {d_ino=139539215, d_off=5, d_reclen=48, d_name=".nfs000000000851330f0000006a", d_type=DT_REG} {d_ino=139539217, d_off=6, d_reclen=48, d_name=".nfs00000000085133110000006b", d_type=DT_REG}}, 32768) = 240 <0.000198>

^^^^ note the sillyrename files

17908 17:57:43.857228 getdents(7</tmp/sosreport/sos.uD3xcU>, {}, 32768) = 0 <0.000008>
17908 17:57:43.857270 close(7</tmp/sosreport/sos.uD3xcU>) = 0 <0.000009>
17908 17:57:43.857413 unlink("/tmp/sosreport/sos.uD3xcU/.nfs000000000851330700000068") = -1 EBUSY (Device or resource busy) <0.000012>

^^^^ these can't be removed

17908 17:57:43.857846 lstat("/tmp/sosreport/sos.uD3xcU/sosreport-asdf-20160628175736", 0x7ffc44ff6860) = -1 ENOENT (No such file or directory) <0.000106>
17908 17:57:43.858029 openat(AT_FDCWD, "/tmp/sosreport/sos.uD3xcU/sosreport-asdf-20160628175736", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.000243>

^^^^ the directory was removed previously

17908 17:57:43.864222 close(4</tmp/sosreport/sos.uD3xcU/.nfs000000000851330700000068>) = 0 <0.000016>

^^^^ this now gets closed, so the system will clean up the fake directory entry, and the '/tmp/sosreport/sos.uD3xcU' directory will exist, but will be empty when listed.

Comment 2 Tomas Orsava 2016-10-19 13:48:39 UTC
Hi Terry,
Thank you for a very thorough report!

I believe this is not a Python issue. NFS disks don't support deleting directories that contain currently opened files, even if the files themselves have been deleted and thus remain only in the form of the sillyrenamed .nfsXXXXX… files. Therefore the behaviour of Python's `shutil.rmtree` function is correct.

What might be possible is to handle this eventuality in `sosreport` itself, thus I'm reassigning this bug to the `sos` component. However, it remains to be seen if using NFS disks for the temp directory is a supported feature, as it is often not the case.

Comment 3 Pavel Moravec 2016-10-19 14:18:11 UTC
So the sequence is:

- sosreport calls shutil.rmtree on its working directory just after creating tar.xz archive (see /usr/lib/python2.7/site-packages/sos/sosreport.py:1453 and methods finalize & cleanup in /usr/lib/python2.7/site-packages/sos/archive.py)

- some file from that directory is still held open, thus the directory contains the .nfs* file(s)  (why this can happen that a file is still open now?)

- shutil.rmtree fails to remove them and raises exception

- in error handling, a cleanup attempt (sosreport.py:1579) is made to delete the same directory - that is gone, however - but how it can be removed if the shutil.rmtree failed on some files within this dir?

Can somebody explain the two "why?" Or did I misunderstand something?

Comment 4 Bryn M. Reeves 2016-10-19 16:34:13 UTC
> -  some file from that directory is still held open, thus the directory 
> contains the .nfs* file(s)  (why this can happen that a file is still open now?)

It's the temporary log files (pre copying to sos_logs):

17908 17:57:43.857270 close(7</tmp/sosreport/sos.uD3xcU>) = 0 <0.000009>
17908 17:57:43.857413 unlink("/tmp/sosreport/sos.uD3xcU/.nfs000000000851330700000068") = -1 EBUSY (Device or resource busy) <0.000012>

This is currently expected behaviour since the log files remain open at the point the archive is created and the temporary directory is cleaned up; this is fine on all POSIX-like local file systems, which we recommend for sos temporary storage.

There are some clear minor bugs here, from the backtrace in comment #0, and those should be fixed but we do not currently have any clear commitment to formally support NFS mounts as temporary storage for sosreport.

Comment 6 Pavel Moravec 2016-12-15 11:43:16 UTC
The traceback from #c0 should be fixed - we will do so within the scope of this BZ, can be in RHEL 7.4.

For the support of NFS filesystem on --tmp-dir, please raise separate RFE.

Comment 8 Donald Douwsma 2016-12-21 06:35:38 UTC
This is a regression in the behaviour of sos on both RHEL6 and RHEL7.

  [root@rhel7 ~]# echo |/usr/sbin/sosreport --name $(hostname) --tmp-dir  /nfs_sos/sosreport/

  sosreport (version 3.2)
  ...
  Running 78/83: vhostmd...        
  Running 79/83: x11...        
  Running 80/83: xen...        
  Running 81/83: xfs...        
  Running 82/83: xinetd...        
  Running 83/83: yum...        

  Creating compressed archive...

  Your sosreport has been generated and saved in:
    /nfs_sos/sosreport/sosreport-rhel7.localdomain.localdomain-20161221012707.tar.xz

  The checksum is: 72fcd6105c8509c73d09f79dfa265a28

  Please send this file to your support representative.

[root@rhel7 ~]# rpm -q sos
sos-3.2-35.el7.noarch

RHEL7
  sos-3.2-35.el7.noarch     OK
  sos-3.2-35.el7_2.3.noarch FAIL
  sos-3.3-4.el7.noarch      FAIL
  sos-3.3-5.el7_3.noarch    FAIL

RHEL6
  sos-2.2-68.el6.noarch     OK
  sos-3.2-28.el6.noarch     OK
  sos-3.2-28.el6_7.2.noarch FAIL 
    Note comments in https://rhn.redhat.com/errata/RHSA-2016-0152.html

Comment 10 Pavel Moravec 2016-12-21 11:13:00 UTC
I think

https://github.com/sosreport/sos/commit/4a9b919a7f1b9542a23982e49cc9035e84551e13#diff-599b5408ed192a10242eeaa5663803f0R1604

causes the regression.

Could you please (in private comment) share with me an access to some RHEL[6-7] machine with a NFS mount, where I can play with this BZ?

Comment 11 Bryn M. Reeves 2016-12-21 16:52:24 UTC
It's because the temporary log files are now inside the tmpnam directory (that was added in commit 4a9b919): previously these were in /tmp, and outside of any path that sos would call rmtree() for.

Comment 19 Pavel Moravec 2017-01-01 18:25:24 UTC
And here is a fix :)

https://github.com/sosreport/sos/pull/909

Comment 20 Pavel Moravec 2017-02-20 21:36:41 UTC
Closed #909 via 3f79120.

Comment 24 errata-xmlrpc 2017-08-01 23:08:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2203


Note You need to log in before you can comment on or make changes to this bug.