Bug 1823247

Summary: [abrt] findutils: leave_dir(): find killed by SIGABRT
Product: [Fedora] Fedora Reporter: Peter Larsen <plarsen>
Component: findutilsAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: kdudka, svashisht, vcrhonek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/d9b60e1f9773ed880b2bb65bfb576133efe924e0
Whiteboard: abrt_hash:d5864717700e558dab70b44a1e827a182ad549da;VARIANT_ID=workstation;
Fixed In Version: findutils-4.7.0-4.fc33 findutils-4.7.0-4.fc32 findutils-4.6.0-25.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 02:31:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: core_backtrace
none
File: cpuinfo
none
File: dso_list
none
File: environ
none
File: limits
none
File: maps
none
File: mountinfo
none
File: open_fds
none
File: proc_pid_status none

Description Peter Larsen 2020-04-13 01:36:45 UTC
Description of problem:
Running find in a "watch" loop to monitor changes. 

Version-Release number of selected component:
1:findutils-4.6.0-24.fc31

Additional info:
reporter:       libreport-2.12.0
backtrace_rating: 4
cgroup:         0::/user.slice/user-1601400001.slice/user/gnome-terminal-server.service
cmdline:        find . -type f
crash_function: leave_dir
executable:     /usr/bin/find
journald_cursor: s=0c3752e789814a58940002600f8f501b;i=436d;b=da731d6eaffa435b8e2ac8126a7e3112;m=467bab6d3;t=5a321997c16f4;x=10a195077aa90357
kernel:         5.5.15-200.fc31.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            1601400001

Truncated backtrace:
Thread no. 1 (5 frames)
 #2 leave_dir at ../../../gl/lib/fts-cycle.c:136
 #4 fts_build at ../../../gl/lib/fts.c:1390
 #5 fts_read at ../../../gl/lib/fts.c:968
 #6 find at ../../find/ftsfind.c:576
 #7 process_all_startpoints at ../../find/ftsfind.c:638

Potential duplicate: bug 1558249

Comment 1 Peter Larsen 2020-04-13 01:36:47 UTC
Created attachment 1678354 [details]
File: backtrace

Comment 2 Peter Larsen 2020-04-13 01:36:48 UTC
Created attachment 1678355 [details]
File: core_backtrace

Comment 3 Peter Larsen 2020-04-13 01:36:49 UTC
Created attachment 1678356 [details]
File: cpuinfo

Comment 4 Peter Larsen 2020-04-13 01:36:50 UTC
Created attachment 1678357 [details]
File: dso_list

Comment 5 Peter Larsen 2020-04-13 01:36:51 UTC
Created attachment 1678358 [details]
File: environ

Comment 6 Peter Larsen 2020-04-13 01:36:52 UTC
Created attachment 1678359 [details]
File: limits

Comment 7 Peter Larsen 2020-04-13 01:36:53 UTC
Created attachment 1678360 [details]
File: maps

Comment 8 Peter Larsen 2020-04-13 01:36:54 UTC
Created attachment 1678361 [details]
File: mountinfo

Comment 9 Peter Larsen 2020-04-13 01:36:55 UTC
Created attachment 1678362 [details]
File: open_fds

Comment 10 Peter Larsen 2020-04-13 01:36:56 UTC
Created attachment 1678363 [details]
File: proc_pid_status

Comment 11 Kamil Dudka 2020-04-14 11:24:51 UTC
Is the crash reproducible?  If yes, could you please try to run find with the -noleaf option?

Comment 12 Peter Larsen 2020-04-14 15:24:53 UTC
(In reply to Kamil Dudka from comment #11)
> Is the crash reproducible?  If yes, could you please try to run find with
> the -noleaf option?

Not easily. It was just a "watch -n1 "find . -type f | wc -l" loop. It ran over an hour, and only one of those runs resulted in the crash.  I've used find for more than 2 decades - I never remember ever seeing a coredump/segfault before using it.

I suspect a race condition - ie. find may have been using a directory inode as that inode was being removed by another process. Pure guess work though.

Comment 13 Kamil Dudka 2020-04-14 16:08:33 UTC
(In reply to Peter Larsen from comment #12)
> I suspect a race condition - ie. find may have been using a directory inode
> as that inode was being removed by another process. Pure guess work though.

This should not result in a crash under normal circumstances.

What is known to break is the leaf optimization on less commonly used file systems.  The optimization allows to traverse a directory tree recursively without calling stat() on all its nodes to check whether they are directories or not.  But the optimization works properly only on file systems that report link counts properly, which for example CIFS is not.  find (more precisely gnulib's FTS module) maintains a white-list and a black-list of file system types to enable or disable the leaf optimization automatically.  The -noleaf option of find can be used on Fedora to explicitly disable the optimization to ease debugging of such cases.  See bug #1558249 for an example of similar bug that was fixed recently.

Comment 14 Peter Larsen 2020-04-14 18:09:12 UTC
(In reply to Kamil Dudka from comment #13)
> (In reply to Peter Larsen from comment #12)
> > I suspect a race condition - ie. find may have been using a directory inode
> > as that inode was being removed by another process. Pure guess work though.
> 
> This should not result in a crash under normal circumstances.
> 
> What is known to break is the leaf optimization on less commonly used file
> systems.  The optimization allows to traverse a directory tree recursively
> without calling stat() on all its nodes to check whether they are
> directories or not.  But the optimization works properly only on file
> systems that report link counts properly, which for example CIFS is not. 
> find (more precisely gnulib's FTS module) maintains a white-list and a
> black-list of file system types to enable or disable the leaf optimization
> automatically.  The -noleaf option of find can be used on Fedora to
> explicitly disable the optimization to ease debugging of such cases.  See
> bug #1558249 for an example of similar bug that was fixed recently.

Ok - this was done on XFS - standard /home LVM volume on a clean Fedora31 install. The procsss running were creating/altering about 65000 files under a base directory of the home directory. Since it's a long process I had "find" list all the files and counted them so I could trace what was happening.  Definitely not using CIFS.  Note the install program ran from an NFS share, but find only monitored the destination directory. 

I'll try to attempt this install again this weekend on a new VM. Unfortunately this software requires about 150GB of disk space to install so I first have to see if I have that space somewhere. If -noleaf works to me that solves the issue. You're welcome to try this yourself - the software in question can be downloaded for free here: https://www.xilinx.com/member/forms/download/xef.html?filename=Xilinx_Unified_2019.2_1106_2127_Lin64.bin but it's obviously not part of Fedora. I give it a destination directory under $HOME and then run the find in a separate terminal against this directory while the install is on-going.

Comment 15 Kamil Dudka 2020-04-15 11:00:54 UTC
Did find operate on a single file system?  If not, was there any automount involved?

I remember a similar bug report where a recursive bind mount triggered by automount crashed find: bug #1188498

Comment 16 Peter Larsen 2020-04-15 14:10:57 UTC
(In reply to Kamil Dudka from comment #15)
> Did find operate on a single file system?  If not, was there any automount
> involved?
> 
> I remember a similar bug report where a recursive bind mount triggered by
> automount crashed find: bug #1188498

Single file system $HOME/installdir - not using FUSE or anything like that.  The install process (not find) read from NFS and wrote to $HOME/installdir.  Note, investigating the directory shows a ton of symbolic and hard links - it looks like all the references are local to the files in this structure, and doesn't link to system wide (different file systems).

Comment 17 Kamil Dudka 2020-04-16 13:26:33 UTC
There was a similar bug report upstream this week and they have an experimental patch for it:

    https://lists.gnu.org/archive/html/bug-gnulib/2020-04/msg00069.html

I will build test packages with the above patch applied...

Comment 18 Kamil Dudka 2020-04-16 16:53:56 UTC
Experimental packages of findutils with the upstream patch applied are available in the following copr:

    https://copr.fedorainfracloud.org/coprs/kdudka/findutils-rhbz1823247/

Comment 19 Kamil Dudka 2020-04-17 16:28:02 UTC
dist-git commit: https://src.fedoraproject.org/rpms/findutils/c/d84db4f6

Comment 20 Fedora Update System 2020-04-23 14:43:53 UTC
FEDORA-2020-b1c7f64b0b has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-b1c7f64b0b

Comment 21 Fedora Update System 2020-04-23 14:43:55 UTC
FEDORA-2020-4ff071d8e5 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4ff071d8e5

Comment 22 Fedora Update System 2020-04-23 20:46:17 UTC
FEDORA-2020-4ff071d8e5 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4ff071d8e5`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4ff071d8e5

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Fedora Update System 2020-04-25 04:18:46 UTC
FEDORA-2020-b1c7f64b0b has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-b1c7f64b0b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-b1c7f64b0b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 24 Fedora Update System 2020-04-28 02:31:16 UTC
FEDORA-2020-4ff071d8e5 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Fedora Update System 2020-05-10 04:50:39 UTC
FEDORA-2020-b1c7f64b0b has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.