Bug 2086693

Summary: nfs-utils-2.6.1-2.rc4.fc36: busyloop in nfsrahead
Product: [Fedora] Fedora Reporter: Damian Wrobel <dwrobel>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: bfields, luk.claes, mironov.ivan, steved, tbecker
Target Milestone: ---   
Target Release: ---   
Hardware: armv7hl   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-20 13:05:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch for fixing busy loop in nfsrahead none

Description Damian Wrobel 2022-05-16 13:20:54 UTC
Created attachment 1880162 [details]
Patch for fixing busy loop in nfsrahead

On a fresh install of Fedora 36 systemd-udev-settle.service takes much time in comparison to earlier Fedora releases:

$ systemd-analyze blame | head -n2
2min 48ms systemd-udev-settle.service
  53.688s NetworkManager-wait-online.service

Further analyses shown that systemd-udev-settle.service invokes nfsrahead:

$ ps aux | grep nfs
root        79  0.0  0.0      0     0 ?        I<   21:21   0:00 [nfsiod]
root       585 91.3  0.0   1872   744 ?        R    21:25   0:11 /usr/libexec/nfsrahead btrfs-1
dw         856  0.0  0.1   7860  1856 pts/0    S+   21:25   0:00 grep --color=auto nfs

which seems to consume 100% of cpu:

$ top
top - 21:27:50 up 6 min,  2 users,  load average: 4.42, 4.74, 2.38
Tasks: 168 total,   2 running, 165 sleeping,   1 stopped,   0 zombie
%Cpu0  : 78.0 us,  3.4 sy,  0.0 ni,  1.7 id, 16.9 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 98.4 us,  1.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 69.1 us, 16.4 sy,  0.0 ni, 14.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    920.4 total,      8.2 free,    413.9 used,    498.2 buff/cache
MiB Swap:    920.0 total,    919.5 free,      0.5 used.    490.3 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
  631 openhab   20   0  456896 235780  15848 S 242.6  25.0   4:31.53 java       
  585 root      20   0    1872    744    684 R 100.0   0.1   2:41.06 nfsrahead  

Under gdb it looks like it busy loops in getopt():

$ gdb /usr/libexec/nfsrahead 
GNU gdb (GDB) Fedora 11.2-3.fc36
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/nfsrahead...
Reading symbols from /usr/lib/debug/usr/libexec/nfsrahead-2.6.1-2.rc4.fc36.arm.debug...
(gdb) bt
No stack.
(gdb) r
Starting program: /usr/libexec/nfsrahead 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGINT, Interrupt.
0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, longind=longind@entry=0x0, long_only=long_only@entry=0, posixly_correct=posixly_correct@entry=0) at getopt.c:708
708	  getopt_data.optind = optind;
(gdb) bt
#0  0x76f16e8c in _getopt_internal (argc=1, argv=0x7efffa14, 
    optstring=0x4006c4 "dF", longopts=longopts@entry=0x0, 
    longind=longind@entry=0x0, long_only=long_only@entry=0, 
    posixly_correct=posixly_correct@entry=0) at getopt.c:708
#1  0x76f16f38 in getopt (argc=<optimized out>, argv=<optimized out>, 
    optstring=<optimized out>) at getopt.c:735
#2  0x004004b8 in main (argc=1, argv=0x7efffa14) at main.c:145
(gdb) n
709	  getopt_data.opterr = opterr;
(gdb) 
711	  result = _getopt_internal_r (argc, argv, optstring, longopts,
(gdb) 
715	  optind = getopt_data.optind;
(gdb) 
716	  optarg = getopt_data.optarg;
(gdb) 
717	  optopt = getopt_data.optopt;
(gdb) 
719	  return result;
(gdb) 
main (argc=1, argv=0x7efffa14) at main.c:146
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 
145		while((opt = getopt(argc, argv, "dF")) != -1) {
(gdb) 
146			switch (opt) {
(gdb) 

with the attached patch the boot time becomes again comparable to earlier Fedora releases:

$ systemd-analyze blame | head -n4
11.213s grafana-server.service
 8.106s firewalld.service
 7.033s NetworkManager-wait-online.service
 5.326s systemd-udev-settle.service

More precisely the systemd-udev-settle.service is executed by ~2min faster during the boot.

Versions involved:
$ rpm -qv systemd nfs-utils gcc
systemd-250.3-8.fc36.armv7hl
nfs-utils-2.6.1-2.rc4.fc36.armv7hl
gcc-12.1.1-1.fc36.armv7hl

Comment 1 Damian Wrobel 2022-05-16 13:26:53 UTC
Based on [1], nfsrahead is a tool for configuring "NFS mounts", so the question is why it is being invoked with "btrfs-1" parameter?
Also the machine doesn't have any nfs mounts configured, so one might expect that there won't be any need to execute such a tool during a boot process.

[1] http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfsrahead/nfsrahead.man;h=5488f63359399e7e3e2896b0f1d1f7253693ef00;hb=18ac9c85d49d081dcfac8ba6ca63b4adc07c8ac6#l15

Comment 2 Ivan Mironov 2022-05-16 17:03:02 UTC
Seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926

Comment 3 Thiago Rafael Becker 2022-05-20 13:05:08 UTC
Indeed it is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2083926.

getopt returns an int and we were using a char for the return type. This is usually harmless, unless we are compiling to arm architectures in which char is unsigned. When we hit this combination, the compiler optimizes the test condition `getopt(argc, argv, "dF") != -1` to true, which results in the infinite loop. 

Thanks for the analysis. Marking it as duplicate of 2083926.

*** This bug has been marked as a duplicate of bug 2083926 ***