RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1912106 - Using -hosts option does not resolve host from /etc/hosts and mount failes
Summary: Using -hosts option does not resolve host from /etc/hosts and mount failes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: autofs
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.4
Assignee: Ian Kent
QA Contact: Kun Wang
URL:
Whiteboard:
Depends On:
Blocks: 1948956
TreeView+ depends on / blocked
 
Reported: 2021-01-03 14:27 UTC by Lukas Herbolt
Modified: 2021-11-10 07:14 UTC (History)
3 users (show)

Fixed In Version: autofs-5.1.4-66.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1948956 (view as bug list)
Environment:
Last Closed: 2021-11-09 19:32:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ceredump -d (4.70 MB, application/x-lz4)
2021-01-11 08:52 UTC, Lukas Herbolt
no flags Details
coredump without -d flag (4.74 MB, application/x-lz4)
2021-01-11 08:53 UTC, Lukas Herbolt
no flags Details
sosreport (11.54 MB, application/x-xz)
2021-01-11 15:05 UTC, Lukas Herbolt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4372 0 None None None 2021-11-09 19:32:58 UTC

Description Lukas Herbolt 2021-01-03 14:27:09 UTC
Description of problem: When using -hosts in auto.master the hosts are not resolved if they are specified in /etc/hosts. 


Version-Release number of selected component (if applicable):
[root@fastvm-rhel-8-3-22 ~]# rpm -q autofs
autofs-5.1.4-43.el8.x86_64


How reproducible:
Everytime

Steps to Reproduce:
1. create record in /etc/hosts
2. setup automount using -hosts
3. cd <autmounted direcotry>

Actual results:
cd hangs 

Expected results:
name is resolved and records are created

Additional info:

My settings for reproducer:
automount should be using just files in /etc 

[root@fastvm-rhel-8-3-22 ~]# grep auto /etc/nsswitch.conf
automount:  files

[root@fastvm-rhel-8-3-22 ~]# grep filer /etc/hosts 
192.168.11.20 filer filer.example.com

# NOTE: mounts done from a hosts map will be mounted with the
/net	-hosts



Autofs debug logs:
===================
[root@fastvm-rhel-8-3-22 ~]# ping filer
PING filer (192.168.11.20) 56(84) bytes of data.
64 bytes from filer (192.168.11.20): icmp_seq=1 ttl=64 time=0.382 ms
^C
--- filer ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.382/0.382/0.382/0.000 ms
[root@fastvm-rhel-8-3-22 ~]# journalctl -f -u autofs.service
-- Logs begin at Sun 2021-01-03 15:06:28 CET. --
Jan 03 15:18:13 fastvm-rhel-8-3-22 automount[1989]: st_expire: state 1 path /net
Jan 03 15:18:13 fastvm-rhel-8-3-22 automount[1989]: expire_proc: exp_proc = 139890338412288 path /net
Jan 03 15:18:13 fastvm-rhel-8-3-22 automount[1989]: expire_cleanup: got thid 139890338412288 path /net stat 1
Jan 03 15:18:13 fastvm-rhel-8-3-22 automount[1989]: expire_cleanup: sigchld: exp 139890338412288 finished, switching from 2 to 1
Jan 03 15:18:13 fastvm-rhel-8-3-22 automount[1989]: st_ready: st_ready(): state = 2 path /net
Jan 03 15:18:14 fastvm-rhel-8-3-22 automount[1989]: st_expire: state 1 path /misc
Jan 03 15:18:14 fastvm-rhel-8-3-22 automount[1989]: expire_proc: exp_proc = 139890338412288 path /misc
Jan 03 15:18:14 fastvm-rhel-8-3-22 automount[1989]: expire_cleanup: got thid 139890338412288 path /misc stat 0
Jan 03 15:18:14 fastvm-rhel-8-3-22 automount[1989]: expire_cleanup: sigchld: exp 139890338412288 finished, switching from 2 to 1
Jan 03 15:18:14 fastvm-rhel-8-3-22 automount[1989]: st_ready: st_ready(): state = 2 path /misc
Jan 03 15:18:37 fastvm-rhel-8-3-22 automount[1989]: handle_packet: type = 3
Jan 03 15:18:37 fastvm-rhel-8-3-22 automount[1989]: handle_packet_missing_indirect: token 2, name filer, request pid 2052
Jan 03 15:18:37 fastvm-rhel-8-3-22 automount[1989]: attempting to mount entry /net/filer
Jan 03 15:18:37 fastvm-rhel-8-3-22 automount[1989]: lookup_mount: lookup(hosts): filer -> (null)
Jan 03 15:18:37 fastvm-rhel-8-3-22 automount[1989]: get_exports: lookup(hosts): fetchng export list for filer
Jan 03 15:18:37 fastvm-rhel-8-3-22 systemd[1]: autofs.service: Main process exited, code=killed, status=11/SEGV
Jan 03 15:18:37 fastvm-rhel-8-3-22 systemd[1]: autofs.service: Failed with result 'signal'.

Comment 1 Ian Kent 2021-01-05 00:04:50 UTC
I can't reproduce this with the steps above.

But I don't think the the 8.3 iso I used was the final release.
I'll try again with that.

Comment 2 Ian Kent 2021-01-11 02:42:58 UTC
(In reply to Ian Kent from comment #1)
> I can't reproduce this with the steps above.
> 
> But I don't think the the 8.3 iso I used was the final release.
> I'll try again with that.

I can't reproduce this on a released RHEL-8.3 install either.

I think the only thing we can do is for you to get a gdb backtrace
for me to have a look at.

That means, to get a meaningful backtrace, the autofs debuginfo and
debugsource packages would need to be installed and the gdb bt command
run on the core.

Assuming gdb is installed and those two debug packages are also
installed and the most recent crash was automount you should be
able to do:

coredumpctl debug

and then issue the gdb "bt" command to get the backtrace.
Could you try this please?

Comment 3 Lukas Herbolt 2021-01-11 08:51:43 UTC
It's surprising you cannot reproduce it, I had same results on 8.3 and 7.8. 
I looks like the mounting over -hosts never worked.


runnning autofs with debuginfo installed and -d option result in coredump:

Jan 11 09:27:44 fastvm-rhel-8-3-22 systemd[1]: systemd-coredump: Succeeded.
Jan 11 09:27:44 fastvm-rhel-8-3-22 systemd-coredump[9158]: Process 9146 (automount) of user 0 dumped core.
                                                           
                                                           Stack trace of thread 9156:
                                                           #0  0x00007f054bd93b29 _int_malloc (libc.so.6)
                                                           #1  0x00007f054bd96076 __libc_calloc (libc.so.6)
                                                           #2  0x00007f054d5820d0 xdr_string (libtirpc.so.3)
                                                           #3  0x00007f054810f082 xdr_name (lookup_hosts.so)
                                                           #4  0x00007f054810f165 xdr_groupnode (lookup_hosts.so)
                                                           #5  0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #6  0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #7  0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #8  0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #9  0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #10 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #11 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #12 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #13 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #14 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #15 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #16 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #17 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #18 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #19 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #20 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #21 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #22 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #23 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #24 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #25 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #26 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #27 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #28 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #29 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #30 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #31 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #32 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #33 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #34 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #35 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #36 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #37 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #38 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #39 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #40 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #41 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #42 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #43 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #44 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #45 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #46 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #47 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #48 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #49 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #50 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #51 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #52 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
                                                           #53 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #54 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #55 0x00007f054810f139 xdr_groups (lookup_hosts.so)
                                                           #56 0x00007f054810f1f4 xdr_exportnode (lookup_hosts.so)
                                                           #57 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #58 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #59 0x00007f054810f1a9 xdr_exports (lookup_hosts.so)
                                                           #60 0x00007f054810f204 xdr_exportnode (lookup_hosts.so)
                                                           #61 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
                                                           #62 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
                                                           #63 0x00007f054810f1a9 xdr_exports (lookup_hosts.so)
                                                           
                                                           Stack trace of thread 9152:
                                                           #0  0x00007f054be00ca1 __poll (libc.so.6)
                                                           #1  0x00005627deaf16b5 poll (automount)
                                                           #2  0x00005627deaf33c6 handle_mounts (automount)
                                                           #3  0x00007f054d79e14a start_thread (libpthread.so.0)
                                                           #4  0x00007f054be0bf23 __clone (libc.so.6)
                                                           
                                                           Stack trace of thread 9155:
                                                           #0  0x00007f054be00ca1 __poll (libc.so.6)
                                                           #1  0x00005627deaf16b5 poll (automount)
                                                           #2  0x00005627deaf33c6 handle_mounts (automount)
                                                           #3  0x00007f054d79e14a start_thread (libpthread.so.0)
                                                           #4  0x00007f054be0bf23 __clone (libc.so.6)
                                                           
                                                           Stack trace of thread 9148:
                                                           #0  0x00007f054d7a46e8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                                                           #1  0x00005627deb0aa60 alarm_handler (automount)
                                                           #2  0x00007f054d79e14a start_thread (libpthread.so.0)
                                                           #3  0x00007f054be0bf23 __clone (libc.so.6)
                                                           
                                                           Stack trace of thread 9149:
                                                           #0  0x00007f054d7a42fc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                           #1  0x00005627deafe95f st_queue_handler (automount)
                                                           #2  0x00007f054d79e14a start_thread (libpthread.so.0)
                                                           #3  0x00007f054be0bf23 __clone (libc.so.6)
                                                           
                                                           Stack trace of thread 9146:
                                                           #0  0x00007f054bd475cc __sigtimedwait (libc.so.6)
                                                           #1  0x00007f054d7a86ac sigwait (libpthread.so.0)
                                                           #2  0x00005627deaefd86 statemachine (automount)
                                                           #3  0x00007f054bd327b3 __libc_start_main (libc.so.6)
                                                           #4  0x00005627deaf042e _start (automount)
Jan 11 09:27:43 fastvm-rhel-8-3-22 systemd[1]: autofs.service: Failed with result 'signal'.
Jan 11 09:27:43 fastvm-rhel-8-3-22 systemd[1]: autofs.service: Main process exited, code=killed, status=11/SEGV
Jan 11 09:27:43 fastvm-rhel-8-3-22 systemd[1]: Started Process Core Dump (PID 9157/UID 0).
Jan 11 09:27:43 fastvm-rhel-8-3-22 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Jan 11 09:27:43 fastvm-rhel-8-3-22 kernel: Code: 00 4c 89 f7 e8 f8 da ff ff e9 d7 fc ff ff 0f 1f 00 4c 89 f8 4c 89 fb 4c 89 ff 4c 89 f9 48 c1 e8 06 48 c1 eb 09 ba 02 00 00 00 <44> 89 64 24 28 48 89 44 24 48 48 c1 ef 0c 83 c0 30 48 c1 e9 0f 89
Jan 11 09:27:43 fastvm-rhel-8-3-22 kernel: automount[9156]: segfault at 7f05435befc8 ip 00007f054bd93b29 sp 00007f05435befa0 error 6 in libc-2.28.so[7f054bd0f000+1b9000]
Jan 11 09:27:42 fastvm-rhel-8-3-22 automount[9146]: get_exports: lookup(hosts): fetchng export list for filer
Jan 11 09:27:42 fastvm-rhel-8-3-22 automount[9146]: lookup_mount: lookup(hosts): filer -> (null)
Jan 11 09:27:42 fastvm-rhel-8-3-22 automount[9146]: attempting to mount entry /net/filer
Jan 11 09:27:42 fastvm-rhel-8-3-22 automount[9146]: handle_packet_missing_indirect: token 1, name filer, request pid 8543
Jan 11 09:27:42 fastvm-rhel-8-3-22 automount[9146]: handle_packet: type = 3

Same coredump happened when running without -d

Comment 4 Lukas Herbolt 2021-01-11 08:52:19 UTC
Created attachment 1746191 [details]
ceredump -d

Comment 5 Lukas Herbolt 2021-01-11 08:53:19 UTC
Created attachment 1746192 [details]
coredump without -d flag

Comment 6 Ian Kent 2021-01-11 13:15:43 UTC
(In reply to Lukas Herbolt from comment #3)
> It's surprising you cannot reproduce it, I had same results on 8.3 and 7.8. 
> I looks like the mounting over -hosts never worked.

You gave me coredumps without enough information to construct a system
that can use to look at them.

That's why I asked you for the backtrace.

> 
> 
> runnning autofs with debuginfo installed and -d option result in coredump:

Which isn't really useful.

This is more or les within libtirpc and I'm not sure what it's doing TBH.

There's no line number information at all, it looks like backtraces
are within libtirpc so the libtirpc debug packages would need to be
installed as well.

> 
> Jan 11 09:27:44 fastvm-rhel-8-3-22 systemd[1]:
> systemd-coredump: Succeeded.
> Jan 11 09:27:44 fastvm-rhel-8-3-22 systemd-coredump[9158]: Process 9146
> (automount) of user 0 dumped core.
>                                                            
>                                                            Stack trace of
> thread 9156:
>                                                            #0 
> 0x00007f054bd93b29 _int_malloc (libc.so.6)
>                                                            #1 
> 0x00007f054bd96076 __libc_calloc (libc.so.6)
>                                                            #2 
> 0x00007f054d5820d0 xdr_string (libtirpc.so.3)
>                                                            #3 
> 0x00007f054810f082 xdr_name (lookup_hosts.so)
>                                                            #4 
> 0x00007f054810f165 xdr_groupnode (lookup_hosts.so)
>                                                            #5 
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #6 
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #7 
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #8 
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #9 
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #10
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #11
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #12
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #13
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #14
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #15
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #16
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #17
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #18
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #19
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #20
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #21
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #22
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #23
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #24
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #25
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #26
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #27
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #28
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #29
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #30
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #31
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #32
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #33
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #34
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #35
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #36
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #37
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #38
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #39
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #40
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #41
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #42
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #43
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #44
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #45
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #46
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #47
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #48
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #49
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #50
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #51
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #52
> 0x00007f054810f175 xdr_groupnode (lookup_hosts.so)
>                                                            #53
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #54
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #55
> 0x00007f054810f139 xdr_groups (lookup_hosts.so)
>                                                            #56
> 0x00007f054810f1f4 xdr_exportnode (lookup_hosts.so)
>                                                            #57
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #58
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #59
> 0x00007f054810f1a9 xdr_exports (lookup_hosts.so)
>                                                            #60
> 0x00007f054810f204 xdr_exportnode (lookup_hosts.so)
>                                                            #61
> 0x00007f054d5834f4 xdr_reference (libtirpc.so.3)
>                                                            #62
> 0x00007f054d583619 xdr_pointer (libtirpc.so.3)
>                                                            #63
> 0x00007f054810f1a9 xdr_exports (lookup_hosts.so)

All that automount does is setup an information structure with
host name and various connection parameters and then calls
the MOUNTPROC_EXPORT RPC. The functions above are standard
libtirpc xdr functions called by code generated from an rpc
definition. They are essentially called by rpc call backs
used by libtirpc.

That libtirpc xdr_string function preceeding the crash shouldn't
fault it should return a fail.

How large is the exports list and are there any netgroups and
if so how many?

To progress further on this I will need more information about
the system it's failing on so I can construct one the same to
look at the core dumps.

Comment 8 Ian Kent 2021-01-11 13:49:48 UTC
(In reply to Ian Kent from comment #6)
> 
> How large is the exports list and are there any netgroups and
> if so how many?

Actually, now that I think of it, if there are netgroups here how
many entries are there in each of the netgroup groups.

IIRC there is a limit on the number of entries in each and if there
are too many entries in the netgroup it needs to be divided into
smaller groups and the subgroups added to the main group obeying
the limit in every group.

It was a long time ago now but that limit might have been around
16 ... not sure now.

Comment 9 Lukas Herbolt 2021-01-11 15:05:01 UTC
The nfs server is exporting about 31k mounts. Not sure what do you mean by netgroup.

RHEL 8.3 NFS server:

[root@fastvm-rhel-8-3-20 ~]# head -2 /etc/exports; echo ...; tail -2 /etc/exports
/mnt/nfs-00000 192.168.0.0/24(rw) 192.168.1.0/24(rw) 192.168.2.0/24(rw) 192.168.3.0/24(rw) 192.168.4.0/24(rw) 192.168.5.0/24(rw) 192.168.6.0/24(rw) 192.168.7.0/24(rw) 192.168.8.0/24(rw) 192.168.9.0/24(rw) 192.168.10.0/24(rw) 192.168.11.0/24(rw) 192.168.12.0/24(rw) 
/mnt/nfs-00001 192.168.0.0/24(rw) 192.168.1.0/24(rw) 192.168.2.0/24(rw) 192.168.3.0/24(rw) 192.168.4.0/24(rw) 192.168.5.0/24(rw) 192.168.6.0/24(rw) 192.168.7.0/24(rw) 192.168.8.0/24(rw) 192.168.9.0/24(rw) 192.168.10.0/24(rw) 192.168.11.0/24(rw) 192.168.12.0/24(rw) 
...
/mnt/nfs-30999 192.168.0.0/24(rw) 192.168.1.0/24(rw) 192.168.2.0/24(rw) 192.168.3.0/24(rw) 192.168.4.0/24(rw) 192.168.5.0/24(rw) 192.168.6.0/24(rw) 192.168.7.0/24(rw) 192.168.8.0/24(rw) 192.168.9.0/24(rw) 192.168.10.0/24(rw) 192.168.11.0/24(rw) 192.168.12.0/24(rw) 
/mnt/nfs-31000 192.168.0.0/24(rw) 192.168.1.0/24(rw) 192.168.2.0/24(rw) 192.168.3.0/24(rw) 192.168.4.0/24(rw) 192.168.5.0/24(rw) 192.168.6.0/24(rw) 192.168.7.0/24(rw) 192.168.8.0/24(rw) 192.168.9.0/24(rw) 192.168.10.0/24(rw) 192.168.11.0/24(rw) 192.168.12.0/24(rw) 


When I tried just with 10, 1000, 10, but 20k it segfault.


>>> You gave me coredumps without enough information to construct a system
>>> that can use to look at them.

>>> That's why I asked you for the backtrace.

I cannot collect the backtrace b/c at the moment of cd /net/<servername> the autofs coredumped.


bt run on the core:


Reading symbols from /usr/sbin/automount...Reading symbols from /usr/lib/debug/usr/sbin/automount-5.1.4-43.el8.x86_64.debug...done.
done.

warning: Ignoring non-absolute filename: <linux-vdso.so.1>
Missing separate debuginfo for linux-vdso.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/2562dce40ef1eb37130134a8186f225a6973cf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/automount --systemd-service --dont-check-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd64cd33b29 in _int_malloc (av=av@entry=0x7fd640000020, bytes=bytes@entry=15) at malloc.c:3711
3711	        malloc_consolidate (av);
[Current thread is 1 (Thread 0x7fd648894700 (LWP 1601))]
(gdb) bt
#0  0x00007fd64cd33b29 in _int_malloc (av=av@entry=0x7fd640000020, bytes=bytes@entry=15) at malloc.c:3711
#1  0x00007fd64cd36076 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=15) at malloc.c:3444
#2  0x00007fd64e5220d0 in xdr_string (xdrs=xdrs@entry=0x7fd6400047e8, cpp=cpp@entry=0x7fd640c8b180, maxsize=maxsize@entry=255) at xdr.c:808
#3  0x00007fd6488ae082 in xdr_name (xdrs=xdrs@entry=0x7fd6400047e8, objp=objp@entry=0x7fd640c8b180) at mount_xdr.c:85
#4  0x00007fd6488ae165 in xdr_groupnode (xdrs=0x7fd6400047e8, objp=0x7fd640c8b180) at mount_xdr.c:129
#5  0x00007fd64e5234f4 in xdr_reference (xdrs=xdrs@entry=0x7fd6400047e8, pp=pp@entry=0x7fd640c8b148, size=size@entry=16, proc=proc@entry=0x7fd6488ae150 <xdr_groupnode>) at xdr_reference.c:88
#6  0x00007fd64e523619 in xdr_pointer (xdrs=xdrs@entry=0x7fd6400047e8, objpp=objpp@entry=0x7fd640c8b148, obj_size=obj_size@entry=16, xdr_obj=xdr_obj@entry=0x7fd6488ae150 <xdr_groupnode>) at xdr_reference.c:135
#7  0x00007fd6488ae139 in xdr_groups (xdrs=xdrs@entry=0x7fd6400047e8, objp=objp@entry=0x7fd640c8b148) at mount_xdr.c:119
#8  0x00007fd6488ae175 in xdr_groupnode (objp=0x7fd640c8b140, xdrs=0x7fd6400047e8) at mount_xdr.c:131
#9  xdr_groupnode (xdrs=0x7fd6400047e8, objp=0x7fd640c8b140) at mount_xdr.c:125
#10 0x00007fd64e5234f4 in xdr_reference (xdrs=xdrs@entry=0x7fd6400047e8, pp=pp@entry=0x7fd640c8b108, size=size@entry=16, proc=proc@entry=0x7fd6488ae150 <xdr_groupnode>) at xdr_reference.c:88
#11 0x00007fd64e523619 in xdr_pointer (xdrs=xdrs@entry=0x7fd6400047e8, objpp=objpp@entry=0x7fd640c8b108, obj_size=obj_size@entry=16, xdr_obj=xdr_obj@entry=0x7fd6488ae150 <xdr_groupnode>) at xdr_reference.c:135
#12 0x00007fd6488ae139 in xdr_groups (xdrs=xdrs@entry=0x7fd6400047e8, objp=objp@entry=0x7fd640c8b108) at mount_xdr.c:119
#13 0x00007fd6488ae175 in xdr_groupnode (objp=0x7fd640c8b100, xdrs=0x7fd6400047e8) at mount_xdr.c:131
#14 xdr_groupnode (xdrs=0x7fd6400047e8, objp=0x7fd640c8b100) at mount_xdr.c:125
...
---

So probably I am over some limit but there is no limitation mentioned in the docs.

While I was testing the same with the old showmount implementation (auto.net) it as working quite fine up to 51k mounts.
Also attaching sosreport from the server.

Comment 10 Lukas Herbolt 2021-01-11 15:05:48 UTC
Created attachment 1746297 [details]
sosreport

Comment 11 Ian Kent 2021-01-12 00:45:09 UTC
(In reply to Lukas Herbolt from comment #9)
> The nfs server is exporting about 31k mounts. Not sure what do you mean by
> netgroup.

That's a lot of exports, ;)

So it sounds like it's the rpcgen generated code that's causing this.
We might be running out of stack space because of the way it's done
in that code.

To change that I will need to change autofs to not generate that RPC
code during the package build by adding the generated code to autofs
and manually optimizing and maintaining it.

I'll see what I can do.

Comment 12 Ian Kent 2021-01-12 02:05:06 UTC
Mmm ... that's interesting.

At first glance it looks like showmount(8) uses the same generated
code and yet it worked for you.

I'll bet the automount stack size is smaller ... joy to come for
nfs-utils perhaps, ;)

Comment 15 Ian Kent 2021-02-02 10:13:47 UTC
Could you give this build a try please:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34668456

Comment 16 Ian Kent 2021-02-04 00:50:16 UTC
(In reply to Ian Kent from comment #15)
> Could you give this build a try please:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34668456

There is an updated build at:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34729130

Could you check if this resolves the problem please?

Comment 17 Lukas Herbolt 2021-02-04 10:11:32 UTC
Tested on 50k mounts it still hangs but it at least get the list of the exports.

I have debug logs along with gcore: 
http://file.emea.redhat.com/~lherbolt/bz1912106/

Comment 18 Ian Kent 2021-02-05 01:07:11 UTC
(In reply to Lukas Herbolt from comment #17)
> Tested on 50k mounts it still hangs but it at least get the list of the
> exports.

What does "hang" mean, I can't work that out from the posted
debug log?

Did you run multiple tests with progressively more exports?
If so what happened in those cases?

> 
> I have debug logs along with gcore: 
> http://file.emea.redhat.com/~lherbolt/bz1912106/

Only the debug log will be useful at this stage but what you
posted looks more like you had started autofs with a lot of
mounts already present and doesn't really show what is happening
other than a whole bunch of offset mounts being possibly mounted
of being reconnected to at startup.

A full debug log, from starting autofs and without any existing
mounts, ie. a clean environment, is needed for me to try and
understand what's happening. If that is what you did you need to
say so.

With this number of offsets (basically the exports) belonging to
a mount entry it is probably going to get very slow because the
some respects of underlying handling is not designed for this
many offsets.

From this debug log I can't tell if it's the number of offsets
that's a problem or there's something else causing a problem.

Comment 19 Ian Kent 2021-02-05 01:26:57 UTC
(In reply to Lukas Herbolt from comment #17)
> Tested on 50k mounts it still hangs but it at least get the list of the
> exports.

And there are other implications of having such a large number of
exports from a host.

In RHEL-8 the expire processing still reads the system mount table
and these offsets must each be mounted as autofs trigger mounts so
there will be more than 50k mounts in that table.

This will have a significant impact on autofs, particularly the
expire, and will also have a significant impact on other system
applications as well, systemd for example will suffer quite a bit
with such a large mount table.

Comment 20 Lukas Herbolt 2021-02-05 10:08:52 UTC
Hi,
The autofs was started with:

[root@fastvm-rhel-8-3-22 ~]# grep -v "^#" /etc/sysconfig/autofs 
USE_MISC_DEVICE="yes"
OPTIONS="-d"

so it is debug logs.

The steps were:

1: start autofs
2: cd /hosts/filer 

The cd never finished, but the logs shows that autofs get the export
list from the NFS server. I can try to do it with smaller amount of exports 
30k or so. I think it's hanging or waiting to create the 50k directories, but
I will need to re-check it.


>>>  full debug log, from starting autofs
Is there some more option that -d? 

Setting up the reproducer is not hard:
First I create the nfs dirs:

# for i in $(seq -f "%05g" 0 50000); do  mkdir -p /mnt/nfs-$i;  done;

# for i in $(seq  1 11); do echo -n "192.168.$i.0/24(rw) "; done > exp.list ; echo  >> exp.list; > /etc/exports; for i in $(seq -f "%05g" 0 50000); do  echo -n  "/mnt/nfs-$i " >> /etc/exports; cat exp.list  >> /etc/exports;  done; systemctl restart nfs-server.service

Comment 21 Ian Kent 2021-02-05 12:01:46 UTC
(In reply to Lukas Herbolt from comment #20)
> Hi,
> The autofs was started with:
> 
> [root@fastvm-rhel-8-3-22 ~]# grep -v "^#" /etc/sysconfig/autofs 
> USE_MISC_DEVICE="yes"

No need, the miscellaneous device will be used if it exists.
And I'm not even sure this is checked any more.


> OPTIONS="-d"

Same as setting "logging = debug" in /etc/autofs.conf.

> 
> so it is debug logs.
> 
> The steps were:
> 
> 1: start autofs
> 2: cd /hosts/filer 
> 
> The cd never finished, but the logs shows that autofs get the export
> list from the NFS server. I can try to do it with smaller amount of exports 
> 30k or so. I think it's hanging or waiting to create the 50k directories, but
> I will need to re-check it.

No need, I'm not even up to using a larger number of exports yet.
I'm working with just 5k exports to see what takes the time and see if I can
improve it.

There are some places where the list of offsets of the mount is traversed
when searching for a mount and that's slow when there are a large number
of offsets, it needs to be changed to use a hash table lookup instead.

Even so there are places where the linear traversal cannot be avoided.

> 
> 
> >>>  full debug log, from starting autofs
> Is there some more option that -d? 

No, but what you provided didn't look like what I expected.

> 
> Setting up the reproducer is not hard:

I have already setup an environment to use.
I'm investigating what's going on.

What are your expectations?

I can tell you now if you expect to be able to mount 30k+ offset mount triggers
in a time that's anywhere near reasonable for interactive use I'm pretty sure
you will be disappointed.

I've started looking at optimising the most costly operations but I fear that
won't be nearly enough.

Ian

Comment 22 Ian Kent 2021-02-08 04:24:00 UTC
Can you give this build a try please:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34813296

If this is going to work at all the expire improvements from release 5.1.7
are going to be needed. I've back ported them in this build.

I've also made some improvements to the handling.

Let me know how it goes and we can talk about the difficulties further.

Comment 30 Ian Kent 2021-02-25 06:25:33 UTC
Can you give this build a try please:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35151944

With 40k exports it takes about 1 minute to setup the offsets and at
expire or shutdown (if the offsets are mounted) it takes about 1:45.

During these times paths under the mount (the host in this case) will
not be accessible and will block (appear to hang) since allowing
processes to walk into under construction automounts leads to
unresolvable and quite severe problems.

Also be aware that working with this many mounts has significant
impact on other system processes, udiskd seems to behave particularly
badly at times and you will usually see about three systemd processes
consuming a CPU each. Essentially, if you have less than about 8 CPUs,
the system will be heavily (probably fully) loaded.

This is about as far as I can go without redesigning the offset
handling and I'm not sure how much improvement we would actually
get from it either. It probably still won't be suitable for
interactive use though.

I may spend a little time on doing that re-design to see just how
much it would get us. At that point though it wouldn't be suitable
for general use since I just want to know what difference it would
make.

Ian

Comment 32 Ian Kent 2021-03-03 00:32:56 UTC
(In reply to Ian Kent from comment #30)
> 
> I may spend a little time on doing that re-design to see just how
> much it would get us. At that point though it wouldn't be suitable
> for general use since I just want to know what difference it would
> make.

So I have done an initial implementation of extending the new tree
implementation in 5.1.7 to see what difference it would make.

Now, I haven't back ported this to the RHEL rpm yet and I haven't
done much testing either so the results might not be what we end
up with but the results look sensible.

With 40k exports it takes about 40 seconds to return to a prompt
on initial access. Of this 40 seconds about 20 seconds are spent
getting the exports list, I'm not sure I can squeeze much more
out of that.

Shutting down autofs with all the mounts present takes a bit
under 20 seconds (about the same as the initial access but
there's no reading of exports).

So the times are better, and they make sense, and I appear to
have resolved the problems resulting from the original code
assumption of there being a small number of offsets.

I'll spend a little more time checking the patches and tweaking
them and then back port them to the RHEL rpm and we'll see how
that goes.

What are your thoughts on this?

Comment 33 Ian Kent 2021-03-03 08:12:59 UTC
Can you give this build a try please:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35234632

I have back ported the mapent tree conversion changes to RHEL-8 autofs.

Test results are ...

- 40k server exports, autofs debug logging enabled.

Initial access 40k mounts.

[raven@localhost ~]$ time ls /net/f32/exports/nfs-1000

real    0m40.466s
user    0m0.000s
sys     0m0.002s

Read of exports takes 20 seconds.
Update (add) of map entries takes 8 seconds.
Mount of offsets takes about 12 seconds.

Expire takes about 12 seconds.
Offset entry delete takes about 3 seconds

Shutdown time with all offsets mounted.

Redirecting to /bin/systemctl stop autofs.service

real    0m17.909s
user    0m0.025s
sys     0m0.028s

- 40k server exports, autofs debug logging disabled.

Initial access 40k mounts.

[raven@localhost ~]$ time ls /net/f32/exports/nfs-1000

real    0m33.141s
user    0m0.001s
sys     0m0.001s

Shutdown time with all offsets mounted.

Redirecting to /bin/systemctl stop autofs.service

real    0m11.879s
user    0m0.029s
sys     0m0.024s


- 20k server exports, debug logging enabled.

Initial access 20k mounts.

[raven@localhost ~]$ time ls /net/f32/exports/nfs-1000

real    0m12.479s
user    0m0.000s
sys     0m0.002s

Read of exports takes 5 seconds.
Update (add) of map entries takes 3 seconds.
Mount of offsets takes about 5 seconds.

Expire takes about 4 seconds.
Offset entry delete takes about 2 seconds

[root@localhost SPECS]# time service autofs stop

Shutdown time with all offsets mounted.

Redirecting to /bin/systemctl stop autofs.service

real    0m6.066s
user    0m0.019s
sys     0m0.030s

- 20k server exports, debug logging disabled.

Initial access 20k mounts.

[raven@localhost ~]$ time ls /net/f32/exports/nfs-1000

real    0m8.916s
user    0m0.001s
sys     0m0.002s

Shutdown time with all offsets mounted.

[root@localhost SPECS]# time service autofs stop
Redirecting to /bin/systemctl stop autofs.service

real    0m4.369s
user    0m0.019s
sys     0m0.029s

From this I see the read of the exports increases non-linearly as the 
number of exports increases, from about 5 seconds for 20k exports to
20 seconds for 40k exports. Not really much I can do about that I
think.

The increase in time for creation of the map entries is not quite
linear at about 3 seconds for 20k exports and 8 seconds for 40k
exports.

Again the mounting of the offsets is not quite linear either at about
5 seconds for 20k exports and about 12 seconds for 40k exports.

For 20k exports the expire (umount) takes about 4 seconds and the
offset map entry delete takes about 2 seconds while the 40k case
takes about 12 and 3 seconds respectively.

That expiration is distinctly not a linear increase and is a bit
surprising. This might be due to the map entry cache hash chain
length increasing as the number of exports grows.

This was all done with the autofs configuration option
map_hash_table_size set to 8192 which means that the average size
of hash chains will be around 8 for 64k entries. So for 40k entries
it might be around 5 or 6 which will slow things down a bit with
40k entries and, as I have seen, when the hash chain length grows
the performance penalty is not a linear increase.

Over all I think further improvements will require much more time
with only small improvements. The only obvious thing that might be
worth looking at is getting the exports which is clearly increasing
much more quickly as the number of exports grows but I think that's
probably outside the scope of autofs.

Comment 34 Ian Kent 2021-03-03 08:27:36 UTC
One more thing about the reason I have been so keen to change the
autofs offset handling code.

I always knew the existing offset handling code was horrible.

When I first looked at this I looked at the core code for traversing
the offset list and I had difficulty understanding what I had done.

The code is simply not easy to maintain.

The tree implementation is much simpler and I think much better from
an ongoing maintenance POV.

And that's the reason I was so keen to work on this.

Comment 35 Ian Kent 2021-03-04 07:29:19 UTC
I have yet another build, it gives a slight improvement getting
the exports list (maybe 20% or so):
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35259612

Please give this one a try and let me know how it goes.

Comment 37 Ian Kent 2021-03-06 03:13:30 UTC
I have yet another build,
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35292059

There are quiet a few changes due to spot checking various cases and a
surprising improvement to getting the exports list from a server.

You might want to give this one a try.

Comment 38 Ian Kent 2021-03-10 12:28:09 UTC
I have another build:
http://brew-task-repos.usersys.redhat.com/repos/scratch/ikent/autofs/5.1.4/60.el8/

I've done quite a bit of testing now and fixed a number of problems.
I think this is getting close to being done.

It's pretty quick with that large number of exports, still slow from an
interactive POV, but quite good nevertheless.

Let me know how it goes with your testing.

Comment 39 Ian Kent 2021-03-12 10:00:02 UTC
Another build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35401459

You need to set "map_hash_table_size = 8192" (an average of 4 entries per hash
table bucket) for such a large number of mounts.

With this the bottleneck is in the kernel which would be extremely hard to
improve.

Comment 40 Ian Kent 2021-03-15 13:36:13 UTC
I have set this bug target for 8.5 with ITM of 4.

It seems this bug is too late for 8.4 but it looks like it's
too early for 8.5 to get this to the customer in a sensible
time.

Depending on customer impact it may be worth while to issue
a hotfix release to the customer in the interim.

The changes for this bug are significant so considering an
exception to include it in 8.4 is probably not the right
thing to do but the changes have been well tested so a
hotfix should certainly be stable and worth while for the
customer.

Lukas, can you consult the customer and offer your thougts
on whether a hotfix should be made available please?

Once we work the above out I will commit the changes to the
source repository and clone the bug for RHEL-9. It needs to
be resolved in RHEL-9, firstly to avoid a regression when
upgrading to RHEL-9, but also because it resolves a licensing
problem with the current RHEL-9 autofs package.

My final scratch build can be found at:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35430311

Comment 41 Ian Kent 2021-03-15 13:41:53 UTC
(In reply to Ian Kent from comment #40)
> 
> My final scratch build can be found at:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35430311

Lukas,

It would be good if you can test this out before I commit the changes
as well, just in case I have missed something.

Comment 42 Ian Kent 2021-03-16 09:11:16 UTC
(In reply to Ian Kent from comment #41)
> (In reply to Ian Kent from comment #40)
> > 
> > My final scratch build can be found at:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35430311
> 
> Lukas,
> 
> It would be good if you can test this out before I commit the changes
> as well, just in case I have missed something.

One further build, minor clean ups.
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35461868

Comment 43 Lukas Herbolt 2021-03-16 11:16:52 UTC
It looks fine to me and customer is fine to have it 8.5 he is now 
using workaround with modified /etc/auto.net script.

Comment 52 errata-xmlrpc 2021-11-09 19:32:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4372


Note You need to log in before you can comment on or make changes to this bug.