Description of problem: Using autofs to access a NFSv3 share over /net leads to a segfault. The very same seems to work just fine in Fedora 16 (autofs-5.0.6-4.fc16.x86_64). The /etc/auto.{master,net} configuration has not been changed. [root@slabstb250 ~]# mount -t nfs nas-id-api:/ID/api/stud/fs1/quota/ /mnt/ [root@slabstb250 ~]# umount /mnt [root@slabstb250 ~]# /usr/sbin/automount --pid-file /run/autofs.pid --verbose --debug --foreground Starting automounter version 5.0.6-13.fc17, master map auto.master using kernel protocol version 5.02 lookup_nss_read_master: reading master files auto.master parse_init: parse(sun): init gathered global options: (null) spawn_mount: mtab link detected, passing -n to mount spawn_umount: mtab link detected, passing -n to mount lookup_read_master: lookup(file): read entry /misc lookup_read_master: lookup(file): read entry /net lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d lookup_nss_read_master: reading master dir /etc/auto.master.d lookup_read_master: lookup(dir): scandir: /etc/auto.master.d lookup_read_master: lookup(file): read entry +auto.master lookup_nss_read_master: reading master files auto.master parse_init: parse(sun): init gathered global options: (null) lookup(file): failed to read included master map auto.master master_do_mount: mounting /misc automount_path_to_fifo: fifo name /run/autofs.fifo-misc lookup_nss_read_map: reading map file /etc/auto.misc parse_init: parse(sun): init gathered global options: (null) spawn_mount: mtab link detected, passing -n to mount spawn_umount: mtab link detected, passing -n to mount remount_active_mount: trying to re-connect to mount /misc mounted indirect on /misc with timeout 300, freq 75 seconds remount_active_mount: re-connected to mount /misc st_ready: st_ready(): state = 0 path /misc master_do_mount: mounting /net automount_path_to_fifo: fifo name /run/autofs.fifo-net lookup_nss_read_map: reading map hosts (null) parse_init: parse(sun): init gathered global options: (null) remount_active_mount: trying to re-connect to mount /net mounted indirect on /net with timeout 300, freq 75 seconds remount_active_mount: re-connected to mount /net st_ready: st_ready(): state = 0 path /net <in another shell: ls /net/nas-id-api/ID/api/stud/fs1/quota/> handle_packet: type = 3 handle_packet_missing_indirect: token 12, name nas-id-api, request pid 23359 attempting to mount entry /net/nas-id-api lookup_mount: lookup(hosts): fetchng export list for nas-id-api rpc_get_exports_proto Segmentation fault (core dumped) [root@slabstb250 ~]# rpm -qa autofs autofs-5.0.6-13.fc17.x86_64
Looks like the source of the issue was a misconfiguration. /etc/auto.master read: /net -hosts instead of: /net /etc/auto.net Not sure if that configuration was wrong out-of-the-box or someone messed with the system. Either way, this should lead to an error report in the log file, not to a segfault.
(In reply to comment #1) > Looks like the source of the issue was a misconfiguration. /etc/auto.master > read: > /net -hosts > instead of: > /net /etc/auto.net The former is the recommended and the default installed configuration. You are correct in that there is a problem with a recent change. Mind you this doesn't happen on F16. It appears to be a combination of a passed stack variable being non-null and getaddrinfo(3) not returning a lookup failure on a name that obviously has no valid translation. Try this build: https://koji.fedoraproject.org/koji/buildinfo?buildID=319139
Okay, I now updated to that koji build of autofs. Using -hosts, I get no more segfault, but it still doesn't work either: Starting automounter version 5.0.6-17.fc17, master map auto.master using kernel protocol version 5.02 lookup_nss_read_master: reading master files auto.master parse_init: parse(sun): init gathered global options: (null) spawn_mount: mtab link detected, passing -n to mount spawn_umount: mtab link detected, passing -n to mount lookup_read_master: lookup(file): read entry /home lookup_read_master: lookup(file): read entry /net master_do_mount: mounting /home automount_path_to_fifo: fifo name /run/autofs.fifo-home lookup_nss_read_map: reading map program /usr/local/bin/auto_home_dfs parse_init: parse(sun): init gathered global options: (null) spawn_mount: mtab link detected, passing -n to mount spawn_umount: mtab link detected, passing -n to mount mounted indirect on /home with timeout 60, freq 15 seconds st_ready: st_ready(): state = 0 path /home master_do_mount: mounting /net automount_path_to_fifo: fifo name /run/autofs.fifo-net lookup_nss_read_map: reading map hosts (null) parse_init: parse(sun): init gathered global options: (null) mounted indirect on /net with timeout 300, freq 75 seconds st_ready: st_ready(): state = 0 path /net handle_packet: type = 3 handle_packet_missing_indirect: token 41, name nas-id-api, request pid 3225 attempting to mount entry /net/nas-id-api lookup_mount: lookup(hosts): fetchng export list for nas-id-api rpc_get_exports_proto lookup_mount: exports lookup failed for nas-id-api key "nas-id-api" not found in map source(s). dev_ioctl_send_fail: token = 41 failed to mount /net/nas-id-api handle_packet: type = 3 handle_packet_missing_indirect: token 42, name nas-id-api, request pid 3225 attempting to mount entry /net/nas-id-api dev_ioctl_send_fail: token = 42 failed to mount /net/nas-id-api Using (i.e. specifying) /etc/auto.net still works.
I suspect there's something wrong with the name resolution but I can't duplicate that so I can't work out what it is. The best I can do is feed autofs an invalid name and work from there, which is what I've done. Once again this all works fine when the rpm is built on F16, so something else has changed as well. Mind you the bulk of the changes in this part of the autofs code have been in place since early December. There have been some recent changes but that still can't account for getaddrinfo(3) not functioning properly and that's been used for a lot longer than that.
So, what can we do to further debug and eventually fix this? Anything we can try? Anything we can provide?
(In reply to comment #5) > So, what can we do to further debug and eventually fix this? Anything we can > try? Anything we can provide? What was he outcome of checking your name resolution settup? How is your site setup wrt. resolving names? The only main difference I can see between showmount and the autofs rpc code is the getaddrinfo(2) and AFAICT it is that call which is not returning what is expected. If I can't reproduce it then I can't debug it so you we need work out what it is at your site that is causing the failure so I can reproduce the problem.
(In reply to comment #6) > What was he outcome of checking your name resolution settup? > How is your site setup wrt. resolving names? Not sure what kind of information you're looking for. Name resolution works normally, I can ping/dig/nslookup the hostname. dig only works with the FQDN specified, all others add the search domain (actually the secondary) to the hostname themselves. autofs works neither with the hostname nor with the FQDN. resolv.conf features the search domains and our on-site dns servers (AD). No further magic involved.
Created attachment 585779 [details] Test getaddrinfo(3) lookup program We could try using this to check if getaddrinfo(3) is working. Use "gcc -o gai-test gai-test.c" and then ./gai-test <host name> and post the result.
(In reply to comment #8) > ./gai-test <host name> sock_dgram query addrinfo 0x1bf8b90 sock_dgram query ai_addr is non-null sock_stream query addrinfo 0x1bf8b90 sock_stream query ai_addr is non-null Same output both with the hostname and the FQDN.
(In reply to comment #9) > (In reply to comment #8) > > ./gai-test <host name> > > sock_dgram query addrinfo 0x1bf8b90 > sock_dgram query ai_addr is non-null > sock_stream query addrinfo 0x1bf8b90 > sock_stream query ai_addr is non-null > > Same output both with the hostname and the FQDN. That's a bit of a puzzle then. I'll have another look at the autofs code and see if I can see anything wrong.
Created attachment 585787 [details] Test getaddrinfo(3) lookup program (updated) This also return the protocol field, can you run this and post the result please.
sock_dgram query addrinfo 0x692b90 sock_dgram query ai_addr is non-null sock_dgram query ai_protocol 17 sock_stream query addrinfo 0x692b90 sock_stream query ai_addr is non-null sock_stream query ai_protocol 6
Does this build make a difference? https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225
(In reply to comment #3) > Okay, I now updated to that koji build of autofs. Using -hosts, I get no > more segfault, but it still doesn't work either: You say you updated but did that actually work? > > Starting automounter version 5.0.6-17.fc17, master map auto.master Says you have ..... but .... > using kernel protocol version 5.02 > lookup_nss_read_master: reading master files auto.master > parse_init: parse(sun): init gathered global options: (null) > spawn_mount: mtab link detected, passing -n to mount > spawn_umount: mtab link detected, passing -n to mount > lookup_read_master: lookup(file): read entry /home > lookup_read_master: lookup(file): read entry /net > master_do_mount: mounting /home > automount_path_to_fifo: fifo name /run/autofs.fifo-home > lookup_nss_read_map: reading map program /usr/local/bin/auto_home_dfs > parse_init: parse(sun): init gathered global options: (null) > spawn_mount: mtab link detected, passing -n to mount > spawn_umount: mtab link detected, passing -n to mount > mounted indirect on /home with timeout 60, freq 15 seconds > st_ready: st_ready(): state = 0 path /home > master_do_mount: mounting /net > automount_path_to_fifo: fifo name /run/autofs.fifo-net > lookup_nss_read_map: reading map hosts (null) > parse_init: parse(sun): init gathered global options: (null) > mounted indirect on /net with timeout 300, freq 75 seconds > st_ready: st_ready(): state = 0 path /net > handle_packet: type = 3 > handle_packet_missing_indirect: token 41, name nas-id-api, request pid 3225 > attempting to mount entry /net/nas-id-api > lookup_mount: lookup(hosts): fetchng export list for nas-id-api > rpc_get_exports_proto This line is not printed anywhere in the source, it souldn't be in the log. I think it would be wise to "rpm -e autofs" and check that there are no autofs package files remaining, remove them if there is then install autofs. Mostly that means looking in /usr/lib/autofs or /usr/lib64/autofs or both. Ian
(In reply to comment #13) > Does this build make a difference? > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 It does, works perfectly. (In reply to comment #14) > (In reply to comment #3) > > Okay, I now updated to that koji build of autofs. Using -hosts, I get no > > more segfault, but it still doesn't work either: > > You say you updated but did that actually work? What in "it still doesn't work" is hard to understand? :) > > rpc_get_exports_proto > > This line is not printed anywhere in the source, it souldn't > be in the log. I think got that line in all versions. Can't tell where it's coming from, though. > I think it would be wise to "rpm -e autofs" and check that there > are no autofs package files remaining, remove them if there is > then install autofs. Mostly that means looking in /usr/lib/autofs > or /usr/lib64/autofs or both. Is that still necessary now that you have a working version?
(In reply to comment #15) > (In reply to comment #13) > > Does this build make a difference? > > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 > > It does, works perfectly. That's good to hear. > > (In reply to comment #14) > > (In reply to comment #3) > > > Okay, I now updated to that koji build of autofs. Using -hosts, I get no > > > more segfault, but it still doesn't work either: > > > > You say you updated but did that actually work? > > What in "it still doesn't work" is hard to understand? :) If you read between the lines what I'm getting at is the log message below is not supposed to be present. I just wanted to make sure there wasn't some odd rpm problem where the autofs shared libraries were not properly updated. > > > > rpc_get_exports_proto And this is in fact the clue that lead me to what is probably the root cause of the original SEGV. See the second patch below (which I'll post shortly) for a discription of what I found. The first patch is needed too because I believe it resolves another subtle issue caused by the recent changes to the autofs rpc error handling changes. Please also try this build (in case I've messed something else up): https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 Ian
Created attachment 585984 [details] Patch - fix initialization in rpc create_client()
Created attachment 585985 [details] Patch - fix libtirpc name clash Actually I'll need to change the bug number reference in these patches and mark that bug a dup of this, since we resolved it here.
(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #13) > > > Does this build make a difference? > > > > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 > Please also try this build (in case I've messed something else > up): > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 Uhm...that's the same build :) Happy to test another one, though.
(In reply to comment #19) > (In reply to comment #16) > > (In reply to comment #15) > > > (In reply to comment #13) > > > > Does this build make a difference? > > > > > > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 > > > Please also try this build (in case I've messed something else > > up): > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225 > > Uhm...that's the same build :) Happy to test another one, though. Oops! Try this: https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588
(In reply to comment #20) > Oops! > Try this: > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588 Works.
(In reply to comment #21) > (In reply to comment #20) > > Oops! > > Try this: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588 > > Works. Great. I think that about resolves it, I'll get onto sorting out the bug references and pushing out an update. Thanks Ian
*** Bug 821660 has been marked as a duplicate of this bug. ***
autofs-5.0.6-19.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/autofs-5.0.6-19.fc17
Package autofs-5.0.6-19.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing autofs-5.0.6-19.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-8269/autofs-5.0.6-19.fc17 then log in and leave karma (feedback).
autofs-5.0.6-19.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.