Bug 821847 - Using /net to access a NFS share leads to a segfault
Using /net to access a NFS share leads to a segfault
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: autofs (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Ian Kent
Fedora Extras Quality Assurance
:
: 821660 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-15 10:58 EDT by Sandro Mathys
Modified: 2012-05-30 20:52 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-30 20:52:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test getaddrinfo(3) lookup program (1.71 KB, text/plain)
2012-05-21 06:57 EDT, Ian Kent
no flags Details
Test getaddrinfo(3) lookup program (updated) (1.83 KB, text/plain)
2012-05-21 07:28 EDT, Ian Kent
no flags Details
Patch - fix initialization in rpc create_client() (1.89 KB, patch)
2012-05-22 06:03 EDT, Ian Kent
no flags Details | Diff
Patch - fix libtirpc name clash (1.40 KB, patch)
2012-05-22 06:05 EDT, Ian Kent
no flags Details | Diff

  None (edit)
Description Sandro Mathys 2012-05-15 10:58:50 EDT
Description of problem:
Using autofs to access a NFSv3 share over /net leads to a segfault. The very same seems to work just fine in Fedora 16 (autofs-5.0.6-4.fc16.x86_64).

The /etc/auto.{master,net} configuration has not been changed.

[root@slabstb250 ~]# mount -t nfs nas-id-api:/ID/api/stud/fs1/quota/ /mnt/
[root@slabstb250 ~]# umount /mnt
[root@slabstb250 ~]# /usr/sbin/automount --pid-file /run/autofs.pid --verbose --debug --foreground
Starting automounter version 5.0.6-13.fc17, master map auto.master
using kernel protocol version 5.02
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
spawn_mount: mtab link detected, passing -n to mount
spawn_umount: mtab link detected, passing -n to mount
lookup_read_master: lookup(file): read entry /misc
lookup_read_master: lookup(file): read entry /net
lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d
lookup_nss_read_master: reading master dir /etc/auto.master.d
lookup_read_master: lookup(dir): scandir: /etc/auto.master.d
lookup_read_master: lookup(file): read entry +auto.master
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup(file): failed to read included master map auto.master
master_do_mount: mounting /misc
automount_path_to_fifo: fifo name /run/autofs.fifo-misc
lookup_nss_read_map: reading map file /etc/auto.misc
parse_init: parse(sun): init gathered global options: (null)
spawn_mount: mtab link detected, passing -n to mount
spawn_umount: mtab link detected, passing -n to mount
remount_active_mount: trying to re-connect to mount /misc
mounted indirect on /misc with timeout 300, freq 75 seconds
remount_active_mount: re-connected to mount /misc
st_ready: st_ready(): state = 0 path /misc
master_do_mount: mounting /net
automount_path_to_fifo: fifo name /run/autofs.fifo-net
lookup_nss_read_map: reading map hosts (null)
parse_init: parse(sun): init gathered global options: (null)
remount_active_mount: trying to re-connect to mount /net
mounted indirect on /net with timeout 300, freq 75 seconds
remount_active_mount: re-connected to mount /net
st_ready: st_ready(): state = 0 path /net


<in another shell: ls /net/nas-id-api/ID/api/stud/fs1/quota/>        


handle_packet: type = 3
handle_packet_missing_indirect: token 12, name nas-id-api, request pid 23359
attempting to mount entry /net/nas-id-api
lookup_mount: lookup(hosts): fetchng export list for nas-id-api
rpc_get_exports_proto
Segmentation fault (core dumped)
[root@slabstb250 ~]# rpm -qa autofs
autofs-5.0.6-13.fc17.x86_64
Comment 1 Sandro Mathys 2012-05-16 07:52:34 EDT
Looks like the source of the issue was a misconfiguration. /etc/auto.master read:
/net -hosts
instead of:
/net /etc/auto.net

Not sure if that configuration was wrong out-of-the-box or someone messed with the system. Either way, this should lead to an error report in the log file, not to a segfault.
Comment 2 Ian Kent 2012-05-16 08:41:03 EDT
(In reply to comment #1)
> Looks like the source of the issue was a misconfiguration. /etc/auto.master
> read:
> /net -hosts
> instead of:
> /net /etc/auto.net

The former is the recommended and the default installed configuration.

You are correct in that there is a problem with a recent change.
Mind you this doesn't happen on F16.

It appears to be a combination of a passed stack variable being
non-null and getaddrinfo(3) not returning a lookup failure on a
name that obviously has no valid translation.

Try this build:
https://koji.fedoraproject.org/koji/buildinfo?buildID=319139
Comment 3 Sandro Mathys 2012-05-16 09:09:39 EDT
Okay, I now updated to that koji build of autofs. Using -hosts, I get no more segfault, but it still doesn't work either:

Starting automounter version 5.0.6-17.fc17, master map auto.master
using kernel protocol version 5.02
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
spawn_mount: mtab link detected, passing -n to mount
spawn_umount: mtab link detected, passing -n to mount
lookup_read_master: lookup(file): read entry /home
lookup_read_master: lookup(file): read entry /net
master_do_mount: mounting /home
automount_path_to_fifo: fifo name /run/autofs.fifo-home
lookup_nss_read_map: reading map program /usr/local/bin/auto_home_dfs
parse_init: parse(sun): init gathered global options: (null)
spawn_mount: mtab link detected, passing -n to mount
spawn_umount: mtab link detected, passing -n to mount
mounted indirect on /home with timeout 60, freq 15 seconds
st_ready: st_ready(): state = 0 path /home
master_do_mount: mounting /net
automount_path_to_fifo: fifo name /run/autofs.fifo-net
lookup_nss_read_map: reading map hosts (null)
parse_init: parse(sun): init gathered global options: (null)
mounted indirect on /net with timeout 300, freq 75 seconds
st_ready: st_ready(): state = 0 path /net
handle_packet: type = 3
handle_packet_missing_indirect: token 41, name nas-id-api, request pid 3225
attempting to mount entry /net/nas-id-api
lookup_mount: lookup(hosts): fetchng export list for nas-id-api
rpc_get_exports_proto
lookup_mount: exports lookup failed for nas-id-api
key "nas-id-api" not found in map source(s).
dev_ioctl_send_fail: token = 41
failed to mount /net/nas-id-api
handle_packet: type = 3
handle_packet_missing_indirect: token 42, name nas-id-api, request pid 3225
attempting to mount entry /net/nas-id-api
dev_ioctl_send_fail: token = 42
failed to mount /net/nas-id-api

Using (i.e. specifying) /etc/auto.net still works.
Comment 4 Ian Kent 2012-05-16 09:43:21 EDT
I suspect there's something wrong with the name resolution
but I can't duplicate that so I can't work out what it is.
The best I can do is feed autofs an invalid name and work
from there, which is what I've done.

Once again this all works fine when the rpm is built on F16,
so something else has changed as well.

Mind you the bulk of the changes in this part of the autofs
code have been in place since early December. There have been
some recent changes but that still can't account for
getaddrinfo(3) not functioning properly and that's been
used for a lot longer than that.
Comment 5 Sandro Mathys 2012-05-21 04:38:15 EDT
So, what can we do to further debug and eventually fix this? Anything we can try? Anything we can provide?
Comment 6 Ian Kent 2012-05-21 05:23:35 EDT
(In reply to comment #5)
> So, what can we do to further debug and eventually fix this? Anything we can
> try? Anything we can provide?

What was he outcome of checking your name resolution settup?
How is your site setup wrt. resolving names?

The only main difference I can see between showmount and the
autofs rpc code is the getaddrinfo(2) and AFAICT it is that
call which is not returning what is expected.

If I can't reproduce it then I can't debug it so you we need
work out what it is at your site that is causing the failure
so I can reproduce the problem.
Comment 7 Sandro Mathys 2012-05-21 06:11:46 EDT
(In reply to comment #6)
> What was he outcome of checking your name resolution settup?
> How is your site setup wrt. resolving names?

Not sure what kind of information you're looking for. Name resolution works normally, I can ping/dig/nslookup the hostname. dig only works with the FQDN specified, all others add the search domain (actually the secondary) to the hostname themselves. autofs works neither with the hostname nor with the FQDN.

resolv.conf features the search domains and our on-site dns servers (AD).

No further magic involved.
Comment 8 Ian Kent 2012-05-21 06:57:47 EDT
Created attachment 585779 [details]
Test getaddrinfo(3) lookup program

We could try using this to check if getaddrinfo(3) is working.

Use "gcc -o gai-test gai-test.c" and then

./gai-test <host name>

and post the result.
Comment 9 Sandro Mathys 2012-05-21 07:07:18 EDT
(In reply to comment #8)
> ./gai-test <host name>

sock_dgram query addrinfo 0x1bf8b90
sock_dgram query ai_addr is non-null
sock_stream query addrinfo 0x1bf8b90
sock_stream query ai_addr is non-null

Same output both with the hostname and the FQDN.
Comment 10 Ian Kent 2012-05-21 07:20:39 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > ./gai-test <host name>
> 
> sock_dgram query addrinfo 0x1bf8b90
> sock_dgram query ai_addr is non-null
> sock_stream query addrinfo 0x1bf8b90
> sock_stream query ai_addr is non-null
> 
> Same output both with the hostname and the FQDN.

That's a bit of a puzzle then.
I'll have another look at the autofs code and see if I can
see anything wrong.
Comment 11 Ian Kent 2012-05-21 07:28:43 EDT
Created attachment 585787 [details]
Test getaddrinfo(3) lookup program (updated)

This also return the protocol field, can you run this
and post the result please.
Comment 12 Sandro Mathys 2012-05-21 09:19:39 EDT
sock_dgram query addrinfo 0x692b90
sock_dgram query ai_addr is non-null
sock_dgram query ai_protocol 17
sock_stream query addrinfo 0x692b90
sock_stream query ai_addr is non-null
sock_stream query ai_protocol 6
Comment 13 Ian Kent 2012-05-22 03:02:12 EDT
Does this build make a difference?

https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225
Comment 14 Ian Kent 2012-05-22 03:51:44 EDT
(In reply to comment #3)
> Okay, I now updated to that koji build of autofs. Using -hosts, I get no
> more segfault, but it still doesn't work either:

You say you updated but did that actually work?

> 
> Starting automounter version 5.0.6-17.fc17, master map auto.master

Says you have ..... but ....

> using kernel protocol version 5.02
> lookup_nss_read_master: reading master files auto.master
> parse_init: parse(sun): init gathered global options: (null)
> spawn_mount: mtab link detected, passing -n to mount
> spawn_umount: mtab link detected, passing -n to mount
> lookup_read_master: lookup(file): read entry /home
> lookup_read_master: lookup(file): read entry /net
> master_do_mount: mounting /home
> automount_path_to_fifo: fifo name /run/autofs.fifo-home
> lookup_nss_read_map: reading map program /usr/local/bin/auto_home_dfs
> parse_init: parse(sun): init gathered global options: (null)
> spawn_mount: mtab link detected, passing -n to mount
> spawn_umount: mtab link detected, passing -n to mount
> mounted indirect on /home with timeout 60, freq 15 seconds
> st_ready: st_ready(): state = 0 path /home
> master_do_mount: mounting /net
> automount_path_to_fifo: fifo name /run/autofs.fifo-net
> lookup_nss_read_map: reading map hosts (null)
> parse_init: parse(sun): init gathered global options: (null)
> mounted indirect on /net with timeout 300, freq 75 seconds
> st_ready: st_ready(): state = 0 path /net
> handle_packet: type = 3
> handle_packet_missing_indirect: token 41, name nas-id-api, request pid 3225
> attempting to mount entry /net/nas-id-api
> lookup_mount: lookup(hosts): fetchng export list for nas-id-api
> rpc_get_exports_proto

This line is not printed anywhere in the source, it souldn't
be in the log.

I think it would be wise to "rpm -e autofs" and check that there
are no autofs package files remaining, remove them if there is
then install autofs. Mostly that means looking in /usr/lib/autofs
or /usr/lib64/autofs or both.

Ian
Comment 15 Sandro Mathys 2012-05-22 05:06:20 EDT
(In reply to comment #13)
> Does this build make a difference?
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225

It does, works perfectly.

(In reply to comment #14)
> (In reply to comment #3)
> > Okay, I now updated to that koji build of autofs. Using -hosts, I get no
> > more segfault, but it still doesn't work either:
> 
> You say you updated but did that actually work?

What in "it still doesn't work" is hard to understand? :)

> > rpc_get_exports_proto
> 
> This line is not printed anywhere in the source, it souldn't
> be in the log.

I think got that line in all versions. Can't tell where it's coming from, though.

> I think it would be wise to "rpm -e autofs" and check that there
> are no autofs package files remaining, remove them if there is
> then install autofs. Mostly that means looking in /usr/lib/autofs
> or /usr/lib64/autofs or both.

Is that still necessary now that you have a working version?
Comment 16 Ian Kent 2012-05-22 06:01:42 EDT
(In reply to comment #15)
> (In reply to comment #13)
> > Does this build make a difference?
> > 
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225
> 
> It does, works perfectly.

That's good to hear.

> 
> (In reply to comment #14)
> > (In reply to comment #3)
> > > Okay, I now updated to that koji build of autofs. Using -hosts, I get no
> > > more segfault, but it still doesn't work either:
> > 
> > You say you updated but did that actually work?
> 
> What in "it still doesn't work" is hard to understand? :)

If you read between the lines what I'm getting at is the log
message below is not supposed to be present. I just wanted to
make sure there wasn't some odd rpm problem where the autofs
shared libraries were not properly updated.

> 
> > > rpc_get_exports_proto

And this is in fact the clue that lead me to what is probably
the root cause of the original SEGV. See the second patch below
(which I'll post shortly) for a discription of what I found.

The first patch is needed too because I believe it resolves
another subtle issue caused by the recent changes to the autofs
rpc error handling changes.

Please also try this build (in case I've messed something else
up):
https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225

Ian
Comment 17 Ian Kent 2012-05-22 06:03:06 EDT
Created attachment 585984 [details]
Patch - fix initialization in rpc create_client()
Comment 18 Ian Kent 2012-05-22 06:05:35 EDT
Created attachment 585985 [details]
Patch - fix libtirpc name clash

Actually I'll need to change the bug number reference in these
patches and mark that bug a dup of this, since we resolved it
here.
Comment 19 Sandro Mathys 2012-05-22 06:23:59 EDT
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #13)
> > > Does this build make a difference?
> > > 
> > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225

> Please also try this build (in case I've messed something else
> up):
> https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225

Uhm...that's the same build :) Happy to test another one, though.
Comment 20 Ian Kent 2012-05-22 06:45:43 EDT
(In reply to comment #19)
> (In reply to comment #16)
> > (In reply to comment #15)
> > > (In reply to comment #13)
> > > > Does this build make a difference?
> > > > 
> > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225
> 
> > Please also try this build (in case I've messed something else
> > up):
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093225
> 
> Uhm...that's the same build :) Happy to test another one, though.

Oops!
Try this:
https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588
Comment 21 Sandro Mathys 2012-05-22 06:52:44 EDT
(In reply to comment #20)
> Oops!
> Try this:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588

Works.
Comment 22 Ian Kent 2012-05-22 07:04:07 EDT
(In reply to comment #21)
> (In reply to comment #20)
> > Oops!
> > Try this:
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=4093588
> 
> Works.

Great.
I think that about resolves it, I'll get onto sorting out
the bug references and pushing out an update.

Thanks
Ian
Comment 23 Ian Kent 2012-05-22 23:10:11 EDT
*** Bug 821660 has been marked as a duplicate of this bug. ***
Comment 24 Fedora Update System 2012-05-22 23:19:36 EDT
autofs-5.0.6-19.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/autofs-5.0.6-19.fc17
Comment 25 Fedora Update System 2012-05-24 11:32:52 EDT
Package autofs-5.0.6-19.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing autofs-5.0.6-19.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8269/autofs-5.0.6-19.fc17
then log in and leave karma (feedback).
Comment 26 Fedora Update System 2012-05-30 20:52:46 EDT
autofs-5.0.6-19.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.