1527815 – automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1527815 - automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000]

Summary: automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	Ian Kent
QA Contact:	xiaoli feng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1504145 1509088 1527213
TreeView+	depends on / blocked

Reported:	2017-12-20 08:01 UTC by xiaoli feng
Modified:	2018-04-10 18:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 18:18:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
core dump (1.29 MB, application/x-core) 2017-12-21 05:31 UTC, xiaoli feng	no flags	Details
Patch - fix use after free in do_master_list_reset() (1.51 KB, patch) 2017-12-21 09:46 UTC, Ian Kent	no flags	Details \| Diff
Patch - fix deadlock in dumpmaps (1.04 KB, patch) 2017-12-22 07:04 UTC, Ian Kent	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:0977	0	None	None	None	2018-04-10 18:18:30 UTC

Description xiaoli feng 2017-12-20 08:01:42 UTC

Description of problem:
Use nis map, restart autofs failed.

 automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000]

Version-Release number of selected component (if applicable):
autofs-5.0.7-79.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Use nis map and restart autofs
2.
3.

Actual results:
automount can't be started after restart autofs service

Expected results:
automount can't be started after restart autofs service

Additional info:

Comment 6 Murphy Zhou 2017-12-20 08:52:24 UTC

I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7

If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta
blocker.

Xiaoli, Can you help identify the first version that hit this issue since 69.el7? 

Thanks!

Comment 7 xiaoli feng 2017-12-20 09:15:41 UTC

(In reply to xzhou from comment #6)
> I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7
> 
> If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta
> blocker.
> 
> Xiaoli, Can you help identify the first version that hit this issue since
> 69.el7? 
> 
> Thanks!

The first version that hit this issue is autofs-5.0.7-77.el7.x86_64.

Comment 9 Ian Kent 2017-12-21 02:25:33 UTC

This crash doesn't happen on my local fedora machine with the
current upstream or with RHEL-7 revision 80 built from source.

It does happen on a local centos7 vm I have.

But it "should not happen" at all.

From a core of the crash:
Program terminated with signal 11, Segmentation fault.
#0  do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083
2083		list_for_each(p, head) {

and we know that in autofs:
#define list_for_each(pos, head) \
        for (pos = (head)->next; pos != (head); pos = pos->next)

also from the core:
(gdb) l
2078		struct list_head *head, *p;
2079	
2080		master_mutex_lock();
2081	
2082		head = &master->mounts;
2083		list_for_each(p, head) {
2084			struct master_mapent *entry;
2085	
2086			entry = list_entry(p, struct master_mapent, list);
2087	

(gdb) p head
$1 = (struct list_head *) 0x55788ecf0e90

(gdb) p master
$2 = (struct master *) 0x55788ecf0e60
(gdb) p *$2
$3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0,
     reading = 0, read_fail = 0, default_ghost = 0,
     default_logging = 1, default_timeout = 300, logopt = 1,
     nc = 0x55788ecf3500,
     mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90},
     completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}}

(gdb) p $1->next
$4 = (struct list_head *) 0x55788ecf0e90

which means that "pos != (head)" in the list_for_each() macro
is true after the initialization "pos = (head)->next" and since
this is a for loop it should do nothing and return.

That's just plain bizarre.

Comment 10 Ian Kent 2017-12-21 02:32:53 UTC

(In reply to Ian Kent from comment #9)
> This crash doesn't happen on my local fedora machine with the
> current upstream or with RHEL-7 revision 80 built from source.
> 
> It does happen on a local centos7 vm I have.
> 
> But it "should not happen" at all.
> 
> From a core of the crash:
> Program terminated with signal 11, Segmentation fault.
> #0  do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083
> 2083		list_for_each(p, head) {
> 
> and we know that in autofs:
> #define list_for_each(pos, head) \
>         for (pos = (head)->next; pos != (head); pos = pos->next)
> 
> also from the core:
> (gdb) l
> 2078		struct list_head *head, *p;
> 2079	
> 2080		master_mutex_lock();
> 2081	
> 2082		head = &master->mounts;
> 2083		list_for_each(p, head) {
> 2084			struct master_mapent *entry;
> 2085	
> 2086			entry = list_entry(p, struct master_mapent, list);
> 2087	
> 
> (gdb) p head
> $1 = (struct list_head *) 0x55788ecf0e90
> 
> (gdb) p master
> $2 = (struct master *) 0x55788ecf0e60
> (gdb) p *$2
> $3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0,
>      reading = 0, read_fail = 0, default_ghost = 0,
>      default_logging = 1, default_timeout = 300, logopt = 1,
>      nc = 0x55788ecf3500,
>      mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90},
>      completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}}
> 
> (gdb) p $1->next
> $4 = (struct list_head *) 0x55788ecf0e90
> 
> which means that "pos != (head)" in the list_for_each() macro
> is true after the initialization "pos = (head)->next" and since
> this is a for loop it should do nothing and return.

Obviously that's ""pos != (head)" in the list_for_each() macro
is false after ..."
 
> That's just plain bizarre.

Comment 11 xiaoli feng 2017-12-21 05:31:14 UTC

Created attachment 1370725 [details]
core dump

Comment 12 Ian Kent 2017-12-21 08:38:44 UTC

(In reply to xiaoli feng from comment #11)
> Created attachment 1370725 [details]
> core dump

As I mentioned above I was able to reproduce the crash.
I also verified the master list is actually fine.

It looks like this crash is a plain old use after free
due to stupidity on my part.

Your usage scenario also demonstrates another false negative
return with the retry on startup failure logic which essentially
causes the master map read retry to be done unnecessarily.

The recursive plus map include of /etc/auto.master within
auto.master (when auto.master is in fact /etc/auto.master)
is the cause.

Not sure how to handle that in a sane way just yet but I
really want to avoid these false negatives if possible.

But I'm tempted to just fix the use after free for now and
work out how best to handle the false negative later when
I have time to do it properly.

Ian

Comment 13 Ian Kent 2017-12-21 09:43:56 UTC

Could you get build autofs-5.0.7-81.el7 either from brew web
or the brew build tree and give it a try please.

I think this will resolve the SEGV and autofs will start ok,
all be it with a 10 second delay at startup with associated
log noise due to the false negative failure.

Ian

Comment 14 Ian Kent 2017-12-21 09:46:09 UTC

Created attachment 1370770 [details]
Patch - fix use after free in do_master_list_reset()

Comment 15 xiaoli feng 2017-12-22 02:03:50 UTC

(In reply to Ian Kent from comment #13)
> Could you get build autofs-5.0.7-81.el7 either from brew web
> or the brew build tree and give it a try please.
> 
> I think this will resolve the SEGV and autofs will start ok,
> all be it with a 10 second delay at startup with associated
> log noise due to the false negative failure.
> 
> Ian

Yes. Now the test is running. I will post the result after finishing it.

Comment 16 xiaoli feng 2017-12-22 05:58:03 UTC

Now autofs service can be started successfully. But it's block when execute "automount -m".

[21:28:26 root@ ~~]# automount -m
lookup_nss_read_master: reading master files auto.master
100000000|do_init: parse(sun): init gathered global options: (null)
100000000|spawn_mount: mtab link detected, passing -n to mount

block here~

Comment 17 Ian Kent 2017-12-22 06:36:36 UTC

(In reply to xiaoli feng from comment #16)
> Now autofs service can be started successfully. But it's block when execute
> "automount -m".
> 
> [21:28:26 root@ ~~]# automount -m
> lookup_nss_read_master: reading master files auto.master
> 100000000|do_init: parse(sun): init gathered global options: (null)
> 100000000|spawn_mount: mtab link detected, passing -n to mount
> 
> block here~

Right.

That's due to the changes for bug 1509088.
I did check for this but I see there is a case I missed.

I'll re-check and fix it up.

LOL, both of these are good catches, well done.

Thanks
Ian

Comment 18 Ian Kent 2017-12-22 07:04:57 UTC

Created attachment 1371207 [details]
Patch - fix deadlock in dumpmaps

fyi - the commit id in the patch description is the upstream
      commit id.

Comment 19 Ian Kent 2017-12-22 07:10:21 UTC

I probably should have added these two changes to the related
bugs but I didn't do that.

I've added a "Blocks" for the two related bugs in an attempt
to ensure that these changes are not missed by people looking
at those bugs.

Comment 20 Ian Kent 2017-12-22 07:12:07 UTC

As with revision 81 could you test revision 82 please.

Comment 21 xiaoli feng 2017-12-22 07:28:28 UTC

(In reply to Ian Kent from comment #20)
> As with revision 81 could you test revision 82 please.

I had tested on autofs-5.0.7-82.el7. And These two issues are gone.

Comment 22 Ian Kent 2017-12-22 08:14:45 UTC

(In reply to xiaoli feng from comment #21)
> (In reply to Ian Kent from comment #20)
> > As with revision 81 could you test revision 82 please.
> 
> I had tested on autofs-5.0.7-82.el7. And These two issues are gone.

Thanks, I'll add this bug and build revision 82 to the errata.

Comment 24 Ian Kent 2017-12-22 08:45:25 UTC

Oh joy, RPMDiff test failure, Execshield "lost GNU_RELRO
security protection on ppc64 ppc64le" for all the binaries.

It is not something I have done and I don't know what action
I need to take.

Comment 36 errata-xmlrpc 2018-04-10 18:18:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0977

Note You need to log in before you can comment on or make changes to this bug.