Bug 1527815

Summary:

automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000]

Product:

Red Hat Enterprise Linux 7

Reporter:

xiaoli feng <xifeng>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED ERRATA

QA Contact:

xiaoli feng <xifeng>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

7.5

CC:

lwang, rhandlin, xzhou

Target Milestone:

beta

Keywords:

Regression

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-10 18:18:20 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1504145, 1509088, 1527213

Attachments:

Description	Flags
core dump	none
Patch - fix use after free in do_master_list_reset()	none
Patch - fix deadlock in dumpmaps	none

Description xiaoli feng 2017-12-20 08:01:42 UTC

Description of problem:
Use nis map, restart autofs failed.

 automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000]

Version-Release number of selected component (if applicable):
autofs-5.0.7-79.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Use nis map and restart autofs
2.
3.

Actual results:
automount can't be started after restart autofs service

Expected results:
automount can't be started after restart autofs service

Additional info:

Comment 6 Murphy Zhou 2017-12-20 08:52:24 UTC

I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7

If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta
blocker.

Xiaoli, Can you help identify the first version that hit this issue since 69.el7? 

Thanks!

Comment 7 xiaoli feng 2017-12-20 09:15:41 UTC

(In reply to xzhou from comment #6)
> I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7
> 
> If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta
> blocker.
> 
> Xiaoli, Can you help identify the first version that hit this issue since
> 69.el7? 
> 
> Thanks!

The first version that hit this issue is autofs-5.0.7-77.el7.x86_64.

Comment 9 Ian Kent 2017-12-21 02:25:33 UTC

This crash doesn't happen on my local fedora machine with the
current upstream or with RHEL-7 revision 80 built from source.

It does happen on a local centos7 vm I have.

But it "should not happen" at all.

From a core of the crash:
Program terminated with signal 11, Segmentation fault.
#0  do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083
2083		list_for_each(p, head) {

and we know that in autofs:
#define list_for_each(pos, head) \
        for (pos = (head)->next; pos != (head); pos = pos->next)

also from the core:
(gdb) l
2078		struct list_head *head, *p;
2079	
2080		master_mutex_lock();
2081	
2082		head = &master->mounts;
2083		list_for_each(p, head) {
2084			struct master_mapent *entry;
2085	
2086			entry = list_entry(p, struct master_mapent, list);
2087	

(gdb) p head
$1 = (struct list_head *) 0x55788ecf0e90

(gdb) p master
$2 = (struct master *) 0x55788ecf0e60
(gdb) p *$2
$3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0,
     reading = 0, read_fail = 0, default_ghost = 0,
     default_logging = 1, default_timeout = 300, logopt = 1,
     nc = 0x55788ecf3500,
     mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90},
     completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}}

(gdb) p $1->next
$4 = (struct list_head *) 0x55788ecf0e90

which means that "pos != (head)" in the list_for_each() macro
is true after the initialization "pos = (head)->next" and since
this is a for loop it should do nothing and return.

That's just plain bizarre.

Comment 10 Ian Kent 2017-12-21 02:32:53 UTC

(In reply to Ian Kent from comment #9)
> This crash doesn't happen on my local fedora machine with the
> current upstream or with RHEL-7 revision 80 built from source.
> 
> It does happen on a local centos7 vm I have.
> 
> But it "should not happen" at all.
> 
> From a core of the crash:
> Program terminated with signal 11, Segmentation fault.
> #0  do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083
> 2083		list_for_each(p, head) {
> 
> and we know that in autofs:
> #define list_for_each(pos, head) \
>         for (pos = (head)->next; pos != (head); pos = pos->next)
> 
> also from the core:
> (gdb) l
> 2078		struct list_head *head, *p;
> 2079	
> 2080		master_mutex_lock();
> 2081	
> 2082		head = &master->mounts;
> 2083		list_for_each(p, head) {
> 2084			struct master_mapent *entry;
> 2085	
> 2086			entry = list_entry(p, struct master_mapent, list);
> 2087	
> 
> (gdb) p head
> $1 = (struct list_head *) 0x55788ecf0e90
> 
> (gdb) p master
> $2 = (struct master *) 0x55788ecf0e60
> (gdb) p *$2
> $3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0,
>      reading = 0, read_fail = 0, default_ghost = 0,
>      default_logging = 1, default_timeout = 300, logopt = 1,
>      nc = 0x55788ecf3500,
>      mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90},
>      completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}}
> 
> (gdb) p $1->next
> $4 = (struct list_head *) 0x55788ecf0e90
> 
> which means that "pos != (head)" in the list_for_each() macro
> is true after the initialization "pos = (head)->next" and since
> this is a for loop it should do nothing and return.

Obviously that's ""pos != (head)" in the list_for_each() macro
is false after ..."
 
> That's just plain bizarre.

Comment 11 xiaoli feng 2017-12-21 05:31:14 UTC

Created attachment 1370725 [details]
core dump

Comment 12 Ian Kent 2017-12-21 08:38:44 UTC

(In reply to xiaoli feng from comment #11)
> Created attachment 1370725 [details]
> core dump

As I mentioned above I was able to reproduce the crash.
I also verified the master list is actually fine.

It looks like this crash is a plain old use after free
due to stupidity on my part.

Your usage scenario also demonstrates another false negative
return with the retry on startup failure logic which essentially
causes the master map read retry to be done unnecessarily.

The recursive plus map include of /etc/auto.master within
auto.master (when auto.master is in fact /etc/auto.master)
is the cause.

Not sure how to handle that in a sane way just yet but I
really want to avoid these false negatives if possible.

But I'm tempted to just fix the use after free for now and
work out how best to handle the false negative later when
I have time to do it properly.

Ian

Comment 13 Ian Kent 2017-12-21 09:43:56 UTC

Could you get build autofs-5.0.7-81.el7 either from brew web
or the brew build tree and give it a try please.

I think this will resolve the SEGV and autofs will start ok,
all be it with a 10 second delay at startup with associated
log noise due to the false negative failure.

Ian

Comment 14 Ian Kent 2017-12-21 09:46:09 UTC

Created attachment 1370770 [details]
Patch - fix use after free in do_master_list_reset()

Comment 15 xiaoli feng 2017-12-22 02:03:50 UTC

(In reply to Ian Kent from comment #13)
> Could you get build autofs-5.0.7-81.el7 either from brew web
> or the brew build tree and give it a try please.
> 
> I think this will resolve the SEGV and autofs will start ok,
> all be it with a 10 second delay at startup with associated
> log noise due to the false negative failure.
> 
> Ian

Yes. Now the test is running. I will post the result after finishing it.

Comment 16 xiaoli feng 2017-12-22 05:58:03 UTC

Now autofs service can be started successfully. But it's block when execute "automount -m".

[21:28:26 root@ ~~]# automount -m
lookup_nss_read_master: reading master files auto.master
100000000|do_init: parse(sun): init gathered global options: (null)
100000000|spawn_mount: mtab link detected, passing -n to mount

block here~

Comment 17 Ian Kent 2017-12-22 06:36:36 UTC

(In reply to xiaoli feng from comment #16)
> Now autofs service can be started successfully. But it's block when execute
> "automount -m".
> 
> [21:28:26 root@ ~~]# automount -m
> lookup_nss_read_master: reading master files auto.master
> 100000000|do_init: parse(sun): init gathered global options: (null)
> 100000000|spawn_mount: mtab link detected, passing -n to mount
> 
> block here~

Right.

That's due to the changes for bug 1509088.
I did check for this but I see there is a case I missed.

I'll re-check and fix it up.

LOL, both of these are good catches, well done.

Thanks
Ian

Comment 18 Ian Kent 2017-12-22 07:04:57 UTC

Created attachment 1371207 [details]
Patch - fix deadlock in dumpmaps

fyi - the commit id in the patch description is the upstream
      commit id.

Comment 19 Ian Kent 2017-12-22 07:10:21 UTC

I probably should have added these two changes to the related
bugs but I didn't do that.

I've added a "Blocks" for the two related bugs in an attempt
to ensure that these changes are not missed by people looking
at those bugs.

Comment 20 Ian Kent 2017-12-22 07:12:07 UTC

As with revision 81 could you test revision 82 please.

Comment 21 xiaoli feng 2017-12-22 07:28:28 UTC

(In reply to Ian Kent from comment #20)
> As with revision 81 could you test revision 82 please.

I had tested on autofs-5.0.7-82.el7. And These two issues are gone.

Comment 22 Ian Kent 2017-12-22 08:14:45 UTC

(In reply to xiaoli feng from comment #21)
> (In reply to Ian Kent from comment #20)
> > As with revision 81 could you test revision 82 please.
> 
> I had tested on autofs-5.0.7-82.el7. And These two issues are gone.

Thanks, I'll add this bug and build revision 82 to the errata.

Comment 24 Ian Kent 2017-12-22 08:45:25 UTC

Oh joy, RPMDiff test failure, Execshield "lost GNU_RELRO
security protection on ppc64 ppc64le" for all the binaries.

It is not something I have done and I don't know what action
I need to take.

Comment 36 errata-xmlrpc 2018-04-10 18:18:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0977