Bug 1527815
| Summary: | automount[1979]: segfault at 55f5101d30e8 ip 000055f50f177668 sp 00007ffffa85fdd0 error 4 in automount[55f50f16d000+48000] | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | xiaoli feng <xifeng> | ||||||||
| Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | xiaoli feng <xifeng> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 7.5 | CC: | lwang, rhandlin, xzhou | ||||||||
| Target Milestone: | beta | Keywords: | Regression | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-04-10 18:18:20 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1504145, 1509088, 1527213 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
xiaoli feng
2017-12-20 08:01:42 UTC
I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7 If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta blocker. Xiaoli, Can you help identify the first version that hit this issue since 69.el7? Thanks! (In reply to xzhou from comment #6) > I see Beta compose RHEL-7.5-20171215.0 has autofs-5.0.7-79.el7 > > If it reproduces on autofs-5.0.7-79.el7, we should propose this bug as Beta > blocker. > > Xiaoli, Can you help identify the first version that hit this issue since > 69.el7? > > Thanks! The first version that hit this issue is autofs-5.0.7-77.el7.x86_64. This crash doesn't happen on my local fedora machine with the
current upstream or with RHEL-7 revision 80 built from source.
It does happen on a local centos7 vm I have.
But it "should not happen" at all.
From a core of the crash:
Program terminated with signal 11, Segmentation fault.
#0 do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083
2083 list_for_each(p, head) {
and we know that in autofs:
#define list_for_each(pos, head) \
for (pos = (head)->next; pos != (head); pos = pos->next)
also from the core:
(gdb) l
2078 struct list_head *head, *p;
2079
2080 master_mutex_lock();
2081
2082 head = &master->mounts;
2083 list_for_each(p, head) {
2084 struct master_mapent *entry;
2085
2086 entry = list_entry(p, struct master_mapent, list);
2087
(gdb) p head
$1 = (struct list_head *) 0x55788ecf0e90
(gdb) p master
$2 = (struct master *) 0x55788ecf0e60
(gdb) p *$2
$3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0,
reading = 0, read_fail = 0, default_ghost = 0,
default_logging = 1, default_timeout = 300, logopt = 1,
nc = 0x55788ecf3500,
mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90},
completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}}
(gdb) p $1->next
$4 = (struct list_head *) 0x55788ecf0e90
which means that "pos != (head)" in the list_for_each() macro
is true after the initialization "pos = (head)->next" and since
this is a for loop it should do nothing and return.
That's just plain bizarre.
(In reply to Ian Kent from comment #9) > This crash doesn't happen on my local fedora machine with the > current upstream or with RHEL-7 revision 80 built from source. > > It does happen on a local centos7 vm I have. > > But it "should not happen" at all. > > From a core of the crash: > Program terminated with signal 11, Segmentation fault. > #0 do_master_list_reset (master=0x55788ecf0e60) at automount.c:2083 > 2083 list_for_each(p, head) { > > and we know that in autofs: > #define list_for_each(pos, head) \ > for (pos = (head)->next; pos != (head); pos = pos->next) > > also from the core: > (gdb) l > 2078 struct list_head *head, *p; > 2079 > 2080 master_mutex_lock(); > 2081 > 2082 head = &master->mounts; > 2083 list_for_each(p, head) { > 2084 struct master_mapent *entry; > 2085 > 2086 entry = list_entry(p, struct master_mapent, list); > 2087 > > (gdb) p head > $1 = (struct list_head *) 0x55788ecf0e90 > > (gdb) p master > $2 = (struct master *) 0x55788ecf0e60 > (gdb) p *$2 > $3 = {name = 0x55788ecf0460 "auto.master", recurse = 0, depth = 0, > reading = 0, read_fail = 0, default_ghost = 0, > default_logging = 1, default_timeout = 300, logopt = 1, > nc = 0x55788ecf3500, > mounts = {next = 0x55788ecf0e90, prev = 0x55788ecf0e90}, > completed = {next = 0x55788ecf0ea0, prev = 0x55788ecf0ea0}} > > (gdb) p $1->next > $4 = (struct list_head *) 0x55788ecf0e90 > > which means that "pos != (head)" in the list_for_each() macro > is true after the initialization "pos = (head)->next" and since > this is a for loop it should do nothing and return. Obviously that's ""pos != (head)" in the list_for_each() macro is false after ..." > That's just plain bizarre. Created attachment 1370725 [details]
core dump
(In reply to xiaoli feng from comment #11) > Created attachment 1370725 [details] > core dump As I mentioned above I was able to reproduce the crash. I also verified the master list is actually fine. It looks like this crash is a plain old use after free due to stupidity on my part. Your usage scenario also demonstrates another false negative return with the retry on startup failure logic which essentially causes the master map read retry to be done unnecessarily. The recursive plus map include of /etc/auto.master within auto.master (when auto.master is in fact /etc/auto.master) is the cause. Not sure how to handle that in a sane way just yet but I really want to avoid these false negatives if possible. But I'm tempted to just fix the use after free for now and work out how best to handle the false negative later when I have time to do it properly. Ian Could you get build autofs-5.0.7-81.el7 either from brew web or the brew build tree and give it a try please. I think this will resolve the SEGV and autofs will start ok, all be it with a 10 second delay at startup with associated log noise due to the false negative failure. Ian Created attachment 1370770 [details]
Patch - fix use after free in do_master_list_reset()
(In reply to Ian Kent from comment #13) > Could you get build autofs-5.0.7-81.el7 either from brew web > or the brew build tree and give it a try please. > > I think this will resolve the SEGV and autofs will start ok, > all be it with a 10 second delay at startup with associated > log noise due to the false negative failure. > > Ian Yes. Now the test is running. I will post the result after finishing it. Now autofs service can be started successfully. But it's block when execute "automount -m". [21:28:26 root@ ~~]# automount -m lookup_nss_read_master: reading master files auto.master 100000000|do_init: parse(sun): init gathered global options: (null) 100000000|spawn_mount: mtab link detected, passing -n to mount block here~ (In reply to xiaoli feng from comment #16) > Now autofs service can be started successfully. But it's block when execute > "automount -m". > > [21:28:26 root@ ~~]# automount -m > lookup_nss_read_master: reading master files auto.master > 100000000|do_init: parse(sun): init gathered global options: (null) > 100000000|spawn_mount: mtab link detected, passing -n to mount > > block here~ Right. That's due to the changes for bug 1509088. I did check for this but I see there is a case I missed. I'll re-check and fix it up. LOL, both of these are good catches, well done. Thanks Ian Created attachment 1371207 [details]
Patch - fix deadlock in dumpmaps
fyi - the commit id in the patch description is the upstream
commit id.
I probably should have added these two changes to the related bugs but I didn't do that. I've added a "Blocks" for the two related bugs in an attempt to ensure that these changes are not missed by people looking at those bugs. As with revision 81 could you test revision 82 please. (In reply to Ian Kent from comment #20) > As with revision 81 could you test revision 82 please. I had tested on autofs-5.0.7-82.el7. And These two issues are gone. (In reply to xiaoli feng from comment #21) > (In reply to Ian Kent from comment #20) > > As with revision 81 could you test revision 82 please. > > I had tested on autofs-5.0.7-82.el7. And These two issues are gone. Thanks, I'll add this bug and build revision 82 to the errata. Oh joy, RPMDiff test failure, Execshield "lost GNU_RELRO security protection on ppc64 ppc64le" for all the binaries. It is not something I have done and I don't know what action I need to take. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0977 |