Bug 1089576
Summary: | segfault in automount | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Greg Earle <bugzilla.redhat.com> | ||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||
Status: | CLOSED ERRATA | QA Contact: | JianHong Yin <jiyin> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | eguan, ikent, rcritten, rmainz | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | autofs-5.0.5-94.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 783303 | Environment: | |||||
Last Closed: | 2014-10-14 08:18:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Greg Earle
2014-04-21 02:32:38 UTC
(In reply to Greg Earle from comment #0) > I think I'm hitting something very similar to now-closed bug #783303 on RHEL > 6.3. > > I first saw this happen with autofs-5.0.5-54.el6.i686 as shipped with RHEL > 6.3. > > I just updated autofs to autofs-5.0.5-88.el6.i686 and the bug is still there. > > On a RHEL 6.3 system that uses LDAP for authentication, I can reliably crash > "automount" with a SIGSEGV just by SIGHUP'ing it. And here I thought that > was just going to make it re-read its maps ;-) > > I started up the automounter and it spits out this error to the "messages" > file: > > automount[5622]: syntax error in map near [ master \ > ldap:ldap.fltops.jpl.nasa.gov:nisMapName=auto.home,ou=Services,ou=MER, > ou=Projects,dc=dir,dc=jpl,dc=nasa,dc=gov ] Right, so you have a syntax error in the master map, looks like a missing leading "/" on "master". > > but it comes up. Then I send it a HUP and it results in > > automount[5913]: segfault at 0 ip 0023cd2f sp b5dfafbc error 4 in > libc-2.12.so[110000+190000] > > Core was generated by `automount --pid-file /var/run/autofs.pid'. > Program terminated with signal 11, Segmentation fault. > #0 0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6 > Missing separate debuginfos, use: debuginfo-install autofs-5.0.5-88.el6.i686 > (gdb) where > #0 0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6 > #1 0x00cf796c in master_find_mapent () > #2 0x00cf38df in master_parse_entry () > #3 0x002abf6f in lookup_read_master () from /usr/lib/autofs/lookup_file.so > #4 0x00ce6af1 in ?? () > #5 0x00ce6c1b in ?? () > #6 0x00ce84a7 in lookup_nss_read_master () > #7 0x00cf9102 in master_read_master () > #8 0x00cd8cf6 in _start () Leaving out the "/" I can duplicate this, including with the current upstream. I'll see what I can find. Ian Created attachment 888341 [details]
Patch - fix reset flex scan buffer on init
Cheers Ian - I will notify the people that control the LDAP maps and let them know that map has a problem. Ultimately I am trying to solve this problem: I have 3 machines exhibiting bizarre behavior. At 8:45 PM PDT every night (despite my inability to find any 'cron' job referencing this particular time), each of them try to automount *every single host* in our NIS "hosts" map: automount[7066]: lookup_mount: exports lookup failed for [NIS hostname1] automount[7066]: lookup_mount: exports lookup failed for [NIS hostname2] automount[7066]: lookup_mount: exports lookup failed for [NIS hostname3] etc. It's driving me batty and no amount of SIGHUPs has solved it. When I saw a similar message on these LDAP-using hosts, I thought "Aha, I can try to debug it over there, since I'm only getting one of those "exports lookup failed" messages on those machines", as I mentioned previously. Apr 20 14:10:49 machine3 automount[2194]: rpc_get_exports_proto Apr 20 14:10:49 machine3 automount[2194]: lookup_mount: exports lookup failed for machine2 Apr 20 14:10:49 machine3 automount[2194]: key "machine2" not found in map source(s). Anyway, I'll get back to trying to figure out this head-scratcher and will wait in hopes that your 5.0.5-94 patch will float downstream to us RHEL 6.x users where x < 5 :-) That's what led me to discover this current bug I'm reporting. Oops, sorry - that last comment should've more accurately said automount[7066]: lookup_mount: exports lookup failed for [NIS hostname1] automount[7066]: update_negative_cache: key "NIS hostname1" not found in map. automount[7066]: lookup_mount: exports lookup failed for [NIS hostname2] automount[7066]: update_negative_cache: key "NIS hostname2" not found in map. automount[7066]: lookup_mount: exports lookup failed for [NIS hostname3] automount[7066]: update_negative_cache: key "NIS hostname3" not found in map. [... etc. to the end of the NIS "hosts" map entries ...] I think I may have found the problem with the update_negative_cache issue. (Typos in our "netgroup" file/map.) So I guess it's good that it existed to set up the circumstances under which I found this bug, but mentioning it here is just noise now. Unfortunately I can't edit out Comments 5 & 6. Anyway Ian thanks for the patch and I'll look forward to the forthcoming -94 version RPMs. (In reply to Greg Earle from comment #5) > Cheers Ian - I will notify the people that control the LDAP maps and let > them know that map has a problem. There's often a bit of confusion with this so it might be worth pointing out there are three types of autofs format map they can encounter (an amd map format parser is being added to autofs but I won't talk about that). First is the master map whose first field is the mount point for the autofs managed file system and so will always be a full path. Then each entry in the master map will refer to an autofs mount map which can be either an indirect or direct map. The first field of indirect maps is a single path component so has no path separators but the first field of direct maps is always the full path to the automount point so will always contain path separators. I expect you know this already but had to say it for completeness. Ian (In reply to Greg Earle from comment #5) > > Anyway, I'll get back to trying to figure out this head-scratcher and will > wait in hopes that your 5.0.5-94 patch will float downstream to us RHEL 6.x > users where x < 5 :-) That's usually not the way things work, bugs get fixed in the current release only. If there's a business case to back port a change then a back port needs to be requested through support and then it needs to be approved by project management. If approved then it is usually only for the release previous to the current release. The further back you need to go the harder it is to get approval. fyi, bugs logged directly in public bugzilla are lowest priority for engineering and are essentially best effort if time permits since GSS is the way customers are supposed to report problems. Ian Thanks for the clarification Ian. I'm a little puzzled in that sometimes I will see updated RPMs that are tagged with a name showing the current OS build (e.g. ".el6_5.${arch}.rpm" in the name) but then a newer RPM will come along that's apparently been back-ported and it goes back to having a ".el6.${arch}.rpm" name - which I assume means it's been compiled on a previous version. I realize that might not happen to automount but just in case it does I'll be ready ;-) (I also realize there's no business case for anything other than "bugs get fixed in the current release only" but some of us work in places *cough*GovernmentLabs*cough* that don't let us run the latest & greatest - which in the case of the current OS release 6.5 means that I avoided Heartbleed. I didn't file this via GSS because it didn't seem important enough to do so.) Anyway you can close this bug out. I found out that fixing the typo problem didn't seem to fix my issue (the automounters on 3 systems deciding simultaneously every day at 8:45 PM to try and automount every host in the NIS "hosts" map, for no obvious reason), so I have to get back to focussing on that oddity. (In reply to Greg Earle from comment #10) > Thanks for the clarification Ian. > > I'm a little puzzled in that sometimes I will see updated RPMs that are > tagged with a name showing the current OS build (e.g. ".el6_5.${arch}.rpm" > in the name) but then a newer RPM will come along that's apparently been > back-ported and it goes back to having a ".el6.${arch}.rpm" name - which I > assume means it's been compiled on a previous version. Upon each release, 6.3, 6.4, 6.5 etc., a branch is created in our code repository at the point of the release and work continues on the main branch toward the next release. These releases always have the ".el6.${arch}". Now, if for some reason a bug is approved for back porting to the previous release (usually not further) then the changes get applied to the specific release branch and get a revision number like ".el6_x.${arch}[.x]". The scheme is meant to ensure that revisions on the current release branch always have a higher revision number (from rpms' POV) than those on previous release branches. There's a fair bit more to this relating to how the revision is specified, what is appropriate for back port etc. but I won't go into that. > > I realize that might not happen to automount but just in case it does I'll > be ready ;-) Looking at the changelog "rpm -q[p] --changelog <package>" of each should tell the story of what has changed if you need to know. > > (I also realize there's no business case for anything other than "bugs get > fixed in the current release only" but some of us work in places > *cough*GovernmentLabs*cough* that don't let us run the latest & greatest - > which in the case of the current OS release 6.5 means that I avoided > Heartbleed. I didn't file this via GSS because it didn't seem important > enough to do so.) Sometimes there is a business case for back porting a change but it's a fair amount of effort for the reporter so the change has to be important enough for them to consider requesting it be back ported. Then there's the question of the magnitude of the change etc..... > > Anyway you can close this bug out. I found out that fixing the typo problem > didn't seem to fix my issue (the automounters on 3 systems deciding > simultaneously every day at 8:45 PM to try and automount every host in the > NIS "hosts" map, for no obvious reason), so I have to get back to focussing > on that oddity. I think the resolution of this bug is fairly clear and we have a bank of regression tests that attempt to ensure changes don't introduce regressions. We'll leave this open while we work through the QA process for the release it's included in and close it when that's complete. Ian (In reply to Greg Earle from comment #10) > > Anyway you can close this bug out. I found out that fixing the typo problem > didn't seem to fix my issue (the automounters on 3 systems deciding > simultaneously every day at 8:45 PM to try and automount every host in the > NIS "hosts" map, for no obvious reason), so I have to get back to focussing > on that oddity. That has to be something doing a scan of the directory tree. There's a couple of things that can affect this. If the autofs mount has the browse option then each host directory will be created within the autofs mount point and a directory scan will cause everything to be mounted. But if the browse option isn't present then there won't necessarily be a bunch of directories for such a scan (see BROWSE_MODE in the configuration for the default setting). If you enable debug logging the the pid of the process that is requesting the mount will be logged when the daemon receives the request. Perhaps you could use that to identify the errant process together with a frequent ps list capture around the time it occurs. Ian Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1587.html |