Hide Forgot
Created attachment 522912 [details] crash from abrt Description of problem: segmentation fault when searching for samba printer Version-Release number of selected component (if applicable): system-config-printer-1.1.16-22.el6.s390x How reproducible: 1 from 5 attempts Steps to Reproduce: 1. s-c-printer 2. New, for new printer 3. Network printer 4. Windows Printer via SAMBA 5. Browse... 6. click to any machine in the new opened windows (see screenshot) Actual results: $ system-config-printer params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf": No such file or directory ... ... ... params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf.append": No such file or directory Segmentation fault (core dumped) Expected results: no segmentation fault Additional info: samba is running on rhel4 machine
From backtrace, seems to be due to libsmbclient.
Hm, from the backtrace it looks more like a crash happening within /lib64/nss_nis.so, reassigning to glibc, maybe it sounds familiar.
It crashes in the prologue, perhaps a stack overflow.
Could you run it with valgrind?
Created attachment 523567 [details] valgrind --log-file=/tmp/valgr --trace-children=yes system-config-printer there is log from valgrind in the attachment.
Wow. ==56986== If you believe this happened as a result of a stack ==56986== overflow in your program's main thread (unlikely but ==56986== possible), you can try to increase the size of the ==56986== main thread stack using the --main-stacksize= flag. ==56986== The main thread stack size used in this run was 10485760. Looking at the Samba code I don't see any stack variables which are big enough to use this big amount of memory. Could you run with 16M stacksize and if the error still occurs with 32M. This looks more like a python bug to me.
(In reply to comment #10) > Looking at the Samba code I don't see any stack variables which are big enough > to use this big amount of memory. Could you run with 16M stacksize and if the > error still occurs with 32M. This looks more like a python bug to me. Please could you point me how to do it? Should I run it under valgrind with --main-stacksize=16M --max-stackframe=16M?
Created attachment 523805 [details] valgr1, --main-stacksize=16000000 here it is: valgrind --log-file=/tmp/valgr1 --trace-children=yes --main-stacksize=16000000 system-config-printer
Hi Petr, I'm new and a bit confused. This bug is reported against RHEL 6.2 but in comment #2 you say this is Samba version 3.0.33. Isn't there version 3.5.x in RHEL 6.2. Could you please provide the following information describing the environment where this is happening: * architecture * rhel version * samba version (rpm -qi samba) To answer comment #11 I think the valgrind commaline should look like this: valgrind --tool=memcheck -v --num-callers=20 --main-stacksize=16M --max-stackframe=16M <application> or valgrind --tool=memcheck -v --num-callers=20 --main-stacksize=16777216 --max-stackframe=16777216 <application> Thanks, -- andreas
Looking at the valgrind lock from comment #12 it looks like it didn't segfault, so we really have a stack overflow here.
Petr, this looks like a stack overflow caused by the python code. Following the code path in samba I don't see any stack variables consuming several megabytes of stack space.
Reassigning to system-config-printer which calls libsmb
Its all on rhel61 s390x .qa.[root@s390x-6s-v1 tps]# rpm -qi samba Name : samba Relocations: (not relocatable) Version : 3.5.6 Vendor: Red Hat, Inc. Release : 86.el6_1.4 Build Date: Fri 05 Aug 2011 10:10:38 AM EDT Install Date: Sun 18 Sep 2011 11:39:52 AM EDT Build Host: s390-006.build.bos.redhat.com Group : System Environment/Daemons Source RPM: samba-3.5.6-86.el6_1.4.src.rpm Size : 19450265 License: GPLv3+ and LGPLv3+ Signature : RSA/8, Thu 18 Aug 2011 01:45:54 AM EDT, Key ID 199e2f91fd431d51 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://www.samba.org/ Summary : Server and Client software to interoperate with Windows machines Description : Samba is the suite of programs by which a lot of PC-related machines share files, printers, and other information (such as lists of available files and printers). The Windows NT, OS/2, and Linux operating systems support this natively, and add-on packages can enable the same thing for DOS, Windows, VMS, UNIX of all kinds, MVS, and more. This package provides an SMB/CIFS server that can be used to provide network services to SMB/CIFS clients. Samba uses NetBIOS over TCP/IP (NetBT) protocols and does NOT need the NetBEUI (Microsoft Raw NetBIOS frame) protocol. .qa.[root@s390x-6s-v1 tps]# uname -a Linux s390x-6s-v1.ss.eng.bos.redhat.com 2.6.32-131.12.1.el6.s390x #1 SMP Sun Jul 31 16:49:40 EDT 2011 s390x s390x s390x GNU/Linux .qa.[root@s390x-6s-v1 tps]# rpm -q system-config-printer system-config-printer-1.1.16-22.el6.s390x --- I look for printers on samba network from s-c-printer on rhel6,s390x and s-c-printer crashes on rhel6. There is samba network on rhel4, where is samba-3.0.33-0.34.el4.x86_64.
Could you please run it again but under gdb?: gdb --args /usr/bin/python /usr/share/system-config-printer/system-config-printer.py Just "run" until it crashes as before, then do this: while 1 info f info locals up end What output do you get?
Created attachment 524016 [details] output from gdb (In reply to comment #18) > Could you please run it again but under gdb?: there is requested output in the attachments
Thanks. I can't see anything nearly so big on the stack. Stack frame addresses: 0: 0x3ffff8855c0 52: 0x3ffff888ea0 which is a difference of just 0x38e0, i.e 14kb. Could you try running this small program? Does that crash? python <<EOF import smbc c = smbc.Context () d = c.opendir ("smb://I386-4AS-M1/") print d.getdents () EOF
(In reply to comment #20) > Could you try running this small program? Does that crash? python <<EOF import smbc c = smbc.Context () d = c.opendir ("smb://I386-4AS-M1/") print d.getdents () EOF params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf": No such file or directory params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf.append": No such file or directory [<smbc.Dirent object "CVE-2010-0787" (File share) at 0x2000003f120>, <smbc.Dirent object "IPC$" (IPC share) at 0x2000003fc10>]
Hmm, so that doesn't trigger it. How about if you run valgrind repeatedly as in comment #9: is another thread always in _dl_close_worker() as below? I wonder if that's a clue. ==56986== Process terminating with default action of signal 11 (SIGSEGV) ==56986== Access not within mapped region at address 0xFF43000 ==56986== at 0x4017232: _dl_close_worker (dl-close.c:271) ==56986== by 0x4017DFF: _dl_close (dl-close.c:754) ==56986== by 0x401065B: _dl_catch_error (dl-error.c:178) ==56986== by 0x42A581F: __libc_dlclose (dl-libc.c:47) ==56986== by 0x42AF05D: free_mem (in /lib64/libc-2.12.so) ==56986== by 0x42AED1F: __libc_freeres (in /lib64/libc-2.12.so) ==56986== by 0x40266E9: _vgnU_freeres (vg_preloaded.c:62) ==56986== by 0xCB24B43: cups_array_add (string3.h:59) ==56986== by 0xCB4D2B7: _cupsStrAlloc (string.c:166) ==56986== by 0xCB39513: ippReadIO (ipp.c:1475) ==56986== by 0xCB39BE9: ippRead (ipp.c:1035) ==56986== by 0xCB484CD: cupsGetResponse (request.c:401)
I could be on completely the wrong track here, but based on bits from various attachments... ==56986== Process terminating with default action of signal 11 (SIGSEGV) ==56986== Access not within mapped region at address 0xFF3E8E8 ==56986== at 0xFF3E8E8: ??? ==56986== by 0x423B017: gaih_inet (getaddrinfo.c:715) ==56986== by 0x423DF59: getaddrinfo (getaddrinfo.c:2159) ==56986== by 0xCD3D4E7: internal_resolve_name (namequery.c:1213) ==56986== by 0xCD3E6CD: resolve_name (namequery.c:1586) ==56986== by 0xCC52AB7: SMBC_opendir_ctx (libsmb_dir.c:687) ==56986== by 0xCACE7FB: Dir_init (dir.c:78) ==56986== by 0xCACE2A1: Context_opendir (context.c:231) (Note, I may not be using precisely the source for the S390 code that's running.) I think that's saying 0xff3e8e8 is valid but the access done there isn't. Since gaih_inet() jumps to a function pointer that's a dlsym-lookup of gethostbyname4_r at #715, 0xff3e8e8 is probably in the dynamically loaded library. sysdeps/posix/getaddrinfo.c: 269 static int 270 gaih_inet (const char *name, const struct gaih_service *service, 271 const struct addrinfo *req, struct addrinfo **pai, 272 unsigned int *naddrs) 273 { ... 700 size_t tmpbuflen = 1024; 701 char *tmpbuf = alloca (tmpbuflen); 702 703 while (!no_more) 704 { 705 no_data = 0; 706 nss_gethostbyname4_r fct4 707 = __nss_lookup_function (nip, "gethostbyname4_r"); 708 if (fct4 != NULL) 709 { 710 int herrno; 711 712 while (1) 713 { 714 rc = 0; 715 status = DL_CALL_FCT (fct4, (name, pat, tmpbuf, 716 tmpbuflen, &rc, &herrno, 717 NULL)); 718 if (status == NSS_STATUS_SUCCESS) 719 break; 720 if (status != NSS_STATUS_TRYAGAIN 721 || rc != ERANGE || herrno != NETDB_INTERNAL) 722 { 723 if (status == NSS_STATUS_TRYAGAIN 724 && herrno == TRY_AGAIN) 725 no_data = EAI_AGAIN; 726 else 727 no_data = herrno == NO_DATA; 728 break; 729 } 730 tmpbuf = extend_alloca (tmpbuf, 731 tmpbuflen, 2 * tmpbuflen); 732 } It's in a loop at this point and everytime around it, the alloca'd tmpbuf is doubled; #730. To continue around it, the looked-up gethostbyname4_r() has to return NSS_STATUS_TRYAGAIN with an rc of ERANGE and a herrno of NETDB_INTERNAL. Taking _nss_nis_gethostbyname4_r() as a possible resolution of the dlsym() since it's mentioned in another attachment: nis/nss_nis/nis-hosts.c: 452 enum nss_status 453 _nss_nis_gethostbyname4_r (const char *name, struct gaih_addrtuple **pat, 454 char *buffer, size_t buflen, int *errnop, 455 int *herrnop, int32_t *ttlp) 456 { ... 491 if (*pat == NULL) 492 { 493 uintptr_t pad = (-(uintptr_t) buffer 494 % __alignof__ (struct gaih_addrtuple)); 495 buffer += pad; 496 buflen = buflen > pad ? buflen - pad : 0; 497 498 if (__builtin_expect (buflen < sizeof (struct gaih_addrtuple), 0)) 499 { 500 erange: 501 free (result); 502 *errnop = ERANGE; 503 *herrnop = NETDB_INTERNAL; 504 return NSS_STATUS_TRYAGAIN; 505 } 506 507 *pat = (struct gaih_addrtuple *) buffer; 508 buffer += sizeof (struct gaih_addrtuple); 509 buflen -= sizeof (struct gaih_addrtuple); 510 } ... 522 int parse_res = parse_line (result, &host, data, buflen, errnop, AF_UNSPEC, 523 0); 524 if (__builtin_expect (parse_res < 1, 0)) 525 { 526 if (parse_res == -1) 527 { 528 *herrnop = NETDB_INTERNAL; 529 return NSS_STATUS_TRYAGAIN; 530 } #504 meets those criteria. The erange label at #500 is because it's jumped to from other points in the function. Could these things be combining to consume lots of stack in some circumstances? Separately, one of the attachments does have a host = {h_name = 0x80d4fbb0 "I386-4AS-M1", h_aliases = 0x20006063264, h_addrtype = 1023, h_length = -6658768, h_addr_list = 0x3ffff9a663c} in _nss_nis_gethostbyname4_r() in which the h_length looks odd. -6658768 as a 42-bit two's complement is 0x3ffff9a6530, a stack address. host is set up by #522's parse_line().
Ralph: great analysis. I think that in this case tmpbuflen is only 1024 at the point of the crash (see gdb log), so I'm not sure that goes far enough to explain it. I'm starting to wonder whether valgrind is telling the truth about the 10Mb stack use -- there doesn't seem to be any evidence of it.
Petr, could you attach /etc/hosts from that machine? More to "eliminate it from the enquiry" than in expectation it will show an issue.
(In reply to comment #25) > Petr, could you attach /etc/hosts from that machine? More to "eliminate it > from the enquiry" than in expectation it will show an issue. /etc/hosts is almost empty: # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Created attachment 524232 [details] valgrind output, --main-stacksize=67108864 --max-stackframe=67108864 Having had a look remotely, I still see it segfaulting in exactly the same way even with a 64M stack size (see log). I'm also looking at this line in a new light now: ==34270== The main thread stack size used in this run was 67108864. I actually think that's telling us the static size of the available stack at the beginning of the run (i.e. the number I told it to set it to), not how much was in fact consumed by the program. This is how valgrind reports a stack overflow: ==34563== Stack overflow in thread 1: can't grow stack to 0x7fe601000 So I think we can rule out stack overflow as the cause of this. We can also rule out threading issues, so far as concurrency is concerned: I have observed this with only a single thread (i.e. just wait a while before hitting 'Browse' so the other worker thread fetching PPDs has a chance to finish). The dlclose() thing I mentioned in comment #22 is a red herring as well: _vgnU_freeres() is just valgrind closing down the traced process. From stepping through the program, it looks as though it might be relevant that we go through an auth callback just before this point, i.e.: python (browse_smb_hosts) -> pysmbc (Dir_init) -> libsmbclient (SMBC_opendir_ctx) -> pysmbc (auth_fn) -> python (pysmb.AuthContext.callback) However, there was some weirdness even eliminating that (segfault on exit).
Could we have the output from `system-config-printer --debug' just to see if that shows anything up. No valgrind, etc., just the plain command. Thanks.
Created attachment 524855 [details] --debug before segmentation fault system-config-printer --debug &> file --debug before segmentation fault
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com