Bug 737914

Summary: segmentation fault when searching for samba printer
Product: Red Hat Enterprise Linux 6 Reporter: Petr Sklenar <psklenar>
Component: system-config-printerAssignee: Zdenek Dohnal <zdohnal>
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: asn, gdeschner, ohudlick, prc, ralph-fedora453, sbose, thozza
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-06 09:33:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1356054, 1373253    
Attachments:
Description Flags
crash from abrt
none
valgrind --log-file=/tmp/valgr --trace-children=yes system-config-printer
none
valgr1, --main-stacksize=16000000
none
output from gdb
none
valgrind output, --main-stacksize=67108864 --max-stackframe=67108864
none
--debug before segmentation fault none

Description Petr Sklenar 2011-09-13 11:50:54 UTC
Created attachment 522912 [details]
crash from abrt

Description of problem:
segmentation fault when searching for samba printer

Version-Release number of selected component (if applicable):
system-config-printer-1.1.16-22.el6.s390x

How reproducible:
1 from 5 attempts

Steps to Reproduce:
1. s-c-printer
2. New, for new printer
3. Network printer
4. Windows Printer via SAMBA
5. Browse...
6. click to any machine in the new opened windows (see screenshot)
  
Actual results:
$ system-config-printer
params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf":
	No such file or directory
...
...
...
params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf.append":
	No such file or directory
Segmentation fault (core dumped)


Expected results:
no segmentation fault

Additional info:
samba is running on rhel4 machine

Comment 3 Tim Waugh 2011-09-13 14:00:16 UTC
From backtrace, seems to be due to libsmbclient.

Comment 4 Guenther Deschner 2011-09-14 10:36:06 UTC
Hm, from the backtrace it looks more like a crash happening within /lib64/nss_nis.so, reassigning to glibc, maybe it sounds familiar.

Comment 5 Andreas Schwab 2011-09-15 15:35:51 UTC
It crashes in the prologue, perhaps a stack overflow.

Comment 8 Andreas Schneider 2011-09-16 09:18:42 UTC
Could you run it with valgrind?

Comment 9 Petr Sklenar 2011-09-16 15:04:51 UTC
Created attachment 523567 [details]
valgrind --log-file=/tmp/valgr --trace-children=yes system-config-printer

there is log from valgrind in the attachment.

Comment 10 Andreas Schneider 2011-09-19 08:07:48 UTC
Wow.

==56986==  If you believe this happened as a result of a stack
==56986==  overflow in your program's main thread (unlikely but
==56986==  possible), you can try to increase the size of the
==56986==  main thread stack using the --main-stacksize= flag.
==56986==  The main thread stack size used in this run was 10485760.

Looking at the Samba code I don't see any stack variables which are big enough to use this big amount of memory. Could you run with 16M stacksize and if the error still occurs with 32M. This looks more like a python bug to me.

Comment 11 Petr Sklenar 2011-09-19 08:15:21 UTC
(In reply to comment #10)

> Looking at the Samba code I don't see any stack variables which are big enough
> to use this big amount of memory. Could you run with 16M stacksize and if the
> error still occurs with 32M. This looks more like a python bug to me.

Please could you point me how to do it?
Should I run it under valgrind with --main-stacksize=16M --max-stackframe=16M?

Comment 12 Petr Sklenar 2011-09-19 08:34:02 UTC
Created attachment 523805 [details]
valgr1, --main-stacksize=16000000

here it is:
valgrind --log-file=/tmp/valgr1 --trace-children=yes --main-stacksize=16000000 system-config-printer

Comment 13 Andreas Schneider 2011-09-19 08:43:50 UTC
Hi Petr,

I'm new and a bit confused.

This bug is reported against RHEL 6.2 but in comment #2 you say this is Samba version 3.0.33. Isn't there version 3.5.x in RHEL 6.2.

Could you please provide the following information describing the environment where this is happening:

* architecture
* rhel version
* samba version (rpm -qi samba)

To answer comment #11 I think the valgrind commaline should look like this:

valgrind --tool=memcheck -v --num-callers=20 --main-stacksize=16M --max-stackframe=16M <application>

or

valgrind --tool=memcheck -v --num-callers=20 --main-stacksize=16777216 --max-stackframe=16777216 <application>

Thanks,


 -- andreas

Comment 14 Andreas Schneider 2011-09-19 08:45:46 UTC
Looking at the valgrind lock from comment #12 it looks like it didn't segfault, so we really have a stack overflow here.

Comment 15 Andreas Schneider 2011-09-19 08:54:11 UTC
Petr, this looks like a stack overflow caused by the python code. Following the code path in samba I don't see any stack variables consuming several megabytes of stack space.

Comment 16 Sumit Bose 2011-09-19 09:26:03 UTC
Reassigning to system-config-printer which calls libsmb

Comment 17 Petr Sklenar 2011-09-19 10:13:31 UTC
Its all on rhel61 s390x

.qa.[root@s390x-6s-v1 tps]# rpm -qi samba
Name        : samba                        Relocations: (not relocatable)                                                     
Version     : 3.5.6                             Vendor: Red Hat, Inc.                                                         
Release     : 86.el6_1.4                    Build Date: Fri 05 Aug 2011 10:10:38 AM EDT                                       
Install Date: Sun 18 Sep 2011 11:39:52 AM EDT      Build Host: s390-006.build.bos.redhat.com                                  
Group       : System Environment/Daemons    Source RPM: samba-3.5.6-86.el6_1.4.src.rpm                                        
Size        : 19450265                         License: GPLv3+ and LGPLv3+                                                    
Signature   : RSA/8, Thu 18 Aug 2011 01:45:54 AM EDT, Key ID 199e2f91fd431d51                                                 
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>                                                             
URL         : http://www.samba.org/                                                                                           
Summary     : Server and Client software to interoperate with Windows machines                                                
Description :                                                                                                                 
                                                                                                                              
Samba is the suite of programs by which a lot of PC-related machines                                                          
share files, printers, and other information (such as lists of                                                                
available files and printers). The Windows NT, OS/2, and Linux                                                                
operating systems support this natively, and add-on packages can                                                              
enable the same thing for DOS, Windows, VMS, UNIX of all kinds, MVS,                                                          
and more. This package provides an SMB/CIFS server that can be used to                                                        
provide network services to SMB/CIFS clients.                                                                                 
Samba uses NetBIOS over TCP/IP (NetBT) protocols and does NOT                                                                 
need the NetBEUI (Microsoft Raw NetBIOS frame) protocol.                                                                      
.qa.[root@s390x-6s-v1 tps]# uname -a                                                                                          
Linux s390x-6s-v1.ss.eng.bos.redhat.com 2.6.32-131.12.1.el6.s390x #1 SMP Sun Jul 31 16:49:40 EDT 2011 s390x s390x s390x GNU/Linux                                                                                                                           
.qa.[root@s390x-6s-v1 tps]# rpm -q system-config-printer                                                                      
system-config-printer-1.1.16-22.el6.s390x 

---
I look for printers on samba network from s-c-printer on rhel6,s390x and s-c-printer crashes on rhel6.

There is samba network on rhel4, where is samba-3.0.33-0.34.el4.x86_64.

Comment 18 Tim Waugh 2011-09-20 10:51:23 UTC
Could you please run it again but under gdb?:

gdb --args /usr/bin/python /usr/share/system-config-printer/system-config-printer.py

Just "run" until it crashes as before, then do this:

while 1
info f
info locals
up
end

What output do you get?

Comment 19 Petr Sklenar 2011-09-20 11:56:19 UTC
Created attachment 524016 [details]
output from gdb

(In reply to comment #18)
> Could you please run it again but under gdb?:

there is requested output in the attachments

Comment 20 Tim Waugh 2011-09-20 12:20:56 UTC
Thanks.  I can't see anything nearly so big on the stack.

Stack frame addresses:
 0: 0x3ffff8855c0
52: 0x3ffff888ea0

which is a difference of just 0x38e0, i.e 14kb.

Could you try running this small program?  Does that crash?

python <<EOF
import smbc
c = smbc.Context ()
d = c.opendir ("smb://I386-4AS-M1/")
print d.getdents ()
EOF

Comment 21 Petr Sklenar 2011-09-20 13:36:58 UTC
(In reply to comment #20)
> Could you try running this small program?  Does that crash?

python <<EOF
import smbc
c = smbc.Context ()
d = c.opendir ("smb://I386-4AS-M1/")
print d.getdents ()
EOF
params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf":
        No such file or directory
params.c:OpenConfFile() - Unable to open configuration file "/root/.smb/smb.conf.append":
        No such file or directory
[<smbc.Dirent object "CVE-2010-0787" (File share) at 0x2000003f120>, <smbc.Dirent object "IPC$" (IPC share) at 0x2000003fc10>]

Comment 22 Tim Waugh 2011-09-20 14:19:42 UTC
Hmm, so that doesn't trigger it.

How about if you run valgrind repeatedly as in comment #9: is another thread always in _dl_close_worker() as below?  I wonder if that's a clue.

==56986== Process terminating with default action of signal 11 (SIGSEGV)
==56986==  Access not within mapped region at address 0xFF43000
==56986==    at 0x4017232: _dl_close_worker (dl-close.c:271)
==56986==    by 0x4017DFF: _dl_close (dl-close.c:754)
==56986==    by 0x401065B: _dl_catch_error (dl-error.c:178)
==56986==    by 0x42A581F: __libc_dlclose (dl-libc.c:47)
==56986==    by 0x42AF05D: free_mem (in /lib64/libc-2.12.so)
==56986==    by 0x42AED1F: __libc_freeres (in /lib64/libc-2.12.so)
==56986==    by 0x40266E9: _vgnU_freeres (vg_preloaded.c:62)
==56986==    by 0xCB24B43: cups_array_add (string3.h:59)
==56986==    by 0xCB4D2B7: _cupsStrAlloc (string.c:166)
==56986==    by 0xCB39513: ippReadIO (ipp.c:1475)
==56986==    by 0xCB39BE9: ippRead (ipp.c:1035)
==56986==    by 0xCB484CD: cupsGetResponse (request.c:401)

Comment 23 Ralph Corderoy 2011-09-20 20:30:50 UTC
I could be on completely the wrong track here, but based on bits from
various attachments...

    ==56986== Process terminating with default action of signal 11 (SIGSEGV)
    ==56986==  Access not within mapped region at address 0xFF3E8E8
    ==56986==    at 0xFF3E8E8: ???
    ==56986==    by 0x423B017: gaih_inet (getaddrinfo.c:715)
    ==56986==    by 0x423DF59: getaddrinfo (getaddrinfo.c:2159)
    ==56986==    by 0xCD3D4E7: internal_resolve_name (namequery.c:1213)
    ==56986==    by 0xCD3E6CD: resolve_name (namequery.c:1586)
    ==56986==    by 0xCC52AB7: SMBC_opendir_ctx (libsmb_dir.c:687)
    ==56986==    by 0xCACE7FB: Dir_init (dir.c:78)
    ==56986==    by 0xCACE2A1: Context_opendir (context.c:231)

(Note, I may not be using precisely the source for the S390 code that's
running.)

I think that's saying 0xff3e8e8 is valid but the access done there
isn't.  Since gaih_inet() jumps to a function pointer that's a
dlsym-lookup of gethostbyname4_r at #715, 0xff3e8e8 is probably in the
dynamically loaded library.

sysdeps/posix/getaddrinfo.c:
    269 static int
    270 gaih_inet (const char *name, const struct gaih_service *service,
    271            const struct addrinfo *req, struct addrinfo **pai,
    272            unsigned int *naddrs)
    273 {
    ...
    700           size_t tmpbuflen = 1024;
    701           char *tmpbuf = alloca (tmpbuflen);
    702
    703           while (!no_more)
    704             {
    705               no_data = 0;
    706               nss_gethostbyname4_r fct4
    707                 = __nss_lookup_function (nip, "gethostbyname4_r");
    708               if (fct4 != NULL)
    709                 {
    710                   int herrno;
    711
    712                   while (1)
    713                     {
    714                       rc = 0;
    715                       status = DL_CALL_FCT (fct4, (name, pat, tmpbuf,
    716                                                    tmpbuflen, &rc, &herrno,
    717                                                    NULL));
    718                       if (status == NSS_STATUS_SUCCESS)
    719                         break;
    720                       if (status != NSS_STATUS_TRYAGAIN
    721                           || rc != ERANGE || herrno != NETDB_INTERNAL)
    722                         {
    723                           if (status == NSS_STATUS_TRYAGAIN
    724                               && herrno == TRY_AGAIN)
    725                             no_data = EAI_AGAIN;
    726                           else
    727                             no_data = herrno == NO_DATA;
    728                           break;
    729                         }
    730                       tmpbuf = extend_alloca (tmpbuf,
    731                                               tmpbuflen, 2 * tmpbuflen);
    732                     }

It's in a loop at this point and everytime around it, the alloca'd
tmpbuf is doubled;  #730.  To continue around it, the looked-up
gethostbyname4_r() has to return NSS_STATUS_TRYAGAIN with an rc of
ERANGE and a herrno of NETDB_INTERNAL.  Taking
_nss_nis_gethostbyname4_r() as a possible resolution of the dlsym()
since it's mentioned in another attachment:

nis/nss_nis/nis-hosts.c:
    452 enum nss_status
    453 _nss_nis_gethostbyname4_r (const char *name, struct gaih_addrtuple **pat,
    454                            char *buffer, size_t buflen, int *errnop,
    455                            int *herrnop, int32_t *ttlp)
    456 {
    ...
    491   if (*pat == NULL)
    492     {
    493       uintptr_t pad = (-(uintptr_t) buffer
    494                        % __alignof__ (struct gaih_addrtuple));
    495       buffer += pad;
    496       buflen = buflen > pad ? buflen - pad : 0;
    497
    498       if (__builtin_expect (buflen < sizeof (struct gaih_addrtuple), 0))
    499         {
    500         erange:
    501           free (result);
    502           *errnop = ERANGE;
    503           *herrnop = NETDB_INTERNAL;
    504           return NSS_STATUS_TRYAGAIN;
    505         }
    506
    507       *pat = (struct gaih_addrtuple *) buffer;
    508       buffer += sizeof (struct gaih_addrtuple);
    509       buflen -= sizeof (struct gaih_addrtuple);
    510     }
    ...
    522   int parse_res = parse_line (result, &host, data, buflen, errnop, AF_UNSPEC,
    523                               0);
    524   if (__builtin_expect (parse_res < 1, 0))
    525     {
    526       if (parse_res == -1)
    527         {
    528           *herrnop = NETDB_INTERNAL;
    529           return NSS_STATUS_TRYAGAIN;
    530         }

#504 meets those criteria.  The erange label at #500 is because it's
jumped to from other points in the function.

Could these things be combining to consume lots of stack in some
circumstances?

Separately, one of the attachments does have a

    host = {h_name = 0x80d4fbb0 "I386-4AS-M1", h_aliases = 0x20006063264,
        h_addrtype = 1023, h_length = -6658768, h_addr_list = 0x3ffff9a663c}

in _nss_nis_gethostbyname4_r() in which the h_length looks odd.
-6658768 as a 42-bit two's complement is 0x3ffff9a6530, a stack address.
host is set up by #522's parse_line().

Comment 24 Tim Waugh 2011-09-21 09:32:08 UTC
Ralph: great analysis.  I think that in this case tmpbuflen is only 1024 at the point of the crash (see gdb log), so I'm not sure that goes far enough to explain it.

I'm starting to wonder whether valgrind is telling the truth about the 10Mb stack use -- there doesn't seem to be any evidence of it.

Comment 25 Ralph Corderoy 2011-09-21 09:57:38 UTC
Petr, could you attach /etc/hosts from that machine?  More to "eliminate it from the enquiry" than in expectation it will show an issue.

Comment 26 Petr Sklenar 2011-09-21 10:57:56 UTC
(In reply to comment #25)
> Petr, could you attach /etc/hosts from that machine?  More to "eliminate it
> from the enquiry" than in expectation it will show an issue.
/etc/hosts is almost empty:

# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Comment 29 Tim Waugh 2011-09-21 15:34:23 UTC
Created attachment 524232 [details]
valgrind output, --main-stacksize=67108864 --max-stackframe=67108864

Having had a look remotely, I still see it segfaulting in exactly the same way even with a 64M stack size (see log).

I'm also looking at this line in a new light now:
==34270==  The main thread stack size used in this run was 67108864.

I actually think that's telling us the static size of the available stack at the beginning of the run (i.e. the number I told it to set it to), not how much was in fact consumed by the program.

This is how valgrind reports a stack overflow:

==34563== Stack overflow in thread 1: can't grow stack to 0x7fe601000

So I think we can rule out stack overflow as the cause of this.

We can also rule out threading issues, so far as concurrency is concerned: I have observed this with only a single thread (i.e. just wait a while before hitting 'Browse' so the other worker thread fetching PPDs has a chance to finish).

The dlclose() thing I mentioned in comment #22 is a red herring as well: _vgnU_freeres() is just valgrind closing down the traced process.

From stepping through the program, it looks as though it might be relevant that we go through an auth callback just before this point, i.e.:

python (browse_smb_hosts)
-> pysmbc (Dir_init)
   -> libsmbclient (SMBC_opendir_ctx)
      -> pysmbc (auth_fn)
         -> python (pysmb.AuthContext.callback)

However, there was some weirdness even eliminating that (segfault on exit).

Comment 31 Ralph Corderoy 2011-09-24 14:24:40 UTC
Could we have the output from `system-config-printer --debug' just to see if that shows anything up.  No valgrind, etc., just the plain command.  Thanks.

Comment 32 Petr Sklenar 2011-09-26 08:39:22 UTC
Created attachment 524855 [details]
--debug before segmentation fault

system-config-printer --debug &> file

--debug before segmentation fault

Comment 33 RHEL Program Management 2011-10-07 16:07:16 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 36 Suzanne Logcher 2012-02-14 23:15:32 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 40 Tomáš Hozza 2017-09-06 09:33:44 UTC
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:
http://redhat.com/rhel/lifecycle

This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification.  Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com