Bug 604806

Summary: Double free in libblkid
Product: [Fedora] Fedora Reporter: Jerry James <loganjerry>
Component: util-linux-ngAssignee: Karel Zak <kzak>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: kzak
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-20 07:50:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jerry James 2010-06-16 18:47:16 UTC
Description of problem:
I have a homegrown application that uses libblkid.  Every once in awhile, the app will crash on take-off with glibc complaining about "double free or corruption", and a backtrace that shows execution several function calls deep into libblkid.  I finally managed to catch it while running under a debugger.  Here's a partial backtrace.  Please excuse me for censoring parts that correspond to proprietary software.

#0  0x00000036682329a5 in raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003668234185 in abort () at abort.c:92
#2  0x000000366826fe0b in __libc_message (do_abort=2, 
    fmt=0x3668343ab8 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x0000003668275726 in malloc_printerr (action=3, 
    str=0x3668343d60 "double free or corruption (!prev)", 
    ptr=<value optimized out>) at malloc.c:6283
#4  0x0000003673209288 in blkid_probe_reset_buffer (pr=0x7f35fc000d50)
    at probe.c:543
#5  0x00000036732093be in blkid_reset_probe (pr=0x7f35fc000d50) at probe.c:342
#6  0x000000367320b855 in blkid_verify (cache=0x1ceb490, 
    dev=<value optimized out>) at verify.c:194
#7  0x0000003673206593 in blkid_get_dev (cache=0x1ceb490, 
    devname=0x7f35fc0090a0 "/dev/sdb2", flags=<value optimized out>)
    at devname.c:81
#8  0x000000367320a80d in blkid_get_tag_value (cache=0x1ceb490, 
    tagname=0x414736 "UUID", devname=0x7f35fc0090a0 "/dev/sdb2")
    at resolve.c:45
[censored]

Mount says:

/dev/sdb2 on /home type ext4 (rw)

Here are the local variable values at various levels of the call stack, as well as the top-level innards of the cache structure.

(gdb) frame 4
#4  0x0000003673209288 in blkid_probe_reset_buffer (pr=0x7f35fc000d50)
    at probe.c:543
543			free(bf);
(gdb) info locals
bf = <value optimized out>
read_ct = 1
len_ct = 1024
(gdb) up
#5  0x00000036732093be in blkid_reset_probe (pr=0x7f35fc000d50) at probe.c:342
342		blkid_probe_reset_buffer(pr);
(gdb) info locals
i = <value optimized out>
(gdb) up
#6  0x000000367320b855 in blkid_verify (cache=0x1ceb490, 
    dev=<value optimized out>) at verify.c:194
194		blkid_reset_probe(cache->probe);
(gdb) info locals
st = {st_dev = 5, st_ino = 6071, st_nlink = 1, st_mode = 25008, st_uid = 0, 
  st_gid = 6, __pad0 = 0, st_rdev = 2066, st_size = 0, st_blksize = 4096, 
  st_blocks = 0, st_atim = {tv_sec = 1276703401, tv_nsec = 349981850}, 
  st_mtim = {tv_sec = 1276703400, tv_nsec = 202921625}, st_ctim = {
    tv_sec = 1276703400, tv_nsec = 202921625}, __unused = {0, 0, 0}}
diff = <value optimized out>
now = <value optimized out>
fltr = {0x1cecbc0 "ext4", 0x0}
fd = 9
(gdb) up
#7  0x0000003673206593 in blkid_get_dev (cache=0x1ceb490, 
    devname=0x7f35fc0090a0 "/dev/sdb2", flags=<value optimized out>)
    at devname.c:81
81			dev = blkid_verify(cache, dev);
(gdb) info locals
dev = 0x1cecab0
tmp = <value optimized out>
p = <value optimized out>
pnext = <value optimized out>
(gdb) up
#8  0x000000367320a80d in blkid_get_tag_value (cache=0x1ceb490, 
    tagname=0x414736 "UUID", devname=0x7f35fc0090a0 "/dev/sdb2")
    at resolve.c:45
45		if ((dev = blkid_get_dev(c, devname, BLKID_DEV_NORMAL)) &&
(gdb) info locals
found = <value optimized out>
dev = <value optimized out>
c = 0x1ceb490
ret = 0x0
(gdb) print *cache
$1 = {bic_devs = {next = 0x1ceb760, prev = 0x1cecab0}, bic_tags = {
    next = 0x1ceb870, prev = 0x1ceb950}, bic_time = 0, bic_ftime = 1276710630, 
  bic_flags = 4, bic_filename = 0x1ceb500 "/etc/blkid/blkid.tab", 
  probe = 0x7f35fc000d50}


Version-Release number of selected component (if applicable):
libblkid-2.17.2-5.fc13.x86_64

How reproducible:
Occasionally.  It mostly works, but fails in this way once in awhile.

Steps to Reproduce:
1. Run my app.  If it doesn't crash, run it again.
  
Actual results:
A double free error in libblkid.

Expected results:
No error.

Additional info:
Unfortunately, I cannot share the app with you, as it is proprietary (owned by my employer).

Comment 1 Karel Zak 2010-06-16 22:01:02 UTC
Jerry, run the application with BLKID_DEBUG= env. variable, for example:

  BLKID_DEBUG=0xffff /the/application

and send my the debug output. Please.

Comment 2 Jerry James 2010-06-23 20:40:46 UTC
It hasn't happened again yet.  However, I did trigger a segfault in libblkid while trying to catch the double free.  I'll file a separate bug report on that one.  Meanwhile, I'll keep trying to catch this with BLKID_DEBUG on.

Comment 3 Jerry James 2010-06-23 20:57:54 UTC
Wait, maybe the problem is that I'm an idiot.  Is libblkid thread safe?  I just noticed that 2 different threads can call into it concurrently.  I'll just wrap those calls with a mutex and see if it starts behaving better.

Comment 4 Karel Zak 2010-06-23 21:30:07 UTC
The library should be thread safe if you don't share the same blkid_cache or blkid_probe handlers between more threads. If you use mutex to access the cache or if you have cache per thread then there should not be a problem.

Unfortunately, BLKID_DEBUG is probably not thread safe. There is strerror() used in the debug messages... I'll fix it ASAP.

Comment 5 Jerry James 2010-06-24 01:23:01 UTC
There's the problem then: the blkid_cache is shared by the two threads.  I now have it protected with a mutex.  Hopefully that will resolve the issue.  Thanks for your help!

Comment 6 Karel Zak 2010-07-20 07:50:59 UTC
Closing, problem resolved (comment #5).