Bug 1326497

Summary: libdevmapper: dm_task_get_info() backwards compatibility
Product: Red Hat Enterprise Linux 7 Reporter: Steve Schremmer <sschremm>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: libdevmapper QA Contact: cluster-qe <cluster-qe>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, heinzm, jbrassow, lvm-team, msnitzer, ng-hsg-engcustomer-iop-bz, prajnoha, prockai, sschremm, zkabelac
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1356957 (view as bug list) Environment:
Last Closed: 2016-04-13 08:10:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1356957    

Description Steve Schremmer 2016-04-12 19:56:47 UTC
Description of problem:
The NetApp sanlun command-line utility crashes when using libdevmapper under RHEL7.2.
  *** Error in `sanlun': free(): invalid next size (fast): 0x00007f06040009c0 ***

sanlun dynamically loads libdevmapper.so to get access to a few calls, including dm_task_get_info().

I believe I've tracked the issue back to the addition of 2 new fields to struct dm_info. 'int deferred_remove' and 'int internal_suspend' were added to the struct, causing the struct to grow, but the sanlun binary doesn't know this and only allocates enough memory for the older struct dm_info. When dm_task_get_info() memset's dm_info to zero, it uses the new larger size and steps on memory past the allocated region. 

libdevmapper is intended to be backwards compatible using versioned dm_task_get_info() functions, but it may not be working correctly. Does this version of libdevmapper contain the fix discussed here: https://www.redhat.com/archives/lvm-devel/2015-September/msg00000.html

readelf show the following symbols in libdevmapper:
# readelf -Ws /lib64/libdevmapper.so | grep dm_task_get_info | awk '{print $8}'
dm_task_get_info@Base
dm_task_get_info@@DM_1_02_97
dm_task_get_info_with_deferred_remove@@Base

Version-Release number of selected component (if applicable):
Name        : device-mapper-libs
Arch        : x86_64
Epoch       : 7
Version     : 1.02.107
Release     : 5.el7
 and also tested with
Version     : 1.02.107
Release     : 5.el7_2.1

Steps to Reproduce:
run 'sanlun lun show -pv' with NetApp storage attached.

Additional info:
deferred_removed added here: https://git.fedorahosted.org/cgit/lvm2.git/commit/?h=v2_02_130&id=42e07d2bceaad21f197de5dde9ead69742425159
internal_suspend added here: https://git.fedorahosted.org/cgit/lvm2.git/commit/?h=v2_02_149&id=797c18d543947f4c2777b4dcf3ceff57cb55352b
some discussion of fixing the symbol export control: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=82a27a85b54b68277e79de2fa9b4fadb01330497

Comment 1 Zdenek Kabelac 2016-04-12 20:15:37 UTC
Hi

Is your applicatoin properly requesting fully qualified 'dm_task_get_info@Base' (if you want the old one) ?

Please show us sample application code and which symbol you are requesting ?

If you are writing dynamically loadable plugin - you are unfortunately 'on your own boat' - since you can't use luxury provided by 'gcc' when linking versioned symbols automatically right.

The further question is - why you are is actually even trying to dynamically load
libdevmapper ?

If you have your own 'code' using libdevmapper - you should link this plugin with libdevmapper linked (and then you do not need to resolve magic yourself).

Otherwise you should just 'link' the library to have it portable - there is not much point to open 'library' dynamically and then use it as you will not save memory since unused code is not paged-in and will just complicate your code in major way.

Comment 3 Steve Schremmer 2016-04-12 20:57:35 UTC
I'm not sure the existing application code *knew* to ask for the old function.

It opens the library like this:
  if ((libdm_handle = dlopen(LIBDEVMAPPER_PATH, RTLD_NOW | RTLD_GLOBAL)) == NULL){ /* snipped out error path */ }

And requests the symbol like this:
  if ((lib_dm_task_get_info = (int (*)(struct dm_task *, struct dm_info *)) dlsym(libdm_handle, "dm_task_get_info")) == NULL){ /* error path */

I can't answer the "why" questions, as I wasn't involved in the original development of the application, and this issue has only shown itself in RHEL7.

Which of these is the "correct" symbol?
dm_task_get_info@Base
dm_task_get_info@@Base

Comment 4 Alasdair Kergon 2016-04-12 23:15:30 UTC
dlvsym() lets you specify the version

Comment 5 Alasdair Kergon 2016-04-13 01:08:40 UTC
If I understand this correctly:

After the code was compiled, we added 2 fields to the end of struct dm_info.

The code has not been changed since that happened.

The code expects to use the original version of the function viz. dm_task_get_info@Base.

dlsym() is actually picking up the new (now default) version, dm_task_get_info@@DM_1_02_97.

Our symbols have always been versioned, but it's only recently that we introduced more than one version of the same symbol.  Until dm version 1.02.97 everything had the version string "Base" and code got away with not specifying which version it needed to use because there was only one.  But strictly speaking, all code should have been using dlvsym(..., "Base") instead of dlsym() to make it future proof.

Symbol versioning is very poorly documented.  (And we have some code of our own that uses dlsym() so also needs making future proof.)

Comment 6 Alasdair Kergon 2016-04-13 01:17:14 UTC
(In reply to Steve Schremmer from comment #3)

> Which of these is the "correct" symbol?
> dm_task_get_info@Base
> dm_task_get_info@@Base

@@ simply indicates which version is the default.  Newly-compiled code will pick up that one which will be consistent with the current version of libdevmapper.h.

Comment 7 Zdenek Kabelac 2016-04-13 08:09:28 UTC
I'd suggest to redesign your code to 'dlopen' only just 'your' plugins.

Those plugins should be then correctly linked objects with full .so lib deps.
As example - see our 'dmeventd' our plugins - we have full control over them.
(yes we could switch dlvsym())

Those plugins are using i.e. device mapper library:

$ ldd ./libdevmapper-event-lvm2mirror.so 
	linux-vdso.so.1 (0x00007ffcac94e000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fe8d14db000)
	libdevmapper-event-lvm2.so.2.02 => /var/tmp/lvm/lib/libdevmapper-event-lvm2.so.2.02 (0x00007fe8d12d7000)
	libdevmapper.so.1.02 => /var/tmp/lvm/lib/libdevmapper.so.1.02 (0x00007fe8d1068000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe8d0ca1000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe8d0a83000)
	liblvm2cmd.so.2.02 => /var/tmp/lvm/lib/liblvm2cmd.so.2.02 (0x00007fe8d06a6000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fe8d047f000)
	libsepol.so.1 => /lib64/libsepol.so.1 (0x00007fe8d01e6000)
	libudev.so.1 => /lib64/libudev.so.1 (0x00007fe8d01c5000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fe8cfebb000)
	/lib64/ld-linux-x86-64.so.2 (0x000055ac34e7c000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fe8cfcb6000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fe8cfa73000)
	libdevmapper-event.so.1.02 => /var/tmp/lvm/lib/libdevmapper-event.so.1.02 (0x00007fe8cf86b000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fe8cf5f9000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fe8cf3f4000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fe8cf1d9000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe8cefc1000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fe8cedbc000)
	libattr.so.1 => /lib64/libattr.so.1 (0x00007fe8cebb6000)


Using libdevmapper with dlopen is more complicated and user has to always use full function name (with version).

Suggested links:

https://github.com/apitrace/apitrace/issues/258
https://www.akkadia.org/drepper/symbol-versioning
http://www.akkadia.org/drepper/dsohowto.pdf