+++ This bug was initially created as a clone of Bug #1162640 +++ Description of problem: supervsdm encounters a segfault in libgfapi, causing all VMs on that host to enter paused state and remain stuck there forever, even though supervdsm is restarted automatically (by systemd). High-availability VM's are neither migrated nor restarted. The paused VMs cannot be manually un-paused. Must manually kill and then restart the VM's. Version-Release number of selected component (if applicable): vdsm-4.16.7-1.gitdb83943.fc20.x86_64 vdsm-gluster-4.16.7-1.gitdb83943.fc20.noarch glusterfs-api-3.5.2-1.fc20.x86_64 How reproducible: Consistent since upgrade to ovirt release 3.5. Ovirt nodes remain operational for anywhere from 8 hours to as little as 30 minutes between instances of the above-described segfaults. It issue appears to occur when supervdsm is looping over all gluster volumes and retrieving their details. This occurs regularily (every 5 minutes it seems) usually without crashing, but then randomly does crash. It should also be noted that the volume being queried when the crash occurs is not consistent. It looks like each volume is queried twice, once plain (third arg of call to volumeStatus() is empty ('')) and then again with detail (third arg of call to volumeStatus() is 'detail'). The crash always comes during one of the 'detail' queries. Steps to Reproduce: 1. vm storage domain on glusterfs 2. run some VMs 3. wait Actual results: Here are the relevant log extracts that I can find: /var/log/messages: Nov 5 21:46:29 big kernel: supervdsmServer[11638]: segfault at 18 ip 00007f652c41115a sp 00007f5fd3b7cf40 error 4 in libgfapi.so.0.0.0[7f652c40a000+19000] /var/log/vdsm/supervdsm.log: (looping through getting information about each gluster volume) MainProcess|Thread-792::DEBUG::2014-11-05 21:46:29,015::supervdsmServer::101::SuperVdsm.ServerCallback::(wrapper) call wrapper with ('web', '', 'detail') {} MainProcess|Thread-792::DEBUG::2014-11-05 21:46:29,015::utils::738::root::(execCmd) /usr/sbin/gluster --mode=script volume status web detail --xml (cwd None) MainProcess|Thread-792::DEBUG::2014-11-05 21:46:29,323::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-792::DEBUG::2014-11-05 21:46:29,324::supervdsmServer::108::SuperVdsm.ServerCallback::(wrapper) return wrapper with {'bricks': [{'sizeTotal': '5722917.992', 'mntOptions': 'rw,relatime,ssd,space_cache', 'hostuuid': '6956bfd3-7a64-4789-9fda-d83bbea93924', 'device': '/dev/bcache0', 'blockSize': '4096', 'brick': 'big:/mnt/export/r0/web', 'sizeFree': '5624569.770', 'fsName': 'btrfs'}, {'sizeTotal': '5722917.992', 'mntOptions': 'rw,relatime,ssd,space_cache', 'hostuuid': 'e2a5191a-4d72-430f-9227-feb150163051', 'device': '/dev/bcache0', 'blockSize': '4096', 'brick': 'bang:/mnt/export/r0/web', 'sizeFree': '5619114.816', 'fsName': 'btrfs'}], 'name': 'web'} MainProcess|Thread-792::DEBUG::2014-11-05 21:46:29,325::supervdsmServer::101::SuperVdsm.ServerCallback::(wrapper) call wrapper with ('web',) {} [*** supervdsm crashed here and is restarted ***] MainThread::DEBUG::2014-11-05 22:55:21,655::netconfpersistence::134::root::(_getConfigs) Non-existing config set. MainThread::DEBUG::2014-11-05 22:55:21,655::netconfpersistence::134::root::(_getConfigs) Non-existing config set. MainThread::DEBUG::2014-11-05 22:55:21,722::supervdsmServer::411::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm MainThread::DEBUG::2014-11-05 22:55:21,722::supervdsmServer::420::SuperVdsm.Server::(main) Parsing cmd args MainThread::DEBUG::2014-11-05 22:55:21,722::supervdsmServer::423::SuperVdsm.Server::(main) Cleaning old socket /var/run/vdsm/svdsm.sock MainThread::DEBUG::2014-11-05 22:55:21,723::supervdsmServer::427::SuperVdsm.Server::(main) Setting up keep alive thread MainThread::DEBUG::2014-11-05 22:55:21,723::supervdsmServer::433::SuperVdsm.Server::(main) Creating remote object manager MainThread::DEBUG::2014-11-05 22:55:21,724::fileUtils::192::Storage.fileUtils::(chown) Changing owner for /var/run/vdsm/svdsm.sock, to (36:36) MainThread::DEBUG::2014-11-05 22:55:21,725::supervdsmServer::444::SuperVdsm.Server::(main) Started serving super vdsm object /var/log/vdsm/vdsm.log: Thread-792::DEBUG::2014-11-05 21:46:28,579::BindingXMLRPC::1132::vds::(wrapper) client [192.168.21.36]::call volumeStatus with ('web', '', '') {} flowID [3d9b0b87] Thread-792::DEBUG::2014-11-05 21:46:29,002::BindingXMLRPC::1139::vds::(wrapper) return volumeStatus with {'volumeStatus': {'bricks': [{'status': 'ONLINE', 'brick': 'big:/mnt/export/r0/web', 'pid': '4531', 'port': '50154', 'hostuuid': '6956bfd3-7a64-4789-9fda-d83bbea93924'}, {'status': 'ONLINE', 'brick': 'bang:/mnt/export/r0/web', 'pid': '20996', 'port': '50154', 'hostuuid': 'e2a5191a-4d72-430f-9227-feb150163051'}], 'nfs': [{'status': 'OFFLINE', 'hostname': '192.168.1.83', 'pid': '-1', 'port': 'N/A', 'hostuuid': '6956bfd3-7a64-4789-9fda-d83bbea93924'}, {'status': 'ONLINE', 'hostname': 'bang', 'pid': '21004', 'port': '2049', 'hostuuid': 'e2a5191a-4d72-430f-9227-feb150163051'}], 'shd': [{'status': 'ONLINE', 'hostname': '192.168.1.83', 'pid': '4547', 'hostuuid': '6956bfd3-7a64-4789-9fda-d83bbea93924'}, {'status': 'ONLINE', 'hostname': 'bang', 'pid': '21011', 'hostuuid': 'e2a5191a-4d72-430f-9227-feb150163051'}], 'name': 'web'}, 'status': {'message': 'Done', 'code': 0}} Thread-792::DEBUG::2014-11-05 21:46:29,014::BindingXMLRPC::1132::vds::(wrapper) client [192.168.21.36]::call volumeStatus with ('web', '', 'detail') {} flowID [3d9b0b87] [*** expecting "BindingXMLRPC::1139::vds::(wrapper) return volumeStatus with ..." to come next, but of course it doesn't come because supervsdm crashed.] Expected results: volumeStatus() query should return sucessfully (no crash, duh). Also, running VM's should not be affected by this random failed volume information query. Additional info: I have two identically configured compute nodes that both experience this issue since upgrading to Ovirt 3.5, either one may suffer the crash at any given time, and not at the same time. But eventually both will suffer the crash and wind up with all VM's paused. A core dump found in /var/log/core/core.538.1415241989.dump appears to be truncated and cannot produce a stack trace: # gdb core.538.1415241989.dump ... BFD: Warning: /var/log/core/core.538.1415241989.dump is truncated: expected core file size >= 19657957376, found: 14539485184. ... Failed to read a valid object file image from memory. Core was generated by `/usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f652c41115a in ?? () --- Additional comment from Dan Kenigsberg on 2014-11-12 10:09:17 EST --- Ian, can you attach vdsm.log? It may help me understand why the VMs paused, and why unpausing them failed. How often are the crashes? Can you reproduce them when gdb is attached to supervdsmd, and thus provide the backtrace? Darshan, I'm 99.9% sure that the segfault occurs in libgfapi, and not in vdsm or Python. Are you aware of such bug in libgfapi? If it cannot be solved there, we may need to disable this new feature in Vdsm. --- Additional comment from Ian Morgan on 2014-11-12 10:20:29 EST --- Dan, Here's the log you requested surrounding the previously described instance of the failure. This is a 16MB single-file extract (524K compressed) from the rotated log set. --- Additional comment from Ian Morgan on 2014-11-12 10:22:32 EST --- Take two: accidentally selected type text/plain last time, but the file is XZ compressed. --- Additional comment from Dan Kenigsberg on 2014-11-12 16:26:05 EST --- > Ian, how often are the crashes? Can you reproduce them when gdb is attached to > supervdsmd, and thus provide the backtrace? --- Additional comment from Ian Morgan on 2014-11-12 18:13:29 EST --- Dan, The crashes were frequent. At least a couple of times per day. But never on cue (not intentionally reproducible), and never while I was watching. It's been about two weeks since I did the upgrade, and have been cursing at it since, trying to find any plausible cause, and channelling my deepest Google-fu to find anyone experiencing the same issue. Each of the nodes had been restarted many times since the upgrade while trying to diagnose. I eventually gave up a few days ago and shut the whole cluster down (actually considered a switch to OpenStack, but that prospect made my scr*tum shrivel up real tight). Taking a deep breath, I started everything back up and on a lark, stopped all gluster volumes, disconnected the peers, added a 3rd dummy gluster node to the stack (no bricks, just glusterd to help with quorum, even though there has never been any communication issues between the two nodes), re-probed the peers, started up the volumes, started vdsmd on the nodes, started the engine, and attached gdb to the supervdsmd on one node. And of course now with gdb attached it's been running all day without fail.. sigh. The previous pattern was that it would never fail while I was working with it. Inevitably I would log out of the webadmin portal, and when I came back later I would find that it had crapped out shortly thereafter. I could never witness it failing in real-time. I will keep gdb attached (a watched pot/program never boils/crashes??) and hope for the best. Will report back when it finally decides to crash again, or if it doesn't. [fingers crossed] Of course if it doesn't crash now, it means that either something went haywire in my gluster volumes during the upgrade from 3.4 to 3.5, or that some odd metadata that 3.4 handled/ignored is making 3.5 happy, that a teardown and rebuild of the gluster stack fixed (maybe?) --- Additional comment from Ian Morgan on 2014-11-12 22:03:33 EST --- Backtrace showing crash in glfs_set_volfile_server --- Additional comment from Ian Morgan on 2014-11-12 22:36:02 EST --- Backtrace attached. It looks to me like supervdsm called into glfs_set_volfile_server() with a NULL fs argument, and then the segfault comes naturally when it tries to dereference &fs->ctx->cmd_args. That comes from gfapi.py's glfsInit(): def glfsInit(volumeId, host, port, protocol): fs = _glfs_new(volumeId) rc = _glfs_set_volfile_server(fs, protocol, host, port) Symptomatic cause of segfault: fs is generated from volumeID via _glfs_new() and then used without being error-checked for NULL. Root cause: still unknown. I can see that glfs_new() is being passed a valid volumeId ('gv1'). glfs_new() seems to have 4 places where it could potentially return NULL... Your insight is greatly appreciated. --- Additional comment from Ian Morgan on 2014-11-13 09:02:17 EST --- I've set up another run of gdb attached to supervdsmd, with breakpoints at each of the 4 places that glfs_new() can return NULL, to find out which one is the culprit. If you have any hints to help narrow down the cause more quickly, I'm all ears. --- Additional comment from Ian Morgan on 2014-11-13 11:24:16 EST --- I can't seem to get gdb to break where I ask it to. It keep breaking at some line just before or after the desired breakpoint, so I break at the point of an actual "return NULL" in order to inspect _which_ "return NULL" is causing the issue. Is this because of compiler optimization? What's the point of the -debuginfo packages if they aren't useful for debugging optimized code? Is there any way for me to do anything helpful now without having proper debug builds of all the relevant packages? I'm stuck at this point and don't know what more I can do without your development expertise on this stuff. Your insight is greatly appreciated. --- Additional comment from Dan Kenigsberg on 2014-11-13 14:18:22 EST --- Vdsm must check the return value of glfs_new(). This would give us better hints regarding why it fails and why. Please try with something like the following patch: diff --git a/vdsm/gluster/gfapi.py b/vdsm/gluster/gfapi.py index 2aee49b..1359091 100644 --- a/vdsm/gluster/gfapi.py +++ b/vdsm/gluster/gfapi.py @@ -50,6 +50,8 @@ class StatVfsStruct(ctypes.Structure): def glfsInit(volumeId, host, port, protocol): fs = _glfs_new(volumeId) + if fs is None: + raise ge.GlfsInitException(rc=7, err=['glfs_new(%s) failed' % volumeId]) rc = _glfs_set_volfile_server(fs, protocol, --- Additional comment from Ian Morgan on 2014-11-13 14:31:51 EST --- Implemented the above patch on both compute nodes. I should expect to see the 'glfs_new(<volumeId>) failed' message in the vdsm.log right? So now when that happens, what would you like me to report?
The arguments passed to set_vol_file_server() has to be verified, If arguments are not as expected then function should return with a proper return code.
I think this is a security issue and should get a CVE assigned by Red Hat.
REVIEW: http://review.gluster.org/9128 (api: Perform input validation in all functions.) posted (#1) for review on master by Vijay Bellur (vbellur)
REVIEW: http://review.gluster.org/9128 (api: Perform input validation in all functions.) posted (#2) for review on master by Vijay Bellur (vbellur)
(In reply to Sven Kieske from comment #2) > I think this is a security issue and should get a CVE assigned by Red Hat. At a quick glance, this doesn't seem to be user-triggerable is it? I think that is what would define whether or not this is a security issue. If an unprivileged user can in some way pass some input to cause this scenario to happen (perhaps some user can add metadata to a glusterfs volume that causes this?) then it would be considered a security issue. While the impact is pretty bad, I'm not sure it's a security issue. For instance, a bug in a kernel driver that causes the kernel to panic at random times isn't a security issue even though it takes the entire system down. It needs a way of crossing a trust boundary, so if an unprivileged user can cause this then it is a problem, but if an administrator on the host can make (whatever) changes to a glusterfs volume and can _also_ turn off virtual machines then there is no gain to them as they can already DoS those virtual machines to begin with. Does that make sense? I'm not familiar enough with glusterfs to know for certain which is the case here so if you can provide some input in that regard, then I can definitely let you know whether this is a pretty bad operational bug, or in fact a security issue. Can anyone provide any input to the above that would perhaps clarify?
(In reply to Vincent Danen from comment #5) > (In reply to Sven Kieske from comment #2) > > I think this is a security issue and should get a CVE assigned by Red Hat. > > At a quick glance, this doesn't seem to be user-triggerable is it? > > Can anyone provide any input to the above that would perhaps clarify? While we have not determined a root cause to the failure in bz#1162640, we have a patch that mitigates the segfault. That being said, it never felt like a security issue to me. I am not aware of any unprivileged user-input that could cause this.
(In reply to Ian Morgan from comment #6) > (In reply to Vincent Danen from comment #5) > > (In reply to Sven Kieske from comment #2) > > > I think this is a security issue and should get a CVE assigned by Red Hat. > > > > At a quick glance, this doesn't seem to be user-triggerable is it? > > > > Can anyone provide any input to the above that would perhaps clarify? > > While we have not determined a root cause to the failure in bz#1162640, we > have a patch that mitigates the segfault. That being said, it never felt > like a security issue to me. I am not aware of any unprivileged user-input > that could cause this. Thanks for the clarification. I just assumed users could craft these bad arguments in some way or another.
(In reply to Ian Morgan from comment #6) > (In reply to Vincent Danen from comment #5) > > (In reply to Sven Kieske from comment #2) > > > I think this is a security issue and should get a CVE assigned by Red Hat. > > > > At a quick glance, this doesn't seem to be user-triggerable is it? > > > > Can anyone provide any input to the above that would perhaps clarify? > > While we have not determined a root cause to the failure in bz#1162640, we > have a patch that mitigates the segfault. That being said, it never felt > like a security issue to me. I am not aware of any unprivileged user-input > that could cause this. Perfect, thank you for this, Ian. We'll leave it as-is then. It did not seem like a security issue to me either.
REVIEW: http://review.gluster.org/9128 (api: Perform input validation in all functions.) posted (#3) for review on master by Vijay Bellur (vbellur)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
I believe this bug was fixed a long time ago, and can be closed.
with glusterfs-6.x series.