Bug 499205 - Review Request: ctdb - A Clustered Database based on Samba's Trivial Database (TDB)
Review Request: ctdb - A Clustered Database based on Samba's Trivial Database...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: Package Review (Show other bugs)
5.0
All Linux
medium Severity medium
: rc
: ---
Assigned To: nobody nobody
:
Depends On:
Blocks: 188271 499241
  Show dependency treegraph
 
Reported: 2009-05-05 11:26 EDT by Sumit Bose
Modified: 2014-03-07 09:45 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-03-07 09:45:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sumit Bose 2009-05-05 11:26:34 EDT
Spec URL: http://cvs.fedoraproject.org/viewvc/rpms/ctdb/devel/ctdb.spec?revision=1.6
SRPM URL: http://kojipkgs.fedoraproject.org/packages/ctdb/1.0.79/2.fc12/src/ctdb-1.0.79-2.fc12.src.rpm
Description: CTDB is a cluster implementation of the TDB database used by Samba and other
projects to store temporary data. If an application is already using TDB for
temporary data it is very easy to convert that application to be cluster aware
and use CTDB instead.
Comment 1 Sumit Bose 2009-05-05 11:47:34 EDT
This package is needed to support clustered CIFS, see bz498691.
Comment 2 Jim Meyering 2009-05-19 09:21:40 EDT
I'm reviewing this now.
Comment 3 Jim Meyering 2009-05-19 10:22:36 EDT
This is building stock ctdb-1.0.79 on Fedora 10:
Parallel build fails some of the time:

  Compiling utils/smnotify/gen_smnotify.c
  Linking bin/smnotify
  gcc: poptparse.o: No such file or directory
  make: *** [bin/smnotify] Error 1

This is surely due to a missing dependency.
Comment 4 Jim Meyering 2009-05-19 10:24:25 EDT
Building on rawhide from git (ctdb-1.0.82-6-g92b5580), I see this:
(also with stock 1.0.79)

  checking linux/netfilter.h usability... no
  checking linux/netfilter.h presence... yes
  configure: WARNING: linux/netfilter.h: present but cannot be compiled
  configure: WARNING: linux/netfilter.h:     check for missing prerequisite headers?
  configure: WARNING: linux/netfilter.h: see the Autoconf documentation
  configure: WARNING: linux/netfilter.h:     section "Present But Cannot Be Compiled"
  configure: WARNING: linux/netfilter.h: proceeding with the compiler's result
  checking for linux/netfilter.h... no

may not be a big deal, but it should probably be fixed.
Comment 5 Jim Meyering 2009-05-19 10:25:53 EDT
FYI "make test" has failures (latest from git on rawhide, so perhaps not relevant to RHEL -- build/test on RHEL are coming up):

tests/simple/00_ctdb_init.sh                                             PASSED
tests/simple/00_ctdb_install_eventscript.sh                              PASSED
tests/simple/00_ctdb_onnode.sh                                           PASSED
tests/simple/01_ctdb_version.sh                                          PASSED
tests/simple/02_ctdb_listvars.sh                                         PASSED
tests/simple/03_ctdb_getvar.sh                                           PASSED
tests/simple/04_ctdb_setvar.sh                                           PASSED
tests/simple/05_ctdb_listnodes.sh                                        PASSED
tests/simple/06_ctdb_getpid.sh                                           PASSED
tests/simple/07_ctdb_process_exists.sh                                   FAILED
tests/simple/08_ctdb_isnotrecmaster.sh                                   PASSED
tests/simple/09_ctdb_ping.sh                                             PASSED
tests/simple/11_ctdb_ip.sh                                               FAILED
tests/simple/12_ctdb_getdebug.sh                                         PASSED
tests/simple/13_ctdb_setdebug.sh                                         PASSED
tests/simple/14_ctdb_statistics.sh                                       FAILED
tests/simple/15_ctdb_statisticsreset.sh                                  PASSED
tests/simple/16_ctdb_config_add_ip.sh                                    FAILED
tests/simple/17_ctdb_config_delete_ip.sh                                 PASSED
tests/simple/18_ctdb_freeze.sh                                           PASSED
tests/simple/19_ctdb_thaw.sh                                             PASSED
tests/simple/20_ctdb_getmonmode.sh                                       PASSED
tests/simple/21_ctdb_disablemonitor.sh                                   PASSED
tests/simple/22_ctdb_enablemonitor.sh                                    PASSED
tests/simple/23_ctdb_moveip.sh                                           PASSED
tests/simple/24_ctdb_getdbmap.sh                                         PASSED
tests/simple/25_dumpmemory.sh                                            PASSED
tests/simple/26_ctdb_config_check_error_on_unreachable_ctdb.sh           PASSED
tests/simple/31_ctdb_disable.sh                                          PASSED
tests/simple/32_ctdb_enable.sh                                           PASSED
tests/simple/41_ctdb_ban.sh                                              PASSED
tests/simple/42_ctdb_unban.sh                                            PASSED
tests/simple/51_ctdb_bench.sh                                            PASSED
tests/simple/52_ctdb_fetch.sh                                            FAILED
tests/simple/53_ctdb_transaction.sh                                      FAILED
tests/simple/61_ctdb_persistent_safe.sh                                  PASSED
tests/simple/62_ctdb_persistent_unsafe.sh                                PASSED
tests/simple/99_ctdb_uninstall_eventscript.sh                            PASSED

32/38 tests passed
make: *** [test] Error 1
[Exit 2]
Comment 6 Jim Meyering 2009-05-19 10:42:34 EDT
on F10, "make test" with 1.0.79 has 3 failures:

tests/simple/00_ctdb_init.sh                                             PASSED
tests/simple/00_ctdb_install_eventscript.sh                              PASSED
tests/simple/00_ctdb_onnode.sh                                           PASSED
tests/simple/01_ctdb_version.sh                                          PASSED
tests/simple/02_ctdb_listvars.sh                                         PASSED
tests/simple/03_ctdb_getvar.sh                                           PASSED
tests/simple/04_ctdb_setvar.sh                                           PASSED
tests/simple/05_ctdb_listnodes.sh                                        PASSED
tests/simple/06_ctdb_getpid.sh                                           PASSED
tests/simple/07_ctdb_process_exists.sh                                   PASSED
tests/simple/08_ctdb_isnotrecmaster.sh                                   PASSED
tests/simple/09_ctdb_ping.sh                                             PASSED
tests/simple/11_ctdb_ip.sh                                               FAILED
tests/simple/12_ctdb_getdebug.sh                                         PASSED
tests/simple/13_ctdb_setdebug.sh                                         PASSED
tests/simple/14_ctdb_statistics.sh                                       PASSED
tests/simple/15_ctdb_statisticsreset.sh                                  PASSED
tests/simple/16_ctdb_config_add_ip.sh                                    FAILED
tests/simple/17_ctdb_config_delete_ip.sh                                 PASSED
tests/simple/18_ctdb_freeze.sh                                           PASSED
tests/simple/19_ctdb_thaw.sh                                             PASSED
tests/simple/20_ctdb_getmonmode.sh                                       PASSED
tests/simple/21_ctdb_disablemonitor.sh                                   PASSED
tests/simple/22_ctdb_enablemonitor.sh                                    PASSED
tests/simple/23_ctdb_moveip.sh                                           PASSED
tests/simple/24_ctdb_getdbmap.sh                                         PASSED
tests/simple/25_dumpmemory.sh                                            PASSED
tests/simple/26_ctdb_config_check_error_on_unreachable_ctdb.sh           PASSED
tests/simple/31_ctdb_disable.sh                                          PASSED
tests/simple/32_ctdb_enable.sh                                           PASSED
tests/simple/41_ctdb_ban.sh                                              PASSED
tests/simple/42_ctdb_unban.sh                                            PASSED
tests/simple/51_ctdb_bench.sh                                            PASSED
tests/simple/52_ctdb_fetch.sh                                            FAILED
tests/simple/53_ctdb_transaction.sh                                      PASSED
tests/simple/61_ctdb_persistent_safe.sh                                  PASSED
tests/simple/62_ctdb_persistent_unsafe.sh                                PASSED
tests/simple/99_ctdb_uninstall_eventscript.sh                            PASSED

35/38 tests passed
make: *** [test] Error 1
Comment 7 Jim Meyering 2009-05-19 10:54:26 EDT
building 1.0.79 on rawhide with selinux in enforcing mode, I'm seeing many test failures e.g.,

Waiting for cluster to become healthy...
<120|........................................................................................................................*TIMEOUT*
==========================================================================
TEST FAILED: tests/simple/11_ctdb_ip.sh (status 1) (duration: 122s)

While this is rawhide, and not RHEL5.3, and I don't know the cause of these failures, I wonder...

considering 1.0.82+ passed these tests, should we be considering using something newer than 1.0.79?
Comment 8 Jim Meyering 2009-05-19 10:57:37 EDT
On rawhide, I noticed this in /var/log/messages:

May 20 17:30:22 vu kernel: ctdbd uses obsolete (PF_INET,SOCK_PACKET)
Comment 10 Jim Meyering 2009-05-19 13:30:38 EDT
popt is probably imported, but this looks very suspicious:

lib/popt/popt.c:            t = realloc(t, tn);
lib/popt/popt.c-            te = t + strlen(t);
lib/popt/popt.c-            strncpy(te, a, alen); te += alen;

when realloc fails, strlen(NULL) will segfault
Comment 11 Jim Meyering 2009-05-19 13:38:07 EDT
While it's only in test code, ...

This code from tests/src/ctdb_persistent.c will segfault with a NULL deref upon failed talloc_realloc_size: (it also leaks the original memory in that case):

static void check_counters(struct ctdb_context *ctdb, TDB_DATA data)
{
        ...
        if (old_data.dsize != data.dsize) {
                old_data.dsize = data.dsize;
                old_data.dptr = talloc_realloc_size(ctdb, old_data.dptr, old_data.dsize);
        }

        memcpy(old_data.dptr, data.dptr, data.dsize);
}
Comment 12 Jim Meyering 2009-05-19 13:44:24 EDT
in client/ctdb_client.c,

int ctdb_set_socketname(struct ctdb_context *ctdb, const char *socketname)
{
	ctdb->daemon.name = talloc_strdup(ctdb, socketname);
	return 0;
}

which can leave daemon.name NULL upon failed strdup,
yet there is an unprotected dereference of that member here:

int ctdb_socket_connect(struct ctdb_context *ctdb)
{
...
	strncpy(addr.sun_path, ctdb->daemon.name, sizeof(addr.sun_path));
Comment 13 Jim Meyering 2009-05-19 13:53:08 EDT
This is a similar problem (unchecked allocation failure)

int ctdb_parse_address(struct ctdb_context *ctdb,
		       TALLOC_CTX *mem_ctx, const char *str,
		       struct ctdb_address *address)
{
...
	address->address = talloc_strdup(mem_ctx, str);


because a caller of that function uses address.address without ensuring it's non-NULL:

from ctdb_server.c:

	if (ctdb_parse_address(ctdb, node, nstr, &node->address) != 0) {
		return -1;
	}
	node->ctdb = ctdb;
	node->name = talloc_asprintf(node, "%s:%u", 
				     node->address.address,
Comment 14 Jim Meyering 2009-05-19 15:02:56 EDT
This can deref NULL (ib/ibwrapper_test.c)

	tmp = talloc_strdup(tcx, optarg);
	/* hack to reuse the above ibw_initattr parser */
	if (ibwtest_parse_attrs(tcx, tmp, &attrs, &tcx->naddrs, op))

because the first thing the parsing function does is to deref tmp.

Here's another in the same file:

			tcx->opts = talloc_strdup(tcx, optarg);
			if (ibwtest_parse_attrs(tcx, tcx->opts, &tcx->attrs,

Another: in db_wrap.c:

	w->name = talloc_strdup(w, name);

when w->name becomes NULL, on a subsequent call, the preceding strcmp will dereference NULL.
Comment 15 Jim Meyering 2009-05-19 15:07:17 EDT
another potential NULL deref:
in server/ctdb_daemon.c

	domain_socket_name = talloc_strdup(talloc_autofree_context(), ctdb->daemon.name);
	talloc_set_destructor(domain_socket_name, unlink_destructor);	

if the strdup fails, unlink_destructor will call "unlink(NULL)"
Comment 16 Jim Meyering 2009-05-19 15:19:18 EDT
looking at ctdb_set_logfile,

		ctdb->log->logfile = talloc_strdup(ctdb, logfile);

I couldn't find any code that used that ->logfile member.
Trying to see if that use handled possibility of a NULL pointer.

Is it really used somewhere?
Comment 17 Jim Meyering 2009-05-19 15:42:33 EDT
in ctdb_set_logfile again, there is a potential deref of a possibly-NULL value returned by talloc_zero:

	ctdb->log = talloc_zero(ctdb, struct ctdb_log_state);

	log_state = ctdb->log;

	if (use_syslog) {
		do_debug_v = ctdb_syslog_log;
		ctdb->log->use_syslog = true;
	} else if (logfile == NULL || strcmp(logfile, "-") == 0) {
		do_debug_v = ctdb_logfile_log;
		ctdb->log->fd = 1;
Comment 18 Jim Meyering 2009-05-19 15:44:09 EDT

server/ctdb_server.c:   ctdb->transport = talloc_strdup(ctdb, transport);

later, the transport member may be dereferenced:

server/ctdb_daemon.c:   if (strcmp(ctdb->transport, "tcp") == 0) {
server/ctdb_daemon.c:   if (strcmp(ctdb->transport, "ib") == 0) {
server/ctdb_daemon.c:           DEBUG(DEBUG_ERR,("Failed to initialise transport '%s'\n", ctdb->transport));
Comment 19 Jim Meyering 2009-05-19 15:48:27 EDT
same here:

int ctdb_set_recovery_lock_file(struct ctdb_context *ctdb, const char *file)
{
	ctdb->recovery_lock_file = talloc_strdup(ctdb, file);
	return 0;
}

yet there are unprotected dereferences:

  ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file,
Comment 20 Jim Meyering 2009-05-19 15:56:22 EDT
node_list_file is set, but never used
Comment 21 Jim Meyering 2009-05-19 16:00:52 EDT
there are a few like this in server/ctdb_takeover.c

		ctdb_event_script(ctdb, "releaseip %s %s %u",
				  vnn->iface, 
				  talloc_strdup(ctdb, ctdb_addr_to_str(&vnn->public_address)),
				  vnn->public_netmask_bits);

where the NULL pointer can end up as parameter corresponding to %s format directive.  While that's ok with glibc's printf-style functions, it will cause a segfault (NULL-deref) on other systems.
Comment 22 Jim Meyering 2009-05-19 16:10:16 EDT
server/ctdb_takeover.c: data.dptr = (uint8_t *)talloc_strdup(state, ctdb_addr_to
_str(state->addr));
server/ctdb_takeover.c- data.dsize = strlen((char *)data.dptr)+1;
---------------
this looks suspicious:

server/ctdb_takeover.c: vnn->iface = talloc_strdup(vnn, iface);
server/ctdb_takeover.c- vnn->public_address      = *addr;
server/ctdb_takeover.c- vnn->public_netmask_bits = mask;
server/ctdb_takeover.c- vnn->pnn                 = -1;
server/ctdb_takeover.c- 
server/ctdb_takeover.c- DLIST_ADD(ctdb->vnn, vnn);

contrast with this one that *is* checked:

server/ctdb_takeover.c: arp->iface = talloc_strdup(arp, gratious_arp->iface);
server/ctdb_takeover.c- CTDB_NO_MEMORY(ctdb, arp->iface);
----------------------------
suspicious.  I'll let you investigate:

server/eventscript.c:   script->name  = talloc_strdup(script, name);
----------------------
very suspicious:

server/eventscript.c:           trbt_insert32(tree, (num<<16)|count++, talloc_st
rdup(tree, de->d_name));
------------------------
suspicious:

tools/ctdb.c:           pnn_node->addr = talloc_strdup(pnn_node, node);

tools/ctdb.c:           natgw_node->addr = talloc_strdup(natgw_node, node);


That completes the strdup pass.
Comment 23 Jim Meyering 2009-05-19 17:01:46 EDT
in tools/ctdb.c control_backupdb I see this:

	write(fh, &dbhdr, sizeof(dbhdr));
	write(fh, bd->records, bd->len);

	close(fh);

any of those calls may fail, and so they all should be checked.


In that same function, while most error conditions are handled like this:

	ret = tdb_transaction_start(ctdb_db->ltdb->tdb);
	if (ret == -1) {
		DEBUG(DEBUG_ERR,("Failed to start transaction\n"));
		talloc_free(tmp_ctx);
		return -1;
	}

one omits the talloc_free(tmp_ctx);

	ctdb_db = ctdb_attach(ctdb, argv[0], dbmap->dbs[i].persistent, 0);
	if (ctdb_db == NULL) {
		DEBUG(DEBUG_ERR,("Unable to attach to database '%s'\n", argv[0]));
		return -1;
	}

Since there are so many of those conditions, you might want to use a goto and a single exit point, rather than duplicating the talloc_free/return -1 sequence 8 or 9 times.
Comment 24 Jim Meyering 2009-05-19 17:04:42 EDT
there are many deref-NULL after failed talloc_zero, e.g.,

client/ctdb_client.c:   *ips = talloc_zero_size(mem_ctx, len);
client/ctdb_client.c-   (*ips)->num = ipsv4->num;

client/ctdb_client.c:   ctdb = talloc_zero(ev, struct ctdb_context);
client/ctdb_client.c-   ctdb->ev  = ev;

ib/ibw_ctdb.c:  buf2 = talloc_zero_size(conn, n);
ib/ibw_ctdb.c-  memcpy(buf2, buf, n);

ib/ibw_ctdb_init.c:             struct ctdb_ibw_msg *p = talloc_zero(cn, struct 
ctdb_ibw_msg);
ib/ibw_ctdb_init.c-             p->data = talloc_memdup(p, data, length);


please audit all uses
Comment 25 Jim Meyering 2009-05-19 17:11:49 EDT
There are uses of asprintf-like functions that are not checked for failure:

lib/tdb/tools/tdbtorture.c:             asprintf(&ptr,"xterm -e gdb /proc/%d/exe
 %d", getpid(), getpid());
lib/tdb/tools/tdbtorture.c-             system(ptr);
lib/tdb/tools/tdbtorture.c-             free(ptr);

when asprintf fails, it leaves "ptr" undefined, so you must not use it in any way.
---------------
similar:

lib/util/debug.c:       vasprintf(&s, format, ap);
...
lib/util/debug.c-       fprintf(stderr, "%s.%06u [%5u]: %s", tbuf, (unsigned)t.t
v_usec, (unsigned)getpid(), s);


I've stopped with those.  Please check the remaining uses.
Comment 26 Jim Meyering 2009-05-19 17:14:04 EDT
stumbled across these.
While dup2 doesn't often fail, when it does, you really should diagnose it, so that the primary cause is obvious:

server/ctdb_logging.c:          dup2(1, 2);
server/ctdb_logging.c:          dup2(p[1], 1);
server/ctdb_logging.c:  dup2(1, 2);
Comment 27 Jim Meyering 2009-05-20 01:51:27 EDT
Regarding comment #26, <https://bugzilla.redhat.com/499205#c26>, note that dup2 fails with EMFILE even less often when you have it reuse stdout and stderr, as is done in ctdb_logging.c (there is a race, so it is still possible).  However, even when there are plenty of file descriptors, it can fail with EINTR.
Comment 28 Jim Meyering 2009-05-20 01:56:20 EDT
utils/ping_pong/ping_pong.c

several functions can fail here, yet are not checked:
mmap and calloc can return NULL, ftruncate and fflush can fail.  e.g.,

  p = mmap(NULL, num_locks+1, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  ...
  c = p[i];


  val = (unsigned char *)calloc(num_locks+1, sizeof(unsigned char));
  ...
  val[i] = c;
Comment 29 Jim Meyering 2009-05-20 01:59:45 EDT
please add -Wp,-D_FORTIFY_SOURCE=2 -O2 to CFLAGS.

On new enough systems, that will make gcc warn whenever syscalls like read, write, ftruncate, asprintf, etc. are called but their return values are ignored.
Comment 30 Sumit Bose 2009-05-20 03:55:02 EDT
Regarding comment #10 (popt), would you mind reporting this to popt upstream. The current version still has no check if realloc fails.
Comment 31 Jim Meyering 2009-05-20 04:33:21 EDT
NULL-deref upon calloc failure:

lib/tdb/tools/tdbtorture.c:     pids = (pid_t *)calloc(sizeof(pid_t), num_procs);
lib/tdb/tools/tdbtorture.c-     pids[0] = getpid();

btw, that "(pid_t *)" cast is not needed or useful.
Comment 32 Jim Meyering 2009-05-20 06:20:28 EDT
I've reported the popt problems to upstream:

  http://marc.info/?l=popt-devel&m=124281329104232&w=2
Comment 33 Jim Meyering 2009-05-20 11:22:01 EDT
can you find a way to avoid the socket symlink in /tmp?

This looks like a minor security issue.
Currently, /tmp/ctdb.socket is the default "daemon.name" (aka CTDB_PATH), and the code can call chmod and chown on that name.  Since it's a well-known name, if ever someone manages to create it before the daemon, they can make it point anywhere they want, and the daemon will happily invoke chmod 0700 and/or chown (to euid+egid) through their symlink.
Comment 34 Jim Meyering 2009-05-20 11:42:36 EDT
in ibwrapper.c's ibw_process_init_attrs function, this line can cause a buffer overrun:

   sprintf(ibw_lasterr, "ibw_init: unknown name %s\n", name);

since I can create an attribute name:value pair with arbitrarily long "name" (created in ibwtest_parse_attrs), and that will overflow the fixed-size ibw_lasterr:

#define IBW_LASTERR_BUFSIZE 512
static char ibw_lasterr[IBW_LASTERR_BUFSIZE];

moral: use snprintf, not sprintf
and every use of snprintf should be sure to check for failure.
Otherwise, the resulting buffer will not be NUL-terminated, and using it may read beyond the end of the buffer.  There are a few cases of existing *nprintf uses like that.
Comment 35 Jim Meyering 2009-05-20 11:44:06 EDT
while looking at the parsing code above, I spotted this:

	attrs = (struct ibw_initattr *)talloc_size(tcx,
		n * sizeof(struct ibw_initattr));
	for(p = optext; *p!='\0'; p++) {
		if (porcess_next) {
			attrs[i].name = p;

If that talloc_size call fails, the in-loop attrs[i] will dereference NULL.
Please audit all talloc_size uses.
Comment 36 Jim Meyering 2009-05-20 12:11:30 EDT
memory leak?

in daemon_call_from_client_callback(struct ctdb_call_state *state)

There are two return points in addition to the fall-through one at the end.
Should there be a "talloc_free(dstate);" just before each of the explicit return statements, like there is at the end?
Comment 38 RHEL Product and Program Management 2014-03-07 08:40:04 EST
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Note You need to log in before you can comment on or make changes to this bug.