1684236 – rsyslog-8.24.0-34.el7.x86_64 SIGSEGV when using rsyslog-elasticsearch-8.24.0-34

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1684236 - rsyslog-8.24.0-34.el7.x86_64 SIGSEGV when using rsyslog-elasticsearch-8.24.0-34

Summary: rsyslog-8.24.0-34.el7.x86_64 SIGSEGV when using rsyslog-elasticsearch-8.24.0-34

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	rsyslog
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	7.7
Assignee:	Rich Megginson
QA Contact:	Dalibor Pospíšil
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1594286
TreeView+	depends on / blocked

Reported:	2019-02-28 19:01 UTC by Anthony Zone
Modified:	2023-09-07 19:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:	rsyslog-8.24.0-41.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-31 19:10:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch (2.49 KB, patch) 2019-03-07 14:40 UTC, Rich Megginson	jvymazal: review+	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:1000	0	None	None	None	2020-03-31 19:10:48 UTC

Description Anthony Zone 2019-02-28 19:01:49 UTC

Description of problem:
Customer upgrades rsyslog and rsyslog-elasticsearch to version rsyslog-8.24.0-34.el7.x86_64 rsyslog-elasticsearch-8.24.0-34.  This appears to be due to a null pointer reference in rsyslog-elasticsearch package that is resolved upstream.

Version-Release number of selected component (if applicable):

rsyslog-8.24.0-34.el7.x86_64               
rsyslog-elasticsearch-8.24.0-34.el7.x86_64 


How reproducible:
I do not have an elasticsearch environment, but the customer is able to reproduce this every time they run rsyslog.

Steps to Reproduce:
1. Upgrade to rsyslog and rsyslog-elasticsearch version 8.24.0-34
2. Enable the rsyslog module 'module(load="omelasticsearch")' and forward logs to elasticsearch server.
3. Start rsyslog and it segfaults.

Actual results:
rsyslog segfaults on start.

Expected results:
rsyslog doesn't segfault.

Additional info:

(gdb) bt
#0  0x00007f920b1d4901 in __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:1949
#1  0x00007f920b851eb0 in fjson_tokener_parse_ex (tok=tok@entry=0x7f91f804f360, str=str@entry=0x0, len=len@entry=-1)
    at json_tokener.c:250
#2  0x00007f920b853d57 in fjson_tokener_parse_verbose (str=0x0, error=error@entry=0x7f9203318b84) <<<<---str is null
    at json_tokener.c:174
#3  0x00007f920b853dbe in fjson_tokener_parse (str=<optimized out>) at json_tokener.c:162
#4  0x00007f9209a1afd5 in checkResult (
    reqmsg=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., pWrkrData=0x7f91f8004350) at omelasticsearch.c:1375
#5  curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, 
    message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
#6  0x00007f9209a1b313 in submitBatch (pWrkrData=0x7f91f8004350) at omelasticsearch.c:1465
#7  0x0000562f6d426556 in actionTryCommit (pWti=0x562f6db41640, pThis=0x562f6dae8720) at ../action.c:1246
#8  actionCommit (pThis=pThis@entry=0x562f6dae8720, pWti=pWti@entry=0x562f6db41640) at ../action.c:1332
#9  0x0000562f6d42771b in processBatchMain (pVoid=0x562f6dae8720, pBatch=0x562f6db41670, pWti=0x562f6db41640)
    at ../action.c:1449
#10 0x0000562f6d41be9f in ConsumerReg (pThis=0x562f6db41150, pWti=0x562f6db41640) at queue.c:2008
#11 0x0000562f6d41891e in wtiWorker (pThis=pThis@entry=0x562f6db41640) at wti.c:365
#12 0x0000562f6d417a8d in wtpWorker (arg=0x562f6db41640) at wtp.c:410
#13 0x00007f920c06fdd5 in start_thread (arg=0x7f9203319700) at pthread_create.c:308
#14 0x00007f920b163ead in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f9203319700, max=<optimized out>)
    at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:154
#15 0x0000000000000000 in ?? ()

(gdb) frame 5
#5  curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, 
    message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
1449		CHKiRet(checkResult(pWrkrData, message));

(gdb) p pWrkrData
$1 = (wrkrInstanceData_t *) 0x7f91f8004350

(gdb) p pWrkrData.reply
$2 = 0x0 <<<<---null

1413	static rsRetVal
1414	curlPost(wrkrInstanceData_t *pWrkrData, uchar *message, int msglen, uchar **tpls, int nmsgs)
1415	{
1416		CURLcode code;
1417		CURL *curl = pWrkrData->curlPostHandle;
1418		DEFiRet;
1419	
1420		pWrkrData->reply = NULL;
1421		pWrkrData->replyLen = 0;
1422	
1423		CHKiRet(checkConn(pWrkrData));
1424		CHKiRet(setPostURL(pWrkrData, pWrkrData->pData, tpls));
1425	
1426		curl_easy_setopt(curl, CURLOPT_WRITEDATA, pWrkrData);
1427		curl_easy_setopt(curl, CURLOPT_POSTFIELDS, (char *)message);
1428		curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, msglen);
1429		code = curl_easy_perform(curl);
1430		if (   code == CURLE_COULDNT_RESOLVE_HOST
1431		    || code == CURLE_COULDNT_RESOLVE_PROXY
1432		    || code == CURLE_COULDNT_CONNECT
1433		    || code == CURLE_WRITE_ERROR
1434		   ) {
1435			STATSCOUNTER_INC(indexHTTPReqFail, mutIndexHTTPReqFail);
1436			indexHTTPFail += nmsgs;
1437			DBGPRINTF("omelasticsearch: we are suspending ourselfs due "
1438				  "to failure %lld of curl_easy_perform()\n",
1439				  (long long) code);
1440			ABORT_FINALIZE(RS_RET_SUSPENDED);
1441		}
1442	
1443		DBGPRINTF("omelasticsearch: pWrkrData replyLen = '%d'\n", pWrkrData->replyLen);
1444		if(pWrkrData->replyLen > 0) {
1445			pWrkrData->reply[pWrkrData->replyLen] = '\0'; /* Append 0 Byte if replyLen is above 0 - byte has been reserved in malloc */
1446		}
1447		DBGPRINTF("omelasticsearch: pWrkrData reply: '%s'\n", pWrkrData->reply);
1448	
1449		CHKiRet(checkResult(pWrkrData, message)); <<<<---pWrkrData->reply isn't ever checked to see if it's null


It looks like this issue may have been introduced in the patch from this bz https://bugzilla.redhat.com/show_bug.cgi?id=1565214 which replaced cJSON with libfastjson. I also found the following upstream issue with a similar bt https://github.com/rsyslog/rsyslog/issues/1885. Looks like the fix is in this patch https://github.com/rsyslog/rsyslog/commit/3d2740dd758aad86e857d9e7730bb8db2a23eeb8. I'm not 100% sure it was introduced by the json bz mentioned earlier, but it's possible that cJSON handled the null pointer in some way that didn't cause a segfault in previous versions.

Comment 1 Rich Megginson 2019-03-01 00:14:33 UTC

In trying to triage this issue, I'm wondering how it is possible for reply to be NULL.  Here is the code that sets reply:

static size_t
curlResult(void *ptr, size_t size, size_t nmemb, void *userdata)
{
	char *p = (char *)ptr;
	wrkrInstanceData_t *pWrkrData = (wrkrInstanceData_t*) userdata;
	char *buf;
	size_t newlen;

	newlen = pWrkrData->replyLen + size*nmemb;
	if((buf = realloc(pWrkrData->reply, newlen + 1)) == NULL) {
		DBGPRINTF("omelasticsearch: realloc failed in curlResult\n");
		return 0; /* abort due to failure */
	}
	memcpy(buf+pWrkrData->replyLen, p, size*nmemb);
	pWrkrData->replyLen = newlen;
	pWrkrData->reply = buf;
	return size*nmemb;
}

I think OOM is the only way for reply == NULL?  Can you confirm that you are seeing rsyslog OOM?  If not, can you possibly shed some light into how it is possible for reply to be NULL?

Comment 2 Rich Megginson 2019-03-01 02:02:24 UTC

hmm - looking at the sosreport in the linked case - I don't see anything like an oomkill - but the problem is that it is rsyslog that writes to /var/log/messages, so if rsyslog is having problems, messages will not be written.  Is it possible you could do some sort of scan using journalctl to look for memory or oom kill messages?

Comment 3 Jiří Vymazal 2019-03-01 07:44:28 UTC

Hi Anthony,

could you please provide relevant rsyslog config snippet needed to reproduce the issue?

Thanks

Comment 13 Rich Megginson 2019-03-07 14:40:12 UTC

Created attachment 1541851 [details]
patch

This patch uses option 1

Comment 21 errata-xmlrpc 2020-03-31 19:10:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1000

Note You need to log in before you can comment on or make changes to this bug.