Bug 1312146 - sd_journal_sendv() seems to only allow 511 bytes for binary data
sd_journal_sendv() seems to only allow 511 bytes for binary data
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: systemd-maint
: Reopened
Depends On:
  Show dependency treegraph
Reported: 2016-02-25 16:14 EST by mulhern
Modified: 2016-12-05 06:56 EST (History)
5 users (show)

See Also:
Fixed In Version: systemd-219-30.el7_3.6.x86_64
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-12-05 06:56:01 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Simple annotated reproducer (755 bytes, text/plain)
2016-02-25 16:21 EST, mulhern
no flags Details
Simple python reproducer (4.45 KB, text/x-python)
2016-04-18 04:38 EDT, Michal Sekletar
no flags Details
strace results when writing 512 (15.36 KB, text/plain)
2016-04-19 09:50 EDT, mulhern
no flags Details
strace when writing 511 bytes (15.36 KB, text/plain)
2016-04-19 09:52 EDT, mulhern
no flags Details
The exact graph.bytes file used by the reproducer. (822 bytes, application/octet-stream)
2016-04-19 09:57 EDT, mulhern
no flags Details
An example corrupted journal (8.00 MB, application/octet-stream)
2016-04-19 10:12 EDT, mulhern
no flags Details

  None (edit)
Description mulhern 2016-02-25 16:14:12 EST
Description of problem:

When writing to the journal using sd_journal_sendv() the size of binary data + the field name and the '=' seems to be limited to 511 bytes.

Version-Release number of selected component (if applicable):

Installed Packages
Name        : systemd
Arch        : x86_64
Version     : 219
Release     : 19.el7
Size        : 21 M
Repo        : installed
From repo   : RHEL-7-RTT

and also:

Installed Packages
Name        : systemd
Arch        : x86_64
Epoch       : 0
Version     : 219
Release     : 27.fc22
Size        : 24 M

How reproducible:

As far as I can tell, always.

Steps to Reproduce:
1. Use sd_journal_sendv to write an entry to the journal.
2. Ensure that, for one field with binary data, the combined length of the
field name and the field value and the '=' in between the two exceeds 511.
3. Look in the journal using "journalctl --reverse --output=verbose" and notice that the log entry is just not there. In release 27 a message is printed to stderr: Failed to get source realtime timestamp: Bad message.
4. Notice that the message does get put in the journal if the combined length is no greater than 511. Below is succesfully constructed log entry, with GRAPH field indicating its 505 bytes of binary data. 'GRAPH=' has length 6, 505 + 6 = 511.

Thu 2016-02-25 15:46:47.168702 EST [s=fc1c7f6dbefc4d2995b2dd850949824c;i=23fb43;
    GRAPH=[505B blob data]

Actual results:

Size of binary data for fields is uncomfortably limited.

Expected results:

My understanding is that it should be possibly to put much larger binary values into the journal. Somewhere in the documentation core dumps are mentioned, these
are generally a good deal bigger than 511 bytes.

Additional info:

This originally came to my notice using python-systemd. python-systemd's journal.sendv command behaves in exactly the same way as described above, so it is just passing on the behavior of the underlying systemd library call.
Comment 1 mulhern 2016-02-25 16:21 EST
Created attachment 1130662 [details]
Simple annotated reproducer

To work, requires an additional file, graph.bytes containing the characters "GRAPH=" followed by a bunch of binary data, more than 505 bytes worth.
Comment 3 mulhern 2016-02-25 17:39:00 EST
After writing a bad message to the file:

mulhern@dhcp-25-209 reproducers]$ journalctl --reverse --output=verbose 
Failed to get source realtime timestamp: Bad message
-- Logs begin at Mon 2014-04-14 07:37:02 EDT, end at Thu 2016-02-25 17:34:39 EST
lines 1-1/1 END

i.e, journalctl thinks there are no entries in the log.

but, without --reverse flag, journalctl --output=verbose locates many entries.

This seems to happen in release 27, but maybe not in release 19.
Comment 4 Michal Sekletar 2016-04-18 04:36:43 EDT
I wrote my self quick reproducer using python bindings for journald and it returns expected results on both RHEL7 and Fedora 22.

I also tried to reproduce using reproducer you've attached (Thanks!) and again I was able to attach 1k binary blob to the journal entry without any issues.

Mon 2016-04-18 04:26:03.236431 EDT [s=cbc60fbd8c684625aef3d1327d4dbe06;i=f82;b=95843d23dd7f4831840a3bcb4bc6ac19;m=198d6
    GRAPH=[1.0K blob data]
Comment 5 Michal Sekletar 2016-04-18 04:38 EDT
Created attachment 1148129 [details]
Simple python reproducer
Comment 6 Michal Sekletar 2016-04-18 04:40:45 EDT
Closing for now, feel free to reopen in case you are still experiencing problem.
Comment 7 mulhern 2016-04-18 14:25:29 EDT
I think I will have to reopen.

I ran your python script and it worked fine, then...in trying to reproduce original bug, I completely corrupted my logs, presumably with the binary data I had written. Below you can see the entry from your script the first time I ran it, and nothing more.

[mulhern-journal@megadeth pydevDAG]$ journalctl --user
-- Logs begin at Mon 2016-04-18 08:25:29 EDT, end at Mon 2016-04-18 08:28:22 EDT. --
Apr 18 08:25:29 megadeth.lab.bos.redhat.com su[13323]: pam_unix(su:session): session closed for user root
Apr 18 08:28:22 megadeth.lab.bos.redhat.com python[2088]: binary data
[mulhern-journal@megadeth pydevDAG]$ systemd-cat echo "2016-04-18T14:16:24-0400"
[mulhern-journal@megadeth pydevDAG]$ echo $?
[mulhern-journal@megadeth pydevDAG]$ journalctl --user
-- Logs begin at Mon 2016-04-18 08:25:29 EDT, end at Mon 2016-04-18 08:28:22 EDT. --
Apr 18 08:25:29 megadeth.lab.bos.redhat.com su[13323]: pam_unix(su:session): session closed for user root
Apr 18 08:28:22 megadeth.lab.bos.redhat.com python[2088]: binary data
[mulhern-journal@megadeth pydevDAG]$ journalctl --verify --user
3910a8: invalid object                                                                                                                        
File corruption detected at /var/log/journal/c2a1a2c9555048fab5453a08361c43dc/user-1002.journal:3910a8 (of 8388608 bytes, 44%).               
FAIL: /var/log/journal/c2a1a2c9555048fab5453a08361c43dc/user-1002.journal (Cannot assign requested address)

I will see if I can reproduce it again once I've cleaned up.
Comment 8 mulhern 2016-04-18 15:30:26 EDT
My C reproducer works exactly as before but I think it is a bit flawed because it reads the entire file as bytes, including the text part.

The original motivation for writing binary data was to write zipped textual data, as the textual data was being cut off (bz#1306009), and because it would be annoying to users to have to scroll through large amounts of textual data in the journal. I will go back to the original problem, and try to reproduce the issue that way.
Comment 9 mulhern 2016-04-19 09:50 EDT
Created attachment 1148606 [details]
strace results when writing 512

This is what strace looks like when writing 512 bytes. This invariably ends up corrupting the journal for this user. On RHEL7, all subsequent entries written to the log by this user are not available in the log.
Comment 10 mulhern 2016-04-19 09:52 EDT
Created attachment 1148608 [details]
strace when writing 511 bytes

This is much like with 512 bytes, but everything is OK, the log entry is in the journal for this user, and it continues to be possible to see the entries the user logs.
Comment 11 mulhern 2016-04-19 09:57 EDT
Created attachment 1148610 [details]
The exact graph.bytes file used by the reproducer.

This is the exact bytes file used by the reproducer.
Comment 12 mulhern 2016-04-19 10:12 EDT
Created attachment 1148613 [details]
An example corrupted journal

A very small journal file, only one log entry available via the usual mechanisms.
Comment 13 mulhern 2016-04-19 11:06:31 EDT
I decided there was no point in going back up to the original goal when the reproducer seemed so consistent. So, I ran the reproducer and got a bit more data.

I have one insight. In the strace, the sendmsg call is different for your python test than for mine:

sendmsg(3, {msg_name(29)={sa_family=AF_LOCAL, sun_path="/run/systemd/journal/socket"}, msg_iov(7)=[{"MESSAGE=binary data", 19}, {"\n", 1}, {"DATA=\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 1029}, {"\n", 1}, {"SYSLOG_IDENTIFIER=", 18}, {"python", 6}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL)

Note how the DATA message consists of "DATA=.....", whereas for my reproducer, the binary data field is chopped up into three parts. The reason for this is that my actually random binary data contains a '\n'. Put a '\n' in your own test data (hex is 0a), and you'll find that your message gets split into three parts in the sendmsg call. I still do not know why the change from 511 to 512 in total length made a difference for my data and not for yours, since the 1024 bytes still ended up in the journal, but it is a start.
Comment 14 Michal Sekletar 2016-04-19 11:23:07 EDT
Thanks for getting back with valuable information! At the same time, I am sorry I tried to sweep this under the rug too early.

We were dealing with journal corruptions in some other bug and I prepared some test rpms for RHEL7. Would be great if you could test with those.

Comment 15 mulhern 2016-04-20 11:19:22 EDT
I installed your updates and tested with the original reproducer, against my original description of bug.

Things look good. The entry was properly logged this time and there was no sign of the journal being corrupted after the entry was logged.
Comment 16 Michal Sekletar 2016-04-22 06:32:25 EDT
I am contemplating whether we should close this as dup of 1292447. I guess we should, but OTOH this bug has clear reproducer which is lacking in other bug, so maybe moving to POST and then adding it to the erratum separately would be better.

Note You need to log in before you can comment on or make changes to this bug.