Bug 1312146
Summary: | sd_journal_sendv() seems to only allow 511 bytes for binary data | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | mulhern <amulhern> | ||||||||||||||
Component: | systemd | Assignee: | systemd-maint | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | qe-baseos-daemons | ||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||
Priority: | unspecified | ||||||||||||||||
Version: | 7.2 | CC: | amulhern, bblaskov, jsynacek, msekleta, systemd-maint-list | ||||||||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | systemd-219-30.el7_3.6.x86_64 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2016-12-05 11:56:01 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
mulhern
2016-02-25 21:14:12 UTC
Created attachment 1130662 [details]
Simple annotated reproducer
To work, requires an additional file, graph.bytes containing the characters "GRAPH=" followed by a bunch of binary data, more than 505 bytes worth.
After writing a bad message to the file: mulhern@dhcp-25-209 reproducers]$ journalctl --reverse --output=verbose Failed to get source realtime timestamp: Bad message -- Logs begin at Mon 2014-04-14 07:37:02 EDT, end at Thu 2016-02-25 17:34:39 EST lines 1-1/1 END i.e, journalctl thinks there are no entries in the log. but, without --reverse flag, journalctl --output=verbose locates many entries. This seems to happen in release 27, but maybe not in release 19. I wrote my self quick reproducer using python bindings for journald and it returns expected results on both RHEL7 and Fedora 22. I also tried to reproduce using reproducer you've attached (Thanks!) and again I was able to attach 1k binary blob to the journal entry without any issues. Mon 2016-04-18 04:26:03.236431 EDT [s=cbc60fbd8c684625aef3d1327d4dbe06;i=f82;b=95843d23dd7f4831840a3bcb4bc6ac19;m=198d6 _UID=0 _GID=0 _BOOT_ID=95843d23dd7f4831840a3bcb4bc6ac19 _MACHINE_ID=d20fb34accd74a158e40457bf1d641bf _TRANSPORT=journal _CAP_EFFECTIVE=1fffffffff _SYSTEMD_CGROUP=/ _AUDIT_LOGINUID=0 _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 _HOSTNAME=qeos-4.lab.eng.rdu2.redhat.com _AUDIT_SESSION=3 MESSAGE=TESTING BYTES CODE_FILE=repro.c CODE_LINE=29 CODE_FUNC=main SYSLOG_IDENTIFIER=a.out _COMM=a.out GRAPH=[1.0K blob data] _PID=7693 _SOURCE_REALTIME_TIMESTAMP=1460967963236431 Created attachment 1148129 [details]
Simple python reproducer
Closing for now, feel free to reopen in case you are still experiencing problem. I think I will have to reopen. I ran your python script and it worked fine, then...in trying to reproduce original bug, I completely corrupted my logs, presumably with the binary data I had written. Below you can see the entry from your script the first time I ran it, and nothing more. [mulhern-journal@megadeth pydevDAG]$ journalctl --user -- Logs begin at Mon 2016-04-18 08:25:29 EDT, end at Mon 2016-04-18 08:28:22 EDT. -- Apr 18 08:25:29 megadeth.lab.bos.redhat.com su[13323]: pam_unix(su:session): session closed for user root Apr 18 08:28:22 megadeth.lab.bos.redhat.com python[2088]: binary data [mulhern-journal@megadeth pydevDAG]$ systemd-cat echo "2016-04-18T14:16:24-0400" [mulhern-journal@megadeth pydevDAG]$ echo $? 0 [mulhern-journal@megadeth pydevDAG]$ journalctl --user -- Logs begin at Mon 2016-04-18 08:25:29 EDT, end at Mon 2016-04-18 08:28:22 EDT. -- Apr 18 08:25:29 megadeth.lab.bos.redhat.com su[13323]: pam_unix(su:session): session closed for user root Apr 18 08:28:22 megadeth.lab.bos.redhat.com python[2088]: binary data [mulhern-journal@megadeth pydevDAG]$ journalctl --verify --user 3910a8: invalid object File corruption detected at /var/log/journal/c2a1a2c9555048fab5453a08361c43dc/user-1002.journal:3910a8 (of 8388608 bytes, 44%). FAIL: /var/log/journal/c2a1a2c9555048fab5453a08361c43dc/user-1002.journal (Cannot assign requested address) I will see if I can reproduce it again once I've cleaned up. My C reproducer works exactly as before but I think it is a bit flawed because it reads the entire file as bytes, including the text part. The original motivation for writing binary data was to write zipped textual data, as the textual data was being cut off (bz#1306009), and because it would be annoying to users to have to scroll through large amounts of textual data in the journal. I will go back to the original problem, and try to reproduce the issue that way. Created attachment 1148606 [details]
strace results when writing 512
This is what strace looks like when writing 512 bytes. This invariably ends up corrupting the journal for this user. On RHEL7, all subsequent entries written to the log by this user are not available in the log.
Created attachment 1148608 [details]
strace when writing 511 bytes
This is much like with 512 bytes, but everything is OK, the log entry is in the journal for this user, and it continues to be possible to see the entries the user logs.
Created attachment 1148610 [details]
The exact graph.bytes file used by the reproducer.
This is the exact bytes file used by the reproducer.
Created attachment 1148613 [details]
An example corrupted journal
A very small journal file, only one log entry available via the usual mechanisms.
I decided there was no point in going back up to the original goal when the reproducer seemed so consistent. So, I ran the reproducer and got a bit more data. I have one insight. In the strace, the sendmsg call is different for your python test than for mine: sendmsg(3, {msg_name(29)={sa_family=AF_LOCAL, sun_path="/run/systemd/journal/socket"}, msg_iov(7)=[{"MESSAGE=binary data", 19}, {"\n", 1}, {"DATA=\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 1029}, {"\n", 1}, {"SYSLOG_IDENTIFIER=", 18}, {"python", 6}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) Note how the DATA message consists of "DATA=.....", whereas for my reproducer, the binary data field is chopped up into three parts. The reason for this is that my actually random binary data contains a '\n'. Put a '\n' in your own test data (hex is 0a), and you'll find that your message gets split into three parts in the sendmsg call. I still do not know why the change from 511 to 512 in total length made a difference for my data and not for yours, since the 1024 bytes still ended up in the journal, but it is a start. Thanks for getting back with valuable information! At the same time, I am sorry I tried to sweep this under the rug too early. We were dealing with journal corruptions in some other bug and I prepared some test rpms for RHEL7. Would be great if you could test with those. http://people.redhat.com/~msekleta/systemd-219-19.el7.0.bz1292447.0.src.rpm/ I installed your updates and tested with the original reproducer, against my original description of bug. Things look good. The entry was properly logged this time and there was no sign of the journal being corrupted after the entry was logged. I am contemplating whether we should close this as dup of 1292447. I guess we should, but OTOH this bug has clear reproducer which is lacking in other bug, so maybe moving to POST and then adding it to the erratum separately would be better. |