Bug 839605
Summary: | event message corrupted (may be because of valgrind: socketcall.sendto(msg) points to unaddressable byte(s)) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Friesse <jfriesse> | ||||
Component: | libqb | Assignee: | Angus Salkeld <asalkeld> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 17 | CC: | asalkeld, fdinitto, lhh, sdake | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-07-26 22:23:12 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jan Friesse
2012-07-12 12:25:50 UTC
Also please note that without valgrind, this causes data corruption in message (on client side) so it will ether ends up with incorrect sha1 hash or (most likely) segfault because NSS is not able to process message (msg len (in structure not in callback) is corrupted and says something about HUGE message) This is BLOCKER for corosync. Created attachment 598007 [details]
Patch to ensure problem is really in libqb event
Angus,
attached is patch to ensure that problem is really in libqb and not in corosync itself.
Also problem is not 100% reproducible but with following few commands, it's very easy to reproduce in short time (seconds):
1. corosync -f
2. while true;do ./cpg_test_agent;done
3. while true;do (echo "1:0:cpg_initialize:" ; sleep 0.2; echo "2:0:cpg_join:0:cts_group:"; sleep 1; echo "1:0:record_messages:" ; sleep 0.1; echo "2:0:msg_blaster:0:9000:"; sleep 1; for i in `seq 1 1`;do echo "2:0:read_messages:0:50:" ; sleep 0.01;done) | nc 127.0.0.1 9034;done
Result (in cpg_test_agent):
ERR: nid = 1797661194, pid = 12508, seq = 2081, size = (15046755946816602112 0xd0d0d0d000000000) msg_len = 532
Followed by "Segmentation fault" (because NSS is trying to compute SHA1 from HUGE unallocated data)
In other words, main problem is DATA CORRUPTION of events.
As you can see, msg_pt->len is totally corrupted (in my test environment usually with pattern 0xd0d0d0d000000000)
I'm able to reproduce problem (independently) on multiple computers (all with FB DIMM ECC memory) and/or VMs, on RHEL 6.3 and/or FC17.
This problem must be solved ASAP (Corosync 2.0 can't be used in production with this problem).
Lon added to CC because this bug blocks me. The valgrind error might be an irritation, but will not cause any issues. The socket is only used as a notifier (1 byte means 1 message) so that the client can put a socket in a poll loop. The data is never used. That said I'll sort it out. The real issues is that there is no actual message in the ringbuffer. #define QB_RB_CHUNK_MAGIC 0xA1A1A1A1 #define QB_RB_CHUNK_MAGIC_DEAD 0xD0D0D0D0 #define QB_RB_CHUNK_MAGIC_ALLOC 0xA110CED0 QB_RB_CHUNK_MAGIC_DEAD indicates that the space been looked at has already been reclaimed (like freed). I'll have a look on Monday. This is now fixed upstream: https://github.com/asalkeld/libqb/commit/e5be0396a7510d24b7e5e7a315c7f2f955e31452 I'll work to get it into fedora. libqb-0.14.1-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/libqb-0.14.1-1.fc17 Package libqb-0.14.1-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libqb-0.14.1-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-10851/libqb-0.14.1-1.fc17 then log in and leave karma (feedback). libqb-0.14.1-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |