Bug 1080165
| Summary: | [qpid] Changing system time might cause heartbeat-enabled connections to drop | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Rodrigo A B Freire <rfreire> | ||||||||
| Component: | qpid-cpp | Assignee: | Andrew Stitcher <astitcher> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Michal Toth <mtoth> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 2.4 | CC: | acavalla, astitcher, esammons, freznice, iboverma, jross, messaging-bugs, mtoth, pmoravec, zkraus | ||||||||
| Target Milestone: | 3.1 | Keywords: | Documentation, Improvement, Reproducer, WorkAround | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| URL: | http://comments.gmane.org/gmane.comp.java.openjdk.hotspot.runtime.devel/9155 | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | qpid-cpp-0.30-2 | Doc Type: | Enhancement | ||||||||
| Doc Text: |
Under some circumstances the qpid broker could get confused and drop connections if the system clock was changed. The Qpid broker timekeeping is now changed so that changing the system clock does not confuse internal timekeeping. The qpid broker timekeeping is now separated into two types: Wallclock time (as determined by the system time), which no longer affects the internal workings of the broker except to print timestamps; and Relative time (usually relative to the system boot time), which is used for internal timekeeping purposes.
|
Story Points: | --- | ||||||||
| Clone Of: | |||||||||||
| : | 1200545 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2015-04-14 13:47:41 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Rodrigo A B Freire
2014-03-24 19:37:42 UTC
-------------------
Trivial reproducer:
1) qpid-receive -a "testQueue; {create:always}" -f --connection-options="{heartbeat:1, reconnect:true}"
2) date -s "<any time in future>"
step 2 bounces connection
-------------------
Trivial workaround:
do OS time changes only when broker is stopped
-------------------
Some investigation:
Per tcpdump, broker - at the time of new keepalive probe to be sent - drops the TCP connection, sometimes preceded by connection.heartbeat being sent, sometimes not.
----------------
Potential patch:
I tested patch on upstream:
Index: cpp/src/qpid/sys/posix/Time.cpp
===================================================================
--- cpp/src/qpid/sys/posix/Time.cpp (revision 1580817)
+++ cpp/src/qpid/sys/posix/Time.cpp (working copy)
@@ -53,7 +53,7 @@
AbsTime AbsTime::now() {
struct timespec ts;
- ::clock_gettime(CLOCK_REALTIME, &ts);
+ ::clock_gettime(CLOCK_MONOTONIC, &ts);
AbsTime time_now;
time_now.timepoint = toTime(ts).nanosecs;
return time_now;
prevents this flaw, though I dont know what potential impact it can have elsewhere.
Well, one side-effect it has, unit_tests fail (TimerTest.cpp and valgrind reports "Conditional jump or move depends on uninitialised value(s)").
----------------
Raising proper mrg-* flags (as not sure if the customer plans to upgrade to 3.0 or stay on 2.*, I raised boths and drop the not-relevant later on).
(In reply to Pavel Moravec from comment #1) > ------------------- > Trivial reproducer: > 1) qpid-receive -a "testQueue; {create:always}" -f > --connection-options="{heartbeat:1, reconnect:true}" > 2) date -s "<any time in future>" > > step 2 bounces connection > > > ------------------- > Trivial workaround: > do OS time changes only when broker is stopped > > > ------------------- > Some investigation: > Per tcpdump, broker - at the time of new keepalive probe to be sent - drops > the TCP connection, sometimes preceded by connection.heartbeat being sent, > sometimes not. > > > ---------------- > Potential patch: > I tested patch on upstream: > > Index: cpp/src/qpid/sys/posix/Time.cpp > =================================================================== > --- cpp/src/qpid/sys/posix/Time.cpp (revision 1580817) > +++ cpp/src/qpid/sys/posix/Time.cpp (working copy) > @@ -53,7 +53,7 @@ > > AbsTime AbsTime::now() { > struct timespec ts; > - ::clock_gettime(CLOCK_REALTIME, &ts); > + ::clock_gettime(CLOCK_MONOTONIC, &ts); > AbsTime time_now; > time_now.timepoint = toTime(ts).nanosecs; > return time_now; > > prevents this flaw, though I dont know what potential impact it can have > elsewhere. I believe to fix this completely you need to have qpid::sys::Condition use CLOCK_MONOTONIC as well. By default pthread_cont_t uses the REALTIME clock so even if you generate AbsTime's with CLOCK_MONOTONIC, pthread_cond_timedwait may compare them with times taken from CLOCK_REALTIME so there may still be a problem. The other thing to consider is I think that not all systems support CLOCK_MONOTONIC so we may need to check that it's available and fall back to CLOCK_REALTIME if not. > Well, one side-effect it has, unit_tests fail (TimerTest.cpp and valgrind > reports "Conditional jump or move depends on uninitialised value(s)"). Its important to fix that but probably not very difficult. I think the major issue with using CLOCK_MONOTONIC is that we use the AbsTime class to generate date/times in a few places. I think we could no longer do that reliably. CLOCK_MONOTONIC is defined by POSIX to be present if _POSIX_MONOTONIC_CLOCK is defined so we could easily switch between CLOCK_MONOTONIC and CLOCK_REALTIME. In practice I think that POSIX systems have it nowadays. A fix for this issue has been checked into the qpid trunk: https://svn.apache.org/r1607140 Created attachment 998762 [details]
Qpid broker debug log
Created attachment 998763 [details]
Qpid broker trace log
Created attachment 999460 [details]
Qpid cpp,java clients trace logs
Looking at those logs it seems that the client is closing the connection normally, rather than missing a heartbeat and force closing the connection.
Viz -
2015-03-09 10:18:40 [Network] trace SENT [[[::1]:32871-localhost:5672]]: Frame[BEbe; channel=0; {ConnectionCloseBody: reply-code=200; reply-text=OK; }]
This frame sent by the client is asking for a regular connection close with no error.
Incidentally, the java logs are probably not related to any C++ bug.
I would therefore suggest this may be a bug in the qc2_connector program that makes it think the duration of 20s is finished when the second time change occurs.
As extra support for comment #25: Whe I use the qpid-receive program I find no problem: QPID_LOG_ENABLE="debug+" src/tests/qpid-receive --connection-options "{ heartbeat:'1' }" -a "test_heartbeat_connection_on_system_time_change_future;{'node': {'x-declare': {'auto-delete': True}, 'type': 'queue'}, 'create': 'always'}" --timeout 30 This causes the same behaviour in terms of queue creations etc and heartbeats. Ok, I see your point. Thank you very much. Marking as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0805.html |