This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 465964 - pads loses connection to prelude-manager
pads loses connection to prelude-manager
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: pads (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Grubb
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-07 09:43 EDT by Dominick Grift
Modified: 2008-10-28 14:14 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-28 14:14:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dominick Grift 2008-10-07 09:43:50 EDT
Description of problem:

When i start my system pads has a status of "online" in prewikka (prelude-manager)
But after just a few heartbeats its status becomes "missing"

i would have to restart pads again for its status to become online again, after which it will send a heartbeat or two and then becomes a missing analyzer again...

prelude-notify reports: missing agent: no heartbeat received for analyzer pads.
Comment 1 Steve Grubb 2008-10-15 12:56:21 EDT
Next time it does this, can you use strace -p to attach to the threads and see where its hanging? There should be 2 threads. Thanks.
Comment 2 Dominick Grift 2008-10-15 16:50:56 EDT
I tried it, hard to tell. strace -p <pid> just seems to keep going to me, even when it becomes missing a missing analyzer to prelude-manager.
Comment 3 Steve Grubb 2008-10-15 17:18:28 EDT
hmm...that is interesting. So, I guess the problem needs to be broken into 2 somehow. Either pads is not sending the heartbeat, or prelude-manager is not seeing them. I wonder if you can use tcpdump to find traffic between pads and prelude-manager? I think you can isolate it by telling tcpdump something like this tcpdump -i lo ip host 127.0.0.1 and port 4690. You would need everything else that talks to prelude-manager to be stopped so you don't get traffic between other sensors.
Comment 4 Dominick Grift 2008-10-16 04:33:56 EDT
sh-3.2# tcpdump -i lo dst 127.0.0.1 and tcp dst port 4690 -v
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
10:31:19.154911 IP (tos 0x0, ttl 64, id 39475, offset 0, flags [DF], proto TCP (6), length 729) localhost.localdomain.35109 > localhost.localdomain.prelude: P 463071581:463072258(677) ack 459819453 win 286 <nop,nop,timestamp 62289025 62159053>
10:31:19.155457 IP (tos 0x0, ttl 64, id 39476, offset 0, flags [DF], proto TCP (6), length 217) localhost.localdomain.35109 > localhost.localdomain.prelude: P 677:842(165) ack 1 win 286 <nop,nop,timestamp 62289026 62289025>
10:31:19.155762 IP (tos 0x0, ttl 64, id 39477, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35109 > localhost.localdomain.prelude: F, cksum 0x7d68 (correct), 842:842(0) ack 215 win 301 <nop,nop,timestamp 62289026 62289026>
10:31:22.282716 IP (tos 0x0, ttl 64, id 61593, offset 0, flags [DF], proto TCP (6), length 60) localhost.localdomain.35110 > localhost.localdomain.prelude: S, cksum 0x686c (correct), 2534961184:2534961184(0) win 32792 <sackOK,timestamp 62292153 0,mss 16396,nop,wscale 7>
10:31:22.282769 IP (tos 0x0, ttl 64, id 61594, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x3df2 (correct), ack 2534602531 win 257 <nop,nop,timestamp 62292153 62292153>
10:31:22.283360 IP (tos 0x0, ttl 64, id 61595, offset 0, flags [DF], proto TCP (6), length 145) localhost.localdomain.35110 > localhost.localdomain.prelude: P 0:93(93) ack 1 win 257 <nop,nop,timestamp 62292154 62292153>
10:31:22.283586 IP (tos 0x0, ttl 64, id 61596, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x3d44 (correct), ack 80 win 257 <nop,nop,timestamp 62292154 62292154>
10:31:22.283615 IP (tos 0x0, ttl 64, id 61597, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x3b4c (correct), ack 576 win 265 <nop,nop,timestamp 62292154 62292154>
10:31:22.290939 IP (tos 0x0, ttl 64, id 61598, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x39a3 (correct), ack 978 win 273 <nop,nop,timestamp 62292162 62292161>
10:31:22.291013 IP (tos 0x0, ttl 64, id 61599, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x396c (correct), ack 1032 win 273 <nop,nop,timestamp 62292162 62292162>
10:31:22.291605 IP (tos 0x0, ttl 64, id 61600, offset 0, flags [DF], proto TCP (6), length 559) localhost.localdomain.35110 > localhost.localdomain.prelude: P 93:600(507) ack 1032 win 273 <nop,nop,timestamp 62292162 62292162>
10:31:22.330841 IP (tos 0x0, ttl 64, id 61601, offset 0, flags [DF], proto TCP (6), length 517) localhost.localdomain.35110 > localhost.localdomain.prelude: P 600:1065(465) ack 1032 win 273 <nop,nop,timestamp 62292202 62292202>
10:31:22.379866 IP (tos 0x0, ttl 64, id 61602, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x3510 (correct), ack 1038 win 273 <nop,nop,timestamp 62292251 62292211>
10:31:22.379921 IP (tos 0x0, ttl 64, id 61603, offset 0, flags [DF], proto TCP (6), length 52) localhost.localdomain.35110 > localhost.localdomain.prelude: ., cksum 0x3365 (correct), ack 1416 win 282 <nop,nop,timestamp 62292251 62292251>
10:31:22.381831 IP (tos 0x0, ttl 64, id 61604, offset 0, flags [DF], proto TCP (6), length 297) localhost.localdomain.35110 > localhost.localdomain.prelude: P 1065:1310(245) ack 1416 win 282 <nop,nop,timestamp 62292252 62292251>
10:31:22.420853 IP (tos 0x0, ttl 64, id 61605, offset 0, flags [DF], proto TCP (6), length 729) localhost.localdomain.35110 > localhost.localdomain.prelude: P 1310:1987(677) ack 1416 win 282 <nop,nop,timestamp 62292292 62292292>
Comment 5 Steve Grubb 2008-10-23 11:18:26 EDT
Need to add: libprelude-config --pthread-cflags in configure to get libprelude's flags.
Comment 6 Steve Grubb 2008-10-28 09:56:03 EDT
pads-1.2-2 was built to hopefully solve this problem. Please check it from koji and let me know if that solves the problem. Thanks!

http://koji.fedoraproject.org/koji/buildinfo?buildID=67903
Comment 7 Dominick Grift 2008-10-28 13:21:21 EDT
Yes, looks like this solved it. Thanks you.
Comment 8 Steve Grubb 2008-10-28 14:14:41 EDT
Closing. Thanks for reporting the bug.

Note You need to log in before you can comment on or make changes to this bug.