Bug 1210480
| Summary: | ESXi host smartstate analysis fails | ||
|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Jan Krocil <jkrocil> |
| Component: | SmartState Analysis | Assignee: | Joe Rafaniello <jrafanie> |
| Status: | CLOSED WORKSFORME | QA Contact: | Dave Johnson <dajohnso> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.4.0 | CC: | dajohnso, drieden, jhardy, jkrocil, obarenbo, tcarlin |
| Target Milestone: | GA | ||
| Target Release: | 5.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-04-30 17:35:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Krocil
2015-04-09 20:04:42 UTC
This does not happen with 5.3.z running on the same provider / in the same network. Thinking this is a symptom of bug 1207018 , need to retest when we have a fix for it. Assigning to Joe Rafaniello who is investigating bug 1207018. Jan, I have identified the most common symptoms in bug 1207018. That bug we're still tracking down but I've seen the broker function normally for several hours before it starts leaking. As long as you don't have these symptoms, you can run your test scenario and be sure it's not that bug causing your problem. Symptoms: CLOSE_WAIT TCP connections on the MiqVimBrokerWorker's DRb port. To get the DRb port of the broker: # bin/rake evm:status |grep Broker MiqVimBrokerWorker | started | 3554 | 20903 | 21028 | druby://127.0.0.1:47577 | 2015-04-23T17:36:15Z | 2015-04-23T17:39:53Z The port is 47577 in this case. As long as lsof is only showing ESTABLISHED or LISTEN, it's fine to do your test: # lsof -iTCP | grep 47577 ruby 20820 root 22u IPv4 5671690 0t0 TCP localhost:46273->localhost:47577 (ESTABLISHED) ruby 20820 root 23u IPv4 5672454 0t0 TCP localhost:46441->localhost:47577 (ESTABLISHED) ruby 20824 root 22u IPv4 5672425 0t0 TCP localhost:46439->localhost:47577 (ESTABLISHED) ruby 20824 root 23u IPv4 5671721 0t0 TCP localhost:46282->localhost:47577 (ESTABLISHED) ruby 20843 root 22u IPv4 5670427 0t0 TCP localhost:46066->localhost:47577 (ESTABLISHED) ruby 20843 root 23u IPv4 5670435 0t0 TCP localhost:46068->localhost:47577 (ESTABLISHED) ruby 20903 root 20u IPv4 5670056 0t0 TCP localhost:47577 (LISTEN) ruby 20903 root 23u IPv4 5670428 0t0 TCP localhost:47577->localhost:46066 (ESTABLISHED) ruby 20903 root 24u IPv4 5672426 0t0 TCP localhost:47577->localhost:46439 (ESTABLISHED) ruby 20903 root 25u IPv4 5670436 0t0 TCP localhost:47577->localhost:46068 (ESTABLISHED) ruby 20903 root 26u IPv4 5672455 0t0 TCP localhost:47577->localhost:46441 (ESTABLISHED) ruby 20903 root 28u IPv4 5671691 0t0 TCP localhost:47577->localhost:46273 (ESTABLISHED) ruby 20903 root 29u IPv4 5671722 0t0 TCP localhost:47577->localhost:46282 (ESTABLISHED) Dave, see comment 5... Note, comment 5 forgot to mention that lsof showing CLOSE_WAIT TCP connections on the broker's DRb (druby) port is the clear sign that you hit the bug 1207018. As long as you don't have this, you should be able to recreate the "ESXi host smartstate analysis fails" issue, provide logs and get it fixed without concern of the broker bug. Additionally, I have only seen bug 120718 occur if you have vmware capacity and utilization enabled so if you disable cap & u and do your smartstate analysis, you should be able to track down this issue in this bug... I am very confident the "broker is unavailable" would not be related to the CLOSE_WAIT/drb bug if you disable the cap and u for your test. typo, bug 120718, should have been bug 1207018 Working in 5.4.0.0.24.20150427192818_1fd9e49 for vSphere 5, 5.5. I believe this is due to the leaky file descriptor bug Clearing needinfo Awesome, thanks Dave/Thom! |