Bug 1415290

Summary: A critical section read of the worker's heartbeat information was not protected with a mutex
Product: Red Hat CloudForms Management Engine Reporter: Joe Rafaniello <jrafanie>
Component: ApplianceAssignee: Joe Rafaniello <jrafanie>
Status: CLOSED CURRENTRELEASE QA Contact: Tasos Papaioannou <tpapaioa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.7.0CC: abellott, akarol, jhardy, obarenbo, simaishi, tpapaioa
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: worker
Fixed In Version: 5.8.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1415332 (view as bug list) Environment:
Last Closed: 2017-06-12 16:54:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1415332    

Description Joe Rafaniello 2017-01-20 18:52:58 UTC
Description of problem:

Summary:
A critical section access to the @workers hash was not protected with a read lock, meaning a read could occur while another thread was writing to it. Workers access @workers via a DRb thread running on the server process, while the server's main thread is also accessing it.

Note, this is probably not a huge problem because a slightly incorrect last_heartbeat value will probably not be something anyone would notice.

43e9676 Wrap critical @workers reads with a read lock
fbf7007 last_heartbeat is already nil, don't check it again
18a938f Add explanation of the missing stopping workers.
caf0e77 Initial worker heartbeat wasn't saving the record.
271856d Extract a method to read the last_heartbeat.

This is only partly related to this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1395736.
This PR doesn't resolve that BZ. I'll let @gtanzillo @Fryguy decide if we need euwe/yes on this or if a new BZ is needed...

Comment 2 Joe Rafaniello 2017-01-20 18:53:36 UTC
Above description comes up the upstream pull request to fix this:
https://github.com/ManageIQ/manageiq/pull/12932

Comment 4 Tasos Papaioannou 2017-06-05 14:22:39 UTC
Verified in 5.8.0.17.