Bug 998366

Summary: Power service provider - Possible race condition in code
Product: Red Hat Enterprise Linux 7 Reporter: Robin Hack <rhack>
Component: openlmi-providersAssignee: Radek Novacek <rnovacek>
Status: CLOSED CANTFIX QA Contact: qe-baseos-daemons
Severity: low Docs Contact:
Priority: low    
Version: 7.0CC: jsafrane, jscotka, ovasik, psklenar, sgallagh
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-05 14:49:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 922084    

Description Robin Hack 2013-08-19 07:24:12 UTC
Description of problem:
Possible race condition - unclean shutdown.

Step to reproduce:

In file power.c in function state_change_thread (on line 126)
173:             succeeded =  system("reboot --force &") == 0;
Initializes system reboot.

From line 210:
    MUTEX_LOCK(powerStateChangeJob->power);
    powerStateChangeJob->power->transitioningToPowerState = LMI_AssociatedPowerManagementService_TransitioningToPowerState_No_Change;

to line 230
is code, which should not be reached if restart is faster (SSD disks?).

Version-Release number of selected component (if applicable):
Last git commit:
commit 6532d453d6d25b816c3e0c08de3d3cea46dce543

How reproducible:
always

Actual results:
Small window for race condition.

Expected results:
Maybe.. no window for race condition? :)

Comment 2 Robin Hack 2013-08-20 08:38:21 UTC
Maybe you should use:
http://www.freedesktop.org/wiki/Software/systemd/inhibit/(In reply to Robin Hack from comment #0)
> Description of problem:
> Possible race condition - unclean shutdown.
> 
> Step to reproduce:
> 
> In file power.c in function state_change_thread (on line 126)
> 173:             succeeded =  system("reboot --force &") == 0;
> Initializes system reboot.
> 
> From line 210:
>     MUTEX_LOCK(powerStateChangeJob->power);
>     powerStateChangeJob->power->transitioningToPowerState =
> LMI_AssociatedPowerManagementService_TransitioningToPowerState_No_Change;
> 
> to line 230
> is code, which should not be reached if restart is faster (SSD disks?).
> 
> Version-Release number of selected component (if applicable):
> Last git commit:
> commit 6532d453d6d25b816c3e0c08de3d3cea46dce543
> 
> How reproducible:
> always
> 
> Actual results:
> Small window for race condition.
> 
> Expected results:
> Maybe.. no window for race condition? :)

Maybe you should use for systemd:
http://www.freedesktop.org/wiki/Software/systemd/inhibit/

Comment 3 Jan Safranek 2013-11-01 12:21:20 UTC
Maybe I miss point here, if system reboots, I don't see any possibility that the provider reaches line 230 and the machine is already restarted. The machine must turn off itself first, i.e. clearing all memory and loading fresh kernel, CIMOM and the provider.

Comment 4 Robin Hack 2013-11-01 12:33:35 UTC
1) 
system("reboot --force &") == 0; is nonsense and need to find solution.
Developers should solve this in time.

2)
This is theoretical possibility of race condition.
It's not urgent (maybe wont fix).

Comment 5 Robin Hack 2013-11-01 12:38:06 UTC
3) What about suspend? What if machine run to sleep before provider sends ACK to client?

Comment 6 Jan Safranek 2013-11-01 12:48:23 UTC
1) agreed

2) reboot is graceful, i.e. systemd stops the CIMOM and the CIMOM stops the provider, finishing all operations which are in progress.

3) sleep is not graceful, i.e. the machine could really shut down itself before sending a response, I am not sure if there is a sane way out of this problem.

Comment 7 Radek Novacek 2014-01-20 12:27:59 UTC
I'm not sure what would be the proper solution of this bug. The provider should first reply to the client (via CIMOM) and then execute the operation. But I don't see any possibility for the provider to know when CIMOM finish processing the response.

In general, there are couple of cases that should be OK:
* graceful reboot
* graceful poweroff
Problematic cases are these:
* force reboot
* force poweroff
* sleep/hibernation

We can add some sleep to the working thread before executing the action, but is just workaround that will minimize the issue.

I'll move this to next release since it's not that urgent.

Comment 9 Radek Novacek 2014-09-05 14:49:38 UTC
I would say that not getting a reply when one want to execute 'reboot --force' is correct behaviour. If you want to know if the shutdown/reboot was successfully executed, don't use the '--force'.

Marking this as wontfix, it's not possible to fix this bug properly.