Bug 1328842

Summary: undercloud upgrade failed due to failed docker service
Product: Red Hat OpenStack Reporter: Ola Pavlenko <opavlenk>
Component: rhosp-directorAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: aschultz, dbecker, jpeeler, mburns, mcornea, morazi, opavlenk, rhel-osp-director-maint, sathlang
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-30 09:56:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
after_undercloud_upgrade_inerrupt.log none

Description Ola Pavlenko 2016-04-20 12:07:16 UTC
Description of problem:

openstack undercloud  upgrade" fails to complete after simulating power outage in the middle of undercloud upgrade. 
Fails to start the "docker-registry" service.
/usr/lib/systemd/system/docker-registry.service is empty
w/a reinstall docker-registry rpm and re-run the upgrade
lError: Could not start Service[docker-registry]: Execution of '/bin/systemctl start docker-registry' returned 1: Failed to start docker-registry.service:Unit docker-registry.service is masked.
Error: Could not start Service[docker-registry]: Execution of '/bin/systemctl start docker-registry' returned 1: Failed to start docker-registry.service:Unit               docker-registry.service is masked.
even after successful complettion of undercloud upgrade the service is down

Version-Release number of selected component (if applicable):

[stack@instack ~]$ rpm -qa | grep rhos
rhos-release-1.0.39-1.noarch
[stack@instack ~]$ rpm -qa | grep docker
docker-registry-0.9.1-7.el7.x86_64


How reproducible:


Steps to Reproduce:
1.install ospd 7 ga , deploy overcloud and populate it
2. run rhos-release -P 8-director and yum update on the undercloud node
3. run openstack undercloud upgrade 
4. during #3 shut down the undercloud node
5. start theundercloud node and repeat step #3

Actual results:
undercloud upgrade fails 

Expected results:
undercloud upgrade succeed

Additional info:
reinstalled the docker rpm and ran the upgrade command again with a successful result.

Comment 2 Jeff Peeler 2016-04-20 13:54:58 UTC
Did you actually reproduce this error? I would bet that something else breaks the next time a shutdown is attempted during an upgrade. I wish the output of "rpm -qV docker-registry" was collected before reinstalling the RPM.

If the docker-registry.service file was actually empty, then there's no way for the service to know how to start. And I wouldn't think this could ever happen other than interrupting the upgrade process violently.

(It appears that systemd masks the service with an empty service file, so that would explain that particular odd detail.)

My opinion is that if power failure is a scenario that is supposed to be handled gracefully, some sort of RPM transaction verification would need to be done for everything installed on the system.

Comment 3 Mike Burns 2016-04-20 14:05:39 UTC
I'd try a yum-complete-transaction to see if that fixes things.

If it does, then I think this is notabug since this is really just a "how to recover from a power outage" action.

Comment 4 Ola Pavlenko 2016-04-21 09:11:45 UTC
(In reply to Mike Burns from comment #3)
> I'd try a yum-complete-transaction to see if that fixes things.
> 
> If it does, then I think this is notabug since this is really just a "how to
> recover from a power outage" action.

Unfortunately I've reprovisioned the env already.
I'll reproduce the issue and will try the yum-complete-transaction

Comment 5 Ola Pavlenko 2016-04-26 14:42:01 UTC
Created attachment 1150987 [details]
after_undercloud_upgrade_inerrupt.log

Error during rerun of the upgrade :
Error: Could not start Service[docker-registry]: Execution of '/bin/systemctl start docker-registry' returned 1: Failed to start docker-registry.service: Unit docker-registry.service is masked.
Wrapped exception:
Execution of '/bin/systemctl start docker-registry' returned 1: Failed to start docker-registry.service: Unit docker-registry.service is masked.
Error: /Stage[main]/Main/Service[docker-registry]/ensure: change from stopped to running failed: Could not start Service[docker-registry]: Execution of '/bin/systemctl start docker-registry' returned 1: Failed to start docker-registry.service: Unit docker-registry.service is masked.



The undercloud upgrade was interrupted by shutting down the instack node.

rerunnig the undercloud upgrade ends with:

+ echo 'puppet apply exited with exit code 6'
puppet apply exited with exit code 6
+ '[' 6 '!=' 2 -a 6 '!=' 0 ']'
+ exit 6
[2016-04-26 09:51:53,773] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 6]
 
[2016-04-26 09:51:53,774] (os-refresh-config) [ERROR] Aborting...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 815, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 699, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 370, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)
RuntimeError: os-refresh-config failed. See log for details.
Command 'instack-install-undercloud' returned non-zero exit status 1



Tried to
[stack@instack ~]$ sudo yum-complete-transaction 
No unfinished transactions left.
and then rerun the upgrade ends with same output.

attached  file :  after_undercloud_upgrade_inerrupt.log 
/usr/lib/systemd/system/docker-registry.service was empty

Comment 6 Jeff Peeler 2016-04-26 15:41:54 UTC
When exactly is the undercloud node being shutdown? I'm curious how much (if any) running a "sync" before shutting down would help.

I'm unsure how an RPM transaction is listed as complete yet, as you have shown, the files are not properly present on disk. By my count it looks like 147 of 164 files didn't make it.

Comment 7 Ola Pavlenko 2016-04-27 06:54:25 UTC
(In reply to Jeff Peeler from comment #6)
> When exactly is the undercloud node being shutdown? I'm curious how much (if
> any) running a "sync" before shutting down would help.
> 
> I'm unsure how an RPM transaction is listed as complete yet, as you have
> shown, the files are not properly present on disk. By my count it looks like
> 147 of 164 files didn't make it.

The shutdown is done during the "openstack undercloud upgrade". 
During step#3 http://etherpad.corp.redhat.com/osp-d-upgrade-ohochman

I let it run for a couple secs, and then shutting down the node using virsh.

Comment 8 Sofer Athlan-Guyot 2017-01-30 09:56:27 UTC
Hi Ola,

I'm closing this issue as it didn't have any activity for a while.  Don't hesitate to re-open it if that issue is still relevant.

Regards,