Bug 1224672

Summary: httpd fails to start due to partially-rolled-back pki by engine-setup
Product: [Retired] oVirt Reporter: Yedidyah Bar David <didi>
Component: ovirt-engine-installerAssignee: Yedidyah Bar David <didi>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Novotny <pnovotny>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5CC: bugs, ecohen, gklein, lsurette, pnovotny, rbalakri, sbonazzo, yeylon, ylavi
Target Milestone: ---   
Target Release: 3.5.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-09 09:12:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2015-05-25 09:50:50 UTC
Description of problem:

PKI in engine-setup is not fully transactional. This means that on rollback, some things are left upgraded.

Under certain conditions, this leaves the system in bad state.

Version-Release number of selected component (if applicable):

Current 3.5 branch (prior to 3.5.3)

How reproducible:

Always

Steps to Reproduce:
1. Install and setup 3.4
2. Install 3.5 setup
3. Run engine-setup, and cause it to fail "cleanly", so that it manages to rollback
4. Run engine-setup again

Actual results:

It emits an error:

[ ERROR ] Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute

httpd/ssl_error_log has:

[Mon May 25 12:33:42 2015] [error] Unable to configure RSA server private key
[Mon May 25 12:33:42 2015] [error] SSL Library Error: 185073780 error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch

Expected results:

Should succeed

Additional info:

Currently, reproducing this requires the fix for bug 1224656, which should be in the 3.5.3 final build.

Comment 1 Yedidyah Bar David 2015-05-25 10:21:31 UTC
See also bug 1134219 .

We might decide to close current one and add it there to requirements for new pki.

Comment 2 Yedidyah Bar David 2015-05-25 13:30:13 UTC
workaround:

for key in /etc/pki/ovirt-engine/keys/*.key.nopass; do 
        p12="${key%.key.nopass}".p12
        echo "Extracting ${p12} key to ${key}"
        openssl pkcs12 -in "${p12}" -passin pass:mypass -passout pass: -nocerts -out "${key}" -nodes
done

Comment 3 Yedidyah Bar David 2015-06-04 12:28:39 UTC
Didn't verify, but I think that this bug is solved by Alon's recent patch [1] which was filed against bug 1214860. Do we still want to consider current patch [2], or just close duplicate bug 1134219, or move to modified (if indeed [1] solves)?

[1] https://gerrit.ovirt.org/#/q/Idd564f30ddc9c9bbfd79f70e0f89337ec0a65d61,n,z
[2] http://gerrit.ovirt.org/41642

Comment 4 Sandro Bonazzola 2015-06-08 07:07:25 UTC
(In reply to Yedidyah Bar David from comment #3)
> Didn't verify, but I think that this bug is solved by Alon's recent patch
> [1] which was filed against bug 1214860. Do we still want to consider
> current patch [2], or just close duplicate bug 1134219, or move to modified
> (if indeed [1] solves)?
> 
> [1]
> https://gerrit.ovirt.org/#/q/Idd564f30ddc9c9bbfd79f70e0f89337ec0a65d61,n,z
> [2] http://gerrit.ovirt.org/41642

Let's verify if [1] fixes and move to modified if it solves.

Comment 5 Yaniv Lavi 2015-08-20 11:26:14 UTC
Can you try this on 3.6 latest build and let us know if this works?

Comment 6 Pavel Novotny 2015-08-26 11:50:36 UTC
(In reply to Yaniv Dary from comment #5)
> Can you try this on 3.6 latest build and let us know if this works?

At this moment I cannot test this scenario in 3.6 because of bug 1254639.

Comment 7 Yaniv Lavi 2015-08-30 09:03:44 UTC
Restoring needinfo until tested.

Comment 8 Pavel Novotny 2015-09-16 16:09:00 UTC
I didn't reproduce this during upgrade rhevm-3.5.4.2-1.3.el6ev -> rhevm-3.6.0-0.15.master.el6

I followed steps from comment#0:
1. Install and setup 3.5
2. Install 3.6 setup
3. Run engine-setup, and cause it to fail "cleanly", so that it manages to rollback
4. Run engine-setup again


First I aborted the setup process (step 3.) before step "[ INFO  ] Upgrading CA" and second time after it.
In both cases the subsequent engine-setup command (step 4.) finished successfully.

Comment 9 Yedidyah Bar David 2015-09-17 05:47:30 UTC
(In reply to Pavel Novotny from comment #8)
> I didn't reproduce this during upgrade rhevm-3.5.4.2-1.3.el6ev ->
> rhevm-3.6.0-0.15.master.el6
> 

Sorry for not mentioning: IIRC this specific flow isn't reproducible on any
released version.

Affected versions are probably only those between [1] or [2] or [3] and [4].

You might be able to reproduce using upgrade from <=3.5.2 to one of vt15,
vt15.1, vt15.2, vt15.3.

[1] http://gerrit.ovirt.org/#/q/Ia975aad12e97ce0287cd6414e7ab91ea2ca6c0f9,n,z
[2] http://gerrit.ovirt.org/#/q/I9ca57f8a9b4e97cfbb2bd4877adbc2a87e6348fc,n,z
[3] https://gerrit.ovirt.org/#/q/I6383b29e131d65cece96e298bb2c7fe6ae305afe,n,z
[4] https://gerrit.ovirt.org/#/q/Idd564f30ddc9c9bbfd79f70e0f89337ec0a65d61,n,z

> I followed steps from comment#0:
> 1. Install and setup 3.5
> 2. Install 3.6 setup

For this you'll need to build yourself, because I do not think we had any
internal 3.6 builds at the relevant time, and nightly builds are rotated.

> 3. Run engine-setup, and cause it to fail "cleanly", so that it manages to
> rollback
> 4. Run engine-setup again
> 
> 
> First I aborted the setup process (step 3.) before step "[ INFO  ] Upgrading
> CA" and second time after it.
> In both cases the subsequent engine-setup command (step 4.) finished
> successfully.

Please try again to reproduce using upgrade from 3.4 to vt15, to make sure
that the verification is valid. Moving to QA for now.

Thanks!

Comment 10 Pavel Novotny 2015-09-17 11:23:53 UTC
(In reply to Yedidyah Bar David from comment #9)
> (In reply to Pavel Novotny from comment #8)
> > I didn't reproduce this during upgrade rhevm-3.5.4.2-1.3.el6ev ->
> > rhevm-3.6.0-0.15.master.el6
> > 
> 
> Sorry for not mentioning: IIRC this specific flow isn't reproducible on any
> released version.
> 
> Affected versions are probably only those between [1] or [2] or [3] and [4].
> 
> You might be able to reproduce using upgrade from <=3.5.2 to one of vt15,
> vt15.1, vt15.2, vt15.3.

Upgrade to vt15.x is not relevant now, because at this moment the latest released 3.5 build is vt16.9 - this is what customer gets. 

Thus I can verify this bug by going with upgrade from rhevm-3.4.5-0.3 (av14.2) to rhevm-3.5.4.2-1.3 (vt16.9).

Comment 11 Pavel Novotny 2015-10-07 17:46:07 UTC
Verified in rhevm-3.5.5-0.1.el6ev.noarch (vt17.1).

Scenario #1:
Upgrade from rhevm-3.4.5-0.3.el6ev.noarch (av14.2) to vt17.1.

Scenario #2:
Upgrade from rhevm-3.5.1-0.4.el6ev.noarch (vt14.3) to vt17.1.


Verification steps (according to comment 0):
1. Install and setup
  #1: 3.4.5 (av14.2)
  #2: 3.5.1 (vt14.3)
2. Install 3.5.5 setup (vt17.1)
3. Run engine-setup, and cause it to fail "cleanly", so that it manages to rollback
4. Run engine-setup again

Result: Second execution of rhevm-setup succeeds without errors. RHEVM is updated, web UI is accessible (of course, after removing the old invalid certificate from browser cert. storage first).