Bug 1327983

Summary: Mon Install Hangs for ever
Product: Red Hat Storage Console Reporter: Nishanth Thomas <nthomas>
Component: ceph-installerAssignee: Alfredo Deza <adeza>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2CC: adeza, ceph-eng-bugs, ceph-qe-bugs, kdreyer, mkudlej, nthomas, racpatel, sankarshan, vsarmila
Target Milestone: ---   
Target Release: 2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-installer-1.0.5-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:49:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
ceph installer logs none

Description Nishanth Thomas 2016-04-18 08:12:40 UTC
Created attachment 1148124 [details]
ceph installer logs

Description of problem:

invoking /api/mon/install/ hangs for ever

input : 
curl -d "{\"calamari\": true, \"hosts\": [\"dhcp46-139.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp46-65.lab.eng.blr.redhat.com:8181/api/mon/install/

Looks like fix for 1322907 causing this issue
logs are attached

Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-3.el7.noarch.rpm                
ceph-installer-1.0.4-1.el7.noarch.rpm  
calamari-server-1.4.0-0.5.rc8.el7cp

How reproducible:

Always

Comment 2 Alfredo Deza 2016-04-18 14:57:13 UTC
In /var/log/messages for dhcp46-139.lab.eng.blr.redhat.com I can see:

Apr 18 19:15:23 dhcp46-139 salt-minion: [ERROR   ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
Apr 18 19:15:33 dhcp46-139 salt-minion: [ERROR   ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate

That seems like a potential issue on the salt-master (dhcp46-65).

Comment 3 Christina Meno 2016-04-18 22:56:27 UTC
After Shubhendu reproduced this for us Thank you!

analysis:
seems like the issue is being caused by ceph-installer task model not allowing unicode to come out of ansible stdout and into SQLITE

excerpt from /var/log/messages:
Apr 19 01:46:49 dhcp47-78 celery: [2016-04-19 01:46:49,529: ERROR/MainProcess] Task ceph_installer.tasks.call_ansible[b0fff403-a785-4fc9-939c-2acd0be7e608] raised unexpected: InvalidRequestError('This Session\'s transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. u\'UPDATE tasks SET stderr=?, stdout=?, ended=?, succeeded=?, exit_code=? WHERE tasks.id = ?\' (\'\', \'\\nPLAY [mons] 


so hung is somewhat incorrect. It seems like after this requests are failing to respond despite the request appearing to succeed


https://github.com/ceph/ceph-installer/pull/132/files

Comment 9 Rachana Patel 2016-07-28 23:00:10 UTC
verified with

ceph-ansible-1.0.5-23.el7scon.noarch
ceph-installer-1.0.12-3.el7scon.noarch

working as expected hence moving to verified

Comment 11 errata-xmlrpc 2016-08-23 19:49:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754