Bug 1331680 - Mon install fails with new build ceph-installer-1.0.5-1.el7.noarch.rpm
Summary: Mon install fails with new build ceph-installer-1.0.5-1.el7.noarch.rpm
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-installer
Version: 2
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 2
Assignee: Christina Meno
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1291304
TreeView+ depends on / blocked
 
Reported: 2016-04-29 09:02 UTC by Shubhendu Tripathi
Modified: 2016-05-07 01:29 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-07 01:29:20 UTC
Embargoed:


Attachments (Terms of Use)

Description Shubhendu Tripathi 2016-04-29 09:02:43 UTC
Description of problem:
Fails to install mon node with error

Version-Release number of selected component (if applicable):
ceph-installer-1.0.5-1.el7.noarch.rpm  

How reproducible:
Always

Steps to Reproduce:
1. run the api /setup to bootstrap storage node
2. run the api /setup/agent to install rhscon agent bits
3. run the api /api/mon/install for the node

Actual results:
Fails with error

TASK: [ceph.ceph-common | create ceph conf directory] *************************
failed: [dhcp47-100.lab.eng.blr.redhat.com] => {"failed": true, "gid": 0, "group": "root", "mode": "0755", "owner": "root", "path": "/etc/ceph", "secontext": "unconfined_u:object_r:etc_t:s0", "size": 6, "state": "directory", "uid": 0}
msg: chown failed: failed to look up user ceph

FATAL: all hosts have already failed -- aborting


Expected results:
mon install should be successful.

Additional info:

Comment 2 Nishanth Thomas 2016-04-29 15:28:40 UTC
Actually not packages are getting installed other than 
ceph-release-1-1.el7.noarch
calamari-server-1.4.0-0.6.rc9.el7cp.x86_64

Same issue is seen in both OSD nodes as well

Comment 3 Alfredo Deza 2016-04-29 15:43:28 UTC
This will require at least some logs from the installer and the actual requests that went out to the installer so we can try to understand this better.

Comment 4 Nishanth Thomas 2016-04-29 16:26:35 UTC
Request:

curl -d "{\"calamari\": true, \"hosts\": [\"dhcp47-48.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp47-4.lab.eng.blr.redhat.com:8181/api/mon/install/


Server - dhcp47-4.lab.eng.blr.redhat.com
Node - dhcp47-48.lab.eng.blr.redhat.com

user/passwd - root/redhat

Comment 5 Alfredo Deza 2016-04-29 17:57:20 UTC
I looked at the logs and for the failure I saw (there seems to be one /mon/install/ call for dhcp47-48 that worked 9989721b-1833-4fd1-b0e3-6de1f9175c7e) there are timeouts when yum is executing

Identifier is: a641b20b-9e6c-46f1-b94f-f701985ce042

And the last bits of stderr output shows:

msg: https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-libs-1.6.3-22.el7.x86_64.rpm: [Errno 14] curl#35 - "TCP connection reset by peer"
Trying other mirror.
https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/ed-1.9-4.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed connect to cdn.redhat.com:443; Connection refused"
Trying other mirror.
https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-client-1.6.3-22.el7.x86_64.rpm: [Errno 12] Timeout on https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-client-1.6.3-22.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
Trying other mirror.


Error downloading packages:
  1:cups-libs-1.6.3-22.el7.x86_64: [Errno 256] No more mirrors to try.



FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/var/lib/ceph-installer/site.sample.retry

dhcp47-48.lab.eng.blr.redhat.com : ok=10   changed=1    unreachable=0    failed=1

Comment 7 Nishanth Thomas 2016-05-01 16:20:28 UTC
We are hitting this issue intermittently. one or the other case fails.

Comment 8 Shubhendu Tripathi 2016-05-02 09:13:28 UTC
I have consistently hit this issue where on storage nodes only ceph-release-1-1.el7.noarch gets installed. The mon get calamari installed and no other ceph bits get installed.

The task alway returns success saying installation done on the node.

Comment 9 Shubhendu Tripathi 2016-05-02 09:14:19 UTC
The older builds 

http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/
http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/

do work fine though.

Comment 10 Alfredo Deza 2016-05-02 11:14:29 UTC
Can you provide me access to the installer and the ID of the task that failed so I can debug this further?

Comment 14 Shubhendu Tripathi 2016-05-02 12:41:11 UTC
We do pass redhat_storage=false because if we set this flag to true it installs ceph 0.94.x which is available default in rhel_ceph_osd_rpms and rhel_ceph_mon_rpms and it does not consider the puddle repos added.

the same thing used to work earlier. Not sure how this is broken now..

plz let us know the exact values of parameters to be passed. We have wasted enough time trying different combinations.

Comment 15 Alfredo Deza 2016-05-02 13:56:20 UTC
(In reply to Shubhendu Tripathi from comment #14)
> We do pass redhat_storage=false because if we set this flag to true it
> installs ceph 0.94.x which is available default in rhel_ceph_osd_rpms and
> rhel_ceph_mon_rpms and it does not consider the puddle repos added.
> 
> the same thing used to work earlier. Not sure how this is broken now..
> 
> plz let us know the exact values of parameters to be passed. We have wasted
> enough time trying different combinations.

I can't specify what flags a client need to pass because it depends on what source will the client want to use: upstream, custom repo, RH CDN, etc...

In this particular case it looks like it is a custom repository (the puddle) so I would think that the client should use:

    "redhat_storage": true,
    "redhat_use_cdn": false,

The docs should be the primary source of truth on what each option entails, so the requester should always assume that whatever the docs say is how it should work. If that is not the case then that is a bug we must fix right away.

From the docs (http://docs.ceph.com/ceph-installer/docs/#ceph-versions):

    "The default for the /api/*/install endpoints is to install the latest upstream
    stable version of ceph. If you’d like to install the latest Red Hat Ceph
    Storage ensure that the node being provisioned is correctly entitled and that
    the redhat_storage option is set to True in the json body you send to the
    install endpoint."

Comment 16 Shubhendu Tripathi 2016-05-02 15:28:11 UTC
We register the puddle repos using yum-config-manager --add as below 

yum-config-manager --add http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo
yum-config-manager --add http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-28.1/CEPH-2.repo

with these registrations older builds

http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/
http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/

work perfectly fine with redhat_storage=flase and redhat_use_cdn: true

The same thing is broken with new puddles now.

My point is if we say redhat_storage=true, it tries to install rpms from rhel_ceph_osd_rpms and rhel_ceph_mon_rpms which finds ceph 0.94.x version.

As we have registered puddles using yum-config-manager, we should pass redhat_storage=false as we were doing earlier.

Comment 17 Alfredo Deza 2016-05-02 15:47:05 UTC
(In reply to Shubhendu Tripathi from comment #16)
> We register the puddle repos using yum-config-manager --add as below 
> 
> yum-config-manager --add
> http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo
> yum-config-manager --add
> http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-28.1/CEPH-2.repo
> 
> with these registrations older builds
> 
> http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/
> http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/
> 
> work perfectly fine with redhat_storage=flase and redhat_use_cdn: true
> 
> The same thing is broken with new puddles now.
> 
> My point is if we say redhat_storage=true, it tries to install rpms from
> rhel_ceph_osd_rpms and rhel_ceph_mon_rpms which finds ceph 0.94.x version.

To verify, can you try with:

    "redhat_storage": true,
    "redhat_use_cdn": false,

And see how that works?

> 
> As we have registered puddles using yum-config-manager, we should pass
> redhat_storage=false as we were doing earlier.

That will install the upstream packages which you don't want here. Not sure how that used to be different. Please try with the flags I suggested and see how that works

Comment 18 Alfredo Deza 2016-05-03 17:31:16 UTC
Shubhendu it seems that you were able to install correctly the Mon. Can you update this BZ to reflect that?

Comment 19 Nishanth Thomas 2016-05-04 12:03:47 UTC
Ack. It works properly with this configuration

Comment 20 Shubhendu Tripathi 2016-05-05 04:25:48 UTC
Even I tested with fresh set of nodes and it works fine.


Note You need to log in before you can comment on or make changes to this bug.