Description of problem: Fails to install mon node with error Version-Release number of selected component (if applicable): ceph-installer-1.0.5-1.el7.noarch.rpm How reproducible: Always Steps to Reproduce: 1. run the api /setup to bootstrap storage node 2. run the api /setup/agent to install rhscon agent bits 3. run the api /api/mon/install for the node Actual results: Fails with error TASK: [ceph.ceph-common | create ceph conf directory] ************************* failed: [dhcp47-100.lab.eng.blr.redhat.com] => {"failed": true, "gid": 0, "group": "root", "mode": "0755", "owner": "root", "path": "/etc/ceph", "secontext": "unconfined_u:object_r:etc_t:s0", "size": 6, "state": "directory", "uid": 0} msg: chown failed: failed to look up user ceph FATAL: all hosts have already failed -- aborting Expected results: mon install should be successful. Additional info:
Actually not packages are getting installed other than ceph-release-1-1.el7.noarch calamari-server-1.4.0-0.6.rc9.el7cp.x86_64 Same issue is seen in both OSD nodes as well
This will require at least some logs from the installer and the actual requests that went out to the installer so we can try to understand this better.
Request: curl -d "{\"calamari\": true, \"hosts\": [\"dhcp47-48.lab.eng.blr.redhat.com\"],\"redhat_storage\":false,\"redhat_use_cdn\":true}" http://dhcp47-4.lab.eng.blr.redhat.com:8181/api/mon/install/ Server - dhcp47-4.lab.eng.blr.redhat.com Node - dhcp47-48.lab.eng.blr.redhat.com user/passwd - root/redhat
I looked at the logs and for the failure I saw (there seems to be one /mon/install/ call for dhcp47-48 that worked 9989721b-1833-4fd1-b0e3-6de1f9175c7e) there are timeouts when yum is executing Identifier is: a641b20b-9e6c-46f1-b94f-f701985ce042 And the last bits of stderr output shows: msg: https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-libs-1.6.3-22.el7.x86_64.rpm: [Errno 14] curl#35 - "TCP connection reset by peer" Trying other mirror. https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/ed-1.9-4.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed connect to cdn.redhat.com:443; Connection refused" Trying other mirror. https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-client-1.6.3-22.el7.x86_64.rpm: [Errno 12] Timeout on https://cdn.redhat.com/content/aus/rhel/server/7/7Server/x86_64/os/Packages/cups-client-1.6.3-22.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds') Trying other mirror. Error downloading packages: 1:cups-libs-1.6.3-22.el7.x86_64: [Errno 256] No more mirrors to try. FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/var/lib/ceph-installer/site.sample.retry dhcp47-48.lab.eng.blr.redhat.com : ok=10 changed=1 unreachable=0 failed=1
We are hitting this issue intermittently. one or the other case fails.
I have consistently hit this issue where on storage nodes only ceph-release-1-1.el7.noarch gets installed. The mon get calamari installed and no other ceph bits get installed. The task alway returns success saying installation done on the node.
The older builds http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/ http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/ do work fine though.
Can you provide me access to the installer and the ID of the task that failed so I can debug this further?
We do pass redhat_storage=false because if we set this flag to true it installs ceph 0.94.x which is available default in rhel_ceph_osd_rpms and rhel_ceph_mon_rpms and it does not consider the puddle repos added. the same thing used to work earlier. Not sure how this is broken now.. plz let us know the exact values of parameters to be passed. We have wasted enough time trying different combinations.
(In reply to Shubhendu Tripathi from comment #14) > We do pass redhat_storage=false because if we set this flag to true it > installs ceph 0.94.x which is available default in rhel_ceph_osd_rpms and > rhel_ceph_mon_rpms and it does not consider the puddle repos added. > > the same thing used to work earlier. Not sure how this is broken now.. > > plz let us know the exact values of parameters to be passed. We have wasted > enough time trying different combinations. I can't specify what flags a client need to pass because it depends on what source will the client want to use: upstream, custom repo, RH CDN, etc... In this particular case it looks like it is a custom repository (the puddle) so I would think that the client should use: "redhat_storage": true, "redhat_use_cdn": false, The docs should be the primary source of truth on what each option entails, so the requester should always assume that whatever the docs say is how it should work. If that is not the case then that is a bug we must fix right away. From the docs (http://docs.ceph.com/ceph-installer/docs/#ceph-versions): "The default for the /api/*/install endpoints is to install the latest upstream stable version of ceph. If you’d like to install the latest Red Hat Ceph Storage ensure that the node being provisioned is correctly entitled and that the redhat_storage option is set to True in the json body you send to the install endpoint."
We register the puddle repos using yum-config-manager --add as below yum-config-manager --add http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo yum-config-manager --add http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-28.1/CEPH-2.repo with these registrations older builds http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/ http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/ work perfectly fine with redhat_storage=flase and redhat_use_cdn: true The same thing is broken with new puddles now. My point is if we say redhat_storage=true, it tries to install rpms from rhel_ceph_osd_rpms and rhel_ceph_mon_rpms which finds ceph 0.94.x version. As we have registered puddles using yum-config-manager, we should pass redhat_storage=false as we were doing earlier.
(In reply to Shubhendu Tripathi from comment #16) > We register the puddle repos using yum-config-manager --add as below > > yum-config-manager --add > http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo > yum-config-manager --add > http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-28.1/CEPH-2.repo > > with these registrations older builds > > http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-25.1/ > http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-22.2/ > > work perfectly fine with redhat_storage=flase and redhat_use_cdn: true > > The same thing is broken with new puddles now. > > My point is if we say redhat_storage=true, it tries to install rpms from > rhel_ceph_osd_rpms and rhel_ceph_mon_rpms which finds ceph 0.94.x version. To verify, can you try with: "redhat_storage": true, "redhat_use_cdn": false, And see how that works? > > As we have registered puddles using yum-config-manager, we should pass > redhat_storage=false as we were doing earlier. That will install the upstream packages which you don't want here. Not sure how that used to be different. Please try with the flags I suggested and see how that works
Shubhendu it seems that you were able to install correctly the Mon. Can you update this BZ to reflect that?
Ack. It works properly with this configuration
Even I tested with fresh set of nodes and it works fine.