Description of problem: Currently, we do not export /var/log/ceph to the containers and we remove them once they are stopped so we have no way of getting the logs after e.g. a osd daemon crashes. The logs are just lost and we have no idea why the daemon in the container stopped. Version-Release number of selected component (if applicable): ceph-ansible-2.2.6-1.el7scon.noarch How reproducible: Always Steps to Reproduce: 1. Look for the logs in /var/log/ceph in the host 2. 3. Actual results: No logs. Expected results: The logs are exported from the host. Additional info:
This makes sense and would require changes in both ceph-ansible and ceph-docker because even if we bindmount /var/log/ceph daemons won't log anything as they are configured to log to stderr.
I think this could be seen to block GA of containers on the basis that "Basic functionality of a new or legacy feature not working." Logs are an important element in supporting any component in production. If we can document an acceptable workaround I would be happy. I'm worried that the best we can do without change is: set'DEBUG=stayalive' re-invoke docker run #hoping that the condition repeats then collect the logs with sudo docker exec -i -t $HOSTNAME /bin/bash" run journalctl This seems like it will add to the burden of supporting the product. What do you think?
I'm tempted to say yes and I generally agree, users should keep a consistent experience. However, if we really do this, we can hardly implement any log rotation, which at some point will cause issues. That's why I think we should rely on Docker logging capabilities instead. In Docker the logging driver is journald, so journald is responsible for collecting and storing logs. So basically just use e.g for a monitor "journalctl -u ceph-mon" to get the full history of all the logs. In the end, I think this is the best approach. Although, this will require some documentation on how to access a log history from a particular container. What do you think?
discussed at program meeting... need doc approval from Gregory and should have by the end of today which will resolve this bug
Ok
Just added a couple of comments.
The only difference between the logs from "journalctl" and "daemon logs inside the container" is that journalctl shows the output of the container entrypoint AND the daemon logs. Normally you should see the same log once the daemon start (inside the container and from journalctl). I tend to disagree with the statement "we loose the logs from journalctl if the container dies/restart" this is not what I observed. Even if a new container is created after each restart, the unit file remains the same, thus the entry log from journalctl will remain. Running journalctl -u ceph-osd gives you ALL the container processes/logs that once existed. If you look at the date, you are comparing the creation of the first container (journalctl source) May 19 16:58:49 with your last logs of your last container: 2017-06-07. This is irrelevant. Run journalctl -uf ceph-osd and you will see the last logs.
Based on Comment #21, verified doc text. Looks good to me. one suggestion we for all commands in section 'Viewing Log Files of Containerized Ceph Daemons' we can drop 'service' writted at end e.g. journalctl -u ceph-[daemon]@[ID].service can be journalctl -u ceph-[daemon]@[ID] Both commands are valid so if you dont want to change and move it to verified then no issues.
lgtm