Description of problem: After setup an app to threaddump mode, It will prompt our the wrong log file. Version-Release number of selected component (if applicable): Puddle: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-13.1/ How reproducible: alaws Steps to Reproduce: 1.Given a ruby-1.9 app created # rhc app create r19 ruby-1.9 -predhat 2.Threaddump the app # rhc threaddump r19 -predhat Application event 'thread-dump' successful Success The thread dump file will be available via: rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EDT -o '-n 250' 3. Saw thread dump file # rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EDT -o '-n 250' -predhat Actual Result: Could not find any files matching glob Expect Result: successed Additional info: I found there is "error_log-20130314-000000-EST" file exist when I login to app. and run command blow successed: # rhc tail r19 -f ruby-1.9/logs/error_log-20130314-000000-EST -o '-n 250' -predhat But the timezone of node's system is EDT. [root@node2 openshift]# date Thu Mar 14 07:42:42 EDT 2013
I know we saw a bug like this before. I think it was fixed such that the broker's timezone is used and we assume the node's is the same. Are both broker and node timezone EDT? I wonder if it's just that the app creating the log hasn't updated its idea of what the timezone is since last week. A restart might fix it but that's not a great customer solution.... Perhaps we should recommend using GMT.
Yeah, switching to GMT/UTC would probably be the best thing going forward. Is the timezone in the logfiles coming from the OS?
It is coming from the OS. Specifically this bit in /usr/libexec/openshift/cartridges/ruby-1.9/info/hooks/threaddump: DATE=`date -u '+%Y%m%d'` ZONE=`date +%Z` client_result "Success" client_result "" client_result "The thread dump file will be available via: rhc tail ${application} -f ${cartridge_type}/logs/error_log-$DATE-000000-$ZONE -o '-n 250'" So... very mysterious that this didn't work. It works fine for me. I wonder if there's some way in which date can get confused about the TZ having changed. Perhaps an env var it reads?
Verify it using manually. 1. Inquiry the pid which created the file named error_log-*-000000-EST: [root@node1 openshift]# lsof /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d//ruby-1.9/logs/error_log-20130315-000000-EST COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME rotatelog 27887 8878e0f829d64346b3766cf96cae0f1d 3w REG 253,0 5231 279483 /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d//ruby-1.9/logs/error_log-20130315-000000-EST found pid is 27887 2. Inquiry the command execused by the pid 27887 [root@node1 openshift]# ps -ef |grep 27887 1617 27887 27884 0 02:09 ? 00:00:00 /usr/sbin/rotatelogs /var/lib/openshift/8878e0f829d64346b3766cf96cae0f1d/ruby-1.9/logs/error_log-%Y%m%d-%H%M%S-%Z 86400 Found generate "error_log-*-000000-EST" command: /usr/sbin/rotatelogs $file $rotate_rules 3.Exec this command manually: /usr/sbin/rotatelogs /var/log/redhat-%Y%m%d-%H%M%S-%Z 86400 4.check the logfile gereted by last command: [root@node1 openshift]# ls /var/log/redhat-* /var/log/redhat-20130315-000000-EST 5.show local date: [root@node1 openshift]# date Fri Mar 15 06:48:27 EDT 2013 So, I think it caused by httpd log tools rotatelogs.
Looks like a bug in rotatelogs. I filed one: bug 922844 - and BTW it's the same whether using JBoss or RHEL httpd. Seems to me we could work around this a couple ways: 1) Configure the system with UTC or another timezone that doesn't change with DST. 2) Edit the cartridges so that configs don't include the TZ in the logfile name. I don't see that it really helps anything to have it. But the change would be rather tedious and we should also check whether upstream changes make this moot after the rebase.
I'm a bit confused regarding the severity of this bug. When do the logs get rotated? daily? If someone performs a thread dump is the problem only reproduced if the wait until after the logs are rotated?
Agree, lowering severity. The bug doesn't really have anything to do with rotation, it's just that rotatelogs is setting the file name on the log, and it's using the wrong timezone, so the name that the node thinks the log should be according to node TZ (blah-log-EDT) is not the name rotatelogs is actually giving it (blah-log-EST). We wouldn't notice while DST is not in effect.
Interesting, now I understand how rotatelogs is being used. Interestingly enough Online seems to have this problem too. I just logged into my app and 'rhc tail' didn't work because it couldn't find any matching files. However I do see access_log-20130318-000000-EST instead of EDT.
In the past this was actually hardcoded to use EST. Obviously that was wrong and we had an enterprise bug 855264 and fixed it to use the actual timezone, upstream too. Only now it's wrong for 7 months out of the year. Ideally the rotatelogs bug would be fixed but no idea when to expect that.
Actually even with rotatelogs correct, I think we would see a mismatch on the two days a year where DST changes. So I lean toward either moving to UTC or just removing the TZ. I think including UTC in the name is clearer.
Actually now I lean away from trying to fix the name, and toward making the logic for finding the name less fragile. Either have the node logic find the actual file name, or supply it as a glob that won't be broken by timezone.
Problem really only existed in the ruby cartridges, where the thread dump goes to rotated server logs. I decided it was simplest to just supply a glob that should uniquely find the file without being TOO specific.
[jenkins2@qe-auto-slave ~]$ rhc threaddump r19 -predhat Application event 'thread-dump' successful Success The thread dump file will be available via: rhc tail r19 -f ruby-1.9/logs/error_log-20130322-* -o '-n 250' [jenkins2@qe-auto-slave ~]$ rhc tail r19 -f ruby-1.9/logs/error_log-20130322-* -o '-n 250' Password: ****** [Fri Mar 22 02:53:29 2013] [notice] SELinux policy enabled; httpd running as context system_u:system_r:openshift_t:s0:c1,c387 [Fri Mar 22 02:53:29 2013] [notice] mod_bw : Memory Allocated 32 bytes (each conf takes 32 bytes) [Fri Mar 22 02:53:29 2013] [notice] mod_bw : Version 0.8 - Initialized [1 Confs] [Fri Mar 22 02:53:29 2013] [notice] Digest: generating secret for digest authentication ... ... Prompt is ok on puddle 2013-03-21.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0694.html