Wednesday, April 19, 2017

Why is it OK to catch AND rethrow java.lang.Throwable at the top level code (entrypoint) of your Java application?

There is only one place where it is completely OK to catch the java.lang.Throwable in Java.

It is the 'top-level piece of code', in the app or in the app module. Namely:
- main method or another kind of app entrypoint
- top-level exception handler in the code (e.g., for Spring MVC app it may be a 'globalExceptionHandler' function)

This statement is espessially true for critical production-grade apps, because they have more strict requirements on things like predictable app behaviour, clean & easy-to-debug app logs and log-based alerting integrity.

Here are couple of reasons for catching the java.lang.Throwable in 'top-level pieces of code':

0). You can never trust all 3rd party code (libraries, frameworks and their transitive deps) your app use! Any piece of 3rd party code may throw ANY error, e.g. RuntimeException (the most common example here is NPE), or even java.lang.Error (examples are AssertionError, OutOfMemoryError, ExceptionInInitializerError). So unless you write all your code by yourself and not use 3rd party libs at all, you need to catch java.lang.Throwable t a top level of your app, so no 3rd party code will be able to make your app behave unexpectedly (logging only to stderr and/or crashing).
1). If app doesn't catch java.lang.Throwable, it is logged to stderr without stacktrace. Good examples are OOME and StackOverflowError. This makes investigation of errors such as OOME and StackOverflow much harder, because there is no error stacktrace and no error context (line of code, class name) in resulting logs and log-based alerts.
2). Without catching java.lang.Error, there is no way for the app to perform any 'graceful shoutdown' actions. Namely:
- clear resources (e.g. close connections with proper response)
- log the ERROR with additional context (add the current class, line number, and values for some variables useful for investigation)
- finish current data processing. On OOME there is still a chance that app will be able to finish processing, so client data will not be missed.
3). Somebody in code may throw java.lang.Error by mistake because of lack of programming experience. Examples:
a). Developer may accidentally use asserts in code which throws AssertionError.
This may easily happen by copy-paste from unit-tests.
b). Newbie developer may decide to throw java.lang.Error subclass like ExceptionInInitializerError just because 'it's classname seemed to be fine for my purpose'
4). Catching java.lang.Throwable in *top-level app code* is a 5-minutes fix which is guaranted to not introduce any problems but gives app the last chance to log the error properly, clear used resources and perform a graceful shutdown

Links:

1. From http://stackoverflow.com/questions/2679330/catching-java-lang-outofmemoryerror :
There is only one good reason to catch an OutOfMemoryError and that is to close down gracefully, cleanly releasing resources and logging the reason for the failure best you can (if it is still possible to do so).
2. From http://stackoverflow.com/questions/14376924/should-i-catch-outofmemoryerror
There is perfect sense in having an exception barrier at the very top of your call stack, catching and logging all Throwables. In server-side code this is in fact the norm
3. From http://stackoverflow.com/questions/2679330/catching-java-lang-outofmemoryerror
There is only one good reason to catch an OutOfMemoryError and that is to close down gracefully, cleanly releasing resources and logging the reason for the failure best you can (if it is still possible to do so).

Here is also an interesting link to blog of Roman Ivanov (Chief Java developer with many years of experience, developer and owner of Checkstyle library) which contains more examples and explanations on the topic: http://roman-ivanov.blogspot.ru/2015/03/catch-exception-or-catch-throwable-when.html

P.S. This article only talks about catching Throwables at very specific place - at the top-level 'edge' between app and the outside system. So do not take it as 'always catch Throwable' rule, it was not intended at all! So never ever catch java.lang.Throwable (or java.lang.Error) anywhere under top-level (entrypoint) code of your app or app module. Espessialy you should not catch Throwables in the middle of app business logic, inside business logic always catch specific Exception classes instead. Also obey the proper handling of InterruptedException (but that is a topic for another article)

Friday, November 25, 2016

Jenkins: how to build Gerrit changeset without Gerrit plugin


1. To to job config
2. Update the Git plugin settings:
  - in Advanced --> Refspec set `refs/changes/*:refs/changes/*`
  - in 'branch' set the Gerrit changeset ref, such as `refs/changes/17/417/2` (you can pick it from 'Download' dropdown menu in Gerrit changeset window)

Tuesday, February 16, 2016

Tuesday, November 24, 2015

Useful Docker oneliners


Display image virtual size in human-readable format:

sudo docker inspect <image> | grep -i VirtualSize | awk '{$2/=1000000; printf "%.2f MB\n",$2}'

Checking what command has produced specific Docker image layer:

docker inspect --format '{{ ((index .ContainerConfig.Cmd ) 0) }}' <layer_id>

Useful Docker-related links:

https://imagelayers.io/
https://github.com/larsks/dockerize

Monday, September 14, 2015

Show amount of open files per process


This one also displays process name, pid and number of open files:
lsof | awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20

Monday, August 10, 2015

Backup Jenkins to S3

This way we will backup all except for SCM folders and jars. All Jenkins configs, jobs, logs and builds will be preserved, only built artifacts will be skipped

Backup command:
export GZIP=-9 && tar czfh - /var/lib/jenkins/ --exclude-vcs --exclude "archive" --exclude "target" --exclude "/var/lib/jenkins/docker" --exclude "/var/lib/jenkins/.m2/repository" | /usr/local/bin/aws s3 --region us-west-2 cp - s3://<bucket_name>/jenkins-backup.tar

 Restore command:
aws s3 cp s3://confyrm-jenkins-backups/jenkins-backup.tar - | tar -C / -zxf -


Wednesday, June 24, 2015

Example Logstash config to parse Java / Scala multiline logs (e.g. stacktraces) into ES


Java/Scala stack traces are multiline and usually it have the message starting from

Any line which isn't starting with '[' will be joined into previous one having '[' at the beginning

E.g. this works with Logstash 1.4.0+:

if [type] == "app_logs" {
    multiline {
      pattern => "^[^\[]"
      what => "previous"
    }
    grok {
      match => { "message" => "\[(?<app_log_timestamp>.+)] \[%{WORD:app_name}\] \[(?<thread_name>.+)\] \[(?<class_name>.+)\] \[(?<marker>[a-zA-Z]*)\] \[(?<transaction_id>.*)\] \[%{WORD:log_level}\]: ?%{GREEDYDATA:msg}" }
    }
    date {
      match => ["app_log_timestamp", "MM/dd HH:mm:ss:SSS", "ISO8601"]
      target => "@timestamp"
      add_tag => [ "timestamp_updated_w_log_value" ]
      remove_field => [ "app_log_timestamp" ]
    }
}

This works for all Java multiline logs, the only rule is to not start multiline log newlines from '['.

Useful link: http://logstash.net/docs/1.4.0.rc1/filters/multiline

Also starting from Logstash 1.2 there is a 'multiline' codec (http://logstash.net/docs/1.2.2/codecs/multiline). But I didn't get it work properly with Logstash 1.4. What have I tried:

input {
  file {
      codec => multiline {
        pattern => "^\s"
        what => "previous"
      }
    ..... file path and so on
  }
}

What issue did I met with 'multiline' codec: Java stacktraces were parsed without the very first line. E.g. in ES I was getting:

java.lang.RuntimeException: Exception while executing statement : An I/O error occurred while sending to the backend. errorCode: 0, sqlState: 08006 at ... [other stacktrace lines omitted]

Instead of expected:

[06/24 16:43:51:393] [app_name] [pool-99-thread-999] [ClassName] [smth0] [bar] [ERROR]: Cannot load XXX java.lang.RuntimeException: Exception while executing statement : An I/O error occurred while sending to the backend. errorCode: 0, sqlState: 08006 at ... [other stacktrace lines omitted]