Write code for people, not for computers: April 2017

Thursday, April 27, 2017

What is better for AWS entities management: Ansible or Terraform ?

Ansible is a configuration management tool which is able to provision any AWS entities and then do any deploy / configuration actions on them.
To manage AWS entities Ansible uses boto. To configure instances, Ansible uses SSH and various official modules written at Python (Ansible copies module to the host via SSH, and then executes it there via SSH)

Example Ansible modules are:
1. https://docs.ansible.com/ansible/docker_container_module.html - module for spinning up Docker containers.
Using it, it takes 1-2 days to build a flexible Ansible-based tool acting as a docker-compose but for many remote hosts (and not one local host as docker-compose does).
2. http://docs.ansible.com/ansible/ec2_module.html - module for spinning up EC2 instances.
Using it, it takes 1-2 days to build a flexible automation script able to spin up any number of hosts in existing VPC with related volumes, security groups, Elastic IPs, and so on.
3. http://docs.ansible.com/ansible/ec2_vpc_module.html, https://docs.ansible.com/ansible/ec2_vpc_peer_module.html - modules for spinning up Amazon VPCs and peering them for cross-VPC connectivity

All these Ansible modules are included out-the-box.

Most Ansible modules allow declarative setup and support the "infrastructure as code" pattern, but it is better to call Ansible a procedural rather than declarative tool.
Each Ansible task is a procedure, is aware about existing host state (via Ansible "facts aggregation"). Then it is a programmer responsibility to check this state in the code and do only those high-level steps that are needed.

Example: in you need Ansible script which adds swap to hosts, you write several declarative steps, namely: 'ensure swapfile', 'ensure swapfile is formatted', and 'ensure swap file is added to fstab'.
Then you put the check in code which checks the 'ansible_swap_total' fact (which is gathered from host by Ansible) and decides if these declarative steps needs to be (re)applied.
Then, in case if host already have swap configured, all steps would be skipped. During our call, I am going to show this example in details via screenshare.

Using Ansible, it is possible to automate any action on the host - configuration management, complicated deployments, spinning up and linking docker containers.
All Ansible modules are consistent and reliable, but for better reliability with Ansible it is better to:
- Dockerize Ansible, so the same version of Ansible and it's deps is used between runs at different environments
- Avoid any manual actions on hosts thus reducing the "configuration drift" effects
- In case of complicated changes, always prefer to re-create instances instead of re-configuring them

Terraform positions itself as an orchestration tool, but this name can be misleading. Terraform is actually a tool for spinning resources (instances, networks, load balancers, autoscaling groups, etc.) which supports various cloud providers, including AWS.
Terraform itself doesn't do anything about provisioning instances inside after their creation, it can only call 3-rd party 'provisioners' (shell, Ansible, Chef, Puppet, etc.) as a post-instance-creation hook.
Terraform is not as flexible as Anisble in resources creation. With Terraform, you usually can't write the code in between of any resource creation step, e.g. Terraform does not even support if statements.
Kill features of Terraform are:
- it is written to be fully declarative
- it is better aware of the system state. If you run 2 subnets and provision 20 instances on each, and then you will want to scale up to 30 instances per subnet, you just tell Terraform the new count and it adds +10 instances where needed

Wednesday, April 19, 2017

Why is it OK to catch AND rethrow java.lang.Throwable at the top level code (entrypoint) of your Java application?

There is only one place where it is completely OK to catch the java.lang.Throwable in Java.

It is the 'top-level piece of code', in the app or in the app module. Namely:

- main method or another kind of app entrypoint

- top-level exception handler in the code (e.g., for Spring MVC app it may be a 'globalExceptionHandler' function)

This statement is espessially true for critical production-grade apps, because they have more strict requirements on things like predictable app behaviour, clean & easy-to-debug app logs and log-based alerting integrity.

Here are couple of reasons for catching the java.lang.Throwable in 'top-level pieces of code':

0). You can never trust all 3rd party code (libraries, frameworks and their transitive deps) your app use! Any piece of  3rd party code may throw ANY error, e.g. RuntimeException (the most common example here is NPE), or even java.lang.Error (examples are AssertionError, OutOfMemoryError, ExceptionInInitializerError). So unless you write all your code by yourself and not use 3rd party libs at all, you need to catch java.lang.Throwable t a top level of your app, so no 3rd party code will be able to make your app behave unexpectedly (logging only to stderr and/or crashing).

1). If app doesn't catch java.lang.Throwable, it is logged to stderr without stacktrace. Good examples are NPE, OOME and StackOverflowError. This makes investigation of errors such as NPE, OOME and StackOverflow much harder, because there is no error stacktrace, no timestamp, and no error context (line of code, class name) in resulting logs and log-based alerts.

2). Without catching java.lang.Throwable, there is no way for the app to perform any 'graceful shoutdown' actions at top level. Namely:

- clear resources (e.g. close connections with proper response)

- log the ERROR with additional context (add the current class, line number, and values for some variables useful for investigation)

- finish current data processing. On OOME there is still a chance that app will be able to finish processing, so client data will not be missed.

3). Somebody in code may throw java.lang.Error by mistake because of lack of programming experience. Examples:

a). Developer may accidentally use asserts in code which throws AssertionError.

This may easily happen by copy-paste from unit-tests.

b). Newbie developer may decide to throw java.lang.Error subclass like ExceptionInInitializerError just because 'it's classname seemed to be fine for my purpose'

4). Catching java.lang.Throwable in *top-level app code* is a 5-minutes fix which is guaranted to not introduce any problems but gives app the last chance to log the error properly, clear used resources and perform a graceful shutdown

Links:

1. From http://stackoverflow.com/questions/2679330/catching-java-lang-outofmemoryerror :

There is only one good reason to catch an OutOfMemoryError and that is to close down gracefully, cleanly releasing resources and logging the reason for the failure best you can (if it is still possible to do so).

2. From http://stackoverflow.com/questions/14376924/should-i-catch-outofmemoryerror

There is perfect sense in having an exception barrier at the very top of your call stack, catching and logging all Throwables. In server-side code this is in fact the norm

3. From http://stackoverflow.com/questions/2679330/catching-java-lang-outofmemoryerror

There is only one good reason to catch an OutOfMemoryError and that is to close down gracefully, cleanly releasing resources and logging the reason for the failure best you can (if it is still possible to do so).

Here is also an interesting link to blog of Roman Ivanov (Chief Java developer with many years of experience, developer and owner of Checkstyle library) which contains more examples and explanations on the topic: http://roman-ivanov.blogspot.ru/2015/03/catch-exception-or-catch-throwable-when.html

Also a good argument is that most API frameworks (e.g. Jersey) are going to give you a catch-all hookpoint for Throwable, because regardless of whether it is recoverable or unrecoverable, an error is about to flow out to your client, and you want to at least log it before it does. Apache Spark also catches Throwables at the top level of the Executor code to log the error and pass it to driver, if possible.

P.S. This article only talks about catching Throwables at very specific place - at the top-level 'edge' between app and the outside system. So do not take it as 'always catch Throwable' rule, it was not intended at all! So never ever catch java.lang.Throwable (or java.lang.Error) anywhere under top-level (entrypoint) code of your app or app module. Espessialy you should not catch Throwables in the middle of app business logic, inside business logic always catch specific Exception classes instead. Also obey the proper handling of InterruptedException (but that is a topic for another article)