AKA: How to know when stuff’s fracked and how to do something about it.
In my previous post, Provisioning Consul with Puppet, I covered a first step towards ephemeral nirvana for your stack. In this post I’ll talk a little about integrating Spring Boot to further this goal.
Specifically, I’ll cover:
You can read my justification for writing these posts in part 1; here I’m going to jump right into the example.
I’m using the same Vagrant stack and demo app hosted on Github, so go get your copy now.
The Stack consists of:
git clone https://github.com/benschw/consul-cluster-puppet.git
cd consul-cluster-puppet
./deps.sh # clone some puppet modules we'll be using
./build.sh # build the Spring Boot demo application we'll need
vagrant up
Same steps as before (same example as before…) but I’ll point out a couple things this time.
demo.jar
is used for this example, but it exposes two endpoints: /demo
and /foo
. (It was just easier to set things up this way, but you can pretend it is two different jars build from two source trees, each with one endpoint.)build.sh
just runs “./gradlew” on this source root to build our app into a jarThe IPs are specified in the Vagrantfile, so these links will take you to your stack
Feeling déjà vu? Good, you read the last post. But I promise that’s all over; now I’ll talk you through:
I’ve included a sample “LoadBalancer” implementation in the demo to abstract out working with SRV records so that you can get the address of a service as follows:
LoadBalancingStrategy strategy = new RoundRobinLoadBalancingStrategy();
DnsSrvResolver resolver = DnsSrvResolvers.newBuilder()
.cachingLookups(true)
.retainingDataOnFailures(true)
.metered(new CodahaleSpringBootReporter(metricsRegistry))
.dnsLookupTimeoutMillis(1000)
.build();
loadBalancer = new LoadBalancer(strategy, resolver);
HostAndPort node = loadBalancer.getAddress("foo");
// http://foo1.node.dc1.consul:8080
String address = LoadBalancer.AddressString("http", node) + "/foo";
RestTemplate restTemplate = new RestTemplate();
Foo foo = restTemplate.getForObject(address, Foo.class);
(See the demo
app using it to look up foo
here)
The output of http://172.20.20.20:8080/demo:
{
fooResponse: {
message: "Hello from foo1"
},
selectedAddress: {
port: 8080,
hostText: "foo1.node.dc1.consul."
}
}
fooResponse
shows the output from the foo
service, which reports back its hostnameselectedAddress
shows what the consul LoadBalancer gave us back when we asked for an address in demo
Our demo app uses the LoadBalancer library to look up the SRV
addresses from the Consul DNS server (supplied by the local consul agent.) It then selects one based on our strategy (round robin by default.) Finally it forms the address (A
record plus port) to complete its request (this address is also resolved through the Consul DNS server.)
Spring boot exposes a “Health” endpoint that allows us to query our service to see if it is running and healthy. By default this runs along side our app on the same port, but for our demo we’ve routed it to port 8081 (to keep the admin functions separate and support keeping these endpoints private in the future if we wish.)
One last piece: consul checks health by executing a script and looking at the return value, so I’ve included health.py to parse our health endpoint JSON into a return code.
Typically, everything just works. So if we hit the health endpoint (http://172.20.20.20:8081/health) all we see is “UP”
{
status: "UP"
}
If the app goes down, this page obviously won’t be here any more, but additionally we can register specific checks that can be tripped even if the app doesn’t crash (e.g. the node ran out of disk space or we can’t connect to the database.)
@Component
public class FooServiceHealthyIndicator extends AbstractHealthIndicator {
@Autowired
private LoadBalancer loadBalancer;
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
HostAndPort node = loadBalancer.getAddress("foo");
if (node != null) {
builder.up();
} else {
builder.down();
}
}
}
Sometimes we want to know about a problem but don’t necessarily want to take the application out of load balance. For example, If demo
needs foo
in order to function fully, but it can still run in a degraded state without it, we wouldn’t want all instances of demo to go away if Consul is reporting that no foo
s are available. In this case we could instead “warn” that something isn’t right, but leave the demo service available for discovery:
To modify the above class, just replace builder.down()
with builder.status("WARN")
.
This is the way we have it implemented in the demo we’re running, so if we shut down both foo
services (vagrant halt foo0 foo1
) we’ll see the following from our demo
health endpoint:
{
status: "WARN"
}
Notice that our copy of demo is only in a warn state (not critical), so if something tries to resolve it through consul, it will still be available.
Actuator comes stock with some metrics being collected, but for additional insight like timers, we need to integrate something more. Like Codahale Metrics.
Wiring Codahale metrics into Spring boot isn’t a big deal with the help of ryantenney/metrics-spring. My implementation for this example is nestled in its own package and wired up in our application config.
This allows for the use of the @timed
annotation on resource methods to time all endpoints by name.
It also allows us to explicitly tap into the MetricRegistry and collect additional metrics:
lookups = metrics.timer(MetricRegistry.name(CodahaleSpringBootReporter.class, "srvlookup"));
Timer.Context context = lookups.time();
// do stuff
context.stop();
{
demo.meter.mean.DemoController.demo: 0.004174263352818219,
demo.meter.one-minute.DemoController.demo: 0.029097939451186338,
demo.meter.five-minute.DemoController.demo: 0.009890715011966878,
demo.meter.fifteen-minute.DemoController.demo: 0.0051311229462762085,
demo.timer.min.DemoController.demo: 2930355,
demo.timer.max.DemoController.demo: 214477268,
demo.timer.median.DemoController.demo: 9271362,
demo.timer.mean.DemoController.demo: 39808608.875,
demo.timer.standard-deviation.DemoController.demo: 72246286.59474245,
demo.meter.mean.CodahaleSpringBootReporter.srvlookup: 0.2003580661363458,
demo.meter.one-minute.CodahaleSpringBootReporter.srvlookup: 0.23859104066267098,
demo.meter.five-minute.CodahaleSpringBootReporter.srvlookup: 0.2090802942145022,
demo.meter.fifteen-minute.CodahaleSpringBootReporter.srvlookup: 0.17895849086506815,
demo.timer.min.CodahaleSpringBootReporter.srvlookup: 1604057,
demo.timer.max.CodahaleSpringBootReporter.srvlookup: 125254192,
demo.timer.median.CodahaleSpringBootReporter.srvlookup: 2939579,
demo.timer.mean.CodahaleSpringBootReporter.srvlookup: 3894239.9505208335,
demo.timer.standard-deviation.CodahaleSpringBootReporter.srvlookup: 6640297.013684536,
demo.gauge.gauge.response.**.favicon.ico: 9,
demo.gauge.gauge.response.demo: 33,
demo.counter.CodahaleSpringBootReporter.srvlookupempty: 320,
demo.counter.CodahaleSpringBootReporter.srvlookupfailures: 2,
demo.counter.counter.status.200.**.favicon.ico: 1,
demo.counter.counter.status.200.demo: 8,
demo.counter.counter.status.304.**.favicon.ico: 2
}
Oh Yeah! I forgot to mention that we are timing our SRV address lookups. There are also counters for lookup failures and occurrences of empty result sets (successful query, but no available services to connect to.)
This stuff is crazy useful and gives you data you can aggregate without babysitting your nodes.
So you’ve got Consul running and now you can make use of it (also see Service Discovery for Golang with DNS for my writeup on using Consul in Go).
A glaring omission of this post is how to actually do anything with your health and metrics data. Consul will operate on your health checks, but how can you incorporate this feedback with other things (like low memory and disk space, nodes reaching cpu capacity, or whatever else has traditionally been a pain in your ass?) Additionally, we haven’t really talked about how to capture or aggregate all those metrics we’re providing now.
we’ve run out of time… but here are some ideas…
Pull my leg and I might show you how to build that out too.
comments powered by Disqus