sábado, 4 de julio de 2015

HTTP/2 explained in 5 minutes

After reading and playing for some days with HTTP/2 this is a summary of my understanding at a very high level.

HTTP/2 is all about reducing the latency accessing web applications.  It maintains the semantics (GET, POST... methods, headers, content)  and url schemes of existing HTTP and it is based on the improvements proposed by Google as part of his SPDY protocol that is finally replaced by HTTP/2.

The three most important changes introduced in HTTP/2 in my opinion are:
1) Reduced overhead of HTTP Headers by using binary fields and header compression (HPACK).
2) Ability to use a single TCP connection for multiple HTTP Requests without any type of first in line blocking (Responses can be sent in different order than requests).
3) Support for pushing contents from server to client without previous request (for example the server could send some images that the browser will need when it receives the request for the HTML file referencing those images).

The most controversial "feature" of HTTP/2 was making TLS mandatory.  At the end the requirement was relaxed but some browsers (firefox) plan to make it mandatory anyways.

Most of the relevant browsers (at least chrome and firefox and some versions of IE) already include support for HTTP 2 as well as the most popular opensource servers (nginx and Apache).  So you should be able to take advantage of the new version of the protocol right now.

The HTTP/2 support is negotiated using the same HTTP Upgrade mechanism used for websockets and should be transparent for users and elements in the middle (proxies),

Application developers should benefit for free automatically without any change in their apps but they can get even more benefits with some extra changes:
* Some tricks that are in use today like spriting or inlining resources are not needed any more.  So they can simplify the build/deploy pipeline.
* Server push could be automatic in some cases but in general will require developers to declare the resources to be pushed in each request.  This feature requires support in the web framework being used.

I made a hello world test pushing the javascript referenced in an HTML page automatically and it improved the latency as expected.  I plan to repeat the test with a real page with tens/hundreds of referenced js/css/img files and publish the results.


lunes, 8 de diciembre de 2014

Static type checking for Javascript (TypeScript vs Flow)

I've never been a big fan of Javascript for large applications (nothing beyond proxies and simple services) and that is partially because in my experience the lack of static typing ends up making very easy to make mistakes and very difficult to refactor code

Because of that I was very excited when I discovered TypeScript some months ago (Disclaimer: I'm not a JS expert) and I was very curious about the differences between TypeScript and Flow when some colleage pointed me to it today.   So I tried to play find the seven differences, but I'm lazy and I stopped after finding one.

Apart from cosmetic differences and tools availability both TypeScript and Flow support type definition based on annotations, type inference and class/modules support based on EcmaScript 6 syntax.    The relevant difference I found after reading/playing with them (for half an hour) is that because of the way they implement type inference Flow can detect type changes of the variables after the initial declaration making it more appropriate for legacy Javascript code where adding annotations can be not possible.

This is some code I used to play with it with some inline comments:

var s = "hello";
s.length  // Both TS and Flow know that s is a string and they check they have a length method


var s: string = null;
s = "hello";
s.length  // Both TS and Flow know that s is a string and they check they have a length method

var s = null;
s = "hello";

s.length // TS doesn't know that this is a string but Flow knows and can check it has a length method

domingo, 26 de octubre de 2014

Service discovery and getting started with etcd

After playing with some Twitter opensource components recently (mostly finagle) I became very interested on the concept of service discovery as a way to implement load balancing and failure recovery in the interconnection between internal services of your infrastructure.   This is specially critical if you are have a microservices architecture.

Basically the idea of Service Discovery solutions is having a shared repository with an updated list of existing instances of type A and having mechanisms to retrieve, update and subscribe to that list allowing other components to distribute the requests to service A in an automated and reliable way.


The traditional solution is Zookeeper (based on Google Plaxos algorithm with code opensourced by Yahoo and maintained as part of the Hadoop project) but apparently other alternatives have appeared and are very promising in the near future.  This post summarized very well the alternatives available.

One of the most interesting solutions is etcd (simpler than Zookeeper, implemented in Go and supported by the CoreOS project).  In this post I explain how to do some basic testing with it.

etcd is a simple key/value store with support for expiration and watching keys that makes it ideal for service discovery.   You can think of it like a redis server but distributed (with consistency and partition tolerance) and with a simple HTTP interface supporting GET, SET, DEL and LIST.

Installation

First step is to install etcd and the command line tool etcdctl.
You can easily download it and install it from here or if you are using a Mac you can just "brew install etcd etcdctl"

Registering a service instance

When a new service instance in your infrastructure starts it should register himself in etcd by sending a SET request with all the information that you want to store for that instance.

In this example we store the hostname and port of the service instance and we use a url schema like /services/SERVICE/DATACENTER/INSTANCE_ID.   In addition we set a ttl of 10 seconds to make sure the information expires if it is not refreshed properly because this instance is not available.

var path = require('path'),
    uuid = require('node-uuid'),
    Etcd = require('node-etcd');

var etcd = new Etcd(),

    p = path.join('/', 'services', 'service_a', 'datacenter_x', uuid.v4());


function register() {
  etcd.set(p,
    JSON.stringify({
      hostname: '127.0.0.1',
      port: '3000'
    }), {
        ttl: 60
    });


  console.log('Registered with etcd as ' + p);

}
setInterval(register, 10000);
register();

Discovering service instances

When a service in your infrastructure requires using other service it has to send a GET request to retrieve all the available instances and subscribe (WATCH) to receive notifications of nodes down or new nodes up.

var path = require('path'),
    uuid = require('node-uuid'),
    Etcd = require('node-etcd');

var etcd = new Etcd();
var p = path.join('/', 'services', 'service_a', 'datacenter_x');

var instances = {};
function processData(data) {
  if (data.action == 'set') {
    instances[data.node.key] = data.node.value;
  } else if (data.action == 'expire') {
    delete instances[data.node.key];
  }
  console.log(instances);
}


var watcher = etcd.watcher(p, null, {recursive: true});
watcher.on("change", processData);

etcd.get(p, {recursive: true}, function(res, data) {
  data.node.nodes.forEach(function(node) {
    instances[node.key] = node.value;
  });
  console.log(instances);
});


Conclusions

Service discovery solutions are becoming a central place of lot of server infrastructures because of the increasing complexity in those infrastructures specially because of the raise of microservices like architectures.  etcd is a ver simple approach that you can understand, deploy and start using in a less than an hour and looks more actively maintained and future proof than zookeeper. 

I tend to think that if Redis is able to have a good clustering solution soon it could replace specialized service discovery/configuration solutions in some cases (but I'm far from an expert in this domain).

The other thing that I found missing are good frameworks making use of these technologies integrated with connection pool management, load balancing strategies, failure detection, retries...    Kind of what finagle does for twitter, maybe that can be my next project :)

jueves, 20 de marzo de 2014

Actor Model

The more I write concurrent applications the more I hate it.    Typically you end up having a code full of locks, queues, threads and threadpools where it is from difficult to impossible to know if it is correct or it only apparently works.

Because of that I decided to do a little research on the Actor Pattern that apparently is powering frameworks like Erlang making it a very good solution for highly concurrent communication platfomrs (like Facebook Chat or WhatsApp).

These are the slides I prepared, there is no much explanation on them, so feel free to ask me any question and try it!   The Actor Model is fun and will simplify your life no matter if you use a framework for it or just keep in mind the concept in your future designs.




miércoles, 12 de febrero de 2014

Scientific way of estimating the cost of a feature in your project



I'm a fan of estimations as long as they are not used to try to figure out when a feature will be done.    I like estimations and I think they are critical when they are used to decide which features should be done and which ones shouldn't.

So, if they are so important, what is the best way to make estimations.   I'm going to share my secret formula based on the things that I have read and my personal experience in my professional career (where I have to admit that my estimations are now completely different than 15 years ago).

There are two key concepts that we need to understand before digging into the actual formula:
  • One feature working doesn't mean the feature is complete or ready.   Instrumentation, thread safety, unit tests, error handling, documentation, automation, unexpected problems, bug fixing...  most of the times takes much more time that the implementation of the basic functionality.
  • Once you write something you usually have to maintain and not break it forever.   Making sure that new features, refactors or any minor change doesn't break any existing code is a really big deal in any project with enough complexity.

Based on those key concepts we can split the cost of a feature in 3 buckets:
  • Cost to have something working (the usual engineers initial estimation): X
  • Cost to have something ready to be shipped: Y
  • Cost to keep it working for the life of the product: Z
For a total cost for adding a feature to a product of X + Y + Z 

And now is when the scientific part is applied.   Based on my experience and thousands (well, maybe 3 or 4) articles I have read I think the Pareto Principle has a perfect application in this case.

In any project the cost of implementing the basic functionality (X) is 20% vs the 80% of implementing the rest of functionality needed to ship the product (Y).   So Y = 4 * X

The Circular Estimation Conjecture: You should always multiply your estimates by pi.
I've seen a similar estimation of X + Y = PI * X that is a bit optimistic in my opinion.  I recommend you to read the visual demonstration of what is called the circular estimation conjeture







For the second part (the maintainability cost Z) we can apply the same Pareto Principle to get Z = 4 (X + Y)

With all those numbers in place the conclusion is easy.   The total cost of having a feature in a product is X + 4 * X  + 4 * (X + 4 * X) = 25 * X

Take your initial guess (or ask any engineer) to get X, then the cost of the feature that you need to use to decide if it is worth to waste your time implementing it or not is exactly 25 * X

As corollary and final demonstration of the theorem, I though this post was going to take me 5 mins to write it and it took me 25 mins and I suspect I will have to spend more than one hour discussing about it with other people.






martes, 21 de enero de 2014

Writing sequential test scripts with node

Today I was trying to create a node.js script to test a HTTP service but the test required multiple steps.   I gave it a try by using async module to "symplify" that code and that's the ugly code I came up with.

I'm not an expert in js/node, feel free to comment if I'm doing something wrong, I'm more than happy to learn.

(inflightSession and create are two helper functions that I have)

Test using node + jasmine: 

it("should accept valid sessionId", function(done) {
      async.waterfall([
          inflightSession,

          function(sessionId, callback) {
            create({ 'sessionId':sessionId }, callback);
          }
      ], function(error, response) {
          expect(response.statusCode).to.equal(200);
          done()
    });
  });


Same test using python + unittest:

def create_ok_test():
    session_id = inflight_session()
    response = create({ 'sessionId': session_id })

    assert_equals(200, response.status_code)


Same test using node ES6 generators (yield keyword):

it("should accept valid sessionId", function*() {
      var sessionId = yield inflightSession();

      var response = yield create({ 'sessionId': sessionId });
      expect(response.statusCode).to.equal(200);
  });


Honestly the code in python is way more readable than the existing node code, and still better even when comparing it with the new node generators.    Anway definitely looks like a promising way to move forward in the node community.   Some comments:

ES6 generators are available under a flag in node 0.11 and are supposed to be included in 0.12.

yield is a common keyword in other languages (i.e. python, C#) to exit from a function but keeping the state of that function so that you can resume the execution later.

function* is the syntax to define a generator function (a function using yield inside).

You need a runner supporting those generator functions (in this example jasmine needs to add support for it), basically calling the generator.next and waiting for the result (the result should be a promise or similar object) before calling generator.next again.

UPDATE: As I´m somehow forced to use node, I ended up creating a helper function and my tests are now like this

itx("should accept valid sessionId", inflightSession, function(sessionId, done) {
      create({ 'sessionId':sessionId }, function(error, response) {
          expect(response.statusCode).to.equal(200);
          done()
      });

});


viernes, 17 de enero de 2014

Distributed Load Testing: Conclussions (5/5)

Let's recap what we have done in these series and try to get some conclusions.   The steps or achievements are these ones:
  1. Find and test a distributed load testing tool in python: locust.
  2. Extend locust for custom (non-HTTP) protocol testing.
  3. Use Instant Servers to run the locust master and slaves.
  4. Implement a simple way to autostop the machines when they are not being used based on the locust logs and instant servers stop API.
  5. Create a template for the slaves to be easily cloned.   Use instant servers tags to define groups.
  6. Fix the python Instant Server SDK and extend it with new authentication and clone features.
  7. Extend locust interface adding a button to spawn machines in instant servers directly from the locust web interface.
Today I like even more python and the testing tools based in scripting instead of complex UIs.  This project gave me the oportunity to discover locust and Instant Servers and highly recommend people to use them for this kind of use case, it was very easy and a lot of fun using and combining those technologies.  Hopefully I can get more time for deeper integration of virtual machines in locust (with a good control UI and perhaps support for other providers).