miércoles, 19 de abril de 2017

Multiplatform Travis Projects (Android, iOS, Linux in the same build)

Using travis to build and test your code is usually a piece of cake and highly recommended but last week I tried to use travis for a non so conventional project and it ended up being more challenging than expected.

The project was a C library with Java and Swift wrappers and my goal was to generate Android, iOS and Linux versions of that library using Travis.   The main problem with my plan was that you have to define the "language" of project in your travis.yaml file and in my case... should it be android, objective-c or cpp project?

It would be great if travis would support multilanguage projects [1] or multiple yaml files per project [2] but apparently none of that is going to happen in the short term.

I decided to build the Linux part using docker to make sure I can use the same environment locally, in travis and in production.

Given the fact that the only way to build an iOS project is using OSX images and that there is no docker support in travis for OSX I had to use the multiple operating systems capabilities in travis [3].

This ended up being the most challenging part.  Android projects require a lot of packages (tools, sdks, ndks, gradle...) so I decided to use docker also for this to make sure I had the same environment locally and in travis.    There were some docker images for this and I took many ideas form them, but I decided to generate my own [4].

To not have a too crazy travis.yaml file I put all the steps to install prerequirements and to launch the build process in shell scripts (2 scripts per platform).  That simplifies the travis configuration and also let me reuse the steps if I want to build locally or in jenkins eventually.   My project folder looks like this:


The most interesting scripts (if any) are the android and ios ones.

    echo "no additional requirements needed"

    xcodebuild build -workspace ./project.xcworkspace -scheme 'MyLibrary' -destination 'platform=iOS Simulator,name=iPhone 6,OS=10.3'

    docker pull ggarber/android-dev

    docker run --rm -it --volume=$(pwd):/opt/workspace --workdir=/opt/workspace/samples/android ggarber/android-dev gradle build

With that structure and those scripts the resulting travis.yaml file is very simple:

language: cpp

sudo: required
dist: xenial

  - linux
  - osx

osx_image: xcode8.3

  - docker

  - if [[ "$TRAVIS_OS_NAME" != "osx" ]]; then ./scripts/linux/before_install.sh  ; fi
  - if [[ "$TRAVIS_OS_NAME" != "osx" ]]; then ./scripts/android/before_install.sh ; fi
  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then ./scripts/ios/before_install.sh     ; fi

  - if [[ "$TRAVIS_OS_NAME" != "osx" ]]; then ./scripts/linux/script.sh  ; fi
  - if [[ "$TRAVIS_OS_NAME" != "osx" ]]; then ./scripts/android/script.sh ; fi
  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then ./scripts/ios/script.sh     ; fi

This is working fine although the build process is a little bit slow so these are some ideas to explore to try to improve it in the future:
  • Linux and Android builds could run in parallel.
  • Android docker images are very big (not only mine but all the ones I found).   According to docker hub it is 2GB compressed image.  Probably there are ways to strip this down.
  • I'm not caching the android packages being downloaded during the build process inside the docker container.

[1] https://github.com/travis-ci/travis-ci/issues/4090
[2] https://github.com/travis-ci/travis-ci/issues/3540
[3] https://docs.travis-ci.com/user/multi-os/
[4] https://github.com/ggarber/docker-android-dev

lunes, 6 de febrero de 2017

Using Kafka as the backbone for your microservices architecture

Disclaimer: I only use the word microservices here to get your attention.  Otherwise I would say your platform, your infrastructure or your services.

In many cases when your application and/or your team start growing the only way to maintain a fast development and deployment pace is to split the application and teams in different smaller units.   In case of teams/people that creates some interesting and not necessarily easier to solve challenges but this post is focused on the problems and complexity created in the software/architecture part.

When you split your solution in many components there are at least two problems to solve:
  • How to pass the information from one component to another (f.e. how do you notify all the sub-components when a user signs up so that you send him notifications, start billing him, generate recommendations...)
  • How to maintain the consistency of all the partially overlapped data stored in the different components (f.e. how do you remove all the user data from all the sub-components when the user decides to drop out from your service)

Inter component communication

At a very high level there are two communication models that are needed in most of the architectures:
  •  Synchronous request/response communications.  This has his own challenges and I recommend to use gRPC and some best practices around load balancing, service discovery, circuit breakers.... (find here my slides for TEFCON 2016) but it is usually a well understood model.
  • Asynchronous event based communications where a component generates an event and one or many components receive it and implement some logic in response to that event.
The elegant way to solve this second requirement is having in the middle a bus or a queue (depending on the reliability guarantees required for the use case) where producers send events and consumers can read those events from it.    There are many solutions to implement this pattern but when you have to handle heterogeneous consumers (that consume events at different rates or with different guarantees) or you have a massive amount of events or consumers the solution is not so obvious.

Data consistency

The biggest problem to solve in pure microservices architectures is probably how to ensure data consistency.   Once you split your application in different modules with data that is not completely independent (at the very least they all have the information about the same users) you have to figure out how to maintain that information in sync.

Obviously you have to try to maintain these dependencies and duplicated data as small as possible but usually at least you have to solve the problem of having the same users created in all of them.

To solve it you need a way to sync the data changes between different components that could be duplicated and need to be updated in other components.  So basically you need a way to replicate data that ensures the eventual consistency of it.

The Unified Log solution

If you look at those two problems they can be reduced to a single one: To have a real-time and reliable unified log that you can use to distribute events among different components with different needs and capabilities.   That's exactly the problem that LinkedIn had and what they built Kafka to solve.   The post "The Log: What every software engineer should know about real-time data's unifying abstraction" it is a very very recommended reading.

Kafka decouples the producers from the consumers including the ability to have slow consumers without affecting rest of the consumers.  Kafka does that and at the same time supports very high rates of events (it is common to have hundreds of thousands per second) with very low latencies (<20 msecs easily).  All these features while still being a very simple solution and providing some advanced features like organizing events in topics, preserving ordering of the events or handling consumer groups.

Those Kafka characteristics make it suitable to support most the inter-component communication use cases including events distribution, logs processing and data replication/synchronization.  All with a single simple solution by modeling all these communications as an infinite list of ordered events accessible for multiple consumers using a centralized unified log.

This post was about Kafka but all/most-of-it is equally applicable to the Amazon clone Kinesis. 

You can follow me in Twitter if you are interested in Software and Real Time Communications.

domingo, 15 de enero de 2017

Starting to love gRPC for interprocess communication (1/2)

In the context of a discussion around programming languages and static typing a colleague said that when you get older you stop caring about fancy technologies and you realize that is way better to just use safe and well probed solutions.  

I'm kind of tired of having been using loosely defined JSON-HTTP interfaces for many years and when I discovered gRPC last year it looked exactly what I was looking for.  I would love to start using it in production as soon as possible so I decided to play with it for a while first and explain how it went.

I will split my comments about gRPC in two posts. This first one about what is gRPC and what advantages provide and the next one on how to use it in our applications.

gRPC embraces the RPC paradigm where the APIs are defined as actions receiving some arguments and replying with a response.  Initially it feels like going 10y back when we started to use SOAP and similar technologies but we have to admit that is much simpler to map those primitives to our client and server code (for example no url path mapping) and it is more strict and explicit on what can and cannot be done for each operation and that usually makes the system more robust.

In gRPC you define your interfaces (methods, arguments and results) in an IDL using the protocol buffers format.   This definition is used to generate the server and client code automatically.   The serialization of the calls is done using the binary protobuf format too.  This makes the communication efficient and the protocol extensible being able to use all the features available in protobuf (for example composition or enum types).

Two of the advantages of this approach are automatic code generation and schema validation.  That can also be done in the "traditional" REST interfaces, but it is more tedious, less efficient and in my experience much easier to make mistakes when you add new features or refactor the code.

The communication in gRPC is based on HTTP2 transport.  This provides all the advantages of the new HTTP version (multiplexing, streaming, compression) while at the same time allows you to keep using existing HTTP infrastructure (nginx or other load balancers for example).

Another special feature of gRPC is the streaming support that is very convenient for some APIs these days.    With gRPC you are able to send a (potentially infinite) sequence of arguments to the server and receive a sequence of results from it.   That is very useful to implement applications more responsive where data can be processed and displayed even if part of it is still not ready.    It is also very useful for APIs based on notifications like in case of a chat application for example.

When compared with other IPC frameworks like Finagle (disclaimer, i'm a fan of it) gRPC is still missing important features client side load balancing (although it is wip) and some other goodies like circuit breakers, retries or service discovery.   In the mean time people is implementing those features on top of the framework.

The other missing piece is browsers support.  Even if there is support for many languages including Javascript, the browsers limitations make it not possible to implement a gRPC compatible web client nowadays.   The community is working on an extension of the protocol to support browsers and in the mean time the only solution seem to be the grpc-gateway proxy that generates a JSON-HTTP to gRPC proxy based on the IDL of the service with some extra annotations.

domingo, 6 de noviembre de 2016

Adding metrics/monitoring to the Mac menu bar

In the past I used to have an extra screen close to my desk where I was able to show different dashboards with metrics to monitor the health of our services.    Depending on the service we can use things like graphite, cloudwatch or google analytics.

These days I'm finding some challenges to keep using that approach so I decided to explore the option of showing those metrics in the menu bar of my Mac.

First thing I needed was an app allowing me to put custom stuff in the menu bar.  I explored a couple of options and I ended up using BitBar:

BitBar is a free app that is able to execute almost any script (bash, python, ruby...) and put the output of it in the menu bar with many customizable options for icons, images and format.

Right now I wanted to monitor a couple of services using graphite, another from cloudwatch, show some statistics from google analytics and ideally monitor a heroku app.

- Graphite: It was trivial to write a 1 line bash script curl-ing the graphite endpoint with &format=json and parsing the output with jq.

- CloudWatch: I used a python script and boto3 to be able to get CloudWatch statistics for AWS Firehose but after a couple of try-error iterations the final script was very simple too.

- Google Analytics: This was the most challenging part specially because the authentication part.   I ended up using this sample from Google tuned to print the exact information I wanted: https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py

- Heroku: I was not able to figure out how to access programmatically the requests/sec metrics that are shown in the web dashboard :(

The end result is this one, you can even use emojis (:mushroom) to make the information more colorful:

miércoles, 28 de septiembre de 2016

How much plumbing is required to build and deploy a server exposing the simplest HTTP API

I got into an interesting discussion today about the future of development & deployment and one of the premises was that today there is too much plumbing involved on building and deploying everything.

I argued that it was not that much plumbing with modern frameworks or with project templates (like yeoman ones) and that deployment had been heavily simplified in environments like Heroku.

So, in this quick & dirty post I will try to prove my point building a simple HTTP server exposing a Hello-World HTTP API.

Creating the app:

➜  echo "import os
from flask import Flask
app = Flask(__name__)

def hello_world():
    return 'Hello, World!'

app.run(host='', port=int(os.environ.get('PORT', 5000)))> app.py
➜  echo "flask" > requirements.txt
➜  echo "web: python app.py" > Procfile

Initializing the source control (git) and comiting the changes:

➜  git init
Initialized empty Git repository in /Users/ggb/projects/rgb/.git/
➜  git add *
➜  git commit -m "First version"

Deploying to production:

➜  heroku create rgb-ggb
Creating ⬢ rgb-ggb... done
https://rgb-ggb.herokuapp.com/ | https://git.heroku.com/rgb-ggb.git
➜  git push heroku master *

Try It!: https://rgb-ggb.herokuapp.com/


Code: 7 LoC (4 is the plumbing of starting the server and it has to be done only once)
Deployment: 1 file (Procfile) to tell heroku how to start your app (this is not needed for node.js apps and could be autogenerated with a yeoman template) + 1 git push command in the console (and a "heroku create" command the first time)

sábado, 17 de octubre de 2015

You need a corporate framework

If you are working in a big enough software development team you probably agree that consistency in the code and development practices is very important.  Consistency is what makes you save time when joining a new project or reviewing somebody else code, or what saves ops team time when they deploy a new module and have to figure out how to monitor it, or what saves analytics team time when they have to understand and use the logs & metrics generated by a new component.

To be able to get certain degree of consistency (and quality at the same time) it is very common these days to have coding guidelines, technical plans, training plans, code reviews....  All those practices are very important and help a lot to achieve certain degree of consistency, but in my opinion they don't solve some of the most important problems and in addition they depend a lot on human responsibility (bad, very bad, you shouldn't trust any human).

So let's try to figure out what are some of the problems we have today.  Is any of these problems familiar to you?
  • You start a new project and you don't know what folders to create (should I create doc and test folders), how to name things (is it test or tests, src or lib?), should I use jasmine or mocha for testing, should I put the design of this component in a wiki page, a gdocs or a .txt in a folder, where do I put configuration, should I mention the third party licenses somewhere ...
  • Each component logs different things, with different names and in different format.  Do all your components log every request and response? Do they use WARN and ERROR consistently?  Do you always use the same format for logging?  I've seen teams using as many logging libraries as components they have.  The cost of not having good consistent logging can easily make a company waste hundreds of thousands of dollars very quickly.
  • Half of the components don't have a health or monitoring endpoint, or if they have it the amount of information shown or the format is totally inconsistent.   One service expose the average response time, the other the P99, the other only counters...  It makes hard (if not impossible) to monitor components so at the end nobody pays attention to them until a customer complains.
  • My retries strategy sucks.   Do you always retry when you make requests to third party components (very common with the popularization of "microservices" architectures)?  All your components do the same amount of retries?  The timeout before retrying is always the same?  Do you retry against a different server instance?
  • The configuration of each component is different.    One use XML, the other JSON, the other env variables?   In some components it can be changed on the fly while in others it can't?  In some components the config is in git, in others in chef recipes, in others in external configuration servers?
  • Do you have any service registration and service discovery solution?  Or some services are registered in a database, others in a config file, others in the load balancer configuration file?

Use the force Luke!

What you need is a corporate framework and a corporate project template.

You don't even need to create your own framework.  The best example of this kind of framework I know would be Finagle from Twitter and other teams like Tumblr, Pinterest or Foursquare are reusing it.

Finagle enforces a design to build Scala services (Futures based), it provides a TwitterServer class that automatically exposes a stats endpoint and read configuration properties from command line arguments, includes support for distributed logging,  provides lot of clients (MySQL, HTTP, Redis...) exposing a consistent API and automatically generating logs and statistics, integrates with zookeeper for seamless registration and discovery of services.    If you don't know it I highly recommend you to take a look.

I tried to implement my own framework some months ago (https://github.com/ggarber/snap).  It is very rudimentary (the maturity level of a hackathon project) but I'm using it in production to test if it is really helpful and even at the level of immaturity it has I found it very helpful (I don't need to care much about consistency anymore and it also saved me time).

The other piece I think it is mandatory is to have project template.   It avoids you having to make decisions and should have a reasonable amount of tools integrated to automatically run tests, review styles, initiate a pull request... and maybe even deploy.

This project template can be an Eclipse plugin, a yeoman generator or something else, but if you don't have one I don't understand why :)  As an example for node.js projects I like this one created by a friend: https://github.com/luma/generator-tok

Hopefully I convinced you of how important is to have a corporate framework and project template that you use for all your components.    Feedback is more than welcomed.     And contributors for the snap framework (https://github.com/ggarber/snap) even more! :)

sábado, 4 de julio de 2015

HTTP/2 explained in 5 minutes

After reading and playing for some days with HTTP/2 this is a summary of my understanding at a very high level.

HTTP/2 is all about reducing the latency accessing web applications.  It maintains the semantics (GET, POST... methods, headers, content)  and url schemes of existing HTTP and it is based on the improvements proposed by Google as part of his SPDY protocol that is finally replaced by HTTP/2.

The three most important changes introduced in HTTP/2 in my opinion are:
1) Reduced overhead of HTTP Headers by using binary fields and header compression (HPACK).
2) Ability to use a single TCP connection for multiple HTTP Requests without any type of first in line blocking (Responses can be sent in different order than requests).
3) Support for pushing contents from server to client without previous request (for example the server could send some images that the browser will need when it receives the request for the HTML file referencing those images).

The most controversial "feature" of HTTP/2 was making TLS mandatory.  At the end the requirement was relaxed but some browsers (firefox) plan to make it mandatory anyways.

Most of the relevant browsers (at least chrome and firefox and some versions of IE) already include support for HTTP 2 as well as the most popular opensource servers (nginx and Apache).  So you should be able to take advantage of the new version of the protocol right now.

The HTTP/2 support is negotiated using the same HTTP Upgrade mechanism used for websockets and should be transparent for users and elements in the middle (proxies),

Application developers should benefit for free automatically without any change in their apps but they can get even more benefits with some extra changes:
* Some tricks that are in use today like spriting or inlining resources are not needed any more.  So they can simplify the build/deploy pipeline.
* Server push could be automatic in some cases but in general will require developers to declare the resources to be pushed in each request.  This feature requires support in the web framework being used.

I made a hello world test pushing the javascript referenced in an HTML page automatically and it improved the latency as expected.  I plan to repeat the test with a real page with tens/hundreds of referenced js/css/img files and publish the results.