Saturday, September 15, 2018

Apache Kafka

Introduction to Kafka using NodeJs


Building a B2B healthcare product from scratch for the U.S market
This is a small article intended for node.js developers who intend to start implementing distributed messaging system using Kakfa.

I am planning to write a series of articles demonstrating the usage of Kafka and Storm. This article is the first of the same series. So let's begin.

1.1 What is Kafka ?

Kafka is a distributed messaging system providing fast, highly scalable and redundant messaging through a pub-sub model. Kafka’s distributed design gives it several advantages. First, Kafka allows a large number of permanent or ad-hoc consumers. Second, Kafka is highly available and resilient to node failures and supports automatic recovery. In real world data systems, these characteristics make Kafka an ideal fit for communication and integration between components of large scale data systems.

The Kafka Documentation has done an excellent job in explaining the entire architecture.

Before Moving ahead i would suggest the reader to go through the following link. It is very important to understand the architecture.

https://kafka.apache.org/intro

1.2 Installing & Running Zookeeper and Kafka

Kafka can be downloaded from the following link. I am using the current stable release i.e. 0.10.1.1.

https://kafka.apache.org/downloads

Download the tar. Un-tar it and then follow the steps below:

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. Run the following command to start ZooKeeper:

$ bin/zookeeper-server-start.sh config/zookeeper.properties

Now to start kafka run the following command:

$ bin/kafka-server-start.sh config/server.properties

1.3 Creating Kafka Topic and playing with it

Let's create one topic and play with it. Below is the command to create a topic

$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Posts

Once you create the topic, you can see the available topics with below command:

$bin/kafka-topics.sh --list --zookeeper localhost:2181

For testing kafka, we can use the kafka-console-producer to send a message

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Posts

We can consume all the messages of the same topic by creating a consumer as below:

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Posts --from-beginning


1.3 Integrating Kafka with NodeJS

Let's create a API in NodeJS which will act as a Producer to Kafka. We will be then creating another consumer in NodeJS which will be consuming the topic we created above.

We will be using kafka-node and express module for our producer.

var express = require('express');
var kafka = require('kafka-node');
var app = express();

Let's add the code to handle JSON in our api.

var bodyParser = require('body-parser')
app.use( bodyParser.json() );       // to support JSON-encoded bodies
app.use(bodyParser.urlencoded({     // to support URL-encoded bodies
  extended: true
}));

Now in order to create a kafka producer where you have non-keyed partition, you can simply add the following code

var Producer = kafka.Producer,
    client = new kafka.Client(),
    producer = new Producer(client);
Now let's add some event handler for our producer. These will help us know the state of the producer.

producer.on('ready', function () {
    console.log('Producer is ready');
});

producer.on('error', function (err) {
    console.log('Producer is in error state');
    console.log(err);
})
Now Before going into producing a message to a kafka topic, let us first create a simple route and test our api. Add the below code

app.get('/',function(req,res){
    res.json({greeting:'Kafka Consumer'})
});

app.listen(5001,function(){
    console.log('Kafka producer running at 5001')
});
So, Now the entire code looks like below:

var express = require('express');
var kafka = require('kafka-node');
var app = express();

var bodyParser = require('body-parser')
app.use( bodyParser.json() );       // to support JSON-encoded bodies
app.use(bodyParser.urlencoded({     // to support URL-encoded bodies
  extended: true
}));

var Producer = kafka.Producer,
    client = new kafka.Client(),
    producer = new Producer(client);

producer.on('ready', function () {
    console.log('Producer is ready');
});

producer.on('error', function (err) {
    console.log('Producer is in error state');
    console.log(err);
})


app.get('/',function(req,res){
    res.json({greeting:'Kafka Producer'})
});

app.listen(5001,function(){
    console.log('Kafka producer running at 5001')
})
So let's run the code and test our api in postman.






Now lets create a route which can post some message to the topic.

For the nodejs client, kafka has a producer.send() method which takes two arguments. the first being "payloads" which is an array of ProduceRequest. ProduceRequest is a JSON object like:

{
   topic: 'topicName',
   messages: ['message body'], // multi messages should be a array, single message can be just a string or a KeyedMessage instance
   key: 'theKey', // only needed when using keyed partitioner (optional)
   partition: 0, // default 0 (optional)
   attributes: 2 // default: 0 used for compression (optional)
}
Add the below code to get the topic and the message to be sent .

app.post('/sendMsg',function(req,res){
    var sentMessage = JSON.stringify(req.body.message);
    payloads = [
        { topic: req.body.topic, messages:sentMessage , partition: 0 }
    ];
    producer.send(payloads, function (err, data) {
            res.json(data);
    });
   
})
Now let's run the code and hit our api with a payload. Once the producer pushes the message to the topic, we can see the message get consumed in the default shell consumer we created earlier.

Now Let's create a simple consumer for this in nodejs.

In NodeJS, Kafka consumers can be created using multiple ways. The following is the most simple one out of them all:

Consumer(client, payloads, options)
It takes 3 arguments as above. "client" is the one which keeps a connection with the Kafka server. payloads is an array of FetchRequest, FetchRequest is a JSON object like:

{
   topic: 'topicName',
   offset: 0, //default 0
}
the all possible options for the client are as below:

{
    groupId: 'kafka-node-group',//consumer group id, default `kafka-node-group`
    // Auto commit config
    autoCommit: true,
    autoCommitIntervalMs: 5000,
    // The max wait time is the maximum amount of time in milliseconds to block waiting if insufficient data is available at the time the request is issued, default 100ms
    fetchMaxWaitMs: 100,
    // This is the minimum number of bytes of messages that must be available to give a response, default 1 byte
    fetchMinBytes: 1,
    // The maximum bytes to include in the message set for this partition. This helps bound the size of the response.
    fetchMaxBytes: 1024 * 1024,
    // If set true, consumer will fetch message from the given offset in the payloads
    fromOffset: false,
    // If set to 'buffer', values will be returned as raw buffer objects.
    encoding: 'utf8'
}
So let's add the code below to create a simple consumer.

var kafka = require('kafka-node'),
    Consumer = kafka.Consumer,
    client = new kafka.Client(),
    consumer = new Consumer(client,
        [{ topic: 'Posts', offset: 0}],
        {
            autoCommit: false
        }
    );
Let us add some simple event handlers. One of which notifies us when a message is consumed. For simplicity of the article, let us just do console.log

consumer.on('message', function (message) {
    console.log(message);
});

consumer.on('error', function (err) {
    console.log('Error:',err);
})

consumer.on('offsetOutOfRange', function (err) {
    console.log('offsetOutOfRange:',err);
})
The entire code of the consumer looks like below:

var kafka = require('kafka-node'),
    Consumer = kafka.Consumer,
    client = new kafka.Client(),
    consumer = new Consumer(client,
        [{ topic: 'Posts', offset: 0}],
        {
            autoCommit: false
        }
    );

consumer.on('message', function (message) {
    console.log(message);
});

consumer.on('error', function (err) {
    console.log('Error:',err);
})

consumer.on('offsetOutOfRange', function (err) {
    console.log('offsetOutOfRange:',err);
})
Before testing this consumer, let us first kill the shell consumer. Then hit our producer api


This is the end of this article. But in future articles i am planning to showcase a bit more complicated usage of Kafka.

Hope this article helps!




Monday, September 10, 2018

Describe Node.js

Introduction to Node.js

The modern web application has really come a long way over the years with the introduction of many popular frameworks such as bootstrap, Angular JS, etc. All of these frameworks are based on the popularJavaScript framework.
But when it came to developing server based applications there was just kind of a void, and this is where Node.js came into the picture.
Node.js is also based on the JavaScript framework, but it is used for developing server-based applications. While going through the entire tutorial, we will look into Node.js in detail and how we can use it to develop server based applications.



What is Node.js?

Node.js is an open-source, cross-platform runtime environment used for development of server-side web applications. Node.js applications are written in JavaScript and can be run on a wide variety of operating systems.
Node.js is based on an event-driven architecture and a non-blocking Input/Output API that is designed to optimize an application's throughput and scalability for real-time web applications.
Over a long period of time, the framework available for web development were all based on a stateless model. A stateless model is where the data generated in one session (such as information about user settings and events that occurred) is not maintained for usage in the next session with that user.
A lot of work had to be done to maintain the session information between requests for a user. But with Node.js there is finally a way for web applications to have a real-time, two-way connections, where both the client and server can initiate communication, allowing them to exchange data freely.


Why use Node.js?

We will have a look into the real worth of Node.js in the coming chapters, but what is it that makes this framework so famous. Over the years, most of the applications were based on a stateless request-response framework. In these sort of applications, it is up to the developer to ensure the right code was put in place to ensure the state of web session was maintained while the user was working with the system.
But with Node.js web applications, you can now work in real-time and have a 2-way communication. The state is maintained, and the either the client or server can start the communication.


Features of Node.js

Let's look at some of the key features of Node.js
1. Asynchronous event driven IO helps concurrent request handling – This is probably the biggest selling points of Node.js. This feature basically means that if a request is received by Node for some Input/Output operation, it will execute the operation in the background and continue with processing other requests.
This is quite different from other programming languages. A simple example of this is given in the code below

var fs = require('fs'); 
          fs.readFile("Sample.txt",function(error,data)
          {
                console.log("Reading Data completed");
     });

The above code snippet looks at reading a file called Sample.txt. In other programming languages, the next line of processing would only happen once the entire file is read.
But in the case of Node.js the important fraction of code to notice is the declaration of the function ('function(error,data)'). This is known as a callback function.
So what happens here is that the file reading operation will start in the background. And other processing can happen simultaneously while the file is being read. Once the file read operation is completed, this anonymous function will be called and the text "Reading Data completed" will be written to the console log.

2. Node uses the V8 JavaScript Runtime engine, the one which is used by Google Chrome. Node has a wrapper over the JavaScript engine which makes the runtime engine much faster and hence processing of requests within Node also become faster.
3. Handling of concurrent requests – Another key functionality of Node is the ability to handle concurrent connections with a very minimal overhead on a single process.
4. The Node.js library used JavaScript – This is another important aspect of development in Node.js. A major part of the development community are already well versed in javascript, and hence, development in Node.js becomes easier for a developer who knows javascript.
5. There are an Active and vibrant community for the Node.js framework. Because of the active community, there are always keys updates made available to the framework. This helps to keep the framework always up-to-date with the latest trends in web development.


Who uses Node.js

Node.js is used by a variety of large companies. Below is a list of a few of them.
Paypal – A lot of sites within Paypal have also started the transition onto Node.js.
LinkedIn - LinkedIn is using Node.js to power their Mobile Servers, which powers the iPhone, Android, and Mobile Web products.
Mozilla has implemented Node.js to support browser APIs which has half a billion installs.
Ebay hosts their HTTP API service in Node.js

When to Use Node.js

Node.js is best for usage in streaming or event-based real-time applications like
1. Chat applications
2. Game servers – Fast and high-performance servers that need to processes thousands of requests at a time, then this is an ideal framework.
3. Good for collaborative environment – This is good for environments which manage document. In document management environment you will have multiple people who post their documents and do constant changes by checking out and checking in documents. So Node.js is good for these environments because the event loop in Node.js can be triggered whenever documents are changed in a document managed environment.
4. Advertisement servers – Again here you could have thousands of request to pull advertisements from the central server and Node.js can be an ideal framework to handle this.
5. Streaming servers – Another ideal scenario to use Node is for multimedia streaming servers wherein clients have request's to pull different multimedia contents from this server.
Node.js is good when you need high levels of concurrency but less amount of dedicated CPU time.
Best of all, since Node.js is built on javascript, it's best suited when you build client-side applications which are based on the same javascript framework.


When to not use Node.js

Node.js can be used for a lot of applications with various purpose, the only scenario where it should not be used is if there are long processing times which is required by the application.
Node is structured to be single threaded. If any application is required to carry out some long running calculations in the background. So if the server is doing some calculation, it won't be able to process any other requests. As discussed above, Node.js is best when processing needs less dedicated CPU time.