Showing posts with label Docker. Show all posts
Showing posts with label Docker. Show all posts

Thursday, 21 May 2020

Kafka to ELK message manipulation

Kafka to Elk Message Manipulation

Kafka to Elk Message Manipulation

2020-05-21T05:58:11+01:00



Introduction

We will be creating a message flow starting from Kafka and ending in Kibana. Flow will be like below:

console app (to send json message) -> kafka -> logstash -> elasticSearch -> kibana

We will be using ruby filter to manipulate the message as well and docker to setup the environment.

Setup

mkdir kafka-elk
cd kafka-elk
wget http://apache.mirror.anlx.net/kafka/2.5.0/kafka_2.13-2.5.0.tgz
tar xvf kafka_2.13-2.5.0.tgz

Note: We need kafka binaries to get the script that can send json message via console to kafka.

Populate docker-compose.yml file with following contents

version: '2'

services:
  zookeeper:
    image: 'bitnami/zookeeper:3'
    container_name: zookeeper
    ports:
      - '2181:2181'
    volumes:
      - 'zookeeper_data:/bitnami'
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    networks:
      - elastic

  kafka:
    image: 'bitnami/kafka:2.5.0'
    container_name: kafka
    ports:
      - '9092:9092'
      - '29092:29092'
    volumes:
      - 'kafka_data:/bitnami'
    environment:
      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,PLAINTEXT_HOST://:29092
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
    depends_on:
      - zookeeper
    networks:
      - elastic

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
    container_name: elasticsearch
    environment:
      - node.name=elasticsearch
      - cluster.name=es-docker-cluster
      - cluster.initial_master_nodes=elasticsearch
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elastic_data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic

  kibana:
    image: docker.elastic.co/kibana/kibana:7.6.2
    container_name: kibana
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    networks:
      - elastic

  logstash:
    image: docker.elastic.co/logstash/logstash:7.7.0
    container_name: logstash
    ports:
      - 5000:5000
    volumes:
    - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    - ./logstash.yml:/usr/share/logstash/config/logstash.yml
    - ./manipulate_msg.rb:/etc/logstash/manipulate_msg.rb
    networks:
      - elastic

networks:
  elastic:
    driver: bridge

volumes:
  zookeeper_data:
    driver: local
  kafka_data:
    driver: local
  elastic_data:
    driver: local

create logstash.conf file

cat <<EOF > logstash.conf
input {
  kafka {
    bootstrap_servers => "kafka:9092"
    client_id => "transform-text"
    group_id => "transform-text"
    consumer_threads => 3
    topics => ["transform-text"]

    # Following multiline json codec may not work on all the
    # possible multiline json records.
    # codec => multiline {
    #  pattern => "^\{"
    #  negate => true
    #  what => previous
    # }

    # use json record with no newline in between.
    codec => json
    tags => ["transformed-text", "kafka_source"]
    type => "kafka-test-messages"
  }

  # to test the logstah via telnet
  # e.g. cat some.json | nc localhost 5000
  tcp {
    port => 5000
    type => syslog
    codec => multiline {
      pattern => "^\{$"
      negate => true
      what => previous
    }
  }

  # to test the logstah via telnet
  # e.g. cat some.json | nc localhost 5000
  udp {
    port => 5000
    type => syslog
    codec => multiline {
      pattern => "^\{$"
      negate => true
      what => previous
    }
  }

}

filter {
  ruby {
    path => "/etc/logstash/manipulate_msg.rb"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}
EOF

create logstash.yml with following contents

http.host: "0.0.0.0"
config.support_escapes: true

create manipulate_msg.rb with following contents.

def filter(event)
  # get the message line sent by kafka or any other source like syslog
  message = event.get("message")
  event.set("newField", "newValue")
  return [event]
end

Note: We are manipulating the message coming from kafka by adding an extra field/value pair to the message in logstash ruby filter which will be visible in kibana.

Note Kafka is not pushing messages to logstash. Its the logstash that pulling messages from kafka (acting as kafka consumer).

Create a sample.json file with following contents

{
  "menu": {
    "id": "file with space",
    "value": "File",
    "popup": {
      "menuitem": [
        {"value": "New", "onclick": "CreateNewDoc()"},
        {"value": "Open", "onclick": "OpenDoc()"},
        {"value": "Close", "onclick": "CloseDoc()"}
      ]
    }
  }
}

Note: We need to make above json input to be in one-line, as logstash cannot ingest multiline json record. You can use multiline codec in logstash.conf input plugin (to ingest multiline json) but then kibana will show that record as a single string and json record’s field/keys will not be shown as individual fields in kibana.

Multiline codec will make json record just a series of characters (a long string) and record’s keys/fields will not be recoginzed in kibana as separate fields. You might need to make changes to ruby filter so that kibana can show record’s key/fields as individual searchable fields.

To keep things simple, we will be converting our json in a flat structure and use codec json in logstash.conf’s kafka input plugin.

flatten the json record.

cat sample.json |  perl -wp -e 's/\n+//g' > flat_sample.json

Start the whole setup

docker-compose up -d

Note: You can bring down whole setup by running this command docker-compose down -v

Setup kibana

Wait for a minute for above setup to come up fully, and open Kibana URL: http://localhost:5601/ . You need to create an index pattern with logstash-* as index patten (inside management). But before you can create index-pattern you need to send some data to elastic search. That can be sent via running below mentioned kafka-console-producer.sh command. Once you have sent the json , you should be able to create index pattern. Now click on “discover” to view the sent data. Try sending more data.

send json data via kafka

kafka_2.13-2.5.0/bin/kafka-console-producer.sh --topic "transform-text" --bootstrap-server localhost:29092 < flat_sample.json

Note: Kafka input plugin is using json codec

Note: You should see json data visible in Kibana.

Note You can see data being ingested by logstash by viewing logstash logs docker logs -f logstash

send json data via syslog port 5000

 cat flat_sample.json | nc localhost 5000

Note: tcp input plugin is using port 5000 which is can be used by syslog as well.

Note: above command is sending data straight to logstash (skipping kafka)

Note: tcp input plugin is using multiline codec.

Please note the difference in reprsentation of two records sent above in kibana in order to understand the differnce between multiline and json codec.

Tuesday, 7 June 2016

Docker Logging with Rsyslog

Docker Logging With Rsyslog

Docker Logging With Rsyslog

2019-07-27T12:15:28+01:00



Introduction

This document will describe a simple strategy to logging for docker container using Rsyslog. Often we may have to run multiple containers on single machine. We may require logging for different container in different directories or files. This can be achieved using Rsyslog. Approach below is very generic and flexible and can be modified as per requirement easily.

Running Docker with syslog logging

I use following command to run docker container.

docker run --rm --log-driver=syslog  --log-opt tag="{{.ImageName}}/{{.Name}}/{{.ID}}" ubuntu echo atestoutput

Above command will append a line similar to following line in /var/log/syslog file.

Jun  7 15:58:27 machine docker/ubuntu/trusting_dubinsky/b7d1373fccaf[20642]: atestoutput

What we want:

We want the logging of container to go in its own directory. E.g. We probably want that each docker container should log into /var/log/company/dockerapps/containerName/docker.log. For a few containers we may hard code into rsyslog.conf file. But for a number of containers, we will have to use more generic configuration of rsyslog.

Take a look at pattern in above output i.e docker/ubuntu/trusting_dubinsky/b7d1373fccaf[20642]. This is called syslog tag and rsyslog will store this pattern in its property variable which can be accessed by referring to syslogtag in rsyslog.conf file. We will exploit this property to generate dynamic filenames and directory structure.

Version of Rsyslog used

$ rsyslogd -v
        rsyslogd 8.16.0, compiled with:
        PLATFORM:                               x86_64-pc-linux-gnu
        PLATFORM (lsb_release -d):
        FEATURE_REGEXP:                         Yes
        GSSAPI Kerberos 5 support:              Yes
        FEATURE_DEBUG (debug build, slow code): No
        32bit Atomic operations supported:      Yes
        64bit Atomic operations supported:      Yes
        memory allocator:                       system default
        Runtime Instrumentation (slow code):    No
        uuid support:                           Yes
        Number of Bits in RainerScript integers: 64

Configure rsyslog:

Create a regular file /etc/rsyslog.d/40-dockerapp.conf. We will put our configuration in this file. rsyslog comes with various defaults in /etc/rsyslog.conf and /etc/rsyslog.d/50-default.conf. We named our file as 40-dockerapp.conf, because we want to get it executed before 50-default.conf. Populate /etc/rsyslog.d/40-dockerapp.conf file with following contents.

# To create logging directories/filenames dynamically.
template(name="Dockerlogfiles" type="string" string="/var/log/company/dockerapps/%syslogtag:R,ERE,2,FIELD:docker/(.*)/(.*)/(.*)\\[--end%/%syslogtag:R,ERE,3,FIELD:docker/(.*)/(.*)/(.*)\\[--end%/docker.log")

# To format message that will be forwarded to remote syslog server
template (name="LongTagForwardFormat" type="string" string="<%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %syslogtag:::sp-if-no-1st-sp%%syslogtag%%msg:::sp-if-no-1st-sp%%msg%")

if $syslogtag startswith "docker" then {

    # Local logging
    action(name="localFiles" type="omfile" DynaFile="Dockerlogfiles")

    # Remote logging to remote syslog server. 
    action(name="forwardToOtherSyslog" type="omfwd" Target="IP ADDRESS of Target Syslog server" Port="514" Protocol="udp" template="LongTagForwardFormat")
    &~
}

Above template will generate filenames dynamically based on syslogtag pattern shown above in output. In this case we are using regular expressions. &~ will stop the processing of further rules for the message. If &~ is missing, then 50-default.conf will pickup the message, process it and will send it to /var/log/syslog file as well. So we will eventually end up having duplicate syslog entries in two different files.

Details of regular expressions used above:

regex details

%syslogtag:R,ERE,1,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> ImageName
%syslogtag:R,ERE,2,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> ContainerName
%syslogtag:R,ERE,3,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> ContainerID

Note: more accurately above statements are actually called property replacers using regex
NOTE1:

Rsyslog limits length of syslogtag to 32 characters when message is sent to remote rsyslog server(32 char limit is NOT there for local logging though).My tags are longer than 32 characters. Therefore I used another template that will not restrict the limit to 32 when sending the message to remote server. This template is taken from http://www.rsyslog.com/sende-messages-with-tags-larger-than-32-characters/. However I have modified this template.

NOTE2:

I have modified the above template "not to restrict syslogtag length limit to 32 chars" a bit as stated above. The reason for doing this is very subtle. My syslogtag is greater than 32 characters and it consists of format like "docker/A_very_long_docker_image_name:A_very_long_docker_tag_name/containerName/ContainerID:[somePID]". Take a careful look at ":" colon in the middle and near the end of above syslogtag(before somePID). While forwarding above syslogtag (i.e message containing this syslogtag) to remote rsyslog server, destination rsyslog server was creating a space after the middle : colon. In order to prevent this I added %syslogtag:::sp-if-no-1st-sp% construct in above mentioned template.

Restart rsyslog:

Restart rsyslog and test by starting container with command shown above. You will see a file docker.log in /var/log/company/dockerapps/containerName/containerID/docker.log

Note:Rsyslog will create directory structure automatically.

Other ways:

Dynamic filename generation can be done in other ways too. Following are the other ways to write above template.

Using Latest template syntax, String type and Field values.

template syntax

# Using Latest template syntax (String type and Field values)
#        /%syslogtag:F,47:1% => represents docker in $syslogtag . 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:2% => represents ImageNAme. 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:3% => represents ContainerName.
#        /%syslogtag:F,47:4% => represents ContainerID.
template(name="Dockerlogfiles" type="string" string="/var/log/company/dockerapps/%syslogtag:F,47:2%/%syslogtag:F,47:3%/docker.log")

Using old template syntax and Using field values.

old template syntax

#        /%syslogtag:F,47:1% => represents "docker" in $syslogtag . 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:2% => represents ImageNAme. 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:3% => represents ContainerName.
#        /%syslogtag:F,47:4% => represents ContainerID.
# $template Dockerlogfiles, "/var/log/company/dockerapps/%syslogtag:F,47:2%/docker.log"
$template Dockerlogfiles, "/var/log/company/dockerapps/%syslogtag:F,47:2%/%syslogtag:F,47:3%/docker.log"

Using latest template syntax and list type

Template syntax list type

#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="1") == will generate ==> imagename
#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2") == will generate ==> containername
#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2") == will generate ==> containerid
template(name="Dockerlogfiles" type="list") {
   constant(value="/var/log/company/dockerapps/")
   property(name="syslogtag" securepath="replace" \
            regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2")\
   constant(value="/docker.log")
}
A consolidated file describing all above scenarios is below:

Full file

# We assume that a syslogtag generated by docker container is of following format
#        docker/ImageNAme/ContainerName/ContainerID
# An exmaple docker command to generate above tag is
#        docker run --rm --log-driver=syslog  --log-opt tag="{{.ImageName}}/{{.Name}}/{{.ID}}" ubuntu echo atestwithouttag
#    Above command will log following similar message
#        Jun  7 15:58:27 machine docker/ubuntu/trusting_dubinsky/b7d1373fccaf[20642]: atestwithouttag



# A very Simple way to filter messages
#:syslogtag, ereregex, "docker/ubuntu" /var/log/docker-syslog/syslog.log


# To create logging directories/filenames dynamically.
# Using latest template syntax (string type and regex). Below will create log file like : /var/log/company/dockerapps//docker.log
#     %syslogtag:R,ERE,1,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> imageName
#     %syslogtag:R,ERE,2,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> containerName
#     %syslogtag:R,ERE,3,FIELD:docker/(.*)/(.*)/(.*)\\[--end%  == will generate ==> containerID
# template(name="Dockerlogfiles" type="string" string="/var/log/company/dockerapps/%syslogtag:R,ERE,2,FIELD:docker/(.*)/(.*)/(.*)\\[--end%/docker.log")
template(name="Dockerlogfiles" type="string" string="/var/log/company/dockerapps/%syslogtag:R,ERE,2,FIELD:docker/(.*)/(.*)/(.*)\\[--end%/%syslogtag:R,ERE,3,FIELD:docker/(.*)/(.*)/(.*)\\[--end%/docker.log")

# To format message that will be forwarded to remote syslog server
template (name="LongTagForwardFormat" type="string" string="<%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %syslogtag:::sp-if-no-1st-sp%%syslogtag%%msg:::sp-if-no-1st-sp%%msg%")

# Using Latest template syntax (String type and Field values)
#        /%syslogtag:F,47:1% => represents docker in $syslogtag . 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:2% => represents ImageNAme. 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:3% => represents ContainerName.
#        /%syslogtag:F,47:4% => represents ContainerID.
# template(name="Dockerlogfiles" type="string" string="/var/log/company/dockerapps/%syslogtag:F,47:2%/%syslogtag:F,47:3%/docker.log")



# Using old template syntax and Using field values.
#        /%syslogtag:F,47:1% => represents "docker" in $syslogtag . 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:2% => represents ImageNAme. 47 is ASCII decimal value of / character
#        /%syslogtag:F,47:3% => represents ContainerName.
#        /%syslogtag:F,47:4% => represents ContainerID.
# $template Dockerlogfiles, "/var/log/company/dockerapps/%syslogtag:F,47:2%/docker.log"
# $template Dockerlogfiles, "/var/log/company/dockerapps/%syslogtag:F,47:2%/%syslogtag:F,47:3%/docker.log"




# Using latest template method (list type)
#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="1") == will generate ==> imageName
#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2") == will generate ==> containerName
#     regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2") == will generate ==> containerID
#template(name="Dockerlogfiles" type="list") {
#   constant(value="/var/log/company/dockerapps/")
#   property(name="syslogtag" securepath="replace" \
#            regex.expression="docker/\\(.*\\)/\\(.*\\)/\\(.*\\)\\[" regex.submatch="2")\
#   constant(value="/docker.log")
#
#}

if $syslogtag startswith "docker" then {
    # Local logging
    action(name="localFiles" type="omfile" DynaFile="Dockerlogfiles")

    # Remote logging to remote syslog server. 
    action(name="forwardToOtherSyslog" type="omfwd" Target="IP ADDRESS of Target Syslog server" Port="514" Protocol="udp" template="LongTagForwardFormat")
    &~
}

Wednesday, 10 February 2016

Firefox in Docker

Host OS : Fedora 22
Docker Version: 1.7.0-1

steps:
$ mkdir firefox

$ cd firefox; touch Dockerfile

$ ( cat <<-EOF
FROM fedora
MAINTAINER Spare Slant "spareslant@gmail.com"
ENV REFRESHED_AT 2015-11-21
RUN dnf -y install firefox
RUN dnf -y install dejavu-sans-fonts dejavu-serif-fonts
RUN useradd --shell /bin/bash --uid 1000 -m testuser
USER testuser
ENV HOME /home/testuser
CMD ["/usr/bin/firefox", "--no-remote"]
EOF
) > Dockerfile

$ docker build -t="spareslant/firefox:v2"
To run above docker :
docker run -it -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/machine-id:/etc/machine-id -e DISPLAY=$DISPLAY --name firefox spareslant/firefox:v2
A firefox window will pop up. If you close firefox , docker process will also shutdown. To start it again run following: docker start firefox