Wednesday, 2 December 2015

Find repeated Group pattern match. Case study of Ruby, Python and Go

Find repeated Group pattern match. Case study of Ruby, Python and Go

Sometimes I have to find the a group of patterns that are occurring in a log file. Normal grep -f <filename> does that job but it does not print the delimiter that can distinguish between repeated found patterns.

I needed something that can tell me that it has found a few patterns from a group in a sequence and last pattern has been found. Draw a delimiter here and start search for group again in remaining file.

I first wrote it in Ruby. I had huge files to parse. Ruby script was talking quite long to finish. As a case study I decided to write same utility using Python and Go.

Needless to say , Go was much faster.

Following are the codes.

Following are the sample data files.

pattern file :
hotmail
yahoo
google
gmail
Subject file:
This is hotmail test
But this is going to be a yahoo
Now is google
Another one is gmail
Junk lines
more junk lines
MORRRRRRRRRR
This is hotmail test
But this is going to be a yahoo
Another one is gmail
Now is google
Another one is gmail
But this is going to be a yahoo
This is hotmail test
Junk lines
more junk lines
MORRRRRRRRRR
This is hotmail test
But this is going to be a yahoo
Now is google
Another one is gmail
Junk lines
more junk lines
MORRRRRRRRRR
Following is the output:
$ go run atest.go pattern subjectfile
This is hotmail test
But this is going to be a yahoo
Now is google
Another one is gmail
===========
This is hotmail test
But this is going to be a yahoo
Another one is gmail
===========
Now is google
Another one is gmail
===========
But this is going to be a yahoo
This is hotmail test
This is hotmail test
But this is going to be a yahoo
Now is google
Another one is gmail
===========