Writing an effective GROK pattern

3 min readMay 5, 2021

Grok is one of the popular Logstash filters which is used to parse the unstructured log data to a meaningful format.

Logstash ships with 120 default built-in patterns. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

Also, some of the patterns can be referred from https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns
I personally prefer the above link for constructing grok pattern.

Now, there may be cases when these grok patterns won’t fit. So, we have a regular expression library, which can be combined with grok to create powerful patterns.

Grok Syntax

%{SYNTAX:SEMANTIC}

SYNTAX is the default grok patterns
SEMANTIC is the key

Oniguruma Syntax

(?<field_name>regex pattern)

field_name is the key
regex pattern is the placeholder to add your regex

How to use?

Let’s try to create a pattern to parse unstructured log data.

Sample Log Data

09:33:45,416 (metrics-logger-reporter-1-thread-1) type=GAUGE, name=notifications.received, value=2

Required fields from log data

Grok Pattern

We will use Grok Debugger to test our pattern to match the log data.

Let’s disintegrate the log data to create a pattern that matches a particular field:

The field thread, can be a combination of the alphanumeric characters.

So, we need to use oniguruma to match the field logthread. Considering the syntax of oniguruma, we need to create a regex pattern that will match the value of the field logthread

Constructing Regex Pattern

We now use Regex Checker that will help us to construct and test the regex pattern for the value of field logthread

The (?:[()a-zA-Z\d-]+) non-capturing group matches single character present in the list below:

+ greedy match i.e. matches the previous token between one and unlimited times, as many times as possible
() matches a single character in the list ()
a-z matches a single character in the range between a and z
A-Z matches a single character in the range between A and Z
\d matches a digit
- matches the character -

Oniguruma

The final Oniguruma pattern for the field logthread:

(?<logthread>(?:[()a-zA-Z\d-]+))

Grok Pattern + Oniguruma (Final Pattern)

The final pattern that will match the log data:

%{TIME:timestamp} \((?<logthread>(?:[()a-zA-Z\d-]+))\) type=%{DATA:type}, name=%{DATA:name}, value=%{POSINT:value}

Output of the pattern

{
  "timestamp": [
    [
      "09:33:45,416"
    ]
  ],
  "HOUR": [
    [
      "09"
    ]
  ],
  "MINUTE": [
    [
      "33"
    ]
  ],
  "SECOND": [
    [
      "45,416"
    ]
  ],
  "logthread": [
    [
      "metrics-logger-reporter-1-thread-1"
    ]
  ],
  "type": [
    [
      "GAUGE"
    ]
  ],
  "name": [
    [
      "notifications.received"
    ]
  ],
  "value": [
    [
      "2"
    ]
  ]
}

Conclusion

The combination of Grok Pattern and Oniguruma is a perfect pair. The pairing can help to transform any complex logs into structured data. Give it a try using Grok Pattern + Oniguruma in Logstash !!

Let me know in the comments if you have any better way of doing or facing any problem with the above example.

Originally published at https://dev.to on May 5, 2021.