Monday, 12 July 2021

Using capture groups in grep in Linux

Introduction

Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with grep is something similar to capture groups in other programming languages. We'll be printing only the text that is matched with grep and not the entire line containing the matched text.
I go into this because I was trying to accomplish a task using a Perl one-liner but when I couldn't get that to work I thought of turning back to grep and I was pleasantly surprised to achieve that desired result with relative ease.

The scenario

I needed to filter the output of the df command and then print just the matched text. Here's a quick grep of the matched text.

df -hTP | grep 'db'

/dev/mapper/ethoss0db_vg-lvol1                   ext4   126G   85G   35G  71% /ethoss0db/db
/dev/mapper/yuris0db_vg-lvol1                   ext4    32G   22G  8.3G  73% /yuris0db/db

The piece of text I need is ethoss0 and yuris0 which I intended to use further down the line in another script.

GREP to the rescue

To use some enhanced regular expression magic with grep we had to awaken the Perl within it i.e. use the -oP flags with the grep command.

-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

 -P, --perl-regexp
Interpret PATTERN as a Perl regular expression. 

Here is the final grep command I used to get the desired output.

df -hTP | grep -oP '% /\K(.*)(?=db/db$)'

ethoss0
yuris0

So little typing! Elegant isn' it?

Now let's break it down

  • That’s the magic of \K which excludes the token immediately prior to it from the matched text (without rewinding the stream and attempting a different match for that text). \K tells the engine to pretend that the match attempt started at this position.
  • In the expression (?=db/db$), the (?=<expression>), matches the characters immediately before the expression and does not include the expression itself.
  • Using the combination of \K also known as zero-width lookbehind and the  zero-width lookahead  i.e. (?=<expr>), we are able to print only the matched text from a line.


Conclusion

I hope you found this article to be useful and I hope you would consider using all your grep options before resorting to something else while working with regular expressions for your future scripts.

No comments:

Post a Comment

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...