2011-12-01

Get the lines unique on the first field(s)

uniq is a great tool, since it returns the unique (adjacent) lines of the given input. But it has a limitation: it can't check for uniqueness only the first N fields (while it allows to ignore them, weird).

So, what to do if you have a long file, and lines with several fields, but you're only interested in getting the line with the different first 2 field (but all the rest of the line content? awk to the rescue!

$ awk '!x[$1]++' file

will print the (complete) lines of file that has the first field unique. You can set $1$2 to have lines unique on the first 2 fields, and so on. Thanks to this forum post, but there's some other interesting articles.

No comments: