[olug] sort -u vs uniq

Matthew G. Marsh olug4mgm at paktronix.com
Mon Mar 13 10:50:28 CDT 2017

Hmmm - 2 replies in one year.

Also take a look at sort -k as this uses "keys" that are defined by 
fields. And in sort you can set fields using defined delimiters and then 
use keys to sort based on the fields.

Thus if you define the delimiter as "." (the dot in IPv4) then you have 
four fields on which to sort AND you can sort on subsets of those fields 
(very handy for MAC addresses and IPv6 sorting).

BTW FWIW - most *nix man pages are very lame regarding these features as 
they want you to use *bleep* info files. I long ago converted info to man 
and then added in the POSIX spec man pages (.p custom extensions) in case 
I needed to know the full story.



On Mon, 13 Mar 2017, Lou Duchez wrote:

> This page might have some useful information:
> http://unix.stackexchange.com/questions/75341/specify-the-sort-order-with-lc-collate-so-lowercase-is-before-uppercase
> As to what you experienced, I know I was once surprised to see a PostgreSQL 
> statement sort data differently between a Windows server and a Linux server 
> -- it's a vague memory, but I think Linux was evaluating sort order by 
> looking for a numeric component that precedes the rest of the string (so 
> "2beornottobe" was sorting before "1234imdeclaringathumbwar" because "2" is 
> less than "1234").  Is that what Linux is doing for you?  With a string like 
> "", maybe Linux sees that as "108.78" followed by ".42.145".
> Can you foil this nefarious behavior by sorting by a non-numeric character 
> prefixed to the IP addresses, somehow?  I bet not even "C" can mis-sort 
> "A108.78.42.145" and "A69.38.74.12".
>> I'm trying to get a list of uniq IP addresses from a log file. I have a 
>> list of ALL IP addresses. Using sort -nu and sort -n | uniq give me 2 
>> different lists.
>> A stare and compare make me think that sort -nu  only considers the first 2 
>> octets as significant. RTFM of the sort man page indicates sort honors 
>> <appear uninformed>
>> LC_COLLATE isn't in env, so I'm assuming it's set at build/compile time 
>> when building sort or in the c libraries someplace?
>> </appear uninformed -- hardly, stupid probably better tag... and not 
>> closed.>
>> Could this be why the sort -u and uniq return differing output? I don't see 
>> anyplace to specify "how much" to consider significant when running sort. 
>> Anyone care to offer thoughts?
>> Thanks.
>> Noel
>> _______________________________________________
>> OLUG mailing list
>> OLUG at olug.org
>> https://lists.olug.org/mailman/listinfo/olug
> _______________________________________________
> OLUG mailing list
> OLUG at olug.org
> https://lists.olug.org/mailman/listinfo/olug

Matthew G. Marsh
Special Email Addr for OLUG ;-}
Phone: (402) 932-7250
Email: olug4mgm at paktronix.com
WWW:  http://www.paksecured.org

More information about the OLUG mailing list