Quick bar-chart of disk usage

1 minute read

Today I was in search of a command that I had used a long time ago, but ran into a much more interesting one instead.  At the time, I must have been needing to discover what files were the largest disk hogs and if there was a long tail (i.e. how many of the 3.7M files in this directory--not my fault, by the way--were inconsequential).  That brings us to this wonderful "one-line" command:

find /dir/ -name "*.xml" -exec du -s {} ; | perl -ni -e 'if (/^(d+)s+(.*)/) { $h{$2} = $1; if ($max < $1) { $max = $1; } if (length($2) > $maxfname) { $maxfname = length($2); } } END { map { $barlen = ($h{$_} / $max) * 50; $bar = "*" x $barlen; printf ("%" . $maxfname . "s" . "(%5d): %s", $_, $h{$_}, $bar); print "n"; } sort { $h{$b} <=> $h{$a} } keys %h }' 2> /dev/null > report.txt

What that specifically does is to find every XML file in the dir directory, use the linux du command to get the file's size.  That list of filenames and sizes is passed to a hacky perl script that pulls out the size, creates a horizontal histogram bar based on the max size (limit 50 *s wide), sort and return the list from max to min.  Lastly, that's saved to report.txt.

That's quite a quick and dirty trick, but produces a nice command-line output like this:

/dir/w6bz9whg.xml(36560): **************************************************
/dir/w6km312r.xml(31772): *******************************************
/dir/w68d03gz.xml(27728): *************************************
/dir/w6vt5fhv.xml(27076): *************************************
/dir/w6m07v80.xml(17420): ***********************
/dir/w68m0zj8.xml(15276): ********************
/dir/w6mq7qpz.xml(15052): ********************
/dir/w6vq30tq.xml(13808): ******************
/dir/w6tb51hr.xml(13160): *****************
...