Determine disk capacity and which files are consuming the most disk space

Author: KevinDKinsey

Reviewer: name contact BSD flavour

Reviewer: name contact BSD flavour


Concept

  • Be able to combine common Unix command line utilities to quickly determine which files are consuming the most disk space.

Introduction

As disk sizes have increased over the years, so have the amount of data that we seem to want to keep on them. At one time or another, you may be faced with the "too much data/not enough space" problem. How can you quickly find the "disk hogs"?

Use the tools!

The BSD systems are full of tools that can assist with this problem, including:

  • df(1) - "disk free"
  • du(1) - "disk usage"
  • find(1) - "walk a file hierarchy"

If you're using NetBSD, you can also get a df type reading from systat(1). And with any BSD variant, using common "Unix-fu" (in particular, find and shell pipes), these commands can quickly produce useful information about disk usage.

df and du

For a quick summary of disk space, simply call df. Using "-c" with df provides an "overall total"; using "-h" with either df or du produces "human readable" output: that is, calculated into K, M, G (kilobytes, megabytes, gigabyes), etc., instead of "blocks" as indicated by the environment variable $BLOCKSIZE.

Unlike df, you probably don't want to simply call du. Without arguments, du lists the size of every file and subdirectory (and its files and subfiles, ad infinitum) of your CWD, roughly in the order of inodes --- if you happen to be in "/", you'd be a long time reading the output of du. Usually it's better to use du with "-s", possibly even with a specific file or file "glob" argument, or with "-h" and maybe "-c", and pipe the output through sort(1); look for a rather convoluted (yet effective) example below.

du can also read the sizes of files listed to its standard input, which makes find a fairly useful "frontend" to du on occasion (but see the section on find below before you scratch your head too hard on this).

Note: under certain conditions, df and du may disagree somewhat about the amount of free space on a filesystem. Generally, this occurs when a program is holding an open file descriptor to a file that has been unlinked; in such a case, du wouldn't count the file's size, but the blocks are still unavailable as "free blocks" (df="disk free", remember?) In such cases, you can use fstat(1) to see currently open files.

find and the "size" primary

The complete use of find is beyond the scope of this section; please see Find a file with a given set of attributes for complete information. However, using the "size" primary and an expression representing a given filesize, you can quickly produce a list of "disk hogs". See the Examples below.

Examples

Are any partitions nearing "full"?

""$ df /dev/ad0s1a 1978 977 842 54% / /dev/ad0s1e 67765 49502 12841 79% /usr /dev/ad0s1d 3962 2182 1463 60% /var

Display all the *.mp3 files in my homedir, and their sizes with a total:

""$ du -sc *mp3 $HOME

List all files in the current directory, in order of size (almost):

""$ du -h | sort -n | more

Here's a pretty wild set of pipes for "du", showing the largest disk hogs (unless files are >999MB - if so change "M" to "G" in the regular expression); to see the smallest files, use "head" rather than "tail", or for a complete listing pipe it to $PAGER instead of either. The "-n" option to sort(1) ensures that the filesizes are in numeric rather than alphabetical order:

""[root@server][/usr/src] ""# du -hc * | sort -n | grep "[0-9]M" | tail 26M crypto 27M contrib/binutils 28M release 40M sys/dev 47M contrib/gcc 105M sys 204M contrib 458M total

But this brings us to the relative power of find(1). A similar report could be produced like this ("find all files in the cwd greater than approximately 900MB in size"):

"" # find . -size +940000000c

The main difference between this statement's output and that of the "piped arrangement" above is that find doesn't report the actual sizes and the list isn't "sorted". Note that if you're using FreeBSD, you can use "[KMGTP]" with the size designation, thus: "find . -size +900M".

Practice Exercises

  1. Use df to see if your hard drives are nearing "full".
  2. Use find to find out whom in /home/ is the biggest "disk hog". (Optional: Use grep to see if any of these files are "mp3"s).
  3. Use du along with sort(1) and grep(1) to produce lists of files by size.

More information

du(1), df(1), find(1), sort(1), and, for NetBSD systat(1)