Ova

How to find huge files in Unix?

Published in Unix File Management 6 mins read

To find huge files in Unix, you can leverage several powerful command-line utilities, most notably find, du, and ls, to pinpoint large files and directories that might be consuming excessive disk space.

How to Find Huge Files in Unix?

Identifying large files in Unix-like systems is crucial for managing disk space, troubleshooting performance issues, and maintaining system health. Here's a breakdown of the most effective methods.

1. Using find to Locate Large Files System-Wide

The find command is the most versatile tool for searching for files based on various criteria, including size, across your file system.

Key find Options for Size:

  • -size N[cwbkMG]: Specifies the file size.
    • c: bytes
    • w: two-byte words
    • b: 512-byte blocks (default)
    • k: Kilobytes (1024 bytes)
    • M: Megabytes (1024 * 1024 bytes)
    • G: Gigabytes (1024 1024 1024 bytes)
    • Use +N for files larger than N, and -N for files smaller than N.

Examples:

  • Find all files larger than 100MB in the current directory and its subdirectories:

    find . -type f -size +100M -print0 | xargs -0 du -h | sort -rh
    • find . -type f -size +100M: Searches for regular files (-type f) larger than 100 Megabytes (+100M) starting from the current directory (.).
    • -print0: Prints the file names separated by a null character, which is safer for file names with spaces or special characters.
    • xargs -0 du -h: Reads the null-separated file names and passes them to du -h (disk usage in human-readable format).
    • sort -rh: Sorts the output in reverse (-r) human-readable (-h) order, placing the largest files at the top.
  • Find files larger than 500MB under the /var directory and display their details:

    sudo find /var -type f -size +500M -exec ls -lh {} \;
    • sudo find /var: Searches from the /var directory, using sudo to ensure permissions for system directories.
    • -exec ls -lh {} \;: Executes ls -lh for each found file. {} is a placeholder for the filename, and \; terminates the exec command.
  • Find files larger than 1GB, limiting the search depth to 2 levels:

    find /home/user -maxdepth 2 -type f -size +1G -print0 | xargs -0 du -h
    • -maxdepth 2: Limits the search to the specified directory and its immediate subdirectories.

For more details on find command, you can refer to resources like Linux find command examples.

2. Analyzing Disk Usage with du

The du (disk usage) command is excellent for summarizing file space usage, especially for directories. It can help you identify which directories are the biggest culprits.

Key du Options:

  • -h: Human-readable format (e.g., 1K, 234M, 2G).
  • -s: Summarize total for each argument.
  • -a: Display counts for all files, not just directories.
  • --max-depth=N: Print the total for a directory (or file, with -a) only if it is N levels deep or less.

Examples:

  • Show the total size of the current directory:

    du -sh .
  • List the sizes of immediate subdirectories and files in human-readable format, sorted by size:

    du -ah --max-depth=1 | sort -rh
    • du -ah --max-depth=1: Shows disk usage for all files and directories (-a) up to one level deep (--max-depth=1) in human-readable format.
    • sort -rh: Sorts the output in reverse human-readable order.
  • Find the top 10 largest directories in a specific path:

    du -h /var/log | sort -rh | head -n 10
    • head -n 10: Displays only the first 10 lines of the sorted output.

You can learn more about du command from resources like TutorialsPoint du command.

3. Listing Files by Size in a Directory with ls

While find is for system-wide searches and du for directory summaries, ls is useful for quickly inspecting the contents of a specific directory and sorting them by size.

Key ls Options:

  • -l: Long listing format (shows permissions, owner, size, date, etc.).
  • -S: Sorts by file size, largest first.
  • -h: Human-readable sizes with -l.
  • -I PATTERN: Do not list entries matching PATTERN. (As mentioned in the reference, -IS sorts by size and ignores patterns. When combined, -IS sorts by size, and -I helps to filter out files you might want to exclude from the listing.)

Examples:

  • To list the directory contents in descending file size order, use the ls command along with the -IS argument. You will see the larger files at the top of the list descending to the smallest files at the bottom.

    ls -IS

    This will list files in the current directory, sorted by size, in descending order.

  • *List files in the current directory with human-readable sizes, sorted by size (largest first), and ignore `.log` files:**

    ls -lhIS --ignore='*.log'
    • -l: Long format.
    • -h: Human-readable sizes.
    • -IS: Sorts by size (largest first) and applies pattern ignoring.
    • --ignore='*.log': Excludes files ending with .log.
  • List all files in /tmp by size:

    ls -lSh /tmp
    • This is a common and practical way to list files in long format with human-readable sizes, sorted by size.

For more detailed usage of ls, check out Linuxize ls command.

4. Interactive Disk Usage with ncdu

For a more interactive and visual approach, ncdu (NCurses Disk Usage) is a powerful utility that provides a curses-based interface to show disk usage. It's often not installed by default but is available in most package repositories.

Example:

ncdu /

This command will scan your root file system and present an interactive, sortable list of directories and files by size, allowing you to easily navigate and identify large consumers of space.

You can find more information about ncdu at its official project page.

Summary of Commands

Command Primary Use Case Key Options for Size
find System-wide search for files based on criteria. -size, -type f, -maxdepth, -exec, -print0
du Summarize disk usage for directories and files. -h, -s, -a, --max-depth
ls List directory contents, can sort by size. -l, -h, -S, -IS
ncdu Interactive, visual disk usage analyzer. (No direct size options; interactive)

Best Practices and Tips

  • Start with broader searches: Begin with high-level directories (e.g., /, /home, /var) using du or find to pinpoint large areas, then drill down.
  • Combine commands: Piping commands together (e.g., find ... | xargs du -h | sort -rh) is highly effective for detailed analysis.
  • Be cautious: Before deleting any files, especially in system directories, always confirm their purpose and impact. Deleting critical system files can render your system unusable.
  • Regular expression matching: find can also use -name or -regex to match files by name patterns in conjunction with size.

By mastering these Unix commands, you can efficiently locate and manage huge files on your system, ensuring optimal disk utilization and system performance.