Provided by: agedu_9723-1build1_amd64 bug

NAME

       agedu - correlate disk usage with last-access times to identify large and disused data

SYNOPSIS

       agedu [ options ] action [action...]

DESCRIPTION

       agedu scans a directory tree and produces reports about how much disk space is used in each directory and
       subdirectory, and also how that usage of disk space corresponds to files with last-access  times  a  long
       time ago.

       In  other  words,  agedu  is  a  tool you might use to help you free up disk space. It lets you see which
       directories are taking up the most space, as du does; but unlike du, it also distinguishes between  large
       collections of data which are still in use and ones which have not been accessed in months or years - for
       instance, large archives downloaded, unpacked, used once, and never cleaned up. Where du helps  you  find
       what's using your disk space, agedu helps you find what's wasting your disk space.

       agedu  has several operating modes. In one mode, it scans your disk and builds an index file containing a
       data structure which allows it to efficiently retrieve any information  it  might  need.  Typically,  you
       would  use it in this mode first, and then run it in one of a number of `query' modes to display a report
       of the disk space usage of a particular directory and its subdirectories. Those reports can  be  produced
       as  plain  text  (much like du) or as HTML. agedu can even run as a miniature web server, presenting each
       directory's HTML report with hyperlinks to let you navigate around the file system to similar reports for
       other directories.

       So  you  would  typically  start  using agedu by telling it to do a scan of a directory tree and build an
       index. This is done with a command such as

       $ agedu -s /home/fred

       which will build a large data file called agedu.dat in your current directory. (If that current directory
       is inside /home/fred, don't worry - agedu is smart enough to discount its own index file.)

       Having  built  the index, you would now query it for reports of disk space usage. If you have a graphical
       web browser, the simplest and nicest way to query the index is by running agedu in web server mode:

       $ agedu -w

       which will print (among other messages) a URL on its standard output along the lines of

       URL: http://127.0.0.1:48638/

       (That URL will always begin with `127.', meaning that it's  in  the  localhost  address  space.  So  only
       processes  running  on  the  same  computer can even try to connect to that web server, and also there is
       access control to prevent other users from seeing it - see below for more detail.)

       Now paste that URL into your web browser, and you will be shown a graphical representation  of  the  disk
       usage  in  /home/fred  and its immediate subdirectories, with varying colours used to show the difference
       between disused and recently-accessed data. Click on any subdirectory to descend into it and see a report
       for  its  subdirectories  in  turn;  click  on  parts of the pathname at the top of any page to return to
       higher-level directories. When you've finished browsing, you can just press Ctrl-D to send an end-of-file
       indication to agedu, and it will shut down.

       After  that,  you  probably want to delete the data file agedu.dat, since it's pretty large. In fact, the
       command agedu -R will do this for you; and you can chain agedu commands on the same command line, so that
       instead of the above you could have done

       $ agedu -s /home/fred -w -R

       for  a single self-contained run of agedu which builds its index, serves web pages from it, and cleans it
       up when finished.

       If you don't have a graphical web browser,  you  can  do  text-based  queries  as  well.  Having  scanned
       /home/fred as above, you might run

       $ agedu -t /home/fred

       which  again  gives  a summary of the disk usage in /home/fred and its immediate subdirectories; but this
       time agedu will print it on standard output, in much the same format as du. If you then want to find  out
       how  much  old data is there, you can add the -a option to show only files last accessed a certain length
       of time ago. For example, to show only files which haven't been looked at in six months or more:

       $ agedu -t /home/fred -a 6m

       That's the essence of what agedu does. It has other modes of operation for more complex  situations,  and
       the  usual array of configurable options. The following sections contain a complete reference for all its
       functionality.

OPERATING MODES

       This section describes the operating modes supported by agedu. Each of these is in the form of a command-
       line  option, sometimes with an argument. Multiple operating-mode options may appear on the command line,
       in which case agedu will perform the specified actions one after another. For instance, as shown  in  the
       previous  section,  you  might  want  to  perform  a disk scan and immediately launch a web server giving
       reports from that scan.

       -s directory or --scan directory
              In this mode, agedu scans the file system starting at the specified  directory,  and  indexes  the
              results of the scan into a large data file which other operating modes can query.

              By  default,  the  scan  is restricted to a single file system (since the expected use of agedu is
              that you would probably use it because a particular disk partition was running low on space).  You
              can  remove that restriction using the --cross-fs option; other configuration options allow you to
              include or exclude files or entire subdirectories from the scan. See the  next  section  for  full
              details of the configurable options.

              The  index  file is created with restrictive permissions, in case the file system you are scanning
              contains confidential information in its structure.

              Index files are dependent on the characteristics of the CPU architecture you created them on.  You
              should not expect to be able to move an index file between different types of computer and have it
              continue to work. If you need to transfer the results of a  disk  scan  to  a  different  kind  of
              computer, see the -D and -L options below.

       -w or --web
              In  this  mode,  agedu expects to find an index file already written. It allocates a network port,
              and starts up a web server on that port which serves reports generated from  the  index  file.  By
              default it invents its own URL and prints it out.

              The web server runs until agedu receives an end-of-file event on its standard input. (The expected
              usage is that you run it from  the  command  line,  immediately  browse  web  pages  until  you're
              satisfied, and then press Ctrl-D.) To disable the EOF behaviour, use the --no-eof option.

              In  case  the  index  file  contains  any confidential information about your file system, the web
              server protects the pages it  serves  from  access  by  other  people.  On  Linux,  this  is  done
              transparently  by  means  of  using  /proc/net/tcp to check the owner of each incoming connection;
              failing that, the web server will require a password to view the reports, and agedu will print the
              password it invented on standard output along with the URL.

              Configurable  options for this mode let you specify your own address and port number to listen on,
              and also specify your own choice of authentication method (including  turning  authentication  off
              completely) and a username and password of your choice.

       -t directory or --text directory
              In  this  mode, agedu generates a textual report on standard output, listing the disk usage in the
              specified directory and all its subdirectories down to a given depth. By default that depth is  1,
              so  that  you  see  a report for directory itself and all of its immediate subdirectories. You can
              configure a different depth (or no depth limit) using -d, described in the next section.

              Used on its own, -t merely lists the total disk usage in  each  subdirectory;  agedu's  additional
              ability to distinguish unused from recently-used data is not activated. To activate it, use the -a
              option to specify a minimum age.

              The directory structure stored in agedu's index file is treated as a set of literal strings.  This
              means  that  you  cannot  refer to directories by synonyms. So if you ran agedu -s ., then all the
              path names you later pass to the -t option must be either  `.'  or  begin  with  `./'.  Similarly,
              symbolic  links  within  the  directory  you  scanned will not be followed; you must refer to each
              directory by its canonical, symlink-free pathname.

       -R or --remove
              In this mode, agedu deletes its index file.  Running  just  agedu  -R  on  its  own  is  therefore
              equivalent  to  typing  rm agedu.dat. However, you can also put -R on the end of a command line to
              indicate that agedu should delete its index file after it finishes performing other operations.

       -D or --dump
              In this mode, agedu reads an existing index file and produces a dump of its contents  on  standard
              output. This dump can later be loaded into a new index file, perhaps on another computer.

       -L or --load
              In  this  mode, agedu expects to read a dump produced by the -D option from its standard input. It
              constructs an index file from that dump, exactly as it would have if it had  read  the  same  data
              from a disk scan in -s mode.

       -S directory or --scan-dump directory
              In  this  mode,  agedu  will scan a directory tree and convert the results straight into a dump on
              standard output, without generating an index file at all. So running agedu -S /path should produce
              equivalent  output to that of agedu -s /path -D, except that the latter will produce an index file
              as a side effect whereas -S will not.

              (The output will not be exactly identical, due to a difference in treatment of  last-access  times
              on  directories.  However,  it  should  be  effectively  equivalent  for  most  purposes.  See the
              documentation of the --dir-atime option in the next section for further detail.)

       -H directory or --html directory
              In this mode, agedu will generate an HTML report of the disk usage in the specified directory  and
              its immediate subdirectories, in the same form that it serves from its web server in -w mode.

              By  default, a single HTML report will be generated and simply written to standard output, with no
              hyperlinks pointing to other similar pages. If you also specify the -d option (see  below),  agedu
              will  instead write out a collection of HTML files with hyperlinks between them, and call the top-
              level file index.html.

       --cgi  In this mode, agedu will run as the bulk of a CGI script which provides the same set of web  pages
              as the built-in web server would. It will read the usual CGI environment variables, and write CGI-
              style data to its standard output.

              The actual CGI program itself should be a tiny wrapper around agedu  which  passes  it  the  --cgi
              option, and also (probably) -f to locate the index file. agedu will do everything else.

              No  access  control  is performed in this mode: restricting access to CGI scripts is assumed to be
              the job of the web server.

       -h or --help
              Causes agedu to print some help text and terminate immediately.

       -V or --version
              Causes agedu to print its version number and terminate immediately.

OPTIONS

       This section describes the various configuration options that affect agedu's operation  in  one  mode  or
       another.

       The following option affects nearly all modes (except -S):

       -f filename or --file filename
              Specifies  the  location  of the index file which agedu creates, reads or removes depending on its
              operating mode. By default, this is  simply  `agedu.dat',  in  whatever  is  the  current  working
              directory when you run agedu.

       The following options affect the disk-scanning modes, -s and -S:

       --cross-fs and --no-cross-fs
              These configure whether or not the disk scan is permitted to cross between different file systems.
              The default is not to: agedu will normally skip over subdirectories  on  which  a  different  file
              system  is  mounted.  This makes it convenient when you want to free up space on a particular file
              system which is running low. However, in  other  circumstances  you  might  wish  to  see  general
              information about the use of space no matter which file system it's on (for instance, if your real
              concern is your backup media running out of space, and if your backups do not treat different file
              systems specially); in that situation, use --cross-fs.

              (Note that this default is the opposite way round from the corresponding option in du.)

       --prune wildcard and --prune-path wildcard
              These  cause particular files or directories to be omitted entirely from the scan. If agedu's scan
              encounters a file or directory whose name matches the wildcard provided to the --prune option,  it
              will not include that file in its index, and also if it's a directory it will skip over it and not
              scan its contents.

              Note that in most Unix shells, wildcards will probably need to be escaped on the command line,  to
              prevent the shell from expanding the wildcard before agedu sees it.

              --prune-path  is  similar  to  --prune,  except  that  the  wildcard is matched against the entire
              pathname instead of just the filename at the end of it. So whereas --prune *a*b*  will  match  any
              file  whose  actual  name contains an a somewhere before a b, --prune-path *a*b* will also match a
              file whose name contains b and which is inside a directory containing an a, or any file  inside  a
              directory of that form, and so on.

       --exclude wildcard and --exclude-path wildcard
              These  cause  particular files or directories to be omitted from the index, but not from the scan.
              If agedu's scan encounters a file or directory whose name matches the  wildcard  provided  to  the
              --exclude  option, it will not include that file in its index - but unlike --prune, if the file in
              question is a directory it will still scan its contents and index them if they are not  ruled  out
              themselves by --exclude options.

              As  above, --exclude-path is similar to --exclude, except that the wildcard is matched against the
              entire pathname.

       --include wildcard and --include-path wildcard
              These cause particular files or directories to be re-included in the index and the scan,  if  they
              had  previously  been  ruled  out by one of the above exclude or prune options. You can interleave
              include, exclude and prune options as you wish on the command line, and if more than one  of  them
              applies to a file then the last one takes priority.

              For example, if you wanted to see only the disk space taken up by MP3 files, you might run

              $ agedu -s . --exclude '*' --include '*.mp3'

              which will cause everything to be omitted from the scan, but then the MP3 files to be put back in.
              If you then wanted only a subset of those MP3s, you could then  exclude  some  of  them  again  by
              adding,  say, `--exclude-path './queen/*'' (or, more efficiently, `--prune ./queen') on the end of
              that command.

              As with the previous two options, --include-path is similar to --include except that the  wildcard
              is matched against the entire pathname.

       --progress, --no-progress and --tty-progress
              When  agedu is scanning a directory tree, it will typically print a one-line progress report every
              second showing where it has reached in the scan, so you can have some idea of how much  longer  it
              will take. (Of course, it can't predict exactly how long it will take, since it doesn't know which
              of the directories it hasn't scanned yet will turn out to be huge.)

              By default, those progress reports are displayed  on  agedu's  standard  error  channel,  if  that
              channel  points  to a terminal device. If you need to manually enable or disable them, you can use
              the above three options to do so: --progress unconditionally enables the progress  reports,  --no-
              progress  unconditionally disables them, and --tty-progress reverts to the default behaviour which
              is conditional on standard error being a terminal.

       --dir-atime and --no-dir-atime
              In normal operation, agedu ignores the atimes (last access times) on the directories it scans:  it
              only pays attention to the atimes of the files inside those directories. This is because directory
              atimes tend to be reset by a lot of system administrative tasks, such as cron jobs which scan  the
              file system for one reason or another - or even other invocations of agedu itself, though it tries
              to avoid modifying any atimes if possible. So the literal atimes on directories are typically  not
              representative of how long ago the data in question was last accessed with real intent to use that
              data in particular.

              Instead, agedu makes up a fake atime for every directory it scans, which is equal  to  the  newest
              atime of any file in or below that directory (or the directory's last modification time, whichever
              is newest). This is based on the  assumption  that  all  important  accesses  to  directories  are
              actually accesses to the files inside those directories, so that when any file is accessed all the
              directories on the path leading to it should be considered to have been accessed as well.

              In unusual cases it is possible that a directory itself  might  embody  important  data  which  is
              accessed  by  reading the directory. In that situation, agedu's atime-faking policy will misreport
              the directory as disused. In the unlikely event that such directories form a significant  part  of
              your disk space usage, you might want to turn off the faking. The --dir-atime option does this: it
              causes the disk scan to read the original atimes of the directories it scans.

              The faking of atimes on directories also requires a processing pass over the index file after  the
              main  disk  scan is complete. --dir-atime also turns this pass off. Hence, this option affects the
              -L option as well as -s and -S.

              (The previous section mentioned that there might be subtle differences between the output of agedu
              -s /path -D and agedu -S /path. This is why. Doing a scan with -s and then dumping it with -D will
              dump the fully faked atimes on the directories, whereas doing a scan-to-dump  with  -S  will  dump
              only  partially  faked  atimes - specifically, each directory's last modification time - since the
              subsequent processing pass will not have had a chance to take place. However,  loading  either  of
              the  resulting  dump  files  with -L will perform the atime-faking processing pass, leading to the
              same data in the index file in each case. In normal usage it should be safe to ignore all of  this
              complexity.)

       --mtime
              This  option  causes  agedu  to  index files by their last modification time instead of their last
              access time. You might want to use this if your last access times were completely useless for some
              reason: for example, if you had recently searched every file on your system, the system would have
              lost all the information about what files you hadn't recently accessed  before  then.  Using  this
              option is liable to be less effective at finding genuinely wasted space than the normal mode (that
              is, it will be more likely to flag things as disused when they're  not,  so  you  will  have  more
              candidates  to go through by hand looking for data you don't need), but may be better than nothing
              if your last-access times are unhelpful.

              Another use for this mode might be to find recently created large data.  If  your  disk  has  been
              gradually filling up for years, the default mode of agedu will let you find unused data to delete;
              but if you know your disk had plenty of space recently and now it's suddenly full, and you suspect
              that  some  rogue program has left a large core dump or output file, then agedu --mtime might be a
              convenient way to locate the culprit.

       The following option affects all the modes that generate reports: the web server mode -w, the stand-alone
       HTML generation mode -H and the text report mode -t.

       --files
              This option causes agedu's reports to list the individual files in each directory, instead of just
              giving a combined report for everything that's not in a subdirectory.

       The following option affects the text report mode -t.

       -a age or --age age
              This option tells agedu to report only files of at least the specified age. An age is specified as
              a number, followed by one of `y' (years), `m' (months), `w' (weeks) or `d' (days). (This syntax is
              also used by the -r option.) For example, -a 6m will produce a text  report  which  includes  only
              files at least six months old.

       The following options affect the stand-alone HTML generation mode -H and the text report mode -t.

       -d depth or --depth depth
              This  option  controls  the  maximum  depth to which agedu recurses when generating a text or HTML
              report.

              In text mode, the default is 1, meaning that the report will include the directory  given  on  the
              command  line and all of its immediate subdirectories. A depth of two includes another level below
              that, and so on; a depth of zero means only the directory on the command line.

              In HTML mode, specifying this option switches agedu from writing out a single HTML file to writing
              out  multiple files which link to each other. A depth of 1 means agedu will write out an HTML file
              for the given directory and also one for each of its immediate subdirectories.

              If you want agedu to recurse as deeply as possible, give the special word `max' as an argument  to
              -d.

       -o filename or --output filename
              This  option  is  used to specify an output file for agedu to write its report to. In text mode or
              single-file HTML mode, the argument is treated as the name of a file. In multiple-file HTML  mode,
              the  argument  is treated as the name of a directory: the directory will be created if it does not
              already exist, and the output HTML files will be created inside it.

       The following options affect the web server mode  -w,  and  in  some  cases  also  the  stand-alone  HTML
       generation mode -H:

       -r age range or --age-range age range
              The  HTML  reports produced by agedu use a range of colours to indicate how long ago data was last
              accessed, running from red (representing  the  most  disused  data)  to  green  (representing  the
              newest).  By  default, the lengths of time represented by the two ends of that spectrum are chosen
              by examining the data file to see what range of ages appears in it. However, you might want to set
              your own limits, and you can do this using -r.

              The  argument  to  -r consists of a single age, or two ages separated by a minus sign. An age is a
              number, followed by one of `y' (years), `m' (months), `w' (weeks) or `d' (days). (This  syntax  is
              also  used  by  the -a option.) The first age in the range represents the oldest data, and will be
              coloured red in the HTML; the second age represents the newest, coloured green. If the second  age
              is  not  specified, it will default to zero (so that green means data which has been accessed just
              now).

              For example, -r 2y will mark data in red if it has been unused for two years or more, and green if
              it has been accessed just now. -r 2y-3m will similarly mark data red if it has been unused for two
              years or more, but will mark it green if it has been accessed three months ago or later.

       --address addr[:port]
              Specifies the network address and port number on which agedu should listen when  running  its  web
              server. If you want agedu to listen for connections coming in from any source, specify the address
              as the special value ANY. If the port number is omitted, an arbitrary unused port will  be  chosen
              for you and displayed.

              If  you  specify  this  option,  agedu  will  not  print its URL on standard output (since you are
              expected to know what address you told it to listen to).

       --auth auth-type
              Specifies how agedu should control access to the web pages it serves. The options are as follows:

              magic  This option only works on Linux, and only when the incoming connection  is  from  the  same
                     machine  that agedu is running on. On Linux, the special file /proc/net/tcp contains a list
                     of network connections currently known to the operating system kernel, including which user
                     id  created  them.  So  agedu will look up each incoming connection in that file, and allow
                     access if it comes from the same user id under which agedu itself is running. Therefore, in
                     agedu's  normal web server mode, you can safely run it on a multi-user machine and no other
                     user will be able to read data out of your index file.

              basic  In this mode, agedu will use HTTP Basic authentication: the user will  have  to  provide  a
                     username  and  password  via  their  browser.  agedu  will  normally make up a username and
                     password for the purpose, but you can specify your own; see below.

              none   In this mode, the web server is unauthenticated: anyone connecting to it has full access to
                     the  reports generated by agedu. Do not do this unless there is nothing confidential at all
                     in your index file, or unless you are certain that nobody but you can run processes on your
                     computer.

              default
                     This  is  the default mode if you do not specify one of the above. In this mode, agedu will
                     attempt to use Linux  magic  authentication,  but  if  it  detects  at  startup  time  that
                     /proc/net/tcp  is  absent  or  non-functional  then  it  will fall back to using HTTP Basic
                     authentication and invent a user name and password.

       --auth-file filename or --auth-fd fd
              When agedu is using HTTP Basic authentication, these options allow you to specify  your  own  user
              name  and password. If you specify --auth-file, these will be read from the specified file; if you
              specify --auth-fd they will instead be read from a given file descriptor  which  you  should  have
              arranged  to  pass  to  agedu.  In  either  case, the authentication details should consist of the
              username, followed by a colon, followed by the password, followed immediately by end of  file  (no
              trailing newline, or else it will be considered part of the password).

       --title title
              Specify  the string that appears at the start of the <title> section of the output HTML pages. The
              default is `agedu'. This title is followed by a colon and then the path you're viewing within  the
              index  file.  You  might  use  this option if you were serving agedu reports for several different
              servers and wanted to make it clearer which one a user was looking at.

       --no-eof
              Stop agedu in web server mode from looking for end-of-file on standard input and treating it as  a
              signal to terminate.

LIMITATIONS

       The data file is pretty large. The core of agedu is the tree-based data structure it uses in its index in
       order to efficiently perform the queries it needs; this data structure requires O(N log N) storage.  This
       is  larger  than  you  might expect; a scan of my own home directory, containing half a million files and
       directories and about 20Gb of data, produced an index file over 60Mb in size. Furthermore, since the data
       file must be memory-mapped during most processing, it can never grow larger than available address space,
       so a really big filesystem may need to be indexed on a 64-bit computer.  (This  is  one  reason  for  the
       existence of the -D and -L options: you can do the scanning on the machine with access to the filesystem,
       and the indexing on a machine big enough to handle it.)

       The data structure also does not usefully permit access control within the data  file,  so  it  would  be
       difficult  -  even  given  the willingness to do additional coding - to run a system-wide agedu scan on a
       cron job and serve the right subset of reports to each user.

       In certain circumstances, agedu can report false positives (reporting files as disused which are in  fact
       in use) as well as the more benign false negatives (reporting files as in use which are not). This arises
       when a file is, semantically speaking, `read' without actually  being  physically  read.  Typically  this
       occurs  when  a  program checks whether the file's mtime has changed and only bothers re-reading it if it
       has; programs which do this include rsync(1) and make(1). Such programs will fail to update the atime  of
       unmodified  files  despite depending on their continued existence; a directory full of such files will be
       reported as disused by agedu even in situations where deleting them will cause trouble.

       Finally, of course, agedu's normal usage mode depends critically on the OS  providing  last-access  times
       which  are  at  least approximately right. So a file system mounted with Linux's `noatime' option, or the
       equivalent on any other OS, will not give useful results! (However, the Linux  mount  option  `relatime',
       which  distributions  now  tend  to  use  by  default, should be fine for all but specialist purposes: it
       reduces the accuracy of last-access times so that they might be wrong by up to 24 hours,  but  if  you're
       looking for files that have been unused for months or years, that's not a problem.)

LICENCE

       agedu  is  free software, distributed under the MIT licence. Type agedu --licence to see the full licence
       text.