Some thoughts about manpages in Debian

 2005 may 15

 by Siward de Groot


 Debian, like other OS-s, uses man system to manage and display system's manual pages.
 Nearly every package comes with a manpage,
   that it installs on system when package itself is installed.

 I read a lot of manpages, and often see ways in which they could be improved ;
   for individual manpages i can ofcourse file a wishlist bug,
   but there are some more general things as well,
   which i talk about in section 'Manpage format'.

 On a larger scale, manpage system itself could be improved.
 'Man (7) man' even says that it is due for an overhaul
   (and it has said that for a long time now).
 I talk about this in section 'Man system'.

 I do not claim to have a complete overview of what should be changed.
 If you have suggestions, they are welcome.
 My email address is

 In this document i use -s to indicate plural of nouns that don't have a standard plural,
   eg: 486-s means '80486 processors'.

Significance of man system to Debian

 Software needs documentation.
 Documentation needs to be findable.
 Man does both.

 There is documentation in other formats :
   GNU info (i wonder why it isn't called ginfo, or gnufo),

 Compared to these, manpages are crude and look ancient.
 But man system has one thing they can't provide :
   it is standard place to look for documentation.

 Fact that it is home base for documentation is already well established,
   there is a -doc policy that says every program must have a manpage
     (and i only saw one manpage that contained nothing but 'i need to be written'),
   and although i would like manpages for librarys too,
     i don't think that is realizable in foreseeable future.
     (but then again, how much of future is foreseeable ? :-)

 In rest of this document, i ramble about how man system could be improved.

The man system

 The man system keeps manual pages
   in /usr/share/man ,
   in a compressed format, to save disk space ;
 They are compressed with gzip,
   which is same format manpages are in when distributed in Debian packages.
 This is nice.

 When uncompressed, these files are roff sourcefiles
   (roff is a text-formatting format understood by programs nroff and troff).
 These can be converted to readable text by program groff (which uses troff).
 Gunzip and groff take time to convert a manpage to readble text,
   so system has ability to cache (temporarily store) already converted manpages ;
   these can be displayed much quicker than when they first need to be converted.
     On my 200 MHz Pentium :
     $ time man man > /tmp/dump
     real 0m0.470s
     user 0m0.292s
     sys 0m0.147s
 Debian Sarge requires at least a 80486, which can be 3 times as slow,
   so for low end systems it would be nice if this could be improved upon.
 Most of time is used by groff :
     # time gunzip /usr/share/man/man1/man.1.gz > /tmp/dump
     real 0m0.040s
     user 0m0.010s
     sys 0m0.014s

 Groff is a capable text formatter, but it's capabilities are not used.
 All Debian's manpages are limited to using a small set of formatting elements :
   * Header (specify text to print at top and bottom of pages)
   * Section (specify name of a section, which is printed in bold uppercase)
   * Indent (always indents by multiples of 7 spaces,
       text is indented 7 spaces by default, but extra indents can be specified anywhere ;
       text that does not fit on a displayline is wrapped around to current indent)
   * Bold
   * Italic or underline (which one is used depends on capabilities of terminal,
       in an xterm, underline is used, which is most readable for small characters)
 These elements are described in 'man 7 man', which advices against using any others.

 Groff could be replaced by a much faster and simpler program, if
   - roff sourcefiles don't use any other elements than these (current manpages don't)
   - these format specifiers are in a standard format (currently many manpages use roff macros)
 Roff sourcefiles are reasonably human-readable,
   and for manpages that use macros it usually suffices to
   remove the macros (they are usually at start of file)
   and make some small changes to rest of file.
 There are a few manpages that are not as easy to convert to a simple format,
   for example, roff supports displaying tables, and iirc some manpages use it
   (if you know more cases in which nroff capabilities are advantageous, please let me know).
 It is always possible to convert tables to plain text, because manpage contents is constant,
   although it may mean replacing boxes around elements by ascii-art,
   but it is not trivial to write a script that does that,
   and it may even be useless to try to write one,
   because such a script would need to understand roff,
   and it might use same amount of time groff does.

 For manpages that don't use macros, a fast conversion program could fairly easily be written ;
   conversion itself would be trivial,
   but i don't know how easy it is to
     get terminal capabilities and
     output correct control sequences to produce bold and underline.
 This would save time on low-end systems ;
   i think on anything slower than my machine it is a nuisance right now,
   but as i haven't heard any complaints about it,
   i imagine Debian has few users with 486-s.
 If Debian moves on to use a more modern manpage format,
   time spent on writing such a script would be wasted.

 Debian's manpage system has possibility to
   convert a set of manpages and store them in text format (in /var/catman).
 This is usefull on large multi-user systems,
   because it often-used manpages need be converted only once,
   though it costs some diskspace.
 On my (single user) system, /usr/share/man contains 15 MB,
   when uncompressed these would take about 3 times as much,
   so it would cost ca 50 MB, which is negligible on modern disks.

 All manpages contains as first elements :
   name of program(s) this manpage documents, and
   a (very) short description of what they do.
 This description is known as a 'whatis'.
 It is this description that is shown when user types 'whatis '.

 Man system keeps a .db database of all installed manpagenames and whatis-s (in /var/man ) ;
   it is searched when user invokes 'whatis' or 'apropos'.
   (On my system it takes 1.3 MB of diskspace, which is negligible).

 Man system also creates a set of directorys under /var/man ,
   for each directory under /usr/share/man there is one with same name under /var/man ,
   and i wonder why, because on my system they are completely empty.
 This is not a problem, i just wish xemacs would show that a directory is empty,
   so i wouldn't have had to open and close all 15 of them.

 User can read manpages by giving command
   on a terminal.
 This is not always most preferrable way to read a manpage,
   because several manpages of same name can exist,
   and 'man' does not tell about others.
 User can see which manpages of a given name are installed on system with
 This produces a list of manpages, with their sectionnumber and whatis,
   so user can subsequently read a selected manpage with 'man' command.

 An alternative way to interact with user would be :
   Make 'man' always show list of available manpages if there is more than one,
     and make it wait for user to type a sectionnumber,
     after which it displays this manpage.
   If some users don't like this, it could be made user-settable with a shell variable.
 I think i would somewhat prefer this,
   but doubt it is significant enough to be worth implementing.

 Beside 'man' and 'whatis', man system also provides command 'apropos',
   which has syntax: apropos
 This causes man to search it's name&whatis database for strings that match ,
   where  is considered to be a regular expression.
 'Man apropos' does not tell what kind of regular expression this is,
   it is probably not Perl, so it would be egrep or fgrep's kind of regexp,
   but which one of these it is, i do not know ;
   i'll file a bug asking to add that info.

 Default interpretation of this string as a regexp is usefull,
   but in cases that this produces many matches,
   i often would like to limit them to matches that are whole words.
 I don't know how to do that with grep regexps,
   and even if i did, i would prefer to have a commandline option to specify that ;
   this would also make system easier to use for newbies.

 There are also cases where apropos does not find manpages i am looking for.
 For these cases i would like it to also search whole text of all manpages.
 It should not do this by default ofcourse.
 I think only way to get results from such a search in a reasonable amount of time
   would be by using a database.
 I have done some preliminary experiments with mysql, but have not yet completed that ;
   if you have experience how well this works, please let me know.

 When user has told man to do so, man displays manpage.
 Because manpages are often longer than what fits in terminal window,
   man invokes a pager to display them.
 By default this pager is 'less'.
 It works well.
 A small improvement would be if it would tell how many lines manpage consists of,
   but for most manpages this is not really needed,
   because you can page-down to end to find that out.
 When looking at very long manpages, like man bash,
   it would be nice to have some warning that
   it is not something that can simply be read through to the end.

 Man can also be invoked with it's output redirected to a file,
   and it detects this automatically, so that in that case it does not invoke pager.
   This is nice.

 Format produced in this way is not very human-readable however,
   because it contains special ways to indicate bold and underline/italic :
 Bold letter A is represented as A^HA , where ^H is character 8, backspace ;
   this is compatible with old teletype machines,
   which would backspace and print same character again,
   thus producing bold print.
 Italic/underscore letter A is represented as _^HA, also a simple overstrike.
 If produced output is meant to be human readable, then these encodings should not be in it,
   so i assume it is rather meant for printing on an ancient printer.

 Output includes linebreaks that would be appropriate for 77-column wide text,
   so newlines are inserted where they would normally not appear.
 This makes it impossible to reliably detect what line looked like before it was wrapped.
 This format could be changed, to make such detection possible,
   by representing a level of indentation by a tab character,
   and representing indentation of a wrapped line by equivalent amount of spaces.
 Tab characters are not used in current output format, afaik.
 Indentationlevel is currently represented as 7 spaces ;
   maybe that is default for ancient printers.

Manpage Display Format

 In this section i talk about what visual output is currently possible,
   and about what could be made possible.

 Manpages can currently be displayed with format specifiers mentioned above.

 It is possible to set foreground and background colors of terminal,
   and in that case manpage is displayed in those colors,
   but when 'man' command finishes, it sets colors what to what it likes best (white on black)
   which is a bit brutal.
 This happens in an xterm as well as in a virtual terminal ;
   i don't know how it affects real terminals.

 According to 'man man', it is also possible to use man as
    man --html=lynx
   but this does not work for me :
     $ man --html=lynx man
     Reformatting man(1), please wait...
     /usr/bin/groff: can't find `DESC' file
     /usr/bin/groff:fatal error: invalid device `html'
     man: command exited with status 768: /usr/bin/zsoelim /usr/share/man/man1/man.1
     | /usr/bin/tbl | /usr/bin/groff -mandoc -Thtml > /tmp/hmanCivrXP/man.html
 Tough luck.

 I think it would have been nice to be able to read manpages with lynx,
   because then it would be easy to set colors, in a way that doesn't depend on terminal,
   and see-also could be full of links to other manpages, just a click away.

 It wouldn't look like a real web page ofcourse,
   as current manpage format has no way to specify colors, fontproperties, horizontal rules,
   or anything else not mentioned in list of formatspecifiers.

 Some people write documentation in html format.
 There also seems to be a desire to be able to browse manpages with a web-browser
   (that's what --html was meant for, i think).
 And in my opinion manpages should be editable ;
   a sysadmin could use this to let users know about
     features of a command that are not supported, or better done in another way on his system,
   a user on a single-user system could use it to
     add a note about how he can use it for what he uses it for.
 It would also occasionally be good if they supported tables and simple linedrawings.
 Furthermore, i would like indent of wrapped lines to be specifiable.
 And, as noted above, it would be nice if it were faster than groff.

 So there are desires for more modern features.
 Which of these are realilzable, is limited by
   (apart from that someone would need to actually do the work)
   support for non-modern displays ;
 Real consoles still exist, so output must remain readable in black and white.
 Their video chips probably work in text mode,
   i don't know whether they would typically be able to switch to a (VGA) grpahics mode,
   but anything that installer uses should be available,
   so DOS-style linedrawing characters, and color that renders ok on b/w monitors, are possible.
 Text-mode displays even support simple pictures if videocard can do VGA
   (by redefining pixels of a set of characters during flyback time),
   which wouldnt' show up if card couldn't do VGA, but they could be used ornamentally.
 Also, they may not have a mouse, but that doesn't seem like a drawback in this case.

 So it would be possible to upgrade manpages to possiblitys of DOS on consoles,
   and to something more visually interesting on displays that support VGA.

 To implement this,
   * a new outputformat of 'man' needs to be implemented,
       it could simply use DOS's character&attribute encoding,
       though this is a bit wastefull, and for graphics an additional something would be needed.
   * a new inputformat of 'man' would be desirable, to be faster,
       it would probably be possible,
       because this still doesn't support but a fraction of what groff is capable of.
   * it would need to be usable in parallel with current system,
       maybe a new man command that falls back to old one if no new-format manpage exists.
   * this inputformat would need to be editable with a wysiwig editor
       (maybe nano or ae could do it).
   * xterm would need to be able to display new format, and same goes for virtual terminals,
       and less would need to be able to handle it too.
     Less is not a problem, because it has -r (raw) option,
       in which it does not modify it's input ;
       this can be used to send ansi color-escape sequences to terminals,
       if man detected terminal-type
         it could emit color and linedrawing commands that are suitable for that terminal-type.
     VT-s only support a small number (15) of colours,
       and so does xterms, because it emulates a VT,
       but best match could be selected, and it would be an improvement.

 After this would have been implemented,
   existing manpages could be converted to new format,
   which would preferably be done by a script,
   so when building a package, that script could transform .orig.tar manpages.

 There would also be a desire for
   a manpage-display-program
     that could follow links to other manpages (maybe not allowing weblinks, for security),
   a wysiwig editor for this format
     (i have no idea whether nano/ae can display linedrawing characters in text they edit,
      they certainly can't do graphics).

 When manpages were in new format,
   possibilities of old format would be desired for new format too,
   so either new man format would need to implement all of them
     (as far as they are actually used)
   or it would need to be possible to convert a new-format manpage to old format.
 Last of these is not hard to do, if new format is reasonably simple.

 Sequence of development would be
   * have terminal use a characterset that includes linedrawing chars.
   * put a wrapper around 'man' command (shell alias would do),
       and scan /usr/share/newman/ for new-format manpages,
       then convert their colorspecs to ansi-escapes.
   * develop conversion program to support all format-specifiers.
   * write a script to convert existing manpages.
   * create an editor, or modify or use an existing one,
       so converted manpages can be edited to use new possibilitys.
       This could also be used as displaying facility, with search capability.
   * write a script to do reverse conversion.
   * make policy that all manpages should be in new format.
   * remove old 'man' program ; also removes need for groff for most users.

 If a new editor has to be written, it could cost 1000 hours of work.
 If not, then whole project could probably be done in less than 1000 hours.

 A topic that was not discussed yet is that when a user edits a manpage,
   and a new version of manpage becomes available with 'apt-get upgrade',
   then how can new manpage-text be used while keeping modifications ?
 If user has only done an addition
   (this is preferred, as it can not possibly remove usefull information),
   then a simple patch might be appliable automatically.
 If not, then it may need a debconf question ;
   for this some infrastructure would also need to be created.

Manpage Text Format

 Manpages have a fairly standardized format.
 Besides technical necessity (page header and footer, format of whatis line),
   this standard consists of what section-names are used and what they contain.
 I think that this standard is somewhat uniformly used by all unixen.
 If i recall correctly, Debian policy also specifies some of it formally.
 Standard sections (in order they usually appear in) are :

    This contains program name(s), a dash, and whatis.

    Name 'synopsis' seems to indicate that this was originally used to contain a synopsis
     (a synopsis is a short multiline description, highlighting most important things),
     but most manpages use it for a description of commandline syntax.
    Maybe developers had so little time and were so familiar with programs they wrote,
     that for them commandline syntax said it all.
    I would like to see this be used for a real synopsis again,
     because i think most manpages would benefit from this,
     especially for long manpages, reading description takes a lot of time,
     and goes into more detail than is desired by a user who
     is looking for a package that does what s'he wants,
     or is trying to select one of a number of programs to use (eg a mailer).

    Tells what program does.
    It is usually more verbose than a synopsis, describing all parts of program,
     and often describing each part in detail.
    Can come before or after section 'options'.

    Ideally contains an easily browsable list of commandline options,
     with descriptions saying what each option does.
    Sometimes comes before section 'description',
     and sometimes forms a whole with it.

    Lists environment-variables that influence program's behaviour.

    Lists files that program could use.

    Lists known bugs.

    Lists author(s), usually with their email.

    Mentions what copyright this program is under.

    Lists names of other manpages,
      usually because they describe programs with related functionality,
      sometimes because they have more info on this program
      (when there is a -doc package with a manpage, for example).
    Also lists other sources of information : RFC-s, webpages, etc.

 A not so standard, but desirable, section is :

    Shows exactly what to do to get it to do something.
    Is desirable because
      commandline description is somewhat free-format,
      and in some cases not very suitable,
      and description is usually written by an expert,
      who sometimes fails to consider how much users don't know.

What i would like to see changed

 I would like manpages to become more targeted at
   supplying user with info s'he needs in an efficient way.
 If a user has to read through a lot of information to find a little piece of information,
   then this can not be helped if in unusual cases,
   but most frequently needed information should be accessible fast.
 If a user has to scan through a bunch of manpages, looking for something,
   s'he has to understand what s'he reads, even when s'he misses context to relate it to,
   and after finding desired info, 90% of what was read must be forgotten again.
   This is clutter, and it is often avoidable by giving an overview.
 There are many manpages that contain things like
   'if this program is configured with option X, then you can use feature Y',
   or 'on some systems they are installed in X, but they can also be in Y',
   or even 'they are installed in X', when they are really installed in Y.
   Such manpages are fine for original program,
     because users who grab .orig.tgz get to configure program themselves,
   But it is not good for Debian,
     as it requires non-expert readers to take trouble and time to
     find out something that maintainer already knows
     (or else content of manpage remains vague).
   I think a Debian manpage should pertain to a program as installed,
     not as supplied by upstream.

 So my desires for manpages are :
   Accessability (speedy, easy, no avoidable clutter), and
   Debianization (correct for Debian, and missing info filled in (or added)).

 Contents of most manpages is already quite good,
   they are usually informative and fairly complete ;
   they are written by experts, and it shows.

 Accessability and Debianization are not limited to individual manpages,
   I would like the whole man system to make it's information more accessible
   and to be more Debian-specific.
 A manpage that in it's see-also mentions a manpage of another package
   should say which package it is in.
   It documents a program, which is in a package.
 A manpage should not assume that it's reader knows what the program does,
   or what it could be used for.
   Imagine one of our translators reading that manpage.
   Purpose of accessible information is to make Debian usable by all it's users.
 Manpages shoud not waste user's time.
   Imagine someone who does not study information technology,
     and remember how much time you spent simply getting to know the system.
   Microsoft claims that Linux is expensive, and they have a point
     (they also claim that they are more cost-effective, which is false),
   Time spent reading manpages could be spent making money,
     and non-professionals have no way of making that investment pay off ;
     don't make Debian more expensive than necessary.
 I would like user to type 'man mail' and get a list of mailer programs that Debian ships,
   each with a short description suitable for comparing it to other candidates.
   I don't expect this will ever happen.
 I would like user to be able to read manpages of programs that are not installed on system ;
   i need but mention apache,
   whose manpage can only be read if apache is installed,
   and if it is installed, it starts running a webserver by default,
   which is not ideal for users who haven't even read it's manpage
   (at least, it used to be like that, maybe debconf has changed that).

 After these general remarks,
   there are some things about manpages that i would like to highlight.
 I'm not saying that you should do like i say ;
   i hope you find them usefull.

 * Manpages are more readable if every new sentence starts on a new line.
   Look at man man in an xterm, and make xterm as wide as your display.
   (hm, resize it a couple of times, and it no longer wraps to correct position,
     that's a bug.)
   There is not much need to compress as much text as possible in the viewable window ;
     users can scroll it.
     (and would manpages have that wide left margin otherwise ?).
   It is more important to present it in an easily digestible format ;
     bulleted lists are easy, to read and look back,
     and a new line for a new sentence is similar.
   It helps readers arrange their thoughts, because a sentence conveys a thought.

 * The same applies to syntax descriptions :
   Compare :

    gzip [ -acdfhlLnNrtvV19 ] [-S suffix] [ name ... ]
    gunzip [ -acfhlLnNrtvV ] [-S suffix] [ name ... ]
    zcat [ -fhLV ] [ name ... ]

   To :
    gzip    [options] [ filename]...    # compress
    gunzip  [options] [ filename]...    # uncompress
    zcat    [options] [ filename]...    # uncompress to stdout

    Options can be
     single-letter options prepended with a single dash,
     or long options prepended with two dashes (for long options see furtheron) ,
     or a combination of single-letter options prepended with a single dash.

    Available options for gzip are :
      -c        # gzipped output to stdout ; original not changed
      -d        # uncompress
      -f        # force. overwrite, and never warn of anything.
      -h        # help
      -l        # print list of files processed
      -L        # print copyright
      -n        # store original name and timestamps (default)
      -N        # opposite of -n
      -q        # quiet. suppresses all warnings
      -r        # recursive (if file is a directory)
      -t        # test integrity of produced outputfile
      -v        # verbose. prints filename and compression ratio.
      -V        # version. prints version and compilation options.
      -[0-9]    # speed/compression tradeoff. 1:fast 9:best 6:default
      -S suffix # use  instead of ".gz" ;
                 # only allowed as last option.

     Available options for gunzip are same as for gzip, except :
     -d         # is implied, gunzip is same as gzip -d
     -n         # use stored name and timestamps
     -N         # opposite of -n (default)
     -[0-9]     # not applicable
     -S suffix # accept suffix . usefull as '-S ""'.

     Available options for zcat are: -f (force), -h (help), -L (licence), -V (verbose)
     and maybe -q (quiet, manpage doesnt say whether it is allowed or not).

   The point is not that the second form is so good,
     (in fact, man gzip 's list of options is quite near to this,
       because long option names have been chosen to be very descriptive),
   Point is that first format is unreadable,
     and only information it conveys is that -S must come last, if it is used.

   In this particular example, multiple programs are described, that share options,
     so options were shown separate from commandsyntax,
     but it is generally usefull to describe syntax like

    qxxl [-a | --anything  ]   # no matter what, but only one of it
         [-e | --everything]   # all
         [-s | --something ]   # anything, as long as it's valid
         filename ...

   If there is space for a short comment, it makes a great quick-reminder.
   Aligned text is easier to read.
   As long as whole section fits on first screen,
     there is no advantage in compressing the text.

 * A synopsis is a usefull thing, and i would like to see them used more often.
   I think most manpages would benefit from this,
     especially long ones,
     as reading description takes a lot of time,
       and goes into more detail than is desired by a user who
       is looking for a package that does what s'he wants,
       or is trying to select one of a number of programs to use (eg a mailer).

   I would prefer a section that contains a synopsis to be called 'synopsis' ;
     syntax description could be moved to a section called 'syntax'.
   Some manpages use sectionname 'overview' for a synopsis.
   Synopsis should come before syntaxdescription,
     because user first decides what to do, and then how to do it.

   When you want to write a synopsis, ask yourself these questions :
     - what things does this program do ?
       List these.
     - what do 90 % of users use if for 90 % of time ?
       Put description of this first, if reasonably possible,
         and it would be nice to give an example commandline for this.

   Synopsis should ofcourse be shorter than complete description.
   There are cases (i think of man bash) where more than one level of synopsis would be needed,
     but this is because man bash is a complete reference manual,
     which unfortunately has been given format of a manpage,
   Manpage format is not very suitable for this ;
     a document with a table of contents, an overview, division in chapters, etc,
     would be better,
     and then this document could have a usable manpage that describes
     what bash can do for you, how it is usually used, where to find reference manual, etc.
   Manpages are usually between
     70 kB, 1500 lines, (man bash is 200 kB)
      2 kB , 60 lines,
   Smallest ones don't really need a synopsis, because 'description' is short and says it all.
   Generally, number of lines of synopsises i write
     is in the order of magnitude of square root of number of lines of manpage,
     but it depends a lot on what needs to be said.

 * Bugs section describes any bugs that upstream knows about.
   Often it is clear from context that these will never be fixed (by them),
     but not always.

 * It would be nice to tell which package this program is in,
     if it is not identical to programname.
   It is usually not necessary for reporting bugs, because reportbug is smart,
     but maybe user wants to get the source. Without packagename s'he can't.

 * See also section should be last.
   This makes it possible for user to go to this section fast.
   (in most Debian manpages it already is last).

 * See also often contanins links to other manpages.
   As mentioned above, these should be
     Debian-specific :
       If Debian doesn't ship them, don't refer to them as manpages.
       If they are in another package, mention that package,
         because it is not necessarily installed,
           (if it is not a required package),
         and in that case there is no way to find in what package it is,
     Accessible :
       Describe what each reference contains,
         for manpages this can be done by providing it's whatis, like this :

    cdda2wav (1) - a sampling utility that dumps CD audio data into wav sound files
    readcd   (1) - read or write data Compact Discs
    scg      (7) - not found
    fbk      (7) - not found
    mkisofs  (8) - create an hybrid ISO9660/JOLIET/HFS filesystem with optional Rock Ridge attributes.
    rcmd     (3) - routines for returning a stream to a remote command
    ssh      (1) - secure shell client (remote login program) .

       (except that it should not contain 'not found' ofcourse ;
        this is output of a script i use that adds whatis-s).

 * There is often more information about this program in
     - /usr/share/doc/
         especially README-s can be usefull,
         if they are, provide a reference to it,
         don't require users to look there themselves,
           many packages have nothing there that a user wants to read.
     - package description
         often contains Debian-specific information that would be good to have in manpage.