mcl(1)                           USER COMMANDS                          mcl(1)



  NAME
      mcl - The Markov Cluster Algorithm, aka the MCL algorithm.

  SYNOPSIS
      mcl  <-|fname>  [-I f (inflation)] [-o str (fname)] [-scheme k (resource
      scheme)]

      These options are sufficient in 95 percent of the  cases  or  more.  The
      first  argument  must be the name of a file containing a graph/matrix in
      the mcl input format, or a hyphen to read from STDIN.  With  respect  to
      clustering,  only  the -I option and -scheme option are relevant and the
      rest is immaterial.

      As of the first 1.002 release, mcl will accept a very general input for-
      mat.  Graph indices no longer need be sequential; you can use any set of
      indices you like, as long as they are in a suitable range. Refer to  the
      mcxio(5)  section,  and  use mcl -z to find the range from which indices
      can be picked.

      As of the first 1.003 release, clmformat enables you to map a clustering
      onto  a format better suited for inspection, using an index file (the so
      called tab file) mapping mcl indices onto descriptive labels.  Read  the
      clmformat manual for more information - it is simple to use and the man-
      ual is small.

      A mechanism for pipelines is supported (as of the first 1.003  release).
      Refer  to  the  PIPELINES  section  for more information.  A prepackaged
      pipeline for BLAST data is present in the form of mclblastline.

      The full listing of mcl options is shown  below,  separated  into  parts
      corresponding  with  functional  aspects  such as clustering, threading,
      verbosity, pruning and resource management, automatic output naming, and
      dumping.   The  -scheme  parameter provides a single access point to the
      pruning options, and should be sufficient in  most  cases.   mcl  allows
      comprehensive  tuning  and  access  to  its  internals for those who are
      interested, so it has many options.

      Baseline clustering options
      [-I f (inflation)] [-o str (fname)] [-scheme k (resource scheme)]

      Additional clustering options
      [-l n (initial iteration number)] [-L n (main iteration number)]  [-i  f
      (initial inflation)] [-do str (carry out action)]

      Input manipulation options
      [-c  f  (centering)]  [-pi  f  (pre-inflation)] [-pp n (preprune count)]
      [-in-gq f (filter threshold)]

      Alternative modes
      [--expand-only (factor out computation)] [--inflate-first  (rather  then
      expand)]

      Clustering result options
      [-sort  str (sort mode)] [--keep-overlap=y/n (retain overlap)] [--force-
      connected=y/n (analyze components)] [--check-connected=y/n (analyze com-
      ponents)]  [--analyze=y/n  (performance criteria)] [--show-log=y/n (show
      log)] [--append-log=y/n (append log)]

      Verbosity options
      [-v str (verbosity type on)] [-V str  (verbosity  type  off)]  [--silent
      (very)]  [--verbose (very)] [-progress k (gauge)] [--show (print (small)
      matrices to screen)]

      Thread options
      [-te  k  (#expansion  threads)]  [-ti  k  (#inflation  threads)]  [-t  k
      (#threads)] [--clone (when threading (experimental))] [-cloneat n (trig-
      ger)]

      Output file name options
      [-o str (fname)] [-ap str (use  str  as  file  name  prefix)]  [-aa  str
      (append str to suffix)] [-az (show output file name and exit)]

      Dump options
      [-di  i:j  (dump  interval)]  [-dm k (dump modulo)] [-ds stem (dump file
      stem)] [-dump str (type)] [-digits n (printing precision)]

      Info options
      [--jury-charter (explains jury)] [--version (show version)]  [-how-much-
      ram  k (RAM upper bound)] [-h (most important options)] [--apropos (one-
      line description for all options)] [-z  (show  current  settings)]  [-az
      (show  output  file  name  and  exit)]  [--show-schemes  (show  resource
      schemes)]

      Pruning options
      The following options all pertain to the various pruning strategies that
      can  be  employed by mcl. They are described in the PRUNING OPTIONS sec-
      tion, accompanied by a description of the mcl pruning strategy.  If your
      graphs  are huge and you have an appetite for tuning, have a look at the
      following:

      [-p f (cutoff)] [-P n  (1/cutoff)]  [-S  n  (selection  number)]  [-R  n
      (recovery  number)]  [-pct  f  (recover percentage)] [-warn-pct n (prune
      warn percentage)] [-warn-factor n (prune warn factor)]  [--dense  (allow
      matrices to fill)] [--adapt (pruning)] [--rigid (pruning)] [-ae f (adap-
      tive pruning exponent)] [-af f (adaptive pruning factor)] [-nx x (x win-
      dow  index)] [-ny y (y window index)] [-nj j (jury window index)] [-nw w
      (nr of windows)] [-nl w (nr of iterations)] [--thick (expect dense input
      graph)]

      The  first  argument  of  mcl  must be a file name, but some options are
      allowed to appear as the first argument instead. These are  the  options
      that  cause  mcl  to  print out information of some kind, after which it
      will gracefully exit. The full list of these options is

      -z, -h, --apropos, --version, --show-settings,  --show-schemes,  --jury-
      charter, -how-much-ram k.

  DESCRIPTION
      mcl  implements  the  MCL  algorithm, short for the Markov cluster algo-
      rithm, a cluster algorithm for graphs developed by Stijn van  Dongen  at
      the  Centre  for  Mathematics  and  Computer  Science  in Amsterdam, the
      Netherlands. The algorithm simulates flow  using  two  simple  algebraic
      operations on matrices.  The inception of this flow process and the the-
      ory behind it are described elsewhere (see REFERENCES). Frequently asked
      questions  are answered in the mclfaq(7) section.  The program described
      here is a fast threaded implementation written by the  algorithm's  cre-
      ator  with contributions by several others. Anton Enright co-implemented
      threading; see the HISTORY/CREDITS section for a complete account.   See
      the  APPLICABILITY  section  for  a description of the type of graph mcl
      likes best, and for a qualitative  assessment  of  its  speed.   mcl  is
      accompanied  by  several  other  utilities for analyzing clusterings and
      performing matrix and graph operations; see the SEE ALSO section.

      The first argument is the input file name (see the mcxio(5) section  for
      its  expected format), or a single hyphen to read from stdin. The ratio-
      nale for making the name of the input file a fixed parameter is that you
      typically  do  several  runs  with different parameters. In command line
      mode it is pleasant if you  do  not  have  to  skip  over  an  immutable
      parameter all the time.

      The  -I f  option  is  the  main control, affecting cluster granularity.
      Using mcl is as simple as typing (assuming a file  proteins  contains  a
      matrix/graph in mcl input format)

      mcl proteins -I 2.0

      The above will result in a clustering written to the file named out.pro-
      teins.I20s2. It is - of course - possible to explicitly specify the name
      of  the  output  file using the -o option. Refer to the -ap option for a
      description of mcl's procedure in automatically constructing file  names
      from it parameters.

      The  mcl  input format is described in the mcxio(5) section. Clusterings
      are also stored as matrices - this is again discussed  in  the  mcxio(5)
      section.  You presumably want to convert the output to something that is
      easier to interpret. The mcl matrix  format  is  perhaps  unpleasant  to
      parse in the quick and dirty way. You can use

      clmformat -icl <mcl-out-file> -dump -

      to  convert mcl output to a line/tab based format, where each lines con-
      tains a cluster in the form of tab-separated indices. If  you  throw  in
      the  -tab <tab-file>  option, you can get tab-separated labels. Refer to
      the clmformat manual page for more information.

      In finding good mcl parameter settings for a particular  domain,  or  in
      finding  cluster structure at different levels of granularity, one typi-
      cally runs mcl multiple times for varying values  of  f  (refer  to  the
      -I option for further information).

      mcl  expects  a nonnegative matrix in the input file, or equivalently, a
      weighted (possibly directed) graph. NOTE -  mcl  interprets  the  matrix
      entries  or  graph edge weights as similarities, and it likes undirected
      input graphs best. It can handle directed  graphs,  but  any  node  pair
      (i,j)  for  which  w(i,j) is much smaller than w(j,i) or vice versa will
      presumably have a slightly negative effect on the clusterings output  by
      mcl. Many such node pairs will have a distinctly negative effect, so try
      to make your input graphs undirected. How your edge weights are computed
      may affect mcl's performance. In protein clustering, one way to go is to
      choose the negated logarithm of  the  BLAST  probabilities  (see  REFER-
      ENCES).

      mcl's default parameters should make it quite fast under almost all cir-
      cumstances. Taking default parameters, mcl has  been  used  to  generate
      good  protein  clusters on 133k proteins, taking 10 minutes running time
      on a Compaq ES40 system with four alpha EV6.7 processors.  It  has  been
      applied  (with  good results) to graphs with 800k nodes, and if you have
      the memory (and preferably CPUs as well) nothing should  stop  you  from
      going further.

      For  large  graphs, there are several groups of parameters available for
      tuning the mcl computing process, should it be  necessary.  The  easiest
      thing  to  do  is  just vary the -scheme option. This triggers different
      settings for the group of pruning parameters {-p/-P, -R, -S, and  -pct}.
      The  default setting corresponds with -scheme 4.  There is an additional
      group of control parameters {--adapt, --rigid, -ae, -af}, which  may  be
      helpful  in  speeding up mcl.  When doing multiple mcl runs for the same
      graphs with different -I settings (for obtaining clusterings at  differ-
      ent levels of granularity), it can be useful to factor out the first bit
      of computation that is common to all runs, by  using  the  --expand-only
      option  one time and then using --inflate-first for each run in the set.
      Whether mcl considers a graph large depends mainly on the graph  connec-
      tivity;  a  highly  connected  graph on 50,000 nodes is large to mcl (so
      that you might want to tune  resources)  whereas  a  sparsely  connected
      graph on 500,000 nodes may be business as usual.

      mcl  is  a  memory  munger. Its precise appetite depends on the resource
      settings. You can get a rough (and usually much too  pessimistic)  upper
      bound  for  the  amount of RAM that is needed by using the -how-much-ram
      option. The corresponding entry in this manual page contains the  simple
      formula via which the upper bound is computed.

      Two  other  groups  of  interest are the thread-related options (you can
      specify the number of threads to use) {-t, -te, -ti, --clone,  -cloneat}
      and  the  verbosity-related  options {--verbose, --silent, -v, -V}.  The
      actual settings are shown with -z, and for graphs with at most 12  nodes
      or so you can view the MCL matrix iterands on screen by supplying --show
      (this may give some more feeling).

      MCL iterands allow a generic interpretation as clusterings as well.  The
      clusterings  associated with early iterands may contain a fair amount of
      overlap. Refer to the -dump option, the mclfaq(7) manual, and the  clmi-
      mac  utility  (Interpret  Matrices As Clusterings).  Use clmimac only if
      you have a special reason; the normal usage of mcl  is  to  do  multiple
      runs  for  varying  -I  parameters and use the clusterings output by mcl
      itself.

      Under very rare circumstances, mcl might get stuck in a seemingly  infi-
      nite  loop.  If the number of iterations exceeds a hundred and the chaos
      indicator remains nearly constant (presumably around  value  0.37),  you
      can  force  mcl  to  stop by sending it the ALRM signal (usually done by
      kill -s ALRM pid). It will finish the current iteration,  and  interpret
      the last iterand a clustering. Alternatively, you can wait and mcl might
      converge by itself or it will certainly  stop  after  10,000  iterations
      (the default value for the -L option). The most probable explanation for
      such an infinite loop is that the input  graph  contains  the  flip-flop
      graph of node size three as a subgraph.

      The  creator  of  this  page  feels  that  manual  pages  are a valuable
      resource, that online html documentation is also a good thing  to  have,
      and  that  info pages are way way ahead of their time. The NOTES section
      explains how this page was created.

      In the OPTIONS section options are listed in order of  importance,  with
      related options grouped together.

  OPTIONS
      -I f (inflation)
         Sets the main inflation value to f. This value is the main handle for
         affecting cluster granularity. It is usually chosen somewhere in  the
         range  [1.2-5.0]. -I 5.0 will tend to result in fine-grained cluster-
         ings, and -I 1.2 will tend to result in very coarse grained  cluster-
         ings. Your mileage will vary depending on the characteristics of your
         data. That is why it is a good idea to test the quality and coherency
         of  your clusterings using clmdist and clminfo. This will most likely
         reveal that certain values of -I are simply not right for your  data.
         The  clmdist  section contains a discussion of how to use the cluster
         validation tools shipped with mcl (see the SEE ALSO section).

         A second option for affecting cluster granularity is the  -c  option.
         It may possibly increase granularity.

         With  low  values  for -I, like -I 1.2, you should be prepared to use
         more resources in order to  maintain  quality  of  clusterings,  i.e.
         increase the argument to the -scheme option.

      -o str (fname)
         Output  the  clustering  to file named fname.  It is possible to send
         the clustering to stdout by supplying -o -. The clustering is  output
         in  the  mcl  matrix  format;  see  the  mcxio(5)  section  for  more
         information on this.

         Look at the -ap option and its siblings for the automatic naming con-
         structions employed by mcl if the -o option is not used.

      -scheme k (use a preset resource scheme)
         There  are  currently seven different resource schemes, indexed 1..7.
         High schemes result in more expensive computations that may  possibly
         be  more accurate. The default scheme is 4. When mcl is done, it will
         give a grade (the so called jury synopsis) to the appropriateness  of
         the  scheme  used.  A  low  grade does not necessarily imply that the
         resulting clustering is bad - but anyway, a low grade should be  rea-
         son  to try for a higher scheme. The grades are listed in the PRUNING
         OPTIONS section under the -nj option.

         The PRUNING OPTIONS section contains an elaborate description of  the
         way mcl manages resources, should you be interested.  In case you are
         worried about  the  validation  of  the  resulting  clusterings,  the
         mclfaq(7) section has several entries discussing this issue. The bot-
         tom line is that you have to compare the clusterings  resulting  from
         different  schemes  (and otherwise identical parameters) using utili-
         ties such as clmdist, clminfo on the one hand,  and  your  own  sound
         judgment on the other hand.

         If  your  input graph is extremely dense, with an average node degree
         (i.e. the number of neighbours per node) that is somewhere above 500,
         you may need to filter the input graph by removing the nodes of high-
         est degree (and projecting them back onto  the  resulting  clustering
         afterwards) or by using the -pp option.

      --show-schemes (show preset resource schemes)
         Shows  the  explicit  settings  to which the different preset schemes
         correspond.

         The characteristics are written in the same format (more or less)  as
         the output triggered by -v pruning.

      -c f (centering)
         The  larger  the value of f the more nodes are attached to themselves
         rather than their neighbours, the more expansion  (the  spreading  of
         flow  through  the graph) is opposed, and the more fine-grained clus-
         terings tend to be. f should be chosen greater than or equal to  1.0.
         The  default  is f=1.0. This option has a much weaker effect than the
         -I option, but it can be useful depending on your data.

      -do str (carry out action)
      -dont str (omit action)
         These are the actions that can be controlled:

         clm
         log
         show-log
         keep-overlap

         The clm action appends a short synopsis of  performance  criteria  to
         the  output  cluster  file,  as well as a synopsis of the granularity
         characteristics.

         The log action appends several characteristics of the mcl run to  the
         output  cluster  file, such as the resource management log, parameter
         settings, timings, and the command line.  Useful bookkeeping, and  it
         is advisable to always use it.

         The show-log action sends the resource management log to STDOUT.

         The  keep-overlap  action  causes  mcl  to retain overlap should this
         improbable event occur. In theory, mcl may generate a clustering that
         contains  overlap, although this almost never happens in practice, as
         it requires some particular type of symmetry to  be  present  in  the
         input  graph  (not just any symmetry will do).  Mathematically speak-
         ing, this is a conjecture and not a theorem, but I am willing to  eat
         my shoe if it does not hold (for marzipan values of shoe). It is easy
         though to construct an input graph for  which  certain  mcl  settings
         result  in  overlap  -  for  example a line graph on an odd number of
         nodes. The default is to remove overlap should it occur.

         This option has more than theoretical use because mcl is able to gen-
         erate  clusterings  associated with intermediate iterands.  For these
         clusterings, overlap is more than a theoretical possibility, and will
         often  occur.  If  you  specify  the -L k option, mcl will output the
         clustering associated with the last iterand computed, and it may well
         contain overlap.

         This  option  has  no  effect on the clusterings that are output when
         using -dump cls - the default  for  those  is  that  overlap  is  not
         touched, and this default can not yet be overridden.

      -v str (verbosity type on)
         See the --verbose option below.

      -V str (verbosity type off)
         See the --verbose option below.

      --silent (very)
         See the --verbose option below.

      --verbose (very)
         These are the different verbosity modes:

         progress
         pruning
         explain
         all

         where  all means all three previous modes.  --verbose and -v all turn
         them all on, --silent and -V all turn them all off. -v str and -V str
         turn  on/off  the  single mode str (for str equal to one of progress,
         pruning, or explain).  Each verbosity mode is  given  its  own  entry
         below.

      -v progress
         This  mode  causes  mcl to emit an ascii gauge for each single matrix
         multiplication. It uses some default length for the gauge, which  can
         be  altered  by  the -progress k option. Simply using the latter will
         also turn on this verbosity mode.  This mode can give you quickly  an
         idea how long an mcl run might take. If you use threading (see the -t
         option and its friends), this option may slow down the program a lit-
         tle  (relative to -V progress, not relative to a single-CPU mcl run).

      -v explain
         This mode causes the output of explanatory headers  illuminating  the
         output generated with the pruning verbosity mode.

      -v pruning
         This mode causes output of resource-related quantities. It has a sep-
         arate entry in the PRUNING OPTIONS section.

      -progress k (gauge)
         If k>0 then for each matrix multiplication mcl will  print  an  ascii
         gauge telling how far it is. The gauge will be (in some cases approx-
         imately) k characters long. If k<0 then mcl will emit a gauge that is
         extended by one character after every |k| vectors computed. For large
         graphs, this option has been known to ease the pain of impatience. If
         k=0 then mcl will print a message only after every matrix multiplica-
         tion, and not during matrix multiplication. This can be  useful  when
         you want mcl to be as speedy as possible, for example when using par-
         allellized mode (as monitoring progress  requires  thread  communica-
         tion).  For parallellization (by threading) see the -t option.

      -aa str (append str to suffix)
         See the -ap option below.

      -ap str (use str as file name prefix)
         If  the -o option is not used, mcl will create a file name (for writ-
         ing output to) that should uniquely characterize the important param-
         eters  used  in  the current invocation of mcl. The default format is
         out.fname.suf, where out is simply the literal string out,  fname  is
         the  first  argument containing the name of the file (with the graph)
         to be clustered, and where suf is the suffix encoding a set of param-
         eters (described further below).

         The -ap str option specifies a prefix to use rather than out.fname as
         sketched above.  However, mcl will interpret the  character  '=',  if
         present in str, as a placeholder for the input file name.

         If  the -aa str option is used, mcl will append str to the suffix suf
         created by itself.  You can use this if you need to encode some extra
         information in the file name suffix.

         The  suffix is constructed as follows. The -I f and -scheme parameter
         are always encoded.  The -pi f, -l k, -i f, and -c f options are only
         encoded  if  they  are  used.  Any  real  argument f is encoded using
         exactly one trailing digit behind the decimal separator (which itself
         is  not  written).  The  setting  -I 3.14 is thus encoded as I31. The
         -scheme option is encoded using the letter  's',  all  other  options
         mentioned  here  are  encoded as themselves (stripped of the hyphen).
         For example

         mcl small.mci -I 3 -c 2.5 -pi 0.8 -scheme 5

         results in the file name out.small.mci.I30s5c25pi08.  If you want  to
         know  beforehand what file name will be produced, use the -az option.

      -az (show output file name and exit)
         If mcl automatically constructs a file name, it  can  be  helpful  to
         known  beforehand  what  that file name will be. Use -az and mcl will
         write the file name to STDOUT and exit. This can be used  if  mcl  is
         integrated  into  other  software for which the automatic creation of
         unique file names is convenient.

      -te k (#expansion threads)
         See the -t k option below.

      -ti k (#inflation threads)
         See the -t k option below.

      --clone (when threading)
         See the -t k option below.

      -cloneat n (trigger)
         See the -t k option below.

      -t k (#threads)
         The -t options are self-explanatory. Note that threading inflation is
         hardly useful, as inflation is orders of magnitude faster than expan-
         sion. Also note that threading is only useful if you  have  a  multi-
         processor system.

         The  --clone  option  says  to  give  each thread its own copy of the
         matrix being expanded/squared. The latter option can be further  con-
         trolled  using  the  --cloneat k  option. Copies are only made if the
         source matrix (the one to be squared) has on average at least k posi-
         tive  entries  per  vector.  This option is probably not very useful,
         because without it mcl is a memory munger already.

         When threading, it is best not to turn on pruning verbosity  mode  if
         you are letting mcl run unattended, unless you want to scrutinize its
         output later. This is because  it  makes  mcl  run  somewhat  slower,
         although the difference is not dramatic.

      -l n (initial iteration number) (small letter ell)
         The  number  of times mcl will use a different inflation value before
         it switches to the (main) inflation given by  the  -I  (capital  eye)
         option.  The  different value is called initial inflation and is tun-
         able using the -i f option (default value f=2.0). The  default  value
         (to  -l)  is zero. This option supplies new ways of affecting cluster
         granularity, e.g. by supplying

         mcl proteins -i 1.4 -l 2 -I 4.0

         one lets expansion prevail during the first two iterations,  followed
         by  inflation catching up (in a figurative way of writing).  This may
         be useful in certain cases, but this type of experiment is  certainly
         secondary to simply varying -I (capital eye).

      -L n (main iteration number)
         Normally,  mcl  computes the MCL process until it has converged fully
         to a doubly idempotent matrix. The number of iterations  required  is
         typically  somewhere  in  the range 10-100.  The first few iterations
         generally take the longest time.  The -L option can be used to  spec-
         ify  the number of iterations mcl may do at most. When this number is
         reached, mcl will output the clustering associated with  the  iterand
         last computed.

      -i f (initial inflation)
         The  inflation  value  used during the first n iterations, where n is
         specified by the -l (ell) option.  By default, n=0 and f=2.0.

      -pi f (pre-inflation)
         If used, mcl will apply inflation one time to the input graph  before
         entering  the  main  process.  This can be useful for making the edge
         weights in a graph either more homogeneous (which may result in  less
         granular clusterings) or more heterogeneous (which may result in more
         granular clusterings).  Homogeneity is achieved  for  values  f  less
         than  one,  heterogeneity  for values larger than one.  Values to try
         are normally in the range [2.0,10.0].

      -di i:j (dump interval)
      -dump-interval i:j
         Dump during iterations i..j-1. See the -dump str option below.

      -dm k (dump i+0..i+k..)
      -dump-modulo k
         Sampling rate: select only these iterations  in  the  dump  interval.
         See the -dump str option below.

      -ds stem (file stem)
      -dump-stem stem
         Set  the the stem for file names of dumped objects (default mcl). See
         the -dump str option below.

      -dump str (type)
         str can be of the following types.

         ite
         dag
         cls
         chr

         Repeated use is allowed.  The ite option writes mcl iterands to file.
         The  cls  option  writes  clusterings associated with mcl iterands to
         file.  These clusters are obtained from a particular directed acyclic
         graph  (abbreviated  as  DAG)  associated  with each iterand. The dag
         option writes that DAG to file. The DAG  can  optionally  be  further
         pruned  and  then again be interpreted as a clustering using clmimac,
         and clmimac can also work with the matrices  written  using  the  ite
         option.  It should be noted that clusterings associated with interme-
         diate iterands may contain overlap,  which  is  interesting  in  many
         applications.  For more information refer to mclfaq(7) and the REFER-
         ENCES section below.

         The chr option says, for each iterand I, to output a  matrix  C  with
         characteristics of I. C has the same number of columns as I. For each
         column k in C, row entry 0 is the diagonal or 'loop' value of  column
         k  in I after expansion and pruning, and before inflation and rescal-
         ing. Entry 1 is the loop value after inflation and rescaling.   Entry
         2 is the center of column k (the sum of its entries squared) computed
         after expansion and before pruning, entry  3  is  the  maximum  value
         found  in that column at the same time. Entry 4 is the amount of mass
         kept for that column after pruning.

         The -ds option sets  the  stem  for  file  names  of  dumped  objects
         (default  mcl). The -di and -dm options allow a selection of iterands
         to be made.

      -digits n (printing precision)
         This has two completely different uses. It sets the number  of  deci-
         mals  used  for  pretty-printing  mcl  iterands when using the --show
         option (see below), and it sets the number of decimals used for writ-
         ing the expanded matrix when using the --expand-only option.

      --show (print matrices to screen)
         Print  matrices  to  screen.  The  number of significant digits to be
         printed can be tuned  with  -digits n.  An  80-column  screen  allows
         graphs (matrices) of size up to 12(x12) to be printed with three dig-
         its precision (behind the comma), and of size up to 14(x14) with  two
         digits.  This  can give you an idea of how mcl operates, and what the
         effect of pruning is.  Use e.g. -S 6 for such a small graph and  view
         the MCL matrix iterands with --show.

      -sort str (sort mode)
         str  can  be one of lex, size, revsize, or none. The default is 'rev-
         size', in which the largest clusters  come  first.  If  the  mode  is
         'size',  smallest clusters come first, if the mode is 'lex', clusters
         are ordered lexicographically, and if the mode is 'none',  the  order
         is  the same as produced by the procedure used by mcl to map matrices
         onto clusterings.

      --keep-overlap y/n (retain overlap)
         mcl clusterings may in theory generate overlap. In practice, this  is
         very seldom. By default mcl will remove overlap.

      --force-connected=y/n (analyze components)
      --check-connected=y/n (analyze components)
         If  the  input  graph  has  strong bipartite characteristics, mcl may
         yield clusters that do not correspond to connected components in  the
         input  graph.  Turn  one  of  these modes on to analyze the resultant
         clustering.

         If loose clusters are found  they  will  be  split  into  subclusters
         corresponding  to connected components.  With --force-connected=y mcl
         will write the corrected clustering to the normal  output  file,  and
         the  old clustering to the same file with suffix orig.  With --check-
         connected=y mcl will write the loose clustering to the normal  output
         file, and the corrected clustering to the same file with suffix coco.

         These options are not on by default, as  the  analysis  is  currently
         (overly) time-consuming and mcl's behaviour actually makes some sense
         (when taking bipartite characteristics into account).

      --analyze=y/n (performance criteria)
         With this mode turned on, mcl will reread the input matrix  and  com-
         pute  a  few performance criteria and attach them to the output file.
         Off by default.

      --append-log=y/n (append log)
         Appends a log with the process characteristics to  the  output  file.
         By default, this mode is on.

      --show-log=y/n (show log)
         Shows  the  log  with process characteristics on STDOUT.  By default,
         this mode is off.

      --inflate-first (rather then expand)
         Normally, mcl will take the input graph/matrix, make  it  stochastic,
         and start computing an mcl process, where expansion and inflation are
         alternated. This option changes that to alternation of inflation  and
         expansion,  i.e.  inflation is the first operator to be applied. This
         is intended for use with an input matrix that was generated with  the
         --expand-only  option  (see  below).  If you do multiple mcl runs for
         the same graph, then the first step will be the same  for  all  runs,
         namely  computing  the  square  of the input matrix. With the pair of
         --inflate-first and --expand-only this bit of computing can  be  fac-
         tored  out.   NOTE  this  option  assumes  that  the  input matrix is
         stochastic (as it will be when generated  with  --expand-only).   The
         --inflate-first  option  renders all options useless that will other-
         wise affect the input matrix, and precisely these options  do  affect
         the  matrix  resulting  from using --expand-only. See the entry below
         for more information.

      --expand-only (factor out computation)
         This  option  makes  mcl  compute  just  the  square  of  the   input
         graph/matrix,  and  write  it  to  the file name supplied with the -o
         flag, or to the default file named out.mce. NOTE  in  this  case  the
         output  matrix is not a clustering. The intended use is that the out-
         put matrix is used as input for mcl with the  --inflate-first  switch
         turned  on, so that multiple mcl runs need not redo the same computa-
         tion (the first expansion step).

         Note that the -scheme parameters  affect  the  matrix  computed  with
         --expand-only.  Other  options  that affect the matrix resulting from
         this option: -pp, -c, and -digits. The latter option sets the  preci-
         sion for output in native ascii format.

      -in-gq f (filter threshold)
         mcl  will  remove any edges in the input graph (equivalently, entries
         in the input matrix) for which the weight is below f.

      -pp n (preprune count)
         For each column vector (node) in the input matrix  (graph)  mcl  will
         keep  the  n entries (outgoing edges) of that vector (node) that have
         largest weight and remove the rest.

      --jury-charter (explains jury)
         Explains how the jury synopsis is computed from the jury marks.

      --version (show version)
         Show version.

      -how-much-ram n (RAM upper bound)
         n is interpreted as the number of nodes of an input graph.  mcl  will
         print  the  maximum amount of RAM it needs for its computations.  The
         formula for this number in bytes is:

            2 * c * k * n

            2  :  two matrices are concurrently held in memory.
            c  :  mcl cell size (as shown by -z).
            n  :  graph cardinality (number of nodes).
            k  :  MAX(s, r).
            s  :  select number (-S, -scheme options).
            r  :  recover number (-R, -scheme options).

         This estimate will usually be too pessimistic. It does assume  though
         that  the  average  node degree of the input graph does not exceed k.
         The -how-much-ram option  takes  other  command-line  arguments  into
         account  (such  as  -S and -R), and it expresses the amount of RAM in
         megabyte units.

      -h (show help)
         Shows a selection of the most important mcl options.

      --apropos (show help)
         Gives a one-line description for all options.

      --show-settings (show settings)
         A synonym for the -z option.

      -z (show settings)
         Show current settings for tunable parameters.  --show-settings  is  a
         synonym.

  PRUNING OPTIONS
      -p f (cutoff)
      -P n (1/cutoff)
      -S s (selection number)
      -R r (recover number)
      -pct pct (recover percentage)
         After computing a new (column stochastic) matrix vector during expan-
         sion (which is matrix multiplication c.q. squaring),  the  vector  is
         successively  exposed  to different pruning strategies. The intent of
         pruning is that many small entries are removed while  retaining  much
         of the stochastic mass of the original vector. After pruning, vectors
         are rescaled to be stochastic again. MCL iterands  are  theoretically
         known to be sparse in a weighted sense, and this manoever effectively
         perturbs the MCL process a little in order to  obtain  matrices  that
         are  genuinely  sparse,  thus  keeping  the computation tractable. An
         example of monitoring pruning can  be  found  in  the  discussion  of
         -v pruning at the end of this section.

         mcl  proceeds as follows. First, entries that are smaller than cutoff
         are removed, resulting in a vector with at most 1/cutoff entries. The
         cutoff  can  be supplied either by -p, or as the inverse value by -P.
         The latter is more intuitive, if your intuition is like mine (and the
         P  stands  for  precision  or  pruning  by the way).  The cutoff just
         described is rigid; it is the  same  for  all  vectors.  The  --adapt
         option  causes the computation of a cutoff that depends on a vector's
         homogeneity properties, and this option may or may not speed up  mcl.

         Second, if the remaining stochastic mass (i.e. the sum of all remain-
         ing entries) is less than pct/100 and the number of remaining entries
         is  less than r (as specified by the -R flag), mcl will try to regain
         ground by recovering the largest discarded entries. The total  number
         of entries is not allowed to grow larger than r.  If recovery was not
         necessary, mcl tries to prune the vector further down to  at  most  s
         entries (if applicable), as specified by the -S flag. If this results
         in a vector that satisfies the recovery condition  then  recovery  is
         attempted,  exactly  as described above. The latter will not occur of
         course if r <= s.

         The default setting is something like -P 4000  -S 500  -R 600.  Check
         the -z flag to be sure. There is a set of precomposed settings, which
         can be triggered with  the  -scheme k  option.  k=2  is  the  default
         scheme; higher values for k result in costlier and more accurate com-
         putations (vice versa for lower, cheaper, and  less  accurate).   The
         schemes  are  listed using the --show-schemes option. It is advisable
         to use the -scheme option only in interactive mode, and  to  use  the
         explicit  expressions when doing batch processing. The reason is that
         there is no guarantee whatsoever that the  schemes  will  not  change
         between different releases. This is because the scheme options should
         reflect good general purpose settings, and it may  become  appararent
         that other schemes are better.

         Note  that  'less accurate' or 'more accurate' computations may still
         generate the same output clusterings. Use clmdist to  compare  output
         clusterings for different resource parameters. Refer to clmdist for a
         discussion of this issue.

      -warn-pct k (prune warn percentage)
      -warn-factor k (prune warn factor)
         The two options -warn-pct and -warn-factor relate  to  warnings  that
         may  be  triggered once the initial pruning of a vector is completed.
         The idea is to issue warnings if initial  pruning  almost  completely
         destroys  a  computed  vector, as this may be a sign that the pruning
         parameters should be changed. It depends on the mass remaining  after
         initial  pruning  whether  a  warning will be issued. If that mass is
         less than warn-pct or if the number of remaining entries  is  smaller
         by  a  factor  warn-factor than both the number of entries originally
         computed and the recovery number, in that  case,  mcl  will  issue  a
         warning.

         -warn-pct takes an integer between 0 and 100 as parameter, -warn-fac-
         tor takes a real positive number. They default to something  like  30
         and  50.0.  If  you  want to see less warnings, decrease warn-pct and
         increase warn-factor. Set warn-factor to zero if you  want  no  warn-
         ings.

      --dense (allow matrices to fill)
         This  renders  all pruning options useless except for one. After each
         expansion step, mcl will remove all entries that are smaller than the
         threshold  specified by -p or -P, which acts like a precision in this
         case. After removal, the matrix columns are rescaled to be stochastic
         again.

         If  the  -p  threshold (precision) is zero or very small, the --dense
         option results in a rather accurate and very  costly  computation  of
         the  MCL  process.  Do  not use this option for graphs with more than
         several thousands of entries, or you will have trouble  digging  your
         processor out of swap.

      --rigid (pruning)
         See the --adapt option below.

      -ae f (adaptive pruning exponent)
         See the --adapt option below.

      -af f (adaptive pruning factor)
         See the --adapt option below.

      --adapt (pruning)
         The default mcl pruning behaviour as described under the -P option is
         called rigid pruning (it being the default renders the switch --rigid
         currently  useless),  refering  to  the  fact that the first stage of
         pruning removes entries smaller than a fixed threshold.  The  options
         discussed  here enable the computation of a threshold that depends on
         the homogeneity characteristics of a vector. This behaviour is  trig-
         gered by supplying --adapt.

         The --adapt behaviour only affects the first pruning stage, c.q.  the
         computation of the first threshold (see the discussion under  the  -P
         option).  It does not interfere with either selection or recovery. It
         is affected however by the threshold as specified by the  -P  option.
         When  using --adapt, you typically use the -P option as well, and you
         can and should use a  higher  value  then  you  would  without  using
         --adapt.

         All  that  said,  --adapt triggers this behaviour: Given a stochastic
         vector v, its mass center of order two is computed, which is the  sum
         of  each  entry squared. The mass center of v, call it c, is strongly
         related to its homogeneity properties (see REFERENCES). The threshold
         T  is computed as 1/f * pow(c, e), where e and f are the arguments to
         the -af f and -ae e options respectively (check -z for the respective
         defaults).   For  either  e  or  f decreasing it means that T becomes
         larger.  Finally, T is maxed with the rigid  threshold  value,  which
         can  be  altered  using  either  -p f or -P n.  The latter is why you
         should increase the -P parameter n (so that the  rigid  threshold  is
         decreased)  once you switch to adaptive pruning. The adaptive thresh-
         old should be the main factor controlling  pruning,  with  the  rigid
         threshold acting as a safeguard that does not take over too often.

         This  may  seem complicated, but the rules are actually quite simple,
         and you may just disregard the definition of  T.  The  usefulness  of
         these  options will vary. If you want to speed up mcl, try it out and
         add --adapt to your settings.

      --thick (expect dense input graph)
         This option is somewhat esoteric. It does not affect the matrices  as
         computed  by  mcl, but it affects the way in which they are computed.
         If the input graph is very dense, this may speed up mcl a little.

      -v pruning
         Pruning verbosity mode causes mcl to emit several statistics  related
         to  the  pruning  process,  each  of  which  is  described below. Use
         -v explain to get explanatory headers in the output as well (or  sim-
         ply use -v all).

         Selection and recovery
         The  number  of  selections  and recoveries mcl had to perform during
         each iteration is shown. It also shows  the  number  of  vectors  for
         which  the mass after final pruning was below the fraction defined by
         the -pct option as a percentage (default probably 90 or 95).

         Initial and pruned vector footprint distributions
         The distribution of the vector footprints (i.e. the number of nonzero
         entries)  before  and  after pruning is shown. This is assembled in a
         terse (horrid if you will) format, looking as follows (with some con-
         text  stripped,  noting  that  the  data for three expansion steps is
         shown):

         ----------------------------------------------------
          mass percentages  | distr of vec footprints       |
                  |         |____ expand ___.____ prune ____|
           prune  | final   |e4   e3   e2   |e4  e3   e2    |
         all ny nx|all ny nx|8532c8532c8532c|8532c8532c8532c|
         ---------.---------.---------------.---------.-----.
          98 88 86  98 91 86 _________022456 ___________0234
          98 89 86  98 94 91 _______00245678 ___________0234
          98 90 89  99 95 94 _______00235568 ___________0234
          ...

         This particular output  was  generated  (and  truncated  after  three
         rounds of expansion and inflation) from clustering a protein graph on
         9058  nodes  with  settings  -I 1.4,  -P 2000,  -S 500,  -R 600,  and
         -pct 95.

         The  header  entries  8532c85.. indicate thresholds going from 80000,
         50000, 20000, 12500, 8000, all the way down to 300, 200, and 125. The
         character  'c'  signifies the base 1.25 (for no apparent reason). The
         second entry '2' (after '0') on the first line signifies that roughly
         20  percent  of  all  the  vectors  had  footprint (#nonzero entries)
         between 800 and 1250.  Likewise, 40 percent had footprint between 300
         and  500.  The  '0' entries signify a fraction somewhere below 5 per-
         cent, and the '@' entries signify a fraction somewhere above 95  per-
         cent.

         Two columns are listed, one for the expansion vector footprints (i.e.
         after squaring), and the other for the vector footprints right  after
         initial pruning took place (i.e. before selection and recovery, after
         either adaptive or rigid pruning).  This may  give  an  idea  of  the
         soundness  of  the  initial pruning process (overly severe, or overly
         mild), and the extent to which you want  to  apply  selection  and/or
         recovery.

         Initial and final mass windows
         The  mass  averages  of  the pruned vectors after the first selection
         stage are shown, and the mass averages of the vectors as finally com-
         puted, i.e. after selection and recovery. Note that the latter corre-
         sponds to a different stage than what is shown for the  vector  foot-
         prints,  if  either  selection  or  recovery  is turned on.  For both
         cases, three averages are shown: the average over  all  vectors,  the
         average  over  the  worst  x  cases, and the average over the worst y
         cases. The mass averages are shown as percentages: '98' on the  first
         line  under  the  'prune/all' column means that overall 98 percent of
         the stochastic mass of the matrix was kept after pruning.

         This example demonstrates that many entries could  be  removed  while
         retaining  much  of  the  stochastic mass. The effect of the recovery
         (-R) parameter is also clear: the final averages are higher than  the
         initial  averages,  as a result of mcl undoing some overenthousiastic
         pruning.

         An average over the worst k cases is called  a  window  of  width  k;
         internally, mcl tracks many more such windows. The result of this can
         be seen when using the -do log option (which appends  a  log  to  the
         cluster  output)  or  the -do show-log option (which sends the log to
         STDOUT).  From a fixed set of windows those that are  applicable  are
         tracked,  that  is,  all  those  windows for which the width does not
         exceed the graph cardinality. The  windows  in  the  fixed  set  have
         respective  sizes  1,  2,  5,  10, 20, 50, and so on up until 5000000
         (which makes 15 windows in all).

      -nx i (x window index)
      -ny j (y window index)
         The options -nx and -ny both take an index in the  range  1..15.  The
         default values for -nx and -ny are respectively 4 and 7, denoting the
         fourth and seventh window of respective widths 10 and 100.  They  are
         used in the verbosity output as described above.

      -nj i (jury window index)
         The  -nj  denotes  a  window index in the same way as -nx and -ny do.
         This particular window is used for computing the  jury  marks,  which
         are  the  three  number  reported  by mcl when it is done. They are a
         reminder of the existence of pruning  and  its  importance  for  both
         speed  and accuracy, and they are indicative rather than authorative.

         These jury marks are simply the respective mass averages in the  jury
         window  for  the  first  three iterations. The marks are even further
         simplified and mapped to the jury synopsis, which is a  single  grade
         expressed  as  an  adjective.  The grades are, in decreasing order of
         achievement, perfect exceptional superior excellent  good  acceptable
         mediocre  poor  bad  lousy  miserable awful wretched atrocious. Doing
         'mcl --jury-charter' will tell you how the jury marks  map  onto  the
         jury synopsis.

         The  jury  marks  should preferably be higher than 70. If they are in
         the vicinity of 80 or 90, mcl is doing fine as far as pruning is con-
         cerned.   Choose  a higher scheme if you think them too low. For very
         dense graphs that do have strong cluster structure,  the  jury  marks
         can  sink  as low as to the 30's and 40's, but the clusterings gener-
         ated by mcl may still be good. The marks and the synopsis  grade  the
         severity  of pruning, not cluster quality. Note that the jury becomes
         friendlier, resp. harsher when the -nj option is increased/decreased.

      -nw w (nr of windows)
         Normally,  mcl  will use all windows that have width smaller than the
         cardinality of the input graph. This option limits the set of windows
         to  those w windows of smallest width.  This affects the -do log out-
         put.

      -nl l (number of iterations)
         By default, mcl will log the window mass averages for the  first  ten
         iterations.  This  options  sets  that  number  to l.  It affects the
         -do log output.

  PIPELINES
      In general, clustering requires several  stages;  creating  the  matrix,
      running  mcl,  and displaying the result. The display stage is supported
      by clmformat. The matrix creation stage often needs only  be  done  once
      for  a given data collection, followed by repeated runs of the other two
      stages for varying inflation values and scheme settings.

      The matrix creation stage can often be split  up  in  two  more  stages,
      namely parsing a data file in some given format, and assembling a matrix
      from the data bits and pieces, such as node indices and edge weights  or
      even edge weight contributions.  The assembly step can be done by mcxas-
      semble, which allows  a  very  general  input  format  and  customizable
      behaviour  in how the bits and pieces should be transformed to the input
      graph.  This leaves the parse stage to be filled in.

      The mclpipeline script implements a generic  and  customizable  pipeline
      encapsulating  the  four  stages  distinguished here (parsing, assembly,
      clustering, display). It is possible to let only part of the pipeline be
      active,  and  many  other  features  are  supported. The IO mechanism is
      entirely file based, and files are associated with parametrizations  via
      file name extensions (by all means a simple mechanism).

      mclpipeline  requires a single parse script to be specified.  It will be
      plugged into the pipeline and you should  be  set  to  run.   The  parse
      script   must  satisfy  the  interface  requirements  described  in  the
      mclpipeline manual page.

      For BLAST input, the mclblastline script provides a dedicated interface.
      It uses the mcxdeblast script that comes prepackaged with mcl.

  APPLICABILITY
      mcl  will work very well for graphs in which the diameter of the natural
      clusters is not too large. The presence of many edges between  different
      clusters  is not problematic; as long as there is cluster structure, mcl
      will find it. It is less likely to work well for  graphs  with  clusters
      (inducing  subgraphs)  of  large diameter, e.g. grid-like graphs derived
      from Euclidean data. So mcl in its canonical form is certainly  not  fit
      for boundary detection or image segmentation. I experimented with a mod-
      ified mcl and boundary detection in the thesis  pointed  to  below  (see
      REFERENCES).  This  was  fun and not entirely unsuccesful, but not some-
      thing to be pursued further.

      mcl likes undirected input graphs best, and it  really  dislikes  graphs
      with  node pairs (i,j) for which an arc going from i to j is present and
      the counter-arc from j to i is absent. Try  to  make  your  input  graph
      undirected.  Furthermore, mcl interprets edge weights in graphs as simi-
      larities. If you are used to working with dissimilarities, you will have
      to convert those to similarities using some conversion formula. The most
      important thing is that you feel confident  that  the  similarities  are
      reasonable, i.e. if X is similar to Y with weight 2, and X is similar to
      Z with weight 200, then this should mean that the similarity of Y (to X)
      is neglectible compared with the similarity of Z (to X).

      mcl  is  probably not suited for clustering tree graphs. This is because
      mcl works best if there are multiple paths between  different  nodes  in
      the  natural clusters, but in tree graphs there is only one path between
      any pair of nodes. Trees are too sparse a structure for mcl to work  on.

      mcl  may  well  be suited for clustering lattices. It will depend on the
      density characteristics of the lattice, and the conditions  for  success
      are  the same as those for clustering graphs in general: The diameter of
      the natural clusters should not be too large.  NOTE  when  clustering  a
      lattice,  you  have  to cluster the underlying undirected graph, and not
      the directed graph that represents the lattice  itself.  The  reason  is
      that one has to allow mcl (or any other cluster algorithm) to 'look back
      in time', so to speak. Clustering and  directionality  bite  each  other
      (long discussion omitted).

      mcl  has a worst-case time complexity O(N*k^2), where N is the number of
      nodes in the graph, and k is the maximum number  of  neighbours  tracked
      during  computations.  k  depends  on  the  -P and -S options. If the -S
      option is used (which is the default setting) then k  equals  the  value
      corresponding  with  this  option. Typical values for k are in the range
      500..1000. The average case is much better than the worst  case  though,
      as  cluster  structure  itself  has  the effect of helping mcl's pruning
      schemes, certainly if the diameter of natural clusters is not large.

  FILES
      There are currently no resource nor configuration files.  The mcl matrix
      format is described in the mcxio(5) section.

  ENVIRONMENT
      MCLXASCIIDIGITS
         When  writing  matrices in ascii format, mcl will use the environment
         variable MCLXASCIIDIGITS (if present) as  the  precision  (number  of
         digits) for printing the fractional part of values.

      MCLXIOVERBOSITY
         MCL  and  its  sibling  applications will usually report about matrix
         input/output from/to disk. The verbosity level can be  regulated  via
         MCLXIOVERBOSITY. These are the levels it can currently be set to.

          1  Silent but applications may alter this.
          2  Silent and applications can not alter this.
          4  Verbose but applications may alter this.
          8  Verbose and applications can not alter this (default).

      MCLXIOFORMAT
         MCL  and  its sibling applications will by default output matrices in
         ASCII format rather than binary format (cf. mcxio(5)).   The  desired
         format can be controlled via the variable MCLXIOFORMAT. These are the
         levels it can currently be set to.

          1  Ascii format but applications may alter this.
          2  Ascii format and applications can not alter this (default).
          4  Binary format but applications may alter this.
          8  Binary format and applications can not alter this.

      MCLXASCIIFLAGS
         If matrices are output in ascii format, by default empty vectors will
         not be listed. Equivalently (during input time), vectors for which no
         listing is present are understood to be empty - note that  the  pres-
         ence of a vector is established using the domain information found in
         the header part.  It is possible to enforce listing of empty  vectors
         by setting bit '1' in the variable MCLXASCIIFLAGS.

  DIAGNOSTICS
      If  mcl  issues  a  diagnostic error, it will most likely be because the
      input matrix could not be parsed succesfully.  mcl tries to  be  helpful
      in  describing  the  kind  of  parse  error.   The  mcl matrix format is
      described in the mcxio(5) section.

  BUGS
      No  known  bugs  at  this  time.  Please  send  bug  reports   to   mcl-
      devel@micans.org.

  AUTHOR
      Stijn van Dongen.

  HISTORY/CREDITS
      The  MCL  algorithm  was conceived in spring 1996 by the present author.
      The first implementation of the MCL algorithm followed that  spring  and
      summer.  It  was  written  in Perl and proved the viability of the algo-
      rithm. The implementation described here began its life in autumn  1997.
      The  first versions of the vital matrix library were designed jointly by
      Stijn van Dongen and Annius Groenink in the period Oktober  1997  -  May
      1999.  The efficient matrix-vector multiplication routine was written by
      Annius. This routine is without significant changes  still  one  of  the
      cornerstones of this MCL implementation.

      Since May 1999 all MCL libraries have seen much development and redesign
      by the present author. Matrix-matrix multiplication has  been  rewritten
      several times to take full advantage of the sparseness properties of the
      stochastic matrices brought forth by the MCL algorithm. This mostly con-
      cerns  the  issue of pruning - removal of small elements in a stochastic
      column in order to keep matrices sparse.

      Very instructive was that around April 2001 Rob Koopman pointed out that
      selecting  the  k largest elements out of a collection of n is best done
      using a min-heap. This was the key to  the  second  major  rewrite  (now
      counting  three)  of  the  MCL pruning schemes, resulting in much faster
      code, generally producing a more accurate computation of  the  MCL  pro-
      cess.

      In May 2001 Anton Enright initiated the parallellization of the mcl code
      and threaded inflation. From this  example,  Stijn  threaded  expansion.
      This  was  great, as the MCL data structures and operands (normal matrix
      multiplication and Hadamard multiplication) just beg  for  parallelliza-
      tion.

      In  Jan  2003 the 03-010 release introduced support for sparsely enumer-
      ated (i.e. indices need not be  sequential)  graphs  and  matrices,  the
      result of a major overhaul of the matrix library and most higher layers.
      Conceptually, the library now sees matrices  as  infinite  quadrants  of
      which only finite subsections happen to have nonzero entries.

      Joost  van  Baal  set  up  the  mcl CVS tree and packaged mcl for Debian
      GNU/Linux. He completely autotooled the sources,  so  much  so  that  at
      first  I  found  it hard to find them back amidst bootstrap, aclocal.m4,
      depcomp, and other beauties.

      Jan van der Steen shared his elegant mempool code. Philip Lijnzaad  gave
      useful  comments.  Philip,  Shawn  Hoon,  Abel  Ureta-Vidal,  and Martin
      Mokrejs sent helpful bug reports.

      Abel Ureta-Vidal and Dinakarpandian  Deendayal  commented  on  and  con-
      tributed to mcxdeblast and mcxassemble.

      Tim  Hughes contributed several good bug reports for mcxassemble, mcxde-
      blast and zoem (a workhorse for clmformat).

  SEE ALSO
      mclfaq(7) - Frequently Asked Questions.

      mcxio(5) - a description of the mcl matrix format.

      There are many more utilities. Consult mclfamily(7) for an  overview  of
      and  links to all the documentation and the utilities in the mcl family.

      mcl development is discussed on  mcl-devel@lists.micans.org,  (subscrib-
      tion)  information is at https://lists.micans.org:446/listinfo/mcl-devel
      , this list is archived  at  https://lists.micans.org:446/pipermail/mcl-
      devel/.

      mcl's home at http://micans.org/mcl/.

  REFERENCES
      Stijn van Dongen, Graph Clustering by Flow Simulation.  PhD thesis, Uni-
      versity of Utrecht, May 2000.
      http://www.library.uu.nl/digiarchief/dip/diss/1895620/inhoud.htm

      Stijn van Dongen. A cluster algorithm for graphs.  Technical Report INS-
      R0010,  National Research Institute for Mathematics and Computer Science
      in the Netherlands, Amsterdam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z

      Stijn van Dongen. A stochastic uncoupling process for graphs.  Technical
      Report  INS-R0011,  National Research Institute for Mathematics and Com-
      puter Science in the Netherlands, Amsterdam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0011.ps.Z

      Stijn van Dongen. Performance criteria for graph clustering  and  Markov
      cluster  experiments.  Technical  Report  INS-R0012,  National  Research
      Institute for Mathematics and Computer Science in the Netherlands,  Ams-
      terdam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0012.ps.Z

      Enright  A.J.,  Van Dongen S., Ouzounis C.A.  An efficient algorithm for
      large-scale  detection  of  protein  families,  Nucleic  Acids  Research
      30(7):1575-1584 (2002).

  NOTES
      This page was generated from ZOEM manual macros, http://micans.org/zoem.
      Both html and roff pages can be created from  the  same  source  without
      having  to  bother with all the usual conversion problems, while keeping
      some level of sophistication in the typesetting.



  mcl 1.004, 04-250                 6 Sep 2004                            mcl(1)
