Task: Biology
Description: Debian Med micro-biology packages
 This meta package will install Debian packages related to molecular biology,
 structural biology and bioinformatics for use in life sciences.

Depends:     altree, fastdnaml, njplot, tree-puzzle | tree-ppuzzle, treeviewx
Why:         Phylogenetic analysis.

Depends:     molphy, phylip, treetool
Why:         Phylogenetic analysis (Non-free, thus only suggested).

Depends:     fastlink, loki, r-cran-qtl
Why:         Genetics

Depends:     amap-align, blast2, boxshade, dialign, gff2aplot, emboss, hmmer, kalign, mummer, muscle, poa, probcons, proda, seaview, sim4, sibsim4, sigma-align, t-coffee, wise, exonerate
Why:         Sequence alignments and related programs.

Depends:     arb, clustalw | clustalw-mpi, clustalx
Why:         Sequence alignments and related programs (Non-free, thus only suggested).

Depends:     adun.app, garlic, gdpc, ghemical, gromacs, pymol, rasmol, autodock, autogrid
Why:         Molecular modelling and molecular dynamics.

Depends:     plasmidomics
Why:         Presentation

Depends:     biosquid, gff2ps, mipe, melting, ncbi-epcr, ncbi-tools-bin, ncbi-tools-x11, perlprimer, primer3, readseq, tigr-glimmer
Why:         Tools for the molecular biologist.

Suggests:    mozilla-biofox
Why:         Tools for the molecular biologist. Because of the dependency from firefox we only suggest this package to not bloat the system of the user.

Depends: dialign-tx

Depends: glam2

Suggests: pdb2pqr

Suggests: biococoa.app
Why: Only suggests, because current version in Debian breaks, has new upstream -
     at least version 1.7 might run under Linux, the new designed 2.0 seems to
     work under Mac OSX only and it is not really maintained (Debian QA group).
     We want to keep track of it anyway.

Depends: meme
Homepage: http://meme.nbcr.net/meme/
Responsible: Steffen Moeller <moeller@debian.org>
License: non-free for commercial purpose (http://meme.nbcr.net/meme/COPYRIGHT.html)
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/meme/trunk/?rev=0&sc=0
Pkg-Description: motif discovery and search
 MEME is a tool for discovering motifs in a group of related DNA or protein
 sequences.  A motif is a sequence pattern that occurs repeatedly in a group
 of related protein or DNA sequences. MEME represents motifs as position-dependent
 letter-probability matrices which describe the probability of each possible
 letter at each position in the pattern. Individual MEME motifs do not contain
 gaps. Patterns with variable-length gaps are split by MEME into two or more
 separate motifs.
 .
 MEME takes as input a group of DNA or protein sequences (the training set)
 and outputs as many motifs as requested. MEME uses statistical modeling
 techniques to automatically choose the best width, number of occurrences,
 and description for each motif.

Depends: vienna-rna
Homepage: http://www.tbi.univie.ac.at/~ivo/RNA/
Responsible: Steffen Moeller <moeller@debian.org>
License: non-free but redistributable
WNPP: 451193
Pkg-Description: RNA sequence analysis
 The Vienna RNA Package consists of a C code library and several
 stand-alone programs for the prediction and comparison of RNA secondary
 structures.

Depends: cytoscape
Homepage: http://cytoscape.org/
Responsible: Mike Smoot <mes@aescon.com>
License: LGPL
WNPP: 465331
Pkg-Description: visualizing molecular interaction networks
 Cytoscape is a bioinformatics software platform for visualizing molecular
 interaction networks and integrating these interactions with gene expression
 profiles and other state data.  Additional features are available as plugins.

Depends: ballview
Homepage: http://www.ballview.org
Responsible: Andreas Moll <kerosin@gmx.de>
Pkg-URL: http://mentors.debian.net/debian/pool/main/b/ballview/
License: LGPL
Pkg-Description: free molecular modeling and molecular graphics tool
 BALLView provides fast OpenGL-based visualization of molecular structures,
 molecular mechanics methods (minimization, MD simulation using the
 AMBER, CHARMM, and MMFF94 force fields), calculation and visualization
 of electrostatic properties (FDPB) and molecular editing features.
 .
 BALLView is based on BALL (Biochemical Algorithms Library) ,
 which is currently being developed in the groups of Hans-Peter Lenhof
 (Saarland University, Saarbruecken, Germany) and Oliver Kohlbacher
 (University of Tuebingen, Germany). BALL is an application framework
 in C++ that has been specifically designed for rapid software
 development in Molecular Modeling and Computational Molecular Biology.
 It provides an extensive set of data structures as well as classes
 for Molecular Mechanics, advanced solvation methods, comparison and
 analysis of protein structures, file import/export, and visualization.

Depends: raxml
Homepage: http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm
License: GPL
Pkg-Description: Randomized Axelerated Maximum Likelihood
 RAxML is a program for sequential and parallel Maximum Likelihood-based
 inference of large phylogenetic trees. It has originally been derived
 from fastDNAml.
 .
 There are freely accessible web-servers available for RAxML at
 http://phylobench.vital-it.ch/raxml-bb/ and
 http://8ball.sdsc.edu:8889/cipres-web/Bootstrap.do .

Depends: axparafit
Homepage: http://icwww.epfl.ch/~stamatak/AxParafit.html
Responsible: David Paleino <d.paleino@gmail.com>
License: GPL
WNPP: 464323
Pkg-Description: optimized statistical analysis of host-parasite coevolution
 AxParafit is a highly optimized version of Pierre Legendre's Parafit
 program for statistical analysis of host-parasite coevolution.
 AxParafit has been parallelized with MPI (Message Passing Interface)
 for compute clusters and was used to carry out the largest
 co-evolutionary analysis to date for the paper describing the software.

Depends: axpcoords
Homepage: http://icwww.epfl.ch/~stamatak/AxParafit.html
Responsible: David Paleino <d.paleino@gmail.com>
License: GPL
WNPP: 464323
Pkg-Description: LAPACK-based implementation of DistPCoA
 AxPcoords is a fast, LAPACK-based implementation of DistPCoA (see
 http://www.bio.umontreal.ca/Casgrain/en/labo/distpcoa.html)
 which is another program by Pierre Legendre, it conducts a principal
 coordinates analysis.
 This program is required for the pipeline that conducts a full host-parasite
 co-phylogenetic analysis in combination with AxParafit.

Depends: copycat
Homepage: http://www-ab.informatik.uni-tuebingen.de/software/copycat/welcome.html
License: Use of the program is free for academic purposes at an academic institute. For all other uses, please contact the authors.
Pkg-Description: fast access to cophylogenetic analyses
 CopyCat provides an easy and fast access to cophylogenetic analyses.
 It incorporates a wrapper for the program ParaFit, which conducts a
 statistical test for the presence of congruence between host and
 parasite phylogenies. CopyCat offers various features, such as the
 creation of customized host-parasite association data and the
 computation of phylogenetic host/parasite trees based on the NCBI taxonomy.

Depends: mustang
Homepage: http://www.cs.mu.oz.au/~arun/mustang/
Responsible: Morten Kjeldgaard <mok@bioxray.au.dk>
License: 3 clause BSD
WNPP: 459637
Pkg-Description: multiple structural alignment of proteins
 Mustang is an algorithm for structural alignment of multiple
 protein structures. Given a set of PDB files, the program uses the
 spatial information in the Calpha atoms of the set to produce a sequence
 alignment. Based on a progressive pairwise heuristic the algorithm
 then proceeds through a number of refinement passes. Mustang
 reports the multiple sequence alignment and the corresponding
 superposition of structures.

Depends: btk-core
Homepage: http://sourceforge.net/projects/btk/
Responsible: Morten Kjeldgaard <mok@bioxray.au.dk>
License: GPL
WNPP: 459753
Pkg-Description: biomolecule Toolkit C++ library
 The Biomolecule Toolkit is a library for modeling biological
 macromolecules such as proteins, DNA and RNA. It provides a C++ interface
 for common tasks in structural biology to facilitate the development of
 molecular modeling, design and analysis tools.

Depends: tacg
Homepage: http://sourceforge.net/projects/tacg
Responsible: Charles Plessy <plessy@debian.org>
License: GPL and others
WNPP: 461504
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/tacg/trunk/?rev=0&sc=0
Pkg-Description: command line program for finding patterns in nucleic acids
 tacg is a character-based, command line tool for unix-like operating systems
 for pattern-matching in nucleic acids and performing some of the basic protein
 manipulations. It was originally designed for restriction enzyme analysis of
 DNA, but has been extended to other types of matching. It now handles
 degenerate sequence input in a variety of matching approaches, as well as
 patterns with errors, regular expressions and TRANSFAC-formatted matrices.
 .
 It was designed to be a grep for DNA and like the original grep, its
 capabilities have grown so that now the author has to keep calling up the help
 page to figure out which flags (now ~50) mean what. tacg is NOT a GUI
 application in any sense. However, it's existance as a strictly command-line
 tool lends itself well to Webification and wrapping by various GUI tools and
 it is now distributed with a web interface form and a Perl CGI handler.
 Additionally, it can easily be integrated into editors that support shell
 commands such as nedit.
 .
 The use of tacg may be cited as: Mangalam, HJ. (2002) tacg, a grep for DNA.
 BMC Bioinformatics. 3:8  http://www.biomedcentral.com/1471-2105/3/8

Depends: treeplot
Responsible: Charles Plessy <plessy@debian.org>
License: GPL
WNPP: 461508
Pkg-URL: http://www-id.imag.fr/Laboratoire/Membres/Danjean_Vincent/deb.html#treeplot
Pkg-Description: Phylogenetic tree file converter
 Treeplot is a conversion tool, from "Phylip" phylogenetic tree file to
 Postscript (.ps), Adobe Illustrator (.ai), Scalable Vector Graphic
 (.svg), Computer Graphic Metafile(.cgm), Hewlet Packard Graphic Language
 (.hpgl), xfig file (.fig), gif image file(.gif), PBM Portable aNy Map
 file (.pnm)
 .
 The upstream author Olivier Langella says: 'I think that "treeplot"
 is outdated. "Treeviewx" is an equivalent that works great and it is
 already packaged. ... you can replace "treeplot" with
 "populations". I would be pleased if "populations" became a Debian
 package.'  So this package should probably be delisted in favour of
 populations (see http://lists.debian.org/debian-med/2008/03/msg00124.html).

Depends: treevolve
Homepage: http://evolve.zoo.ox.ac.uk/software.html?id=Treevolve
Responsible: Charles Plessy <plessy@debian.org>
License: has to be verified
WNPP: 461510
Pkg-URL: http://www-id.imag.fr/Laboratoire/Membres/Danjean_Vincent/deb.html#treevolve
Pkg-Description: simulation of evolution of DNA sequences
 treevolve will simulate the evolution of DNA sequences under a
 coalescent model, which allows exponential population growth,
 population subdivision according to an island model, migration and
 recombination. In addition different periods of population dynamics
 can be enforced at different times. For example, a period of
 exponential growth can be followed by a period of stasis where the
 population is subdivided into demes. Multiple sets of such simulated
 sequence data can then be compared to sequence data sampled from a
 population of interest using suitable statistics, and various
 evolutionary hypotheses concerning the evolution of this population
 tested.
 .
 Citation: Population dynamics of HIV-1 inferred from gene sequences
 Grassly NC, Harvey PH & Holmes EC (1999) Genetics 151, 427-438.

Depends: infernal
Homepage: http://infernal.janelia.org/
Responsible: Steffen Moeller <moeller@debian.org>
License: GPL
WNPP: 441840
Pkg-URL: http://packages.debian.org/source/experimental/infernal
Pkg-Description: RNA sequence comparison
 Infernal ("INFERence of RNA ALignment") is for searching DNA sequence
 databases for RNA structure and sequence similarities. It is an
 implementation of a special case of profile stochastic context-free
 grammars called covariance models (CMs). A CM is like a sequence
 profile, but it scores a combination of sequence consensus and RNA
 secondary structure consensus, so in many cases, it is more capable of
 identifying RNA homologs that conserve their secondary structure more
 than their primary sequence.
 .
 The tool is an integral component of the Rfam database.
 .
 Users of this package should cite:
 "Query-Dependent Banding (QDB) for Faster RNA Similarity Searches."
  E. P. Nawrocki, S. R. Eddy. PLoS Comput. Biol., 3:e56, 2007.

Depends: mauvealigner
Homepage: http://asap.ahabs.wisc.edu/mauve/
Responsible: Andreas Tille <tille@debian.org>
License: GPL
WNPP:
Pkg-Description: multiple genome alignment
 Mauve is a system for efficiently constructing multiple genome alignments
 in the presence of large-scale evolutionary events such as rearrangement
 and inversion. Multiple genome alignment provides a basis for research
 into comparative genomics and the study of evolutionary dynamics.  Aligning
 whole genomes is a fundamentally different problem than aligning short
 sequences.
 .
 Mauve has been developed with the idea that a multiple genome aligner
 should require only modest computational resources. It employs algorithmic
 techniques that scale well in the amount of sequence being aligned. For
 example, a pair of Y. pestis genomes can be aligned in under a minute,
 while a group of 9 divergent Enterobacterial genomes can be aligned in
 a few hours.
 .
 Mauve computes and interactively visualizes genome sequence comparisons.
 Using FastA or GenBank sequence data, Mauve constructs multiple genome
 alignments that identify large-scale rearrangement, gene gain, gene loss,
 indels, and nucleotide substutition.
 .
 Mauve is developed at the University of Wisconsin.
 .
 Note: There are instructions for compiling Mauve from source available at
 http://asap.ahabs.wisc.edu/mauve/mauve-developer-guide/compiling-mauvealigner-from-source.html

Depends: asap
Homepage: http://asap.ahabs.wisc.edu/software/asap/
Responsible: Andreas Tille <tille@debian.org>
License: GPL
Pkg-Description: organize the data associated with a genome
 Developments in genome-wide approaches to biological research have
 yielded greatly increased quantities of data, necessitating the cooperation
 of communities of scientists focusing on shared sets of data. ASAP
 leverages the internet and database technologies to meet these needs.
 ASAP is designed to organize the data associated with a genome from the
 early stages of sequence annotation through genetic and biochemical
 characterization, providing a vehicle for ongoing updates of the annotation
 and a repository for genome-scale experimental data. Development was
 motivated by the need to more directly involve a greater community of
 researchers, with their collective expertise, in keeping the genome
 annotation current and to provide a synergistic link between up-to-date
 annotation and functional genomic data. The system is continually under
 development at the Genome Evolution Lab with the stable, in-use, publicly
 available University of Wisconsin installation updated regularly.
 .
 Software development on ASAP began in early 2002, and ASAP has been
 continually improved up until the present day. A longstanding goal of
 the ASAP project was to make the source code of ASAP available so that
 other installations of ASAP could be implemented. As future ASAP
 installations come to pass, ASAP will be further extended to be
 inter-operable between sites.

Depends: emboss-kaptain
Homepage: http://userpage.fu-berlin.de/~sgmd/download.html
Responsible: Charles Plessy <plessy@debian.org>
License: GPL-2+
WNPP: 466682
Pkg-Description: graphical interface to EMBOSS using Kaptain
 EMBOSS.kaptn is a graphical user interface (GUI) for more than 200
 programms of the EMBOSS sequence analysis package. It uses Kaptain, a
 universal front-end for command line applications. EMBOSS is a
 collection of high-quality free Open Source software for sequence
 analysis.  With EMBOSS.kaptn it integrates nicely into X window based
 desktops like KDE.

Depends: agdbnet
Homepage: http://pubmlst.org/software/database/agdbnet/
Responsible: Andreas Tille <tille@debian.org>
License: GPL
WNPP:
Pkg-Description: antigen sequence database software for web-based bacterial typing
 AgdbNet is antigen sequence database software for web-based bacterial
 typing. The software facilitates simultaneous BLAST querying of multiple
 loci using either nucleotide or peptide sequences. It's written in Perl
 and runs on Linux/UNIX systems.
 .
 Databases are described by XML files and can have any number of loci, which
 may be defined by nucleotide and/or peptide sequences. The databases can
 optionally have integral isolate tables so that information about representative
 isolates can be retrieved or they may be configured to query external isolate
 databases, such as those hosted on PubMLST.org.
 .
 The software is used on a number of public bacterial typing databases:
  * Neisseria PorA variable regions | PorB | FetA
  * Campylobacter flaA
  * Streptococcus equi seM

Depends: gamgi
Homepage: http://www.gamgi.org/
Responsible: Steffen Moeller <moeller@debian.org>
License: Free
WNPP: 465994
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/gamgi/trunk/?rev=0&sc=0
Pkg-Description: general atomistic modelling graphic interface
 GAMGI provides a graphical user interface for the handling
 of molecular structures.

Depends: martj
Homepage: http://www.ebi.ac.uk/biomart/
Responsible: Steffen Moeller <moeller@debian.org>
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/martj/trunk/?rev=0&sc=0
Pkg-Description: distributed data integration system for biological data
 BioMart is a simple, distributed data integration system with
 powerful query capabilities. The BioMart data model has been applied
 to the following data sources: UniProt Proteomes, Macromolecular
 Structure Database (MSD), Ensembl, Vega, and dbSNP.

Depends: cluster3
Homepage: http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#ctv
License: non-free
WNPP: #286167
Responsible: Steffen Moeller <moeller@debian.org>
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/cluster3/trunk/?rev=0&sc=0
Pkg-Description: find clustering solutions for genome data
 Cluster 3.0 is an enhanced version of Cluster, which was originally
 developed by Michael Eisen while at Stanford University. The main
 improvement consists of the k-means algorithm, which now includes
 multiple trials to find the best clustering solution. This is crucial
 for the k-means algorithm to be reliable. The routine for self-organizing
 maps was extended to include 2D rectangular geometries. The Euclidean
 distance and the city-block distance were added to the available
 measures of similarity.

Depends: jmol
Homepage: http://jmol.sourceforge.net/
Responsible: Daniel Leidert <daniel.leidert.spam@gmx.net>
License: LGPL
Pkg-URL: http://debian.wgdd.de/temp/jmol/
Pkg-Description: molecule viewer
 Jmol is a free, open source molecule viewer for students, educators,
 and researchers in chemistry and biochemistry.
  * The JmolApplet is a web browser applet that can be integrated into web pages.
  * The Jmol application is a standalone Java application that runs on the
    desktop.
  * The JmolViewer is a development tool kit that can be integrated into other
    Java applications.
 .
 For more detailed information about packaging status please see
 http://lists.debian.org/debian-med/2008/03/msg00097.html

Depends: jtreeview
Homepage: http://jtreeview.sourceforge.net/
Responsible: Steffen Moeller <moeller@debian.org>
License: GPL
WNPP: 243771
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/treeview/trunk/?rev=0&sc=0
Pkg-Description: Java re-implementation of Michael Eisen's TreeView
 TreeView creates a matrix-like display of expression data, known as
 Eisen clustering. The original implementation was a Windows program
 named TreeView by Michael Eisen. This TreeView package, sometimes also
 referred to as jTreeView, was rewritten in Java under a free license,
 the original implementation also comes with the source code, but controls
 commercial distribution. And it did not run on Unix.
 .
 Java TreeView is an extensible viewer for microarray data in
 PCL or CDT format.

Depends: smile
Homepage: http://www-igm.univ-mlv.fr/~marsan/smile_english.html
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 221492
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/smile/trunk/?rev=0&sc=0
Pkg-Description: infer motifs in a set of sequences
 SMILE is a tool that infers motifs in a set of sequences, according to some
 criteria. It was first made to infer exceptional sites as binding sites in
 DNA sequences. Since the 1.4 version, it allows to infer motifs written on
 any alphabet (even degenerate) in any kind of sequences.
 .
 The specificity of SMILE is to allow to deal with what we call structured
 motifs, which are motifs associated by some distance constraints.

Depends: cactus
Homepage: http://www.cactuscode.org/Community/Biology.html
License: GPL
Pkg-Description:
 Cactus is an open source problem solving environment designed for scientists
 and engineers. Its modular structure easily enables parallel computation
 across different architectures and collaborative code development between
 different groups.
 .
 Cactus provides easy access to many cutting edge software technologies being
 developed in the academic research community, including the Globus
 Metacomputing Toolkit, HDF5 parallel file I/O, the PETSc scientific library,
 adaptive mesh refinement, web interfaces, and advanced visualization tools.

Depends: contralign
Homepage: http://contra.stanford.edu/contralign/
License: Public Domain
Pkg-Description: parameter learning framework for protein pairwise sequence alignment
 CONTRAlign is an extensible and fully automatic parameter learning
 framework for protein pairwise sequence alignment based on pair
 conditional random fields. The CONTRAlign framework enables the
 development of feature-rich alignment models which generalize well to
 previously unseen sequences and avoid overfitting by controlling model
 complexity through regularization.

Depends: galaxy
Homepage: http://g2.trac.bx.psu.edu/
License: MIT
WNPP: 432472
Pkg-Description: manipulate sequences and annotation files
 Galaxy is a web-based tool allowing users to perform operations which
 are usually done with command-line interface. Using galaxy, one can
 manipulate sequences and annotation files in many formats. Galaxy has
 strong ties with the UCSC genome browser, and makes it easy to
 visualise modified annotation files as a custom track.

Depends: genographer
Homepage: http://hordeum.oscs.montana.edu/genographer/
License: GPL
Pkg-Description: read data and reconstruct them into a gel image
 This program will read in data from an ABI 3700, 3100, 377 or 373,
 CEQ 2000 or SCF and reconstruct them into a gel image which is
 straightened and sized. Bins can be defined easily and viewed as
 thumbnails, which allows for a fairly quick and easy way of scoring a gel.
 .
 The program is written in Java and uses the Java 1.3 API. Therefore,
 it should run on any machine that can run java.

Depends: molekel
Homepage: http://bioinformatics.org/molekel/wiki/Main/HomePage
License: GPL
Pkg-Description: multiplatform molecular visualization
 Molekel is an opensource (GPL) multiplatform molecular visualization
 program being developed at the Swiss National Supercomputing Centre
 (CSCS).

Depends: pftools
Homepage: ftp://us.expasy.org/databases/prosite/tools/ps_scan/sources
License: GPL
Pkg-Description: tools to handle patterns from PROSITE
 ps_scan is a perl program used to scan one or several patterns, rules
 and/or profiles from PROSITE against one or several protein sequences
 in Swiss-Prot or FASTA format. It requires two compiled external
 programs from the PFTOOLS, which are also distributed with the sources.

Depends: proalign
Homepage: http://evol-linux1.ulb.ac.be/ueg/ProAlign/
License: GPL
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 378290
Pkg-Description: Probabilistic multiple alignment program
 ProAlign performs probabilistic sequence alignments using hidden Markov
 models (HMM). It includes a graphical interface (GUI) allowing to (i)
 perform alignments of nucleotide or amino-acid sequences, (ii) view the
 quality of solutions, (iii) filter the unreliable alignment regions and
 (iv) export alignments to other softwares.
 .
 ProAlign uses a progressive method, such that multiple alignment is
 created stepwise by performing pairwise alignments in the nodes of a
 guide tree. Sequences are described with vectors of character
 probabilities, and each pairwise alignment reconstructs the ancestral
 (parent) sequence by computing the probabilities of different
 characters according to an evolutionary model. It has been published in
 Bioinformatics. 2003 Aug 12;19(12):1505-13.

Depends: ssaha
Homepage: http://www.sanger.ac.uk/Software/analysis/SSAHA/
License: GPL
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 425111
Pkg-Description: Sequence Search and Alignment by Hashing Algorithm
 SSAHA is a software tool for very fast matching and alignment of DNA
 sequences. It achieves its fast search speed by converting sequence
 information into a `hash table' data structure, which can then be
 searched very rapidly for matches. It was published by Ning Z,
 Cox AJ, Mullikin JC in Genome Res. 2001;11;1725-9.
 .
 SSAHA is the only free software of its category (fast search of nearly
 indentical sequences). The popular alternative, BLAT, is restricted to
 non-commercial use.
 .
 Unfortunately the source of its successor ssaha2
 http://www.sanger.ac.uk/Software/analysis/SSAHA2/
 does not seem to be available.

Depends: ngila
Homepage: http://scit.us/projects/ngila/
License: GPLv3
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 439996
Pkg-Description: global pairwise alignments with logarithmic and affine gap costs
 Ngila is an application that will find the best alignment of a pair
 of sequences using log-affine gap costs, which are the most
 biologically realistic gap costs.
 .
 Ngila implements the Miller and Myers (1988) algorithm in order to
 find a least costly global alignment of two sequences given homology
 costs and a gap cost. Two versions of the algorithm are
 included: holistic and divide-and-conquer. The former is faster but
 the latter utilizes less memory. Ngila starts with the
 divide-and-conquer method but switches to the holistic method for
 subsequences smaller than a user-established threshold. This improves
 its speed without substantially increasing memory requirements. Ngila
 also allows users to assign costs to end gaps that are smaller than
 costs for internal gaps. This is important for aligning using the
 free-end-gap method.
 .
 Ngila is published in Cartwright RA Bioinformatics 2007
 23(11):1427-1428; doi:10.1093/bioinformatics/btm095

Depends: tm-align
Homepage: http://zhang.bioinformatics.ku.edu/TM-align/
License: free to change and redistribute
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 447505
Pkg-Description: structural protein alignment
 TM-align is a structural alignment program for comparing two proteins
 whose sequences can be different. TM-align will first find the best
 equivalent residues of two proteins based on the structure similarity
 and then output a TM-score.
 .
 TM-align performs a structural alignment of protein sequences. It is
 said to be 10 times faster than DALI and no worse in accuracy.

Depends: staden
Homepage: http://staden.sourceforge.net/
License: BSD
Pkg-Description: DNA sequence assembly (Gap4), editing and analysis tools
 A fully developed set of DNA sequence assembly (Gap4), editing and
 analysis tools (Spin).

Depends: dazzle
Homepage: http://www.biojava.org/dazzle
Responsible: Steffen Moeller <moeller@debian.org>
License: LGPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/dazzle/trunk/?rev=0&sc=0
Pkg-Description: Java-based DAS server
 Dazzle is a general purpose server for the Distributed Annotation System
 (DAS) protocol. It is implemented as a Java servlet, using the BioJava
 APIs. Dazzle is a modular system which uses small "datasource" plugins to
 provide access to a range of databases. Several general-purpose plugins
 are included in the package, and it it straightforward to develop new
 plugins to connect to your own databases.
 .
 Information on DAS is available from http://www.biodas.org/

Depends: mgltools
Homepage: http://mgltools.scripps.edu
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 458811
License: various custom non-free, mostly academia-only
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/mgltools/trunk/?rev=0&sc=0
Pkg-Description: preparation of proteins and ligands to investigate their binding
 The package comprises AutoDockTools, Python Molecular Viewer, Vision and
 many helping smaller python libraries which shall all become available
 as Debian packages ... if their license permits. The essential mslib
 library for instance comes with binaries which should not go into
 Debian.  The analysis itself is performed with AutoDock, which was
 recently made available under the GPL.
 .
 Upstream is very supportive but their binary-only libraries are of a
 third party. This set of tools is well known across biochemistry and
 computational/structural biology.

Depends: ecell
Homepage: http://www.e-cell.org/
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 241195
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/ecell/trunk/?rev=0&sc=0
Pkg-Description: Concept and environment for constructing virtual cells on computers
 The E-Cell Project is an international research project aiming at
 developing necessary theoretical supports, technologies and software
 platforms to allow precise whole cell simulation.
 .
 The E-Cell System is an object-oriented software suite for modeling,
 simulation, and analysis of large scale complex systems such as
 biological cells, architected by Kouichi Takahashi and written by
 a team of developers.
 .
 The core part of the system, E-Cell Simulation Environment version 3,
 allows many components driven by multiple algorithms with different
 timescales to coexist.
 .
 E-Cell System consists of the following three major parts:
  * E-Cell Simulation Environment (or E-Cell SE)
  * E-Cell Modeling Environment (or E-Cell ME)
  * E-Cell Analysis Toolkit.

Depends: ncoils
Homepage: http://www.russell.embl.de/cgi-bin/coils-svr.pl
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 299856
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/ciols/trunk/?rev=0&sc=0
Pkg-Description: coiled coil secondary structure prediction
 The program predicts the coiled coil secondary structure predictions
 from protein sequences. The algorithm was published in
 Lupas, van Dyke & Stock, Predicting coiled coils from
 protein sequences Science, 252, 1162-1164, 1991.

Depends: haploview
Homepage: http://www.broad.mit.edu/mpg/haploview/
Responsible: Steffen Moeller <moeller@debian.org>
WNPP: 311421
License: DFSG free
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/haploview/trunk/?rev=0&sc=0
Pkg-Description: Analysis and visualization of LD and haplotype maps
 This tools assists in the analysis of the nucleotide
 variation in a population. Such investigations are performed
 to determine genes and genetic pathways that are associated
 with diseases. This is an early stage in the quest for new drugs.

Depends: mauvealigner
Homepage: http://asap.ahabs.wisc.edu/mauve/
Responsible: Andreas Tille <tille@debian.org>
License: GPL
Pkg-URL: http://people.debian.org/~tille/packages/mauvealigner/
Pkg-Description: multiple genome alignment algorithms
 The mauveAligner and progressiveMauve alignment algorithms have been
 implemented as command-line programs included with the downloadable Mauve
 software.  When run from the command-line, these programs provide options
 not yet available in the graphical interface.
 .
 Mauve is a system for efficiently constructing multiple genome alignments
 in the presence of large-scale evolutionary events such as rearrangement
 and inversion. Multiple genome alignment provides a basis for research
 into comparative genomics and the study of evolutionary dynamics.  Aligning
 whole genomes is a fundamentally different problem than aligning short
 sequences.
 .
 Mauve has been developed with the idea that a multiple genome aligner
 should require only modest computational resources. It employs algorithmic
 techniques that scale well in the amount of sequence being aligned. For
 example, a pair of Y. pestis genomes can be aligned in under a minute,
 while a group of 9 divergent Enterobacterial genomes can be aligned in
 a few hours.
 .
 Mauve computes and interactively visualizes genome sequence comparisons.
 Using FastA or GenBank sequence data, Mauve constructs multiple genome
 alignments that identify large-scale rearrangement, gene gain, gene loss,
 indels, and nucleotide substutition.
 .
 Mauve is developed at the University of Wisconsin.

Depends: gbrowse
Homepage: http://www.gmod.org/wiki/index.php/GBrowse
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 429610
License: Perl Artistic License, plus additional clauses
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/gbrowse
Pkg-Description: The Generic Genome Browser from GMOD
 The Generic Genome Browser is a combination of database and interactive Web
 page for manipulating and displaying annotations on genomes. Some of its
 features:
  * Simultaneous bird's eye and detailed views of the genome.
  * Scroll, zoom, center.
  * Attach arbitrary URLs to any annotation.
  * Order and appearance of tracks are customizable by administrator and end-user.
  * Search by annotation ID, name, or comment.
  * Supports third party annotation using GFF formats.
  * Settings persist across sessions.
  * DNA and GFF dumps.
  * Connectivity to different databases, including BioSQL and Chado.
  * Multi-language support.
  * Third-party feature loading.
  * Customizable plug-in architecture (e.g. run BLAST, dump & import many formats,
    find oligonucleotides, design primers, create restriction maps, edit features)

Depends: mira
Homepage: http://chevreux.org/projects_mira.html
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 435915
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/mira/trunk/?rev=0&sc=0
Pkg-Description: Whole Genome Shotgun and EST Sequence Assembler
 The mira genome fragment assembler is a specialised assembler for
 sequencing projects classified as 'hard' due to high number of similar
 repeats. For expressed sequence tags (ESTs) transcripts, miraEST is
 specialised on reconstructing pristine mRNA transcripts while
 detecting and classifying single nucleotide polymorphisms (SNP)
 occuring in different variations thereof.
 .
 The assembler is routinely used for such various tasks as mutation
 detection in different cell types, similarity analysis of transcripts
 between organisms, and pristine assembly of sequences from various
 sources for oligo design in clinical microarray experiments.

Depends: phylographer
Homepage: http://www.atgc.org/PhyloGrapher/PhyloGrapher_Welcome.html
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 426489
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/phylographer/trunk/?rev=0&sc=0
Pkg-Description: Graph Visualization Tool
 PhyloGrapher is a program designed to visualize and study evolutionary
 relationships within families of homologous genes or proteins
 (elements).  PhyloGrapher is a drawing tool that generates custom graphs
 for a given set of elements. In general, it is possible to use
 PhyloGrapher to visualize any type of relations between elements.
 Used in conjunction with tcl_blast_parser, PhyloGrapher can represent
 the results of a BLAST search as a graph.
 .
 PhyloGrapher and tcl_blast_parser are useful tools to analyse BLAST
 biological sequence alignment reports (BLAST is provided by Debian's
 blast2 package).

Depends: phylowin
Homepage: http://pbil.univ-lyon1.fr/software/phylowin.html
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 395840
License: unknown
Pkg-Description: Graphical interface for molecular phylogenetic inference
 Phylo_win is a graphical colour interface for molecular phylogenetic
 inference. It performs neighbor-joining, parsimony and maximum
 likelihood methods and bootstrap with any of them. Many distances can be
 used including Jukes & Cantor, Kimura, Tajima & Nei, HKY, Galtier & Gouy
 (1995), LogDet for nucleotidic sequences, Poisson correction for protein
 sequences, Ka and Ks for codon sequences. Species and sites to include
 in the analysis are selected by mouse. Reconstructed trees can be drawn,
 edited, printed, stored and evaluated according to numerous criteria.
 .
 This program uses sources files from the Phylip program, which forbids
 its use for profit.  Therfore, Phylo_win will unfortunately have to be
 distributed in contrib or non-free.

Depends: seq-gen
Homepage: http://tree.bio.ed.ac.uk/software/seqgen/
License: Free
Pkg-Description: simulate the evolution of nucleotide or amino acid sequences
 Seq-Gen is a program that will simulate the evolution of nucleotide or
 amino acid sequences along a phylogeny, using common models of the
 substitution process. A range of models of molecular evolution are
 implemented including the general reversible model. State frequencies
 and other parameters of the model may be given and site-specific rate
 heterogeneity may also be incorporated in a number of ways. Any number
 of trees may be read in and the program will produce any number of data
 sets for each tree. Thus large sets of replicate simulations can be easily
 created. It has been designed to be a general purpose simulator that
 incorporates most of the commonly used (and computationally tractable)
 models of molecular sequence evolution.

Depends: wgs-assembler
Homepage: http://wgs-assembler.sourceforge.net/
Responsible: Charles Plessy <plessy@debian.org>
WNPP: 395843
License: GPL
Pkg-Description: Whole-Genome Shotgun Assembler
 Celera Assembler is scientific software for DNA research. It can
 reconstruct long sequences of genomic DNA given the fragmentary data
 produced by whole-genome shotgun sequencing. The Celera Assembler
 enabled many advances in genomics, including the first genome
 sequence of a multi-cellular organism and the first diploid sequence
 of an individual human.
 .
 The Celera Assembler is a member of a class of software called
 whole-genome shotgun assemblers. The Celera Assembler is mature,
 efficient, open-source software with a long record of contributions
 to science. Celera Assembler is written mostly in C for unix
 operating systems. Although it requires large compute resources to
 resolve complex genomes, it can assemble bacterial genomes on a
 laptop.
 .
 This important software is an "open source" project. Originally
 developed at Celera Genomics, it was released under the GNU Public
 License and deposited on a public repository (Source Forge) in
 2004. Scientists around the world can download, build, and run the
 software without restriction. In addition, they can inspect the
 source code and alter it at their own sites. Workers at JCVI and a
 few other institutes regularly submit their code alterations to the
 public repository.
 .
 JCVI has made many important contributions to Celera
 Assembler. Scientists and engineers at JCVI are extending the code to
 handle more and more polymorphic data sets, including environmental
 samples. In collaboration with scientists at the University of
 Maryland, they are adding the capability to assemble pyrosequencing
 data (as from a 454 FLX machine) in addition to the traditional
 Sanger sequencing data (as from an ABI 3730 machine). JCVI's efforts
 provide the cutting edge software that genome scientists around the
 world will need as they apply DNA sequencing technology to more and
 more difficult problems of biology.
 .
 See also: http://www.jcvi.org/cms/research/software/celera-assembler/overview/
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: gbioseq
Homepage: http://www.bioinformatics.org/project/?group_id=94
License: GPL
Pkg-Description: DNA sequence editor for Linux
 gBioSeq is in an early stage of development, but it is already running.
 The goal is to provide an easy to use software to edit DNA sequences under
 Linux, Windows, MacOsX, using GTK C# (Mono).

Depends: populations
Homepage: http://bioinformatics.org/~tryphon/populations/
License: GPL
Pkg-Description: individuals or populations distances based on allelic frequencies
 Population genetic software (individuals or populations distances, phylogenetic trees)
  * haploids, diploids or polyploids genotypes (see input formats)
  * structured populations (see input files structured populations
  * No limit of populations, loci, alleles per loci (see input formats)
  * Distances between individuals (15 different methods)
  * Distances between populations (15 methods)
  * Bootstraps on loci OR individuals
  * Phylogenetic trees (individuals or populations), using Neighbor Joining or UPGMA
    (PHYLIP tree format)
  * Allelic diversity
  * Converts data files from Genepop to different formats (Genepop, Genetix, Msat,
    Populations...)

Depends: phpphylotree
Homepage: http://www.bioinformatics.org/project/?group_id=372
License: GPL
Pkg-Description: draw phylogenetic trees
 PhpPhylotree is a web application that is able to draw phylogenetic trees.
 It produces an SVG (Scalable Vector Graphic) file from phylip/newick tree files.

Depends: tracetuner
Homepage: http://www.jcvi.org/cms/research/software/tracetuner/overview
License: GPL; but US Patent #6,681,186
Pkg-Description: DNA sequencing and trace processing
 TraceTuner is a DNA sequencing quality value, base calling and trace
 processing software application originally developed by Paracel,
 Inc. While providing a flexible interface and capability to adopt the
 "pure" base calls produced by Phred, KB or any other "original"
 caller, it offers competitive features not currently available in
 other tools, such as customized calibration of quality values,
 advanced heterozygote and mixed base calling and deconvolving the
 "mixed" electropherograms resulting from the presence of indels into
 a couple of "pure" electropherograms. Previous versions of TraceTuner
 were used by Celera Genomics to process over 27 million reads from
 both Drosophila and human genome projects and by Applied Biosystems,
 as a component of its SNP detection and genotyping software product
 SeqScape. TraceTuner implements an advanced peak processing
 technology for resolving overlapping peaks of the same dye color into
 individual, or "intrinsic" peaks. This technology was protected by US
 Patent #6,681,186. Currently, TraceTuner is an open source software,
 which has been used by J. Craig Venter Institute's DNA Sequencing and
 Resequencing pipelines.
 .
 The TraceTuner Software (Copyright 1999-2003, Paracel, Inc. All
 rights reserved.) (the "Software") is covered by US Patent #6,681,186 and is
 being made available free of charge by Applera Corporation subject to the terms
 and conditions of the GNU General Public License, version 2, as published by the
 Free Software Foundation (the "GNU General Public License").

Depends: plink
Homepage: http://pngu.mgh.harvard.edu/~purcell/plink/
License: GPL
Responsible: Steffen Moeller <moeller@debian.org>
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/plink/trunk/?rev=0&sc=0
Pkg-Description: whole-genome association analysis toolset
 plink expects as input the data from SNP (single
 nucleotide polymorphism) chips of many individuals
 and their phenotypical description of a disease.
 It finds associations of single or pairs of DNA
 variations with a phenotype and can retrieve
 SNP annotation from an online source.

Depends: twain
Homepage: http://cbcb.umd.edu/software/pirate/twain/twain.shtml
License: Open Source
Pkg-Description: syntenic genefinder employing a Generalized Pair Hidden Markov Model
 TWAIN is a new syntenic genefinder which employs a Generalized Pair
 Hidden Markov Model (GPHMM) to predict genes in two closely related
 eukaryotic genomes simultaneously.  It utilizes the MUMmer package to
 perform approximate alignment before applying a GPHMM based on an
 enhanced version of the TigrScan gene finder.  TWAIN was written by
 Bill Majoros and Mihaela Pertea while at The Institute for Genomic
 Research (TIGR).
 .
 TWAIN consists of two components: (1) ROSE, the Region Of Synteny
 Extractor, which identifies contiguous regions likely to contain one
 or more syntenic genes, and (2) OASIS, a generalized pair hidden
 Markov model (GPHMM) for predicting genes in the regions identified
 by ROSE.  The system utilizes approximate alignments constructed by
 the PROmer and NUCmer programs in the MUMmer package to assess
 approximate alignment scores efficiently.  More detailed information
 on the architecture of this system will be made available soon.
 Slides from a talk at Computational Genomics 2004 are now available.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: rose
Homepage: http://www.cbcb.umd.edu/software/rose/Rose.html
License: Open Source
Pkg-Description: Region-Of-Synteny Extractor
 ROSE is a program which identifies regions between two genomes which
 are likely to contain orthologous genes. The two genomes are given as
 two multi fasta files of DNA sequences. The PROmer program from the
 MUMmer package needs to be run first between the two genomes, and the
 resulting delta file is then input to ROSE. If a previous annotation
 is available for one or both genomes, then the coordinates of the
 annotated genes from a genome can be optionally given as input in a
 gff file. The gene coordinates will be used to guide the length of
 the regions produced by ROSE. By default, when finding a region of
 consistent alignments, ROSE will add a user-defined margin (1000 bp
 by default) on either side of that region. When a predicted gene
 overlaps an alignment we use the gene prediction to extend the
 boundaries of the output region.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: glimmerhmm
Homepage: http://www.cbcb.umd.edu/software/glimmerhmm/
License: Artistic
Pkg-Description: Eukaryotic Gene-Finding System
 GlimmerHMM is a new gene finder based on a Generalized Hidden Markov
 Model (GHMM). Although the gene finder conforms to the overall
 mathematical framework of a GHMM, additionally it incorporates splice
 site models adapted from the GeneSplicer program and a decision tree
 adapted from GlimmerM. It also utilizes Interpolated Markov Models
 for the coding and noncoding models . Currently, GlimmerHMM's GHMM
 structure includes introns of each phase, intergenic regions, and
 four types of exons (initial, internal, final, and single). A basic
 user manual can be consulted here.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: genezilla
Homepage: http://www.genezilla.org/
License: Artistic
Pkg-Description: eukaryotic gene finder
 GeneZilla is a state-of-the-art program for computational prediction
 of protein-coding genes in eukaryotic DNA, and is based on the
 Generalized Hidden Markov Model (GHMM) framework, similar to GENSCAN
 and GENIE. It is highly reconfigurable and includes software for
 retraining by the end-user. It is written in highly optimized C++ and
 runs under most UNIX/Linux platforms. The run time and memory
 requirements are linear in the sequence length, and are in general
 much better than those of competing systems, due to GeneZilla's novel
 decoding algorithm. Graph-theoretic representations of the high
 scoring open reading frames are provided, allowing for exploration of
 sub-optimal gene models. It utilizes Interpolated Markov Models
 (IMMs), Maximal Dependence Decomposition (MDD), and includes states
 for signal peptides, branch points, TATA boxes, CAP sites, and will
 soon model CpG islands as well.
 .
 GeneZilla is an open-source project hosted at bioinformatics.org and
 currently consists of ~20,000 lines of code.  GeneZilla evolved out
 of the ab initio eukaryotic gene finder TIGRscan, which was developed
 at The Institute for Genomic Research over a 3-year period under NIH
 grants R01-LM06845 and R01-LM007938, and which served as the basis
 for the comparative gene finder TWAIN.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: exalt
Homepage: http://www.cbcb.umd.edu/software/exalt/
License: Artistic
Pkg-Description: phylogenetic generalized hidden Markov model for predicting alternatively spliced exons
 ExAlt is a software program designed to predict alternatively spliced
 overlapping exons in genomic sequence. The program works in several
 ways depending on the available input. ExAlt can use information of
 existing gene structure as well as sequence conservation to improve
 the precision of it's predictions. ExAlt can also make predictions
 when only a single genomic sequence is available. ExAlt has been
 extensively tested on Drosophila melanogaster, but can be adapted to
 run on other species.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: jigsaw
Homepage: http://www.cbcb.umd.edu/software/jigsaw/
License: Artistic
Pkg-Description: gene prediction using multiple sources of evidence
 JIGSAW is a program designed to use the output from gene finders,
 splice site prediction programs and sequence alignments to predict
 gene models. The program provides an automated way to take advantage
 of the many succsessful methods for computational gene prediction and
 can provide substantial improvements in accuracy over an individual
 gene prediction program.
 .
 JIGSAW is available for all species. It is tested on Human, Rice
 (Oryza sativa), Arabidopsis thaliana , Brugia malayi, Cryptococcus
 neoformans, Entamoeba histolytica, Theileria parva, Aspergillus
 fumigatus, Plasmodium falciparum and Plasmodium yoelii.
 .
 The linear combiner option is now available in the current JIGSAW
 software distribution. This allows JIGSAW to be run without the use
 of training data. A weight is assigned to each evidence source, and
 gene predictions are based on a weighted voting scheme, yielding the
 best 'consensus' predictions.
 .
 Predictions are now available for the ENCODE regions in Human and
 viewable as custom tracks in the UCSC Human Genome
 Browser. Predictions available for the Human genome and viewable as
 custom tracks in the UCSC Human Genome Browser
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Depends: genesplicer
Homepage: http://www.cbcb.umd.edu/software/GeneSplicer/
License: Artistic
Pkg-Description: computational method for splice site prediction
 A fast, flexible system for detecting splice sites in the genomic DNA
 of various eukaryotes. The system has been trained and tested
 successfully on Plasmodium falciparum (malaria), Arabidopsis
 thaliana, human, Drosophila, and rice . Training data sets for human
 and Arabidopsis thaliana are included. Use the GeneSplicer Web
 Interface to run GeneSplicer directly, or see below for instructions
 on downloading the complete system including source code.
 .
 There is no independent program to train GeneSplicer, but there is a
 way to obtain the necessary files by using the training procedure of
 GlimmerHMM.
Note: Computational Gene Finding (http://www.cbcb.umd.edu/software/)

Ignore: riso
Homepage: http://kdbio.inesc-id.pt/~asmc/software/riso.html
License: not specified
Pkg-Description: motif discovery tool
 RISO discovers motifs composed of many binding sites separated by
 spacers. Each binding site is called a box
 .
 The author of SMILE claims at his homepage
 http://www-igm.univ-mlv.fr/~marsan/smile_english.html that RISO is
 faster and more powerfull than SMILE which is described itself as
 "SMILE is a tool that infers motifs in a set of sequences, according
 to some criterias. It was first made to infer exceptionnal sites as
 binding sites in DNA sequences. It allows to infer motifs written on
 any alphabet (even degenerate) in any kind of sequences.  The
 specificity of SMILE is to allow  to deal with what we call
 "structured motifs",  which are motifs associated by some distance
 constraints. In particular, SMILE is able to group under a unique
 model different occurrences composed of several boxes separated by
 spacers of different lengths."
 .
 The reference to SMILE is made here especially because there is some
 work done in the Debian Med SVN at
 http://svn.debian.org/wsvn/debian-med/trunk/packages/smile/trunk/?rev=0&sc=0
 .
 On the other hand the SMILE author told us in private mail that he
 thinks that RISO is dead and SMILE continues to have some importance.

Ignore: smile
Homepage: http://www-igm.univ-mlv.fr/~marsan/smile_english.html
License: GPL
WNPP: 221492
Responsible: Steffen Moeller <moeller@debian.org>
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/smile/trunk/?rev=0&sc=0
Pkg-Description: Find statistically significant patterns in sequences
 Smile determines sequence motifs on the basis of a set of DNA, RNA or
 protein sequences. The work was originally described in the Journal of
 Computational Biology (2000) 7:345-362 and has since been developed
 further.
  * No hard limit on the number of combinations of motifs to describe
    subsets of sequences.
  * The sequence alphabet may be specified.
  * The use of wildcards is supported.
  * Better determination of significance of motifs by simulation.
  * Introduction of a set of sequences with negative controls
    that should not match automatically determined motifs.
 .
 Note: We do not know anybody who actually uses SMILE and thus the
 packaging effort is stalled.  Feel free to tell us, if you are
 interested in turning this into an official package.

Depends: mummergpu
Homepage: http://mummergpu.sourceforge.net/
License: Artistic
Pkg-Description: High-throughput sequence alignment using Graphics Processing Units
 The recent availability of new, less expensive high-throughput DNA
 sequencing technologies has yielded a dramatic increase in the volume
 of sequence data that must be analyzed. These data are being
 generated for several purposes, including genotyping, genome
 resequencing, metagenomics, and de novo genome assembly
 projects. Sequence alignment programs such as MUMmer have proven
 essential for analysis of these data, but researchers will need ever
 faster, high-throughput alignment tools running on inexpensive
 hardware to keep up with new sequence technologies.
 .
 MUMmerGPU is a low cost, ultra-fast sequence alignment program
 designed to handle the increasing volume of data produced by new,
 high-throughput sequencing technologies. MUMmerGPU is a GPGPU drop-in
 replacement for MUMmer, using the GPUs in common workstations to
 simultaneously align multiple query sequences against a single
 reference sequence stored as a suffix tree. By processing the queries
 in parallel on the highly parallel graphics card, MUMmerGPU achieves
 more than a 10-fold speedup over a serial CPU version of the sequence
 alignment kernel, and outperforms MUMmer on a high end CPU by
 3.5-fold in total application time when aligning reads from recent
 sequencing projects using Solexa/Illumina, 454, and Sanger sequencing
 technologies.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: amos-assembler
Homepage: http://amos.sourceforge.net/
License: Artistic
Pkg-Description: modular whole genome assembler
 The AMOS consortium is committed to the development of open-source
 whole genome assembly software. The project acronym (AMOS) represents
 our primary goal -- to produce A Modular, Open-Source whole genome
 assembler. Open-source so that everyone is welcome to contribute and
 help build outstanding assembly tools, and modular in nature so that
 new contributions can be easily inserted into an existing assembly
 pipeline. This modular design will foster the development of new
 assembly algorithms and allow the AMOS project to continually grow
 and improve in hopes of eventually becoming a widely accepted and
 deployed assembly infrastructure. In this sense, AMOS is both a
 design philosophy and a software system.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: amoscmp
Homepage: http://amos.sourceforge.net/docs/pipeline/AMOScmp.html
License: Artistic
Pkg-Description: comparative genome assembly package
 A comparative assembler is a program that can assemble a set of
 shotgun reads from an organism by mapping them to the finished
 sequence of a related organism. Thus, a comparative assembler
 transforms the traditional overlap-layout-consensus approach to
 alignment-layout-consensus. The AMOScmp package uses the MUMmer
 program to perform a mapping of the reads to the reference genome,
 then processes the alignment results with a sophisticated layout
 program designed to take into account polymorphisms between the two
 genomes. For a detailed description of the algorithms involved please
 refer to the paper listed in the References section.
 .
 AMOScmp uses as AMOS messages as both the inputs and the outputs (see
 documentation). Two utilities are provided to process these files:
 tarchive2amos - a versatile converter from trace archive .seq, .qual,
 and .xml information into AMOS formatted data; amos2ace - a converter
 from AMOS formatted data to the .ACE assembly format. In addition,
 the AMOS::AmosLib Perl module is provided as a tool for users who
 prefer to write their own conversion utilities. Please see the
 documentation included with the distribution for more information.
 .
 AMOScmp is part of the AMOS package (see
 http://amos.sourceforge.net/)- a collaborative effort to develop a
 modular open-source framework for assembly development.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: minimus
Homepage: http://amos.sourceforge.net/docs/pipeline/minimus.html
License: Artistic
Pkg-Description: AMOS lightweight assembler
 minimus is an assembly pipeline designed specifically for small
 data-sets, such as the set of reads covering a specific gene. Note
 that the code will work for larger assemblies (we have used it to
 assemble bacterial genomes), however, due to its stringency, the
 resulting assembly will be highly fragmented. For large and/or
 complex assemblies the execution of Minimus should be followed by
 additional processing steps, such as scaffolding.
 .
 minimus follows the Overlap-Layout-Consensus paradigm and consists of three main modules:
  * overlapper - computes the overlaps between the reads using a
    modified version of the Smith-Waterman local alignment algorithm 
  * tigger - uses the read overlaps to generate the layouts of reads
    representing individual contigs 
  * make-consensus - refines the layouts produced by the tigger to
    generate accurate multiple alignments within the reads
 .
 minimus uses as AMOS messages as both the inputs and the outputs (see
 documentation). Two utilities are provided to process these files:
 tarchive2amos - a versatile converter from trace archive .seq, .qual,
 and .xml information into AMOS formatted data; amos2ace - a converter
 from AMOS formatted data to the .ACE assembly format. In addition,
 the AMOS::AmosLib Perl module is provided as a tool for users who
 prefer to write their own conversion utilities. Please see the
 documentation included with the distribution for more information.
 .
 minimus is part of the AMOS package - a collaborative effort to
 develop a modular open-source framework for assembly development.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Ignore: catissuecore
Homepage: https://cabig.nci.nih.gov/tools/catissuecore
License: to be clarified, NCICB Open Source Project Site
Pkg-Description: biospecimen inventory, tracking, and basic annotation
 caTissue Core is caBIG's tissue bank repository tool for biospecimen
 inventory, tracking, and basic annotation. Version 1.2.1 of caTissue
 permits users to track the collection, storage, quality assurance,
 and distribution of specimens as well as the derivation and
 aliquotting of new specimens from an existing ones (e.g. for DNA
 analysis). It also allows users to find and request specimens that
 may then be used in molecular, correlative studies.
 .
 Intended Audiences: Translational Researchers, Pathologists, Biobank
 Managers
Note: A lot of stuff can be found at National Cancer Institute's
 Center for Bioinformatics (NCICB) Open Source Project Site
 http://gforge.nci.nih.gov/ which has to be evaluated and put into the
 right category of our tasks files

Ignore: trapss
Homepage: https://putt.eng.uiowa.edu/
License: Creative Commons for Science license
Pkg-Description: Transcript Annotation Prioritization and Screening System
 TrAPSS stands for Transcript Annotation Prioritization and Screening
 System. It is a system comprised of several tools written by
 researchers at the Coordinated Lab for Computational Genomics in the
 University of Iowa. The system aims to aid scientists who are
 searching for the genetic mutation or mutations that are linked to
 expression of a disease phentotype. The system offers support for
 almost all areas of a mutation discovery project from the creation
 and prioritization of a large candidate gene list, to the selection,
 ordering, and managing of primer pairs, and even support for SSCP
 assay results. TrAPSS is a currently deployed and often used tool for
 several laboratories here at the University of Iowa in the College of
 Medicine. The system is composed of several Java applications, many
 web-based PHP tools, and a local MySQL database. Even the Java
 applications are available through a web browser due to Sun's Java
 Web Start. Director of the CLCG, Professor Terry A. Braun, heads the
 project along with Dr. Todd Scheetz and Prof. Thomas
 L. Casavant. Eight developers create and maintain the software:
 Bartley Brown , Hakeem Almabrazi, Steven Davis and Jason Grundstad;
 along with three graduate students, Brian O'Leary, John Ritchison and
 Michael Smith; and one undergraduate student, Matthew Kemp.
 Importance of TrAPSS
 .
 The true importance of TrAPSS is that it is based upon a novel way to
 examine a large candidate list of genes. Rather than sequentially
 examining full genes, the scheme often followed in current target
 identification projects, TrAPSS provides tools that offer the user
 the opportunity to screen certain small parts of several genes from
 the candidate list at once. This "parallel" screening idea was
 envisioned by researchers here at the University of Iowa including
 Dr. Edwin Stone and Prof. Thomas L. Casavant. Research by graduate
 students Steven Davis and Brian O'Leary has demonstrated the
 advantage of the parallel screening method over the sequential
 sequencing of large candidate lists.
Note: Found at
 http://gforge.nci.nih.gov/softwaremap/trove_list.php?form_cat=337

Depends: mage2tab
Homepage: https://www.cbil.upenn.edu/magewiki/index.php/mage2tab
License: CBIL Software and Data License (Apache-like)
WNPP: 476209
Responsible: Charles Plessy <plessy@debian.org>
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/mage2tab/trunk/?rev=0&sc=0
Pkg-Description: MAGE-MLv1 converter and visualiser
 This tool-kit is part of MR_T, a framework for import or export various of
 MAGE (MicroArray Gene Expression) documents (MAGE-MLv1, MAGE-TAB, SOFT,
 MINiML) from or into databases like GUS (the Genomics Unified Schema,
 www.gusdb.org). 

Depends: bambus
Homepage: http://amos.sourceforge.net/docs/bambus/
License: Artistic
Pkg-Description: hierarchical approach to building contig scaffolds
 BAMBUS is the first publicly available scaffolding program. It orders
 and orients contigs into scaffolds based on various types of linking
 information. Additionally, BAMBUS allows the users to build scaffolds
 in a hierarchical fashion by prioritizing the order in which links
 are used. For more information please check out the online
 documentation.
 .
 Note that currently Bambus is undergoing a transition in order to be
 integrated with the AMOS package (see http://amos.sourceforge.net/)
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: hawkeye
Homepage: http://amos.sourceforge.net/hawkeye/
License: Artistic
Pkg-Description: Interactive Visual Analytics Tool for Genome Assemblies
 Genome assembly remains an inexact science. Even when accomplished
 with the best software available, the assembly of a genome often
 contains numerous errors, both small and large. Hawkeye is a visual
 analytics tool for genome assembly analysis and validation, designed
 to aid in identifying and correcting assembly errors. Hawkeye blends
 the best practices from information and scientific visualization to
 facilitate inspection of large-scale assembly data while minimizing
 the time needed to detect mis-assemblies and make accurate judgments
 of assembly quality.
 .
 All levels of the assembly data hierarchy are made accessible to
 users, along with summary statistics and common assembly metrics. A
 ranking component guides investigation towards likely mis-assemblies
 or interesting features to support the task at hand. Wherever
 possible, high-level overviews, dynamic filtering, and automated
 clustering are leveraged to focus attention and highlight anomalies
 in the data. Hawkeyes effectiveness has been proven on several genome
 projects, where it has been used both to improve quality and to
 validate the correctness of complex genomes.
 .
 Hawkeye is compatible with most widely used assemblers, including
 Phrap, ARACHNE, Celera Assembler, Newbler, AMOS, and assemblies
 deposited in the NCBI Assembly Archive.
 .
 Publication: Schatz, M.C., Phillippy, A.M., Shneiderman, B.,
 Salzberg, S.L. (2007) Hawkeye: a visual analytics tool for genome
 assemblies. Genome Biology 8:R34.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: murasaki
Homepage: http://murasaki.dna.bio.keio.ac.jp/
License: GPL
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/murasaki/trunk/?rev=0&sc=0
Pkg-Description: homology detection tool across multiple large genomes
 Murasaki is a scalable and fast, language theory-based homology
 detection tool across multiple large genomes. It enable whole-genome
 scale multiple genome global alignments. Supports unlimited length
 gapped-seed patterns and unique TF-IDF based filtering.
 .
 Murasaki is an anchor alignment software, which is
  * exteremely fast (17 CPU hours for whole Human x Mouse genome (with
    40 nodes: 52 wall minutes))
  * scalable (Arbitrarily parallelizable across multiple nodes using MPI.
    Even a single node with 16GB of ram can handle over 1Gbp of sequence.)
  * unlimited pattern length
  * repeat tolerant
  * intelligent noise reduction

Depends: gmv
Homepage: http://murasaki.dna.bio.keio.ac.jp/wiki/index.php?GMV
License: GPL
Pkg-Description: comparative genome browser for Murasaki
 GMV is a comparative genome browser for Murasaki. GMV visualizes
 anchors from Murasaki, annotation data from GenBank files, and
 expression / prediction score from GFF files.

Depends: pyrophosphate-tools
Homepage: http://www-naweb.iaea.org/nafa/ipc/public/d4_pbl_6a.html
License: not specified
Pkg-Description: for assembling and searching pyrophosphate sequence data
 Simple tools for assembling and searching high-density picolitre
 pyrophosphate sequence data.

Depends: figaro
Homepage: http://amos.sourceforge.net/Figaro/Figaro.html
License: Artistic
Pkg-Description: novel vector trimming software
 Figaro is a software tool for identifying and removing the vector
 from raw DNA sequence data without prior knowledge of the vector
 sequence.  By statistically modeling short oligonucleotide
 frequencies within a set of reads, Figaro is able to determine which
 DNA words are most likely associated with vector sequence.  For a
 description of Figaro's algorithms please see our paper.  Figaro is
 part of the AMOS suite.
Note: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)

Depends: velvet
Homepage: http://www.ebi.ac.uk/~zerbino/velvet/
License: GPL
Pkg-Description: genomic assembler specially designed for short read sequencing
 Velvet is a de novo genomic assembler specially designed for short
 read sequencing technologies, such as Solexa or 454, developed by
 Daniel Zerbino and Ewan Birney at the European Bioinformatics
 Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
 .
 Velvet currently takes in short read sequences, removes errors then
 produces high quality unique contigs. It then uses paired read
 information, if available, to retrieve the repeated areas between
 contigs.
Note: The web site says GPL-3, but the tar archive contains a copy of
 the GPL-2. It also contain a PDF manual whose sources are missing.
 The final package would depend on zlib.

Depends: mirbase
Homepage: http://microrna.sanger.ac.uk/
License: Public Domain
WNPP: 420938
Responsible: Charles Plessy <plessy@debian.org>
Pkg-Description: The microRNA sequence database
 The miRBase Sequence Database provides a searchable repository
 for published microRNA sequences and associated annotation,
 functionality previously provided by the microRNA Registry.  miRBase
 also contains predicted miRNA target genes in miRBase Targets, and
 provides a gene naming and nomenclature function in the miRBase
 Registry.
 .
 Release 9.1 of the database contains 4449 entries representing hairpin
 precursor miRNAs, expressing 4274 mature miRNA products, in primates,
 rodents, birds, fish, worms, flies, plants and viruses.
 .
 This package will install the miRBase database for mySQL, EMBOSS, and/or
 ncbi-blast if you have the corresponding packages installed.
 .
 It is possible that mirbase will not be a package from the main archive, but
 will be autogenerated as part of a larger data packaging effort.

Depends: elph
Homepage: http://www.cbcb.umd.edu/software/ELPH/
License: Artistic
Pkg-Description: motif finder that can find ribosome binding sites, exon splicing enhancers, or regulatory sites
 ELPH (Estimated Locations of Pattern Hits) is a general-purpose Gibbs
 sampler for finding motifs in a set of DNA or protein sequences. The
 program takes as input a set containing anywhere from a few dozen to
 thousands of sequences, and searches through them for the most common
 motif, assuming that each sequence contains one copy of the motif. We
 have used ELPH to find patterns such as ribosome binding sites (RBSs)
 and exon splicing enhancers (ESEs). See below for instructions on
 downloading the complete system, including source code.
 .
 An online tool that uses ELPH output for identifying exon splicing
 enhancers can be found at
 http://www.cbcb.umd.edu/software/SeeEse/index.html .
Note: Other sequence analysis tools (http://www.cbcb.umd.edu/software/)

Depends: repeatfinder
Homepage: http://www.cbcb.umd.edu/software/RepeatFinder/
License: Artistic
Pkg-Description: finding repetitive sequences complete and draft genomes
 Two programs for finding repeats in genomic DNA sequences.  The first
 program, described in the paper by Volfovsky et al. (2001) Genome
 Biology is RepeatFinder.  A second program, designed specifically to
 find repeats likely to confuse a genome assembly, is called
 ClosureRepeatFinder.  The two programs are quite different and have
 different purposes; RepeatFinder is intended to be the more
 comprehensive approach.  Note that RepeatFinder depends on Stefan
 Kurtz's REPuter.
Note: Other sequence analysis tools (http://www.cbcb.umd.edu/software/)

Depends: reputer
Homepage: http://citeseer.ist.psu.edu/kurtz95reputer.html
License: to be clarified
Pkg-Description: fast computation of maximal repeats in complete genomes
 A software tool was implemented that computes exact repeats and
 palindromes in entire genomes very efficiently.
Note: Download site (temporarily) not available - try to contact author

Depends: transtermhp
Homepage: http://transterm.cbcb.umd.edu/index.php
License: Free
Pkg-Description: finds rho-independent transcription terminators in bacterial genomes
 finds rho-independent transcription terminators in bacterial
 genomes. Each terminator found by the program is assigned a
 confidence value that estimates its probability of being a true
 terminator. TransTermHP is described in: C. Kingsford, K. Ayanbule
 and S.L. Salzberg. Rapid, accurate, computational discovery of
 Rho-independent transcription terminators illuminates their
 relationship to DNA uptake. Genome Biology 8:R22 (2007).
Note: Other sequence analysis tools (http://www.cbcb.umd.edu/software/)

Depends: patman
Homepage: http://bioinf.eva.mpg.de/patman/
License: GPL-2+
WNPP: 482555 
Responsible: Charles Plessy <plessy@debian.org>
Pkg-Description: rapid alignment of short sequences to large databases
 Patman searches for short patterns in large DNA databases, allowing
 for approximate matches. It is optimized for searching for many small
 pattern at the same time, for example microarray probes.

Depends: uniprime
Homepage: http://code.google.com/p/uniprime/
License: GPL-3+
Responsible: Charles Plessy <plessy@debian.org>
Pkg-Description: workflow-based platform for universal primer design
 UniPrime automatically designs large sets of universal primers by simply
 inputting a GeneID reference. It automatically retrieves and aligns
 orthologous sequences from GenBank, identifies regions of conservation within
 the alignment and generates suitable primers that can amplify variable genomic
 regions. UniPrime differs from previous automatic primer design programs in
 that all steps of primer design are automated, saved and are phylogenetically
 limited. We have experimentally verified the efficiency and success of this
 program. UniPrime is an experimentally validated, fully automated program that
 generates successful cross-species primers that take into account the
 biological aspects of the PCR.

Depends: genetrack
Homepage: http://sysbio.bx.psu.edu/genetrack.html
License: MIT
Responsible: Charles Plessy <plessy@debian.org>
Pkg-Description: genomic data storage and visualization framework
 GeneTrack is a high performance bioinformatics data storage and analysis
 system designed to store genome wide information. It is currently used to
 analyze data obtained via high-throughput rapid sequencing platforms such as
 the 454 and Solexa as well as tiling array data based on various platforms.

Depends: operondb
Homepage: http://www.cbcb.umd.edu/cgi-bin/operons/operons.cgi
License: to be clarified
Pkg-Description: detect and analyze conserved gene pairs
 Comparison of complete microbial genomes reveals a large number of
 conserved gene clusters - sets of genes that have the same order in
 two or more different genomes. Such gene clusters often, but not
 always represent a co-transcribed unit, or operon. A method was
 developed to detect and analyze conserved gene pairs - pairs of genes
 that are located close on the same DNA strand in two or more
 bacterial genomes. For each conserved gene pair, an estimate of
 probability is calculated that the genes belong to the same
 operon. The algorithm takes into account several alternative
 possibilities. One is that functionally unrelated genes may have the
 same order due simply because they were adjacent in a common
 ancestor. Other possibilities are that genes may be adjacent in two
 genomes by chance alone, or due to horizontal transfer of the gene
 pair.
 .
 The method is modified from the one described in: Maria D. Ermolaeva,
 Owen White and Steven L. Salzberg. Prediction of Operons in Microbial
 Genomes. Nucleic Acids Research, 29, 1216-1221, (2001)
 .
 OperonDB was supported by the NIH under grant R01-LM007938 and by the
 NSF under grant DBI-0234704.
Note: Other sequence analysis tools (http://www.cbcb.umd.edu/software/);
 no info about license or downloadable code found, but tried to
 contact authors.

Depends: velvet
Homepage: http://www.ebi.ac.uk/~zerbino/velvet/
License: GPL-2+
WNPP: 487026
Responsible: Charles Plessy <plessy@debian.org>
Pkg-URL: http://svn.debian.org/wsvn/debian-med/trunk/packages/velvet/trunk/?rev=0&sc=0
Pkg-Description: Nucleic acid sequence assembler for very short reads
 Velvet is a de novo genomic assembler specially designed for short read
 sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and
 Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near
 Cambridge, in the United Kingdom.
 .
 Velvet currently takes in short read sequences, removes errors then produces
 high quality unique contigs. It then uses paired read information, if
 available, to retrieve the repeated areas between contigs.
 .
 Velvet was published in: `Velvet: algorithms for de novo short read assembly
 using de Bruijn graphs.  D.R. Zerbino and E. Birney. Genome Research
 18:821-829.'

Depends: trnascan-se
Homepage: http://lowelab.ucsc.edu/tRNAscan-SE/
License: GPL
Pkg-Description: program for improved detection of transfer RNA genes in genomic sequence
 tRNAscan-SE identifies 99-100% of transfer RNA genes in DNA sequence while
 giving less than one false positive per 15 gigabases. Two previously described
 tRNA detection programs are used as fast, first-pass prefilters to identify
 candidate tRNAs, which are then analyzed by a highly selective tRNA covariance
 model. This work represents a practical application of RNA covariance models,
 which are general, probabilistic secondary structure profiles based on stochastic
 context-free grammars. tRNAscan-SE searches at ~ 30 000 bp/s. Additional
 extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine
 tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
