The Clustering reference article from the English Wikipedia on 24-Apr-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Clustering

Time you got around to sponsoring a child
Clustering has two different meanings in computer science:

  1. In computer hardware, clustering is the connection of many low-cost computers using special software such that they can be used as one larger computer. Clustering can either be used to provide reliability (when one machines fails, the others takes over its workload) or as a means to inexpensively provide large amounts of computing power.
  2. Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering consists of partitioning a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often similarity or proximity for some defined distance measure.

Data clustering algorithms can be hierarchical or partitional, and hierarchical algorithms can be agglomerative (bottom-up) or divisive (top-down).

References


Clustering in
biology has two main applications in the fields of computational biology and bioinformatics.

  1. In proteomics, clustering is used to build groups of proteins with related expression patterns. Often such groups contain functionally related proteins, and thus high throughput experiments using expressed sequence tags (ESTs) can be a powerful tool for genome annotation, a general aspect of genomics.
  2. In sequence analysis, clustering is used to group homologous sequences into gene families. This is a very important concept in bioinformatics, and evolutionary biology in general. See evolution by gene duplication.