Statistical classification
'''Statistical classification''' is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a [[training set]] of previously labeled items.

Formally, the problem can be stated as follows: given training data <math>\{(\mathbf{x_1},y_1),\dots,(\mathbf{x_n}, y_n)\}</math> produce a classifier <math>h:\mathcal{X}\rightarrow\mathcal{Y}</math> which maps an object <math>\mathbf{x} \in \mathcal{X}</math> to its classification label <math>y \in \mathcal{Y}</math>. For example, if the problem is filtering spam, then <math>\mathbf{x_i}</math> is some representation of an email and <math>y</math> is either "Spam" or "Non-Spam".

Statistical classification algorithms are typically used in [[pattern recognition]] systems.

'''Note:''' in [[community ecology]], the term "classification" is synonymous with what is commonly known (in [[machine learning]]) as [[data clustering|clustering]]. See that article for more information about purely [[unsupervised learning|unsupervised]] techniques.


* The second problem is to consider classification as an [[estimation]] problem, where the goal is to estimate a function of the form
:<math>P({\rm class}|{\vec x}) = f\left(\vec x;\vec \theta\right)</math>
where the feature vector input is <math>\vec x</math>, and the function <i>f</i> is typically parameterized by some parameters <math>\vec \theta</math>. In the [[Bayesian statistics|Bayesian]] approach to this problem, instead of choosing a single parameter vector <math>\vec \theta</math>, the result is integrated over all possible thetas, with the thetas weighted by how likely they are given the training data <i>D</i>:
:<math>P({\rm class}|{\vec x}) = \int f\left(\vec x;\vec \theta\right)P(\vec \theta|D) d\vec \theta</math>

* The third problem is related to the second, but the problem is to estimate the [[conditional probability|class-conditional probabilities]] <math>P(\vec x|{\rm class})</math> and then use [[Bayes' rule]] to produce the class probability as in the second problem.

Examples of classification algorithms include:
* [[Linear classifier]]s
** [[Fisher's linear discriminant]]
** [[Logistic regression]]
** [[Naive Bayes classifier]]
** [[Perceptron]]
** [[Support vector machine]]s
* [[Quadratic classifier]]s
* [[Nearest_neighbor_(pattern_recognition)|k-nearest neighbor]]
* [[Boosting]]
* [[Decision tree]]s
** [[Random forest]]s
* [[Artificial neural networks|Neural network]]s
* [[Bayesian network]]s
* [[Hidden Markov model]]s

An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers). Van der Walt and Barnard (see reference section) investigated very specific artificial data sets to determine conditions under which certain classifiers perform better and worse than others.

Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems (a phenomenon that may be explained by the [[No free lunch in search and optimization|No-free-lunch theorem]]). Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than a science.

The most widely used classifiers are the [[Neural Network]] (Multi-layer Perceptron), [[Support Vector Machines]], [[KNN|k-Nearest Neighbours]], Gaussian Mixture Model, Gaussian, [[Naive Bayes]], [[Decision Tree]] and [[Radial Basis Function|RBF]] classifiers.

== Evaluation ==
The measures [[Precision and Recall]] are popular metrics used to evaluate the quality of a classification system. More recently, [[Receiver Operating Characteristic]] (ROC) curves have been used to evaluate the tradeoff between true- and false-positive rates of classification algorithms.

==Application domains==
* [[Computer vision]]
** [[Medical Imaging]] and Medical Image Analysis
** [[Optical character recognition]]
* [[Geostatistics]]
* [[Speech recognition]]
* [[Handwriting recognition]]
* [[Biometric]] identification
* [[Natural language processing]]
* [[Document classification]]
* Internet [[search engines]]
* [[Credit scoring]]

==References==
* C.M. van der Walt and E. Barnard,“Data characteristics that determine classifier performance”, in Proceedings of the Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa,  pp.160-165, 2006.

==External links==
* [http://blog.peltarion.com/2006/07/10/classifier-showdown/ Classifier showdown] A practical comparison of classification algorithms.
* [http://cmp.felk.cvut.cz/cmp/software/stprtool/ Statistical Pattern Recognition Toolbox for Matlab].

== See also ==
* [[Data mining]]
* [[Fuzzy logic]]
* [[Information retrieval]]

[[Category:Machine learning]]
[[Category:Classification algorithms|*]]
[[Category:Statistical classification]]

[[ar:تصنيف إحصائي]]
[[de:Klassifikationsverfahren]]
[[lt:Klasifikavimo algoritmai]]
[[ja:統計分類]]
[[ru:Классификация (машинное обучение)]]
[[simple:Optimal classification]]
[[th:การแบ่งประเภทข้อมูล]]
[[vi:Phân loại bằng thống kê]]