Research

next up previous
Next: Clustering Up: Classification Previous: Classification

Semi supervised classification

Typical statistical approaches for classification learn using labelled data, namely the class label is provided for the training data used.

While it is expensive to create labeled training data, it is usually easier to come up with lot of unlabeled training material--therefore, the question is, can we also use the unlabeled training data?

On the face of it, unlabeled training material is useless. Remarkably, in some applications including text classification, the structure of the classification problem actually allows us to improve classification. In the naive bayes setting, we use unlabeled data by predicting their label using the labeled data and subsequently using these predicted labels to train the classifier.

Of course, such approaches cannot always be succesful. We need to know when we can actually trust our predicted labels to improve classification. To answer these questions better, we currently are developing a framework to analyze how large alphabet prediction schemes perform.



Prasad Santhanam 2007-12-28