The collection Newsgroups is a list of 1000 articles collected
from 20 newgroups. Among the newsgroups are collections of closely
related newsgroups such as comp.os.ms-windows.misc,
comp.windows.x or rec.autos and
rec.motorcycles. The task is to identify all the newsgroups.
Roughly of the documents were randomly chosen for training,
while the rest are used for testing, and the results confirmed by
repeat trials over random splits. CADE is a Portugese dataset with 12
classes. The improvements in classification accuracy are statistically
significant at better than .05 level.
We are currently combining the approach above with graphical model
inference techniques to yield a richer class of classifiers. For
details, please contact me at
prasadsnateecsdotberkeleydotedu.