Collection of Articles On Text Categorization

The following articles helped me a lot in my work on Text Classification. You will find only the articles, as I didn't want to break any copyright laws. But you can find most of these papers by using the titles as keywords at google.

Section 0
  • Rennie, Rifkin: Improving Multiclass Text Classification with the Support Vector Machine (Oct. 2001) (using 20 Newsgroups Data Set)
  • Georges Siolas, Florence d'Alche-Buc: Support Vector Machines based on a Semantic Kernel for Text Categorization (using 20 Newsgroups Data Set)
  • Burges: A Tutorial on Support Vector Machines
  • Osuna et al.: Support Vector Machines, Training and Applications
  • Ngai Tang: Text Categorisation using Support Vector Machines (interesting dissertation, 30 August 2001)
Section 1
  • Domingos, Pazzani: On the Optimality of the Simple Bayesian Classifier und Zero-One Loss
  • Fabrizio Sebastiani: A Tutorial on Automated Text Categorisation
  • Fabrizio Sebastiani: Machine Learning in Automated Text Categorization
  • Fabrizio Sebastiani: Machine Learning in Automated Text Categorization (differently formatted, i.e. 55 pages instead of 63)
  • Galavotti, Sebastiani, Simi: Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
  • Automatic Web Page Categorization by Link and Context Analysis
  • Categorisation by Context
  • Guest Editors'Introduction to the Special Issue on Automated Text Categorization
  • Caropreso, Matwin, Sebastiani: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization
  • Lewis et al: Naive (Bayes) at Fourty
Section 2
  • Jason D.M. Rennie: Improving Multi-class Text Classification with Naive Bayes (Master's Thesis)
  • McCallum, Nigam, Rennie, Seymore: A Machine Learning Approach to Building Domain-Specific Search Engines
  • Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents
  • Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents (condensed version)
  • McCallum, Nigam: A Comparison of Event Models for Naive Bayes Text Classification
  • McCallum: Multi-Label Text Classification with a Mixture Model Trained by EMn
  • McCallum, Nigam: Employing EM and Pool-Based Active Learning for Text Classification
  • Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, Slattery: Learning to Extract Symbolic Knowledge from the WWW
  • Baker, McCallum: Distributional Clustering of Words for Text Classification (newer?)
  • Baker, McCallum: Distributional Clustering of Words for Text Classification
  • Using Maximum Entropy for Text Classificationn
  • Andrew McCallum, Fernando Freitag: Maximum Entropy Markow Models for Information Extraction and Segmentation
  • D'Alessio, Murray, Schiaffino: The Effect of Using Hierarchical Classifiers in Text Categorization
Section 3
  • David Yarowsky: Word-Sense Disambiguation, Using Statistical Models of Roget's Categories, Trained on Large Corpora
  • Ide, Veronis: Word Sense Disambiguation: The State of the Art
  • Schütze: Automatic Word Sense Discrimination
  • Mladenic, Grobelnik: Word Sequences as Features in Text-Learning
  • Yang et. al.: Learning Approaches for Detecting and Tracking News Events
Section 4
  • Apte, Damerau, Weiss: Automated Learning of Decision Rules for Text Categorization
  • Susan Dumais, Hao Chen: Hierarchical Classification of Web Content
Section 5
  • Lewis, Jones: Natural Language Processing for Information Retrieval
  • Wiener, Pedersen, Weigend: A Neural Network Approach to Topic Spotting
  • Gorniak, Peter: Sorting Email Messages by Topic
  • Gorniak, Peter: MailMind, A Connectionist E-Mail Sorting Client
Section 6
  • Vijay Boyapati: Towards a Comprehensive Topic, Hierarchy for News
  • Moulinier, Raskinis, Ganascia: Text Categorization: a Symbolic Approach
  • Quasthoff, Wolff: Effizientes Dokumentclustering durch niederfrequente Therme
Section 7
  • Yang, Pederson: A Comparative Study on Feature Selection in Text Categorization
  • Yang, Liu: A re-examination of Text Categorisation Methods
  • Improving Text Classification by Shrinkage in a Hierarchy of Classes
  • John, Kohavi, Pfleger: Irrelevant Features and the Subset Selection Problem
  • Martijn Spitters: Comparing feature sets for learning text categorization
  • Ellen Riloff: Little Words Can Make a Big Difference
  • Fuka, Hanka: Feature Set Reductuction for Document Classification Problems
  • Feature subset selection in text-learning
  • Ruiz, Srinivasan: Hierarchical Neural Networks for Text Categorization
Section 8 (some only in print)
  • An Algorithm for Suffix Stripping
  • Hsu, Lang: Feature Reduction and Database Maintanance in NETNEWS Classification
  • Thomas Hofmann: Learning and Representing Topic
  • Seminararbeit: Advanced Information Retrieval Methods
Section 9
  • Sam Scott: Feature Engineering for a Symbolic Approach to Text Classification
  • Kermit, et al.: Automatic Complexity Management: Personalised Document Retrieval from the World Wide Webn
  • Michie, et. al.: Machine Learning, Neural and Statistical Classification ( 298 pages!, review of different approaches to text classification)
  • Joachims: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
  • Meghini et. al.: A Model of Multimedia Information Retrieval
  • Slonim, Tishby: Document Clustering using Word Clusters via the Information Bottleneck Method
  • Mlademic: Turning Yahoo into an Automatic Web-Page Classifier
  • Mlademic, Grobelnik:Assigning keywords to documents using machine learning
  • Fuhr et. al.: AIR/X a Rule-Based Multistage Indexing System for Large Subject _Fields
  • Mitchell: Machine Learning,
    Slides for instructors
  • Articles by Junker
  • Mladenic: Text-Learning and Related Intelligent Agents: A Survey
  • Compression: A Key for Next-Generation Text Retrieval Systems
  • Chang: Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW

