Clustering with Feature Selection Using Expectation Maximization
Seminar hall 51, 4th floor main building IISER Pune
Abstract
Clustering is an unsupervised learning task which aims to group similar points together. The goal is to form clusters such that points within a cluster are similar and points in different clusters are dissimilar. In this talk, I will first give a brief overview of hierarchical and model based clustering algorithms. Then I will talk about probabilistic mixture modelling which introduces the presence of latent variables and how we can use the Expectation Maximization (EM) algorithm, to estimate model parameters and the latent structure. I will then discuss the problems that arise when applying this method to data having a very large number of features, many of which may be noise or irrelevant to the clustering. To address this issue there is a need for feature selection which will help us identify important features. This is done by introducing the idea of feature saliencies within the EM algorithm. At the end I will also be presenting some results obtained with and without feature selection.