Abstrait
A Sequence model based phenotype structure discovery algorithm
Yu-Hai Zhao, Ying Yin
Phenotype structure discovery is one of the most important problem in microarray data analysis. The goal is to (1) find groups of samples corresponding to different phenotypes (such as disease or normal), and (2) for each group of samples,find the representative expression pattern that distinguishes this group from others. Different from the existing singleton discriminability based approach and combination discriminability-based approach, we present a novel method in this paper. Based on the proposed g*-sequence model, an efficient algorithm, namely FINDER, is developed to mine the optimal phenotype structure from a given dataset. Further, several effective pruning strategies are designed to improve the efficiency. The experiments conducted on both synthetic and real microarray datasets show that the phenotype structures discovered by FINDER are of both statistical and biological significance. Moreover, FINDER is 2~3 orders of magnitude faster than the alternatives.