S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, Basic local alignment search tool, Journal of Molecular Biology, vol.215, issue.3, pp.403-413, 1990.
DOI : 10.1016/S0022-2836(05)80360-2

S. Brown, J. Gerlt, J. Seffernick, and P. Babbitt, A gold standard set of mechanistically diverse enzyme superfamilies, Genome Biology, vol.7, issue.1, pp.8-1815, 2006.
DOI : 10.1186/gb-2006-7-1-r8

L. Conte, A. B. Hubbard, T. Brenner, S. Murzin, A. Chothia et al., SCOP: a Structural Classification of Proteins database, Nucleic Acids Research, vol.28, issue.1, pp.257-266, 2000.
DOI : 10.1093/nar/28.1.257

A. Enright, S. Van-dongen, and C. Ouzounis, An efficient algorithm for large-scale detection of protein families, Nucleic Acids Research, vol.30, issue.7, pp.1575-84, 2002.
DOI : 10.1093/nar/30.7.1575

T. Wittkop, D. Emig, S. Lange, S. Rahmann, M. Albrecht et al., Partitioning biological data with transitivity clustering, Nature Methods, vol.7, issue.6, pp.419-420, 2010.
DOI : 10.1186/1471-2105-8-396

T. Nepusz, R. Sasidharan, and A. Paccanaro, SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale, BMC Bioinformatics, vol.11, issue.1, pp.120-132, 2010.
DOI : 10.1186/1471-2105-11-120

V. Miele, S. Penel, V. Daubin, F. Picard, D. Kahn et al., High-quality sequence clustering guided by network topology and multiple alignment likelihood, Bioinformatics, vol.28, issue.8, pp.1078-85, 2012.
DOI : 10.1093/bioinformatics/bts098

URL : https://hal.archives-ouvertes.fr/hal-00965711

M. Remmert, A. Biegert, A. Hauser, and J. Soding, HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment, Nature Methods, vol.11, issue.2, pp.173-178, 2012.
DOI : 10.1006/jmbi.1993.1626

URL : http://hdl.handle.net/11858/00-001M-0000-0015-8D56-A

A. Vashist, C. Kulikowski, and I. Muchnik, Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization, Research in computational molecular biology, pp.99-113, 2006.
DOI : 10.1007/11732990_10

F. Abascal and A. Valencia, Automatic annotation of protein function based on family identification, Proteins: Structure, Function, and Genetics, vol.290, issue.3, pp.683-92, 2003.
DOI : 10.1002/prot.10449

D. Tautz and T. Domazet-loso, The evolutionary origin of orphan genes, Nature Reviews Genetics, vol.22, issue.10, pp.692-702, 2011.
DOI : 10.1038/nrg3053

R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 14th international joint conference on artificial intelligence, pp.1137-1182, 1995.

N. Zaki, S. Lazarova-molnar, W. El-hajj, and P. Campbell, Protein-protein interaction based on pairwise similarity, BMC Bioinformatics, vol.10, issue.1, pp.150-61, 2009.
DOI : 10.1186/1471-2105-10-150

S. Brenner, P. Koehl, and M. Levitt, The ASTRAL compendium for protein structure and sequence analysis, Nucleic Acids Research, vol.28, issue.1, pp.254-260, 2000.
DOI : 10.1093/nar/28.1.254

V. Miele, S. Penel, and L. Duret, Ultra-fast sequence clustering from similarity networks with SiLiX, BMC Bioinformatics, vol.12, issue.1, pp.116-140, 2011.
DOI : 10.1186/1471-2105-8-396

URL : https://hal.archives-ouvertes.fr/hal-00698365

J. Soding, Protein homology detection by HMM-HMM comparison, Bioinformatics, vol.21, issue.7, pp.951-60, 2005.
DOI : 10.1093/bioinformatics/bti125

A. Paccanaro, J. Casbon, and M. Saqi, Spectral clustering of protein sequences, Nucleic Acids Research, vol.34, issue.5, pp.1571-80, 2006.
DOI : 10.1093/nar/gkj515

R. Hughey and A. Krogh, Hidden Markov models for sequence analysis: extension and analysis of the basic method, Bioinformatics, vol.12, issue.2, pp.95-107, 1996.
DOI : 10.1093/bioinformatics/12.2.95

M. Margelevicius and C. Venclovas, Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison, BMC Bioinformatics, vol.11, issue.1, p.89, 2010.
DOI : 10.1186/1471-2105-11-89

J. Soding, M. Remmert, and A. Hauser, HH-suite for sensitive protein sequence searching based on HMM-HMM alignment

S. Bocker, S. Briesemeister, and G. Klau, Exact Algorithms for Cluster Editing: Evaluation and Experiments, Algorithmica, vol.8, issue.1, pp.316-350, 2011.
DOI : 10.1007/s00453-009-9339-7

S. Bocker and J. Baumbach, Cluster Editing, pp.33-44, 2013.
DOI : 10.1007/978-3-642-39053-1_5

A. Ben-dor, R. Shamir, and Z. Yakhini, Clustering Gene Expression Patterns, Journal of Computational Biology, vol.6, issue.3-4, pp.281-97, 1999.
DOI : 10.1089/106652799318274

T. Wittkop, J. Baumbach, F. Lobo, and S. Rahmann, Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing, BMC Bioinformatics, vol.8, issue.1, pp.396-407, 2007.
DOI : 10.1186/1471-2105-8-396

J. Wu, Cluster analysis and K-means clustering: an introduction Advances in K-means clustering, pp.1-16, 2012.

V. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment, vol.2008, issue.10, pp.10008-10027, 2008.
DOI : 10.1088/1742-5468/2008/10/P10008

URL : https://hal.archives-ouvertes.fr/hal-01146070

C. Biernacki, G. Celeux, and G. Govaert, Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.7, pp.719-744, 2000.
DOI : 10.1109/34.865189

P. Tan, M. Steinbach, and V. Kumar, Introduction to data mining, 2005.