فهرست:
چکیده 1
فصل اول مقدمه 2
1-1- مقدمه 3
1-2- داده کاوی 3
1-3- روشهای داده کاوی 4
1-4- خوشهبندی 5
1-5- خوشهبندی توافقی 9
1-6- تحقیقات انجام گرفته در پایان نامه 12
1-7- نتایج بدست آمده 13
1-8- ساختار پایان نامه 13
فصل دوم مروری بر کارهای انجام شده 14
2-1- مقدمه 15
2-2- روشهای خوشهبندی 15
2-2-1- روشهای بخشبندی 17
2-2-2- روشهای سلسله مراتبی 19
2-2-3- الگوریتم خوشهبندی K-Means 19
2-3- خوشهبندی توافقی 22
2-3-1- انگیزههای استفاده از خوشهبندی توافقی 23
2-3-2- مسئله خوشهبندی توافقی: ارائهی مثال 25
2-3-3- مروری بر روشهای خوشهبندی توافقی 26
2-3-4- گروهبندی روشهای خوشهبندی توافقی 27
2-3-5- روشهای شباهت محور 31
شباهت دوبهدو(ماتریس همبستگی) 31
گراف محور 35
2-3-6- روشهای توافقی با استفاده از اطلاعات دوجانبه 39
2-3-7- روشهای توافقی با استفاده از مدل ترکیبی 40
2-3-8- روشهای توافقی رأی محور 42
2-4- روشهای تولید اجتماع خوشهبندیها 46
2-5- خلاصه فصل 49
فصل سوم ارائهی راهکار پیشنهادی: خوشهبندی توافقی بر روی دادههای توزیع شده ناهمگن 51
3-1- مقدمه 52
3-2- راهکار پیشنهادی 53
3-2-1- تشخیص نظیر به نظیر بودن خوشهها 53
3-2-2- خوشهبندیهای دارای وزن 60
3-2-3- خوشهبندی توافقی بر روی داده های توزیع شده ناهمگن64
3-3- تولید اجتماع خوشهبندیها 67
3-4- خلاصه فصل 68
فصل چهارم پیادهسازی راهکار پیشنهادی و نتایج ارزیابی آن 70
4-1- مقدمه71
4-2- معیارهای ارزیابی 71
4-2-1- معیار دقت 72
4-2-2- شاخص Davies-Bouldin 73
4-2-3- شاخص Rand73
4-2-4- متوسط اطلاعات دوجانبه نرمالسازی شده (ANMI) 75
4-3- پیادهسازی 76
4-4- مجموعههای دادهای76
4-5- نتایج ارزیابی78
4-5-1- معیار دقت 78
4-5-2- شاخص Davies-Bouldin81
4-5-3- شاخص Rand 83
4-5-4- متوسط اطلاعات دوجانبه نرمالسازی شده (ANMI) 85
4-6- خلاصه فصل 87
فصل پنجم نتیجهگیری و کارهای آینده 88
5-1- مقدمه 89
5-2- نتیجهگیری 89
5-3- کارهای آینده 92
مراجع 94
پیوست الف : فهرست اختصارات 100
پیوست ب : واژهنامه انگلیسی به فارسی 101
پیوست ج : واژهنامه فارسی به انگلیسی 107
منبع:
[1]
Agarwal, P. K., Har-Peled, S., & Yu, H. 2013. Embeddings of surfaces, curves, and moving points in Euclidean space. SIAM Journal on Computing, 42(2), 442-458.
[2]
Alam, S., Dobbie, G., Koh, Y. S., & Riddle, P. 2013, April, Clustering heterogeneous web usage data using hierarchical particle swarm optimization, In Swarm Intelligence (SIS), 2013 IEEE Symposium on (pp. 147-154). IEEE.
[3]
Al-Zoubi, M. B., Hudaib A., Huneiti A. and Hammo B. 2008. New Efficient Strategy to Accelerate k-Means Clustering Algorithm. American Journal of Applied Sciences. 5:1247-1250
[4]
Amigó, E., Gonzalo, J., Artiles, J. and Verdejo, F. 2008. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval. Springer.
[5]
Arthur, D. and Vassilvitskii, S. 2007. k-means++: the advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms. p:1027-1035.
[6]
Ayad, H. G. 2008. Voting-Based Consensus of Data Partitions. PhD Thesis (In University of Waterloo).
[7]
Ayad, H. G. and Kamel, M. S. 2005. Cluster-based cumulative ensembles. In Multiple Classifier Systems: Sixth International Workshop, MCS 2005. Seaside, CA, USA. p:236–245.
[8]
Belghini, N., Zarghili, A., Kharroubi, J., & Majda, A. 2011, January. Sparse Random Projection and Dimensionality Reduction Applied on Face Recognition. In The Proceedings of International Conference on Intelligent Systems & Data Processing (pp. 78-82).
[9]
Berkhin, P. 2006. Survey on Clustering Data Mining Techniques. Grouping Multidimensional Data. Springer. p:25-71.
[10]
Boulis, C. and Ostendorf, M. 2004. Combining multiple clustering systems. In The 8th European conference on Principles and Practice of Knowledge Discovery in Databases(PKDD), LNAI 3202. p:63–74.
[11]
Chunsheng, H., Qian, C., Haiyuan, W. and Wada, T. 2008. RK-Means Clustering: K-Means with Reliability. IEICE transactions on information and systems. 91(1):96-104.
[12]
David, G. and Thomas, H. 2005. Non-redundant clustering with conditional ensembles. The 11th ACM SIGKDD international conference on Knowledge discovery in data mining. p:70-77.
[13]
Dimitriadou, E., Weingessel, A. and Hornik, K. 2002. A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence. 16:901–912.
[14]
Domeniconi, C. and Al-Razgan, M. 2007. Weighted Cluster Ensembles: Methods and Analysis. Technical Report ISE-TR-07-06.
[15]
Domininique, V. , Abdi, H., Williams, L. J., Bennani‐Dosse, M. 2012. Statis and distatis: optimum multitable principal component analysis and three way metric multidimensional scaling. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2), 124-167.
[16]
Duda, R. O., Hart, P. E., & Stork, D. G. 2012. Pattern classification. John Wiley & Sons.
[17]
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering Procedure. Bioinformatics. 19(9):1090-1099
[18]
Elkan, C. 2003. Using the triangle inequality to accelerate k-means. Proceedings of the 20th International Conference on Machine Learning (ICML-2003).
[19]
Fischer, B. and Buhmann, J. M. 2003. Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25:1411–1415.
[20]
Fred, A. 2001. Finding consistent clusters in data partitions. The Second International Workshop on Multiple Classifier Systems. Springer-Verlag. p:309-318.
[21]
Fred, A. and Jain, K. A. 2002. Evidence Accumulation Clustering Based on the K-Means Algorithm. The Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition. Springer-Verlag. p:442-451.
[22]
Gasieniec, L., Jansson, J. and Lingas, A. 2004. Approximation algorithms for Hamming clustering problems. Journal of Discrete Algorithms. Elsevier. 2:289-301
[23]
Gionis, A., Mannila, H. and, Tsaparas, P. 2005. Clustering Aggregation. In Proceedings of Twenty-fitst International Conference on Data Engineering (ICDE). p:341-352.
[24]
Guillaume, R., & Mouaddib, N. 2002. SAINTETIQ: a fuzzy set-based approach to database summarization. Fuzzy sets and systems, 129(2), 137-162.
[25]
Gondek, D. and Hofmann, T. 2004. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining. p:75–82.
[26]
Gordon, A. D. and Vichi, M. 2001. Fuzzy partition models for fitting a set of partitions. Psychometrika. 66:229–248.
[27]
Greene, D., Tsymbal A., Bolshakova, N. and Cunningham P. 2004. Ensemble Clustering in Medical Diagnostics. Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. p:576-581.
[28]
Gupta, M., & Han, J. 2011 , Heterogeneous network-based trust analysis: a survey, ACM SIGKDD Explorations Newsletter, 13(1), 54-71.
[29]
Halkidi, M., Batistakis, Y. and Vazirgiannis, M. 2002. Clustering validity checking methods: part II. ACM SIGMOD Record. 31:19-27.
[30]
Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques. 2th Edition, Morgan Kaufman Publishers.
[31]
Hartigan, J. 1975. Clustering Algorithm. Wiley.
[32]
Hashimoto, T. & Chakraborty, B. 2010, September. Topic extraction from messages in social computing services: Determining the number of topic clusters. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on (pp. 232-235). IEEE.
[33]
Hathaway, R., Bezdek J. and Hu, Y. 2000. Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Transaction Fuzzy Systems. 8:576–582.
[34]
http://archive.ics.uci.edu/ml/datasets.html
[35]
http://www.visionbib.com/bibliography/pattern629.html
[36]
Jiang, D., Tang, C., & Zhang, A. 2004. Cluster analysis for gene expression data: A survey. Knowledge and Data Engineering, IEEE Transactions on, 16(11), 1370-1386.
[37]
Julia, C. 2005. Kernel K-Means for Categorical Data. Advances in Intelligent Data Analysis VI, Springer. p:46-56
[38]
Karen, D., Boman, E. G., Heaphy, R. T., Hendrickson, B. A., Teresco, J. D., Faik, J., ... & Gervasio, L. G. 2005. New challenges in dynamic load balancing. Applied Numerical Mathematics, 52(2), 133-152.
[39]
Karen, D., Boman, E. G., Heaphy, R. T., Bisseling, R. H., & Catalyurek, U. V. 2006, April. Parallel hypergraph partitioning for scientificcomputing. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International (pp. 10-pp). IEEE.
[40]
Karypis, G., & Kumar, V. 2000. Multilevel k-way hypergraph partitioning. VLSI design, 11(3), 285-300.
[41]
Kamal, N., McCallum, A., & Ungar, L. H. 2000, August. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 169-178). ACM.
[42]
Kaski, S., Lagus, K., & Kohonen, T. 2004. Mining massive document collections by the WEBSOM method. Information Sciences, 163(1), 135-156.
[43]
Kaufmann, E. L. & Mueller, H. 2001. Wellness tourism: Market analysis of a special health tourism segment and implications for the hotel industry. Journal of Vacation Marketing, 7(1), 5-17.
[44]
Kogan, J., Nicholas, C. and, Teboulle, M. 2006. Grouping Multidimensional Data. Springer.
[45]
Kotsiantis, S. B. and Pintelas, P. E. 2004. Recent Advanced in Clustering: A Brief Survey. WSEAS Transactions on Information Science and Applications 1. p:73-81.
[46]
L. N. Fred, A. and Jain, A. K. 2002. Data Clustering Using Evidence Accumulation. Proceedings of the 16th International Conference on Pattern Recognition. 4:276-280
[47]
L. N. Fred, A. and Jain, K. A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 27:835–850.
[48]
Lei, M., He, P. and Li, Z. 2006. An Improved K-means Algorithm for Clustering Categorical Data. Journal of Communication and Computer (USA). 3(8):20-24.
[49]
Leisch, F. 1999. Bagged clustering.
[50]
Lu, J. F., Tang, J. B., Tang, Z. M. and Yang, J. Y. 2008. Hierarchical initialization approach for K-Means clustering. Pattern Recognition Letters, Elsevier. 29:787-795.
[51]
Manning, C. D., Raghavan, P. and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.
[52]
Matteo, c., Salam, G. P., & Soyez, G. 2008. The anti-kt jet clustering algorithm. Journal of High Energy Physics, 2008(04), 063.
[53]
Minaei-Bidgoli, B., Topchy, A. and F. Punch, W. 2004. A Comparison of Resampling Methods for Clustering Ensembles. Proceedings of the International Conference on Artificial Intelligence (IC-AI '04). p:939-945.
[54]
Minaei-Bidgoli, B., Topchy, A. and Punch, W. 2004. Ensembles of partitions via data Resampling. In IEEE Intl. Conf. on Information Technology: Coding and Computing, ITCC04, Proceedings. 2:188–192.
[55]
Mirkin, B. 2001. Reinterpreting the category utility function. Machine Learning. 45:219-228.
[56]
Mirkin, B. 2005. Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC.
[57]
Nguyen, N., Caruana, R. 2007. Consensus Clustering. Proceedings of the Sixth International Conference on Data Mining (ICDM). p:607-612.
[58]
Pang-Ning, T., Steinbach, M. and Kumar V. 2005. Introduction to Data Mining. Addison-Wesley.
[59]
Ping, M., Castillo-Davis, C. I., Zhong, W., & Liu, J. S. 2006. A data-driven clustering method for time course gene expression data. Nucleic Acids Research, 34(4), 1261-1269.
[60]
Rizman alik, K. 2008. An efficient k'-means clustering algorithm. Pattern Recognition Letters, Elsevier. 29:1385-1391.
[61]
Strehl, A. 2002. Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD Thesis (The University of Texas at Austin).
[62]
Strehl, A., Ghosh, J. and Cardie, C. 2002. Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research. 3:583-617
[63]
Terry, c. and Serhiy, K. 2004. An eigenspace projection clustering method for inexact graph matching. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(4), 515-519.
[64]
Toledo, M. D. G. 2005. A Comparison in Cluster Validation Techniques. Master of Science Thesis (The University of Puerto Rico).
[65]
Topchy, A., H. C. Law, M., Jain, K. A. and Fred, A. 2004. Analysis of Consensus Partition in Cluster Ensemble. The Fourth IEEE International Conference on Data Mining (ICDM’04). p:225–232.
[66]
Topchy, A., Jain, K. A. and Punch, W. 2003. Combining Multiple Weak Clusterings. In Third IEEE International Conference on Data Mining. p:331-338
[67]
Topchy, A., Jain, K. A. and Punch, W. 2004. A Mixture Model for Clustering Ensembles. The Fourth SIAM International Conference on Data Mining. p:379–390.
[68]
Topchy, A., Jain, K. A. and Punch, W. 2005. Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence. 27:1866-1881.
[69]
Vaidya, J. and Clifton, C. 2003. Privacy-Preserving K-Means Clustering over Vertically Partitioned Data. Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining. p:206-215.
[70]
Valente de, O. J. and Pedrycz, W. 2007. Advances in Fuzzy Clustering and its Applications. Wiley.
[71]
Weingessel, A., Dimitriadou, E. and Kurt, H. 2001. Voting-merging: An Ensemble Method for Clustering. Artificial Neural Networks-ICAN. Springer. p:217–224.
[72]
Xu, R. and Wunsch II, D. 2005. Survey of Clustering Algorithms. IEEE Transaction of Neural Networks. 16:645-678.
[73]
Zhang Fern, X. and E. Brodley, C. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. ICML, AAAI Press. p:186-193
[74]
Zhou, Q & Marchetti, Y. 2014. Solution path clustering with adaptive concave penalty. Electronic Journal of Statistics, 8(1), 1569-1603.
[75]
Zhao, Y. and Karypis, G. 2002. Technical Report 02-014. Computer Science and Engineering Technical Report. University of Minnesota.
[76]
Zhao, Y., Karypis, G. 2001. Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report TR 01-40, University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis.