فهرست:
چکیده--------------------------------------------------------------------------------1
فصل اول:مقدمه----------------------------------------------------------------------2
1-1 پیش گفتار ----------------------------------------------------------------------3
1-2 بیان مسئله ----------------------------------------------------------------------3
1-3 اهمیت و ضرورت انجام تحقیق ---------------------------------------------------4
ساختار پایان نامه --------------------------------------------------------------------5
فصل دوم:وب و هرزنامه های وب----------------------------------------------------6
2-1 وب جهان گستر ----------------------------------------------------------------7
2-1-1 وب به عنوان گراف--------------------------------------------------------8
2-1-2 گراف وب در صفحه و سطح میزبان---------------------------------------8
2-1-3 اتصال--------------------------------------------------------------------9
2-2 موتورهای جستجو------------------------------------------------------------10
2-2-1 معماری موتورهای جستجوی وب----------------------------------------11
2-2-2 سرویس دهنده پرس و جوی موتور جستجو-------------------------------13
2-3 رتبه بندی-------------------------------------------------------------------13
2-3-1 رتبه بندی مبتنی بر محتوا----------------------------------------------13
2-3-2 الگوریتم های مبتنی بر لینک-------------------------------------------15
2-4 هرزنامه وب---------------------------------------------------------------19
2-4-1 هرزنامه محتوا------------------------------------------------------20
2-4-2 هرزنامه لینک -----------------------------------------------------22
2-4-3 تکنیک های مخفی -------------------------------------------------27
2-5 یادگیری ماشین ------------------------------------------------------------29
2-5-1 NaΪVe Bayes --------------------------------------------------------30
2-5-2 درخت تصمیم ------------------------------------------------------------31
2-5-3 ماشین بردار پشتیبان-------------------------------------------------------33
2-6 ترکیب طبقه بندی کننده ها---------------------------------------------------------35
2-6-1 Bagging ----------------------------------------------------------------35
2-6-2 Boosting ---------------------------------------------------------------36
2-7 روش های ارزیابی --------------------------------------------------------------37
2-7-1 ارزیابی متقاطع -----------------------------------------------------------38
2-7-2 دقت و فراخوانی----------------------------------------------------------38
2-7-3 منحنی ROC ------------------------------------------------------------39
2-8 جمع بندی-------------------------------------------------------------------------40
فصل سوم: پیشینه تحقیق -------------------------------------------------------------41
3-1 مجموعه داده های مورد استفاده توسط محققین --------------------------------------42
3-1-1 UK2006 ---------------------------------------------------------------42
3-1-2 UK2007 ---------------------------------------------------------------43
3-1-3 مجموعه داده جمع آوری شده با استفاده از جستجوی MSN -----------------44
3-1-4 DC2010 ---------------------------------------------------------------44
3-2 مطالعات مبتنی بر محتوا----------------------------------------------------------47
3-3 روش های مبتنی بر لینک---------------------------------------------------------51
3-3-1 الگوریتم های مبتنی بر انتشار برچسب ها --------------------------------51
3-3-2 رتبه بندی تابعی --------------------------------------------------------55
3-3-3 الگوریتم های هرس لینک و وزن دهی دوباره-----------------------------56
3-3-4 الگوریتم های مبتنی بر پالایش برچسب ها --------------------------------57
3-4 روش های مبتی بر لینک و محتوا --------------------------------------------------------58
3-4-1 مطالعات مبتنی بر کاهش ویژگی -------------------------------------------------57
3-4-2 مطالعات مبتنی بر ترکیب طبقه بندی کننده ها--------------------------------------59
3-4-3 مطالعات مبتنی بر تست اهمیت ویژگی های متفاوت در تشخیص هرزنامه ----------63
3-4-4 مطالعات مبتنی بر پیکربندی وب ------------------------------------------------71
3-4-5 تشخیص هرزنامه از طریق آنالیز مدلهای زبانی-----------------------------------76
3-4-6 تاثیر زبان صفحه بر ویژگی های تشخیص هرزنامه وب---------------------------79
3-4-7 رویکرد ترکیب ویژگی های مبتنی بر محتوا و لینک برای صفحات عربی ----------82
3-5 جمع بندی---------------------------------------------------------------------------------83
فصل چهارم: پیاده سازی ایده پیشنهادی -------------------------------------------------------85
4-1 مقدمه-------------------------------------------------------------------------------------86
4-2 ویژگی های مجموعه داده انتخابی ----------------------------------------------------------87
4-3 پیش پردازش -----------------------------------------------------------------------------92
4-3-1 پیش پردازش مجموعه داده UK2007 --------------------------------------------------93
4-3-2 کاهش ویژگی ها با اعمال الگوریتم های داده کاوی--------------------------------------93
4-4 داده کاوی و ارزیابی مدل ها--------------------------------------------------------------96
4-4-1 نتایج الگوریتم ها با اعمال روش های کاهش ویژگی --------------------------------102
4-4-2 مقایسه مقدار F_measure بدست آمده از الگوریتم ها با اعمال بر روی ویژگی های بدست آمده از الگوریتم های کاهش ویژگی------------------------------------------------------109
4-5 تفسیر نتایج----------------------------------------------------------------------------110
4-6 جمع بندی------------------------------------------------------------------------------114
فصل پنجم: نتیجه گیری و کارهای آتی ---------------------------------------------------------115
5-1 نتیجه گیری ------------------------------------------------------------------------116
5-2 کارهای آتی- -------------------------------------------------------------------------117
منابع------------------------------------------------------------------------------------------118
پیوست1 ------------------------------------------------------------------------------------125
پیوست 2 ------------------------------------------------------------------------------------126
پیوست 3-------------------------------------------------------------------------------------126
پیوست4 -------------------------------------------------------------------------------------127
پیوست 5-------------------------------------------------------------------------------------127
پیوست 6-------------------------------------------------------------------------------------128
پیوست 7-------------------------------------------------------------------------------------129
پیوست 8-------------------------------------------------------------------------------------129
پیوست 9-------------------------------------------------------------------------------------129
پیوست 10 ----------------------------------------------------------------------------------130
پیوست 11 -----------------------------------------------------------------------------------130
پیوست 12-----------------------------------------------------------------------------------131
پیوست 13-----------------------------------------------------------------------------------132
پیوست 14-----------------------------------------------------------------------------------133
چکیده انگلیسی--------------------------------------------------------------------------------134
منبع:
[1] Han, J., Kamber, M., 2001, “Data Mining: Concepts and Techniques”, Morgan Kaufman, San Francisco.
[2] Abernethy, J., Chapelle,O., Castillo.,C., Nov.2010, “Graph regularization methods for web spam detection”. Mach. Learn., Vol. 81.
[3] http://searchengineland.com/ businessweek-dives-deep-into-googles-search-quality-27317, 2011.
[4] Eiron, N., McCurley, K. S., Tomlin., J. A., 2004, “Ranking the web frontier”, In Proceedings of the 13th International Conference on World Wide Web, WWW’04, New York.
[5] Page, L., Brin, S., Motwani, R., Winograd., T., 1998, “The pagerank citation ranking: Bringing order to the web”.
[6] Jennings, R., 2005, “The global economic impact of spam”, Ferris Research.
[7] Silverstein, C., Marais, H., Henzinger, M., Moricz, M., Sept. 1999, “Analysis of a very large web search engine query log”, SIGIR Forum, 33.
[8] Bencz´ur, A. A., Csalog´any, K., Sarl´os, T., Uher, M., May 2005, “Spamrank: Fully automatic link spam detection work in progress”, In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’05,.
[9] Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.,2007, “Know your neighbors: web spam detection using the web topology”. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’ 07, Amsterdam, The Netherlands.
[10] Gulli, A., Signorini ,A., 2005, “The indexable web is more than 11.5 billion pages”, In Proceedings of the 14th World Wide Web Conference (WWW), Special interest tracksand posters, pages 902–903.
[11] The Official Google Blog, 2008.
[12] Cho, J. , Garcia-Molina, H.,2000, ” The evolution of the web and implications for an incremental crawler”, In The VLDB Journal, pages 200–209.
[13] Bar-Yossef, Z., Broder, A. Z., Kumar, R., Tomkins, A.,2004, ” Sic transit Gloria telae: Towards an understanding of the web’s decay”, In Proceedings of the 13th World Wide Web Conference (WWW), pages 328–337. ACM Press.
[14] Berners-Lee, T., Hendler, J., Lassila, O., 2001, “The semantic web. Scientific American”.
[15] Davison, B. D., 2000, ” Recognizing nepotistic links on the web”, In AAAI-2000 Workshop on Artificial Intelligence for Web Search, pages 23–28, Austin, TX.
[16] Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J. , 2000, ” Graph structure in the web”, In Proceedings of the 9th World Wide Web Conference (WWW), pages 309–320. North-Holland Publishing Co. .
[17] Silverstein, C., Marais, H., Henzinger, M., Moricz, M., 1999, “Analysis of a very large web search engine query log” SIGIR Forum, 33(1):6–12, 1999.
[18] Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S., August 2001, ” Searching the web. ACM Transactions on Internet Technology (TOIT)”, 1(1):2–43,.
[19] Brin , S., Page, L., 1998, ” The anatomy of a large-scale hypertextual Web search engine”, Computer Networks and ISDN Systems, 30(1-7):107–117.
[20] Risvik , K. M., Michelsen, R., 2002,” Search engines and web dynamics”, Computer Networks, 39(3):289–302.
[21] Gyongyi, Z., Garcia-Molina, H., 2004, ” Web Spam Taxonomy”, Technical Report, Stanford University.
[22] Baeza-Yates, R., Ribeiro-Neto, B., 1999, ” Modern Information Retrieval”, Addison-Wesley, Boston.
[23] S. E., Robertson , Jones, K. S., 1988, “Relevance weighting of search terms”, In Document retrieval systems, pages 143–160. Taylor Graham Publishing, London, UK.
[24] Csalogány, K. ,2009, “ Methods for Web Spam Filtering”, Technical Report, Eötvös Loránd University.
[25] Salton, G., Buckley, C., 1988, “Term-weighting approaches in automatic text retrieval”, Information Processing & Management, 24(5):513-523.
[26] Page, L., Brin, S., Motwani, R., Winograd, T., 1998, “The PageRank citation ranking: Bringing order to the web”, Technical Report 1999-66, Stanford University.
[27] Motwani , R., Raghavan, P., 1995, ” Randomized Algorithms”, Cambridge University Press.
[28] Brin, S., Page, L., Apr.1998, “The anatomy of a large-scale hypertextual Web search engine”, In Proceedings of the 7th International World Wide Web Conference, pages 107-117, Brisbane, Australia.
[29] Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.,” Mining the Web’s link Structure”. Computer, 32(8):60–67.
[30] Bianchini, M., Gori, M., Scarselli, F., 2005,” Inside PageRank”, ACM Transactions on Internet Technology, 5(1):92–128.
[31] Liu, B., 2007, “Web Data minig. Exploring Hyperlinks, Contents, and Usage Data”, pages 230-233. Springer-Verlag Berlin Heidelberg, New York.
[32] Gyöngyi, Z., Garcia-Molina, H., 2005, ” Web spam taxonomy” In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Chiba, Japan.
[33] Wu, B.,2007,” Finding and Fighting Search Engine Spam”, Phd thesis, Lehigh University.
[34] Hastie, T., Tibshirani, R., Friedman, J. H., 2001, “ The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations”. New York: Springer-Verlag.
[35] Liu, B., 2007, “Web Data minig. Exploring Hyperlinks, Contents, and Usage Data”, pages 63-64. Springer-Verlag Berlin Heidelberg, New York.
[36] Shannon, E., 1984, “ A Mathematical Theory of Communication”, In Bell System Technical Journal, 27: pp. 379–423.
[37] Liu, B., 2007, “Web Data minig. Exploring Hyperlinks, Contents, and Usage Data”, pages 97-103. Springer-Verlag Berlin Heidelberg, New York.
[38] Breiman, L., 1996, “Bagging Predictors”, Machine Learning, 24(2), 123–140.
[39] Salton , G., McGill, M.,1983, “An Introduction to Modern Information Retrieval”, New York, NY: McGraw-Hill.
[40] Freund, Y., Schapire, R. E., 1996, “Experiments with a New Boosting Algorithm”, In Proc. of the 13th Intl. Conf. on Machine Learning (ICML'96), pp. 148–156.
[41] Quinlan, J. R., 1996, “Bagging, Boosting, and C4.5”, In Proc. of National Conf. on Artificial
Intelligence (AAAI-96), pp. 725-730.
[42] Liu, B., 2007, “Web Data minig. Exploring Hyperlinks, Contents, and Usage Data”, pages 72-75. Springer-Verlag Berlin Heidelberg, New York.
[43] Wu, B., Goel, V., Davison, B. D., May 2006 “Propagating trust and distrust to demote web spam”, In Proceedings of the Workshop on Models of Trust for the Web, Edinburgh,Scotland.
[44] Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.,2007, “Know your neighbors: web spam detection using the web topology”. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’ 07, Amsterdam, The Netherlands.
[45] Becchetti, L., Castillo, C., Donato, D., Boldi, P., Leonardi, S., Santini, M., Vigna, S., 2006, “A Reference Collection for Web Spam”.
[46] Mahmoudi, M., Yari, A., Khadivi, S., 2010, ”Web Spam Detection Based on Discriminative Content and Link Features”, 5th International Symposium on Telecommunications
[47] Fetterly, D., Manasse, M., Najork, M., 2004, “Spam, damn spam, and statistics: using statistical analysis to locate spam web pages”, In Proceedings of the 7th International Workshop on the Web and Databases: collocated with ACM SIGMOD/PODS 2004, WebDB’04, Paris, France.
[48] Erd´elyi, M., Garz´o, A., Bencz´ur, A. A., 2011, ” Web spam classification: a few features worth more”, In Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality’11, Hyderabad, India.
[49] Ntoulas, A., Najork, M., Manasse, M., Fetterly, D., 2006,“Detecting spam web pages through content analysis”, In Proceedings of the 15th International Conference on World Wide Web, WWW’06, Edinburgh, Scotland.
[50] Fetterly, D., Manasse, M., Najork, M., 2007, “Detecting phrase-level duplication on the world wide web”, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’05, Salvador, Brazil.
[51] D Fetterly, D., Manasse, M., Najork, M., Oct. 2003, “On the evolution of clusters of near-duplicate web pages”, J. Web Eng., 2.
[52] Broder, A. Z., 1993, “Some applications of rabin’s fingerprinting method”, In Sequences II: Methods in Communications, Security, and Computer Science. Springer-Verlag.
[53] Rabin, M., 1981,“Fingerprinting by Random Polynomials”, Technical report, Center for Research in Computing Technology, Harvard University.
[54] Erd´elyi, M., Garz´o, A., Bencz´ur, A. A., 2011, ” Web spam classification: a few features worth more”, In Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality’11, Hyderabad, India.
[55] Urvoy, T., Lavergne, T., Filoche, P., Aug. 2006, “Tracking Web Spam with Hidden Style Similarity”, In Proceedings of the Second International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’06, Seattle, Washington.
[56] Mishne, G., Carmel, D., Lempel, R., May 2005, “Blocking blog spam with language model disagreement”, In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’05, Chiba, Japan.
[57] Hiemstra, D., 2009, “Language models”, In Encyclopedia of Database Systems.
[58] Piskorski, J., Sydow, M., Weiss, D., 2008, “Exploring linguistic features for web spam detection: a preliminary study”, In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’08, Beijing, China.
[59] Sydow, M., Piskorski, J., Weiss, D., Castillo, C. ,2007, “Application of machine learning in combating web spam”, Polish Ministry of Science gran.
[60] Bencz´ur, A., B´ır´o, I., Csalog´any, K., Sarl´os T., 2007, “Web spam detection via commercial intent analysis”, In Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’07.
[61] Chellapilla, K., Chickering, D., 2006, “Improving cloaking detection using search query popularity and monetizability”, AIRWeb'06, pp. 20-26.
[62] Wahsheh, H. A., Al-Kabi, M. N., 2011, “Detecting Arabic Web Spam”, The 5th International Conference on 21 Wahsheh et al. Information Technology, ICIT 2011, Paper ID (631), pp. 1-8.
[63] Wang, W., Zeng, G., Sun, M., Gu, H., Zhang, Q.,2007, “EviRank: An Evidence Based Content Trust Model for Web Spam Detection”. APWeb/WAIM, pp. 299-307.
[64] Wang,W., Zeng, G., Tang, D., 2010, “Using evidence based content trust model for spam detection”, Expert Systems with Applications, 37 (8), pp. 1-8.
[65] Wang,W., Zeng, G., 2007, “Content Trust Model for Detecting Web Spam”. IFIP International Federation for Information Processing. pp. 139-152.
[66] Ntoulas, A., Najork, M., Manasse, M., Fetterly, D., 2006,“Detecting spam web pages through content analysis”, In Proceedings of the 15th International Conference on World Wide Web, WWW’06, Edinburgh, Scotland.
[67] Wahsheh, H., Doush, I. A., Al-Kabi, M., Alsmadi, I., Al-Shawakfa, E., 2012, “Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam” Journal of Information Assurance and Security.ISSN 1554-1010 Volume 7 ,pp. 14-23.
[68] Spirin, N., Han, J., 2011, “Survey on Web Spam Detection: Principles and Algorithms”, ACM SIGKDD Explorations Newsletter, Volume 13, pp. 50-64.
[69] Gyöngyi, Z., Garcia-Molina, H., Pedersen, J., 2004,“Combating web spam with TrustRank”, In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 576–587, Toronto, Canada.
[70] Bencz´ur, A. A., Csalog´any, K., Sarl´os, T., Uher, M., 2005, “Spamrank: Fully automatic link spam detection work in progress”, In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb’05.
[71] Guha, R., Kumar, R., Raghavan, P., Tomkins, A., 2004, “Propagation of trust and distrust”, In Proceedings of the 13th International Conference on World Wide Web, WWW’04, New York, NY.
[72] Gyongyi, Z., Garcia-Molina, H., 2006, “Link spam detection based on mass estimation” , In Proceedings of the 32nd International Conference on Very Large Databases, VLDB’06.
[73] Caverlee, J., Liu, L., 2007, “Countering web spam with credibility-based link analysis”. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, PODC’07, Portland, OR.
[74] Baeza-Yates, R., Boldi, P., Castillo, C., 2006, “Generalizing pagerank: damping functions for link-based ranking algorithms”, In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’06, Seattle,Washington.
[75] Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R., , 2006, “Using rank propagation and probabilistic counting for link-based spam detection”, In Proceedings of the Workshop on Web Mining and Web Usage Analysis, WebKDD’06, Philadelphia, USA.
[76] Bharat,K., Henzinger, M. R., 1998, “Improved algorithms for topic distillation in a hyperlinked environment”, In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’98, Melbourne, Australia.
[77] Nomura,S., Oyama, S., Hayamizu, T., Ishida, T., Nov.2004, ”Analysis and improvement of hits algorithm for detecting web communities”, Syst. Comput. Japan, 35.
[78] Lempel, R., Moran, S., 2001, “SALSA: the stochastic approach for link-structure analysis”, ACM Trans. Inf. Syst., 19.
[79] Roberts, G., Rosenthal, J., 2003, “Downweighting tightly knit communities in World Wide Web rankings” Advances and Applications in Statistics (ADAS).
[80] Davison, B., “Recognizing nepotistic links on the web. In Workshop on Artificial Intelligence for Web Search”, AAAI’00.
[81] Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R., 2006, “Link-based characterization and detection of web spam”, In Proceedings of the Second In-ternational Workshop on Adversarial Information Retrieval on the Web, AIRWeb’06, Seattle, USA.
[82] Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Sch¨olkopf, B., Olkopf, B. S., 2003, “Learning with Local and Global Consistency”, In Proceedings of the Advances in Neural Information Processing Systems 16, volume Vol. 16.
[83] Kou, Z., Cohen, W. W., April 2007, “Stacked graphical models for efficient inference in markov random fields”, In Proceedings of the Seventh SIAM International Conference on Data Mining, SDM’07, Minneapolis, Minnesota.
[84] Robertson, S. E., Walker, S., 1994, “Some simple e_ective approximations to the 2-poisson model for probabilistic weighted retrieval”, In In Proceedings of SIGIR'94, pages 232{241. Springer-Verlag.
[85]. Google Search Engine Ranking Factors. http://www.seomoz.org/article/search-ranking-factors. Accessed 29 June 2009.
[86]. Bifet, A., Castillo, C., Chirita, P. A., Weber, I., 2005, ”An analysis of factors used in search engine ranking. In”, Adversarial Information Retrieval on the Web.
[87]. Evans, M.P., 2007, “Analysing Google rankings through search engine optimization data”, Internet Res. 17(1), 21–37
[88]. Karlberger,C., Bayler,G., Kruegel,C., Kirda, E., 2007, ”Exploiting redundancy in natural language to penetrate bayesian spam filters”, In:First USENIX Workshop on Offensive Technologies (WOOT07).
[89] Egele, M., Kolbitsch, C., Platzer, C., 2009, “Removing web spam links from search engine results”, Springer-Verlag France.
[90] Cohen, W. W., Kou, Z., 2006, “Stacked graphical learning:approximating learning in markov random fields using very short inhomogeneous markov chains”, Technical report.
[91] Ponte, J. M., Croft, W. B., 1998, “A language modeling approach to information retrieval”, In SIGIR ’98:Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281, New York, NY,USA.
[92] Witten, I. H., Frank,E., 2005, “Data Mining: PracticalMachine Learning Tools and Techniques”,Morgan Kaufmann, 2 edition.
[93] Cover, T. M., Thomas, J. A., 1991, “Elements of information theory” Wiley-Interscience, New York, NY, USA.
[94] Martinez-Romo, J., Araujo, L., 2009, “Web Spam Identification Through Language Model Analysis” ACM.
[95] Uk-2011 web spam dataset. Accessed: May 2012. https://sites.google.com/site/heiderawahsheh/home/web-spam-2011-datasets/uk-2011-web-spam-dataset.
[96] Extended arabic web spam 2011 dataset. Accessed: May 2012. https://sites.google.com/site/heiderawahsheh/home/web-spam-2011-datasets/arabic-web-spam-2011-dataset.
[97] Gadge, J., Sane, S., Kekre, H., 2011,“Layered Approach to Improve Web Information Retrieval”, Proceedings on 2nd National Conference on Information and Communication Technology NCICT. v7, pp. 28-32.
[98] Wahsheh, H., Al-Kabi, M., Alsmadi, I., 2012, ”Evaluating Arabic spam Classifiers Using Link Analysis”, In Proceeding of the 3rd International Conference on Information and Communication Systems (ICICS'12), ACM, Irbid, Jordan. (2012d) pp.1-5.
[99] Wahsheh, H. A., Al-Kabi, M. N., 2011, “Detecting Arabic Web spam”, The 5th International Conference on Information Technology (ICIT 2011), Amman-Jordan, pp. 1-8.
[100] Wahsheh, H. A., Al-Kabi, M. N., Alsmadi, I. M., 2013, “A link and Content Hybrid Approach for Arabic Web Spam Detection”, MECS.
[101] Witten, I.H., FranK, E., (2005). “Data Mining: Practical Machine Learning Tools and Techniques, Second
Edition”, Morgan Kaufmann Publishers Inc, ISBN:0120884070
[102] Nathan, P., (2005), “Enhancing Random Forest Implementation in Weka”, Machine Learning Conference Paper for ECE591Q.
[103] Sharma, T.C., Manoj, J., (2013), “WEKA Approach for Comparative Study of Classification Algorithm”, International Journal of Advanced Research in Computer and Communication Engineering ,Vol. 2, Issue 4.
[104] Mooney, R.J., Melville, P., (2003), “Constructing Diverse Classifier Ensembles using Artificial Training Examples”, Proceedings of the IJCAI-2003, pp.505-510,Acapulco, Mexico.
[105] Kuncheva, L.I, Rodriguez, J.J., (2007), “An Experimental Study on Rotation Forest Ensembles”, technical report, School of Electronics and Computer Science, University of Wales, Bangor, UK.
[106] Liu, B.,(2007), “ Web Data minig. Exploring Hyperlinks, Contents, and Usage Data”, Springer-Verlag Berlin Heidelberg, New York, pages 112-113.