Departmental Colloquium: Dr. Yan Zhu
September 7, 2016 @ 1:00 pm - 2:00 pm
Department Conference Room (25 Park Place, Room 755)
Blocking Web Spam by Entropy-based Cascade Outlier Detection
Dr. Yan Zhu<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fuserweb.swjtu.edu.cn%2fUserweb%2fflora02%2findex.htm&data=01%7c01%7cyzhang%40gsu.edu%7c40c0279c5fb34ec8bbe408d3d1c9401c%7c515ad73d8d5e4169895c9789dc742a70%7c0&sdata=IQeCCopTEB0Lnr9wYneaYnKNdF70w2TIx6gCF6ICQHQ%3d>
School of Information Science and Technology
Southwest Jiaotong University
Web spam refers to Web pages that try to trick search engines to increase their rankings. It causes huge damage for e-commerce and Web users and threatens Web security. Therefore, combating Web spam is an urgent task. In this talk, I will introduce a cascade detection mechanism based on the entropy-based outlier mining (EOM) algorithm, where Web quality and semantic features are integrated with content and link characteristics to cover multiple dimensions of Web pages. The detection mechanism consists of three stages with different feature groups. Experiments on WEBSPAM-UK2007 showed that quality and semantic features could effectively improve detection, and that the EOM algorithm outperformed many classic classification algorithms for unbalanced data.
About the Speaker: Dr. Yan Zhu is a professor at Southwest Jiaotong University (SWJTU) in China. She received her Ph.D. degree in computer science from Darmstadt University of Technology (TU Darmstadt), Germany, in 2004 and was a research staff member in the Department of Computer Science at TU Darmstadt from 1998 to 2004. She joined the faculty of SWJTU in 2005, where she leads the Data Mining and Web Engineering Group.
Her research interests are in the areas of data mining, big data analysis and management, and Web data security. She has written two books (published by Shaker Verlag, Germany, in 2004 and by SWJTU Press, China, in 2011) and many SCI/EI indexed papers.