Tài liệu khai phá dữ liệu web - Web data mining Bing Liu


The rapid growth of the Web in the past two decades has made it the larg- est publicly accessible data source in the world. Web mining aims to dis- cover useful information or knowledge from Web hyperlinks, page con- tents, and usage logs. Based on the primary kinds of data used in the mining process, Web mining tasks can be categorized into three main types: Web structure mining, Web content mining and Web usage mining. Web structure mining discovers knowledge from hyperlinks, which repre- sent the structure of the Web. Web content mining extracts useful informa- tion/knowledge from Web page contents. Web usage mining mines user activity patterns from usage logs and other forms of logs of user interac- tions with Web systems. Since the publication of the first edition at the end of 2006, there have been some important advances in several areas. To re- flect these advances, new materials have been added to most chapters. The major changes are in Chapter 11 and Chapter 12, which have been re- written and significantly expanded. When the first edition was written, opinion mining (Chapter 11) was still in its infancy. Since then, the re- search community has gained a much better understanding of the problem and has proposed many novel techniques to solve various aspects of the problem. To include the latest developments for the Web usage mining chapter (Chapter 12), the topics of recommender systems and collaborative filtering, query log mining, and computational advertising have been added. This new edition is thus considerably longer, from a total of 532 pages in the first edition to a total of 622 pages in this second edition.


The goal of the book is to present the above Web data mining tasks and their core mining algorithms. The book is intended to be a text with a comprehensive coverage, and therefore, for each topic, sufficient details are given so that readers can gain a reasonably complete knowledge of its algorithms or techniques without referring to any external materials. Five of the chapters - partially supervised learning, structured data extraction, information integration, opinion mining and sentiment analysis, and Web usage mining - make this book unique. These topics are not covered by ex- isting books, but yet are essential to Web data mining. Traditional Web mining topics such as search, crawling and resource discovery, and social network analysis are also covered in detail in this book. 

Although the book is entitled Web Data Mining, it also includes the main topics of data mining and information retrieval since Web mining uses their algorithms and techniques extensively. The data mining part mainly consists of chapters on association rules and sequential patterns, supervised learning (or classification), and unsupervised learning (or clus- tering), which are the three fundamental data mining tasks. The advanced topic of partially (semi-) supervised learning is included as well. For in- formation retrieval, its core topics that are crucial to Web mining are de- scribed. The book is thus naturally divided into two parts. The first part, which consists of Chapters 2–5, covers data mining foundations. The sec- ond part, which consists of Chapters 6–12, covers Web specific mining.

Two main principles have guided the writing of this book. First, the ba- sic content of the book should be accessible to undergraduate students, and yet there should be sufficient in-depth materials for graduate students who plan to pursue Ph.D. degrees in Web data mining or related areas. Few as- sumptions are made in the book regarding the prerequisite knowledge of readers. One with a basic understanding of algorithms and probability con- cepts should have no problem with this book. Second, the book should ex- amine the Web mining technology from a practical point of view. This is important because most Web mining tasks have immediate real-world ap- plications. In the past few years, I was fortunate to have worked directly or indirectly with many researchers and engineers in several search engine companies, e-commerce companies, opinion mining and sentiment analy- sis companies, and also traditional companies that are interested in exploit- ing the information on the Web in their businesses. During the process, I gained practical experiences and first-hand knowledge of real-world prob- lems. I try to pass those non-confidential pieces of information and knowl- edge along in the book. The book, thus, has a good balance of theory and practice. I hope that it will not only be a learning text for students, but also a valuable source of information/knowledge and ideas for Web mining re- searchers and practitioners.

Để lại bình luận

Khóa học tại CodeCamp

Xem nhiều

Các nội dung được quan tâm