Nguyen Xuan Toi, Nguyen Viet Hung, Pham Bao Son

Main Article Content

Abstract

Abstract. With the rapid growth of information technology, Internet and digital libraries have been developing so fast that illegal copying of documents is becoming easier and more popular. A challenging question is how to identify documents with similar content which are candidate of plagiarism. There are several approaches for estimating the similarity between two documents and each has its own advantages and disadvantages. An approach may be effective in one domain but may not work in others. In this paper, we propose a unified plagiarism detection framework that can identify which approach works most effectively in a new domain. Experimental results on three different corpora for different languages have demonstrated the effectiveness of our approach. '
Keywords : plagiarism detection, copied documents, similarity.

References

Michael J. Wise, Sting similatity via greedy sfiing tiling and running-karp-rabin matching, Technfclt
report, University of Sydney: Department of Computer Science, 1993.
[2] A. Si, H.V. Leong, R.W.H. Lau. Check, A document plagiarism detection systerrl Proceeding of ACM
symposium for Applied Computing, Y ol 7 2 (12) (1997) 7 0.
[3] S. Brin, J. Davis, H. Garcia Molina, Copy detection mechanisms for digital documents, In Procesding of the -
ACM SIGMOD international conference on management of date, San Jose, California, 1995.
[4] N. Shivakumar, H. Garcia Molina, Scam: A copy detection mechanism for digital documents, In Procesding
Internationql Conference on Theory and Practice of Digital Libraries, Austin, Texas, 1995.
[5] Michael J. Wise, Yap3: improved detection of similarities in computer program and other texts, ,SIGC,SE
Bull.,Yol28 (l) (1996) 130. ISSN 0097-8418. doi: http:i/doi.acm.orgl10.1145/236462.236525.