CFP PAN'09: 3rd Int. PAN Workshop - 1st Competition on Plagiarism Detection
Satellite workshop of 25th SEPLN Conference on
Uncovering Plagiarism, Authorship and Social Software Misuse
Donostia-San Sebastián, September 10
About the PAN Workshop:
The workshop shall bring together experts and researchers around the exciting and future-oriented topics of plagiarism detection, authorship identification, and the detection of social software misuse. The development of new solutions for these problems can benefit from the combination of existing technologies, and in this sense the workshop provides a platform that spans different views and approaches. The following list gives examples from the outlined fields for which contributions are welcome, but not restricted to:
* plagiarism detection in general, in Web communities and social networks, and cross-language plagiarism
* identifying near-duplicate and versioned documents of all kinds: text, software, image, music, video
* technology for high-similarity retrieval such as fingerprinting and similarity hashing
* models for authorship identification, authorship attribution, and writing style
* NLP- and knowledge-based retrieval models to capture personal traits and sentiment
* Web forensics, community fraud, and new Web infringements
Social Software Misuse Detection:
* uncovering serial sharing and lobbying
* monitoring vandalism, trolling, or stalking
* trust, psychological and personality-based user studies, social aspects of Web misuse
lagiarism analysis is a collective term for computer-based methods to identify plagiarism offense. In connection with text documents we distinguish between orpus-based and intrinsic analysis: the former compares suspicious documents against a set of potential original documents, the latter identifies potentially plagiarized passages by analyzing the suspicious document with respect to changes in writing style.
Authorship identification divides into so-called attribution and verification problems. In the authorship attribution problem, one is given examples of the writing of a number of authors and is asked to determine which of them authored given anonymous texts. In the authorship verification problem, one is given examples of the writing of a single author and is asked to determine if given texts were or were not written by this author. As a categorization problem,
verification is significantly more difficult than attribution. Authorship verification and intrinsic plagiarism analysis represent two sides of the same coin.
"Social Software Misuse" can nowadays be noticed on many social software based platforms. These platforms like blogs, sharing sites for photos and videos, wikis and online forums are contributing up to one third of new Web content. "Social Software Misuse" is a collective term for anti-social behavior in online communities; an example is the distribution of spam via the e-mail infrastructure. Interestingly, spam is one of the few misuses for which detection technology is developed at all, though various forms of misuse exist that threaten the different online ommunities. Our workshop shall close this gap and invites contributions concerned with all kinds of social software misuse.
About the Competition on Plagiarism Detection:
The detection of plagiarism by hand is a laborious retrieval task, a task which can be aided or automatized. The PAN competition on plagiarism detection shall foster the development of new solutions in this respect.
The competition divides into two tracks:
* External Plagiarism Analysis. Given a set of suspicious documents and a set of potential source documents the task is to find all passages within the suspiscious documents which have been plagairized from one or more of the source documents.
* Intrinsic Plagiarism Analysis. Given a set of suspicious documents the task is to detect paragraphs in the documents which have not been written by its main author. source documents are given in this task.
A large corpus of artificial plagiarism containing cases which have been obfuscated and/or translated will be released for the competition. A development corpus, to be used in developing a detection software, will be released two months before the competition starts, a competition corpus will be used to evaluate and compare detection softwares. The former will contain fully annotated plagiarism cases, the latter will not.
The succes of a plagiarism detection software will be measured in terms of its precision, recall, and granularity.
open Notification of interest for participation
28.03.2009 Release of the development corpus
21.05.2009 Release of the competition corpus
07.06.2009 Submission deadline for the competition
15.06.2009 Notification of competition results
01.07.2009 Submission deadline for the papers
15.07.2009 Notification of reviews
01.08.2009 Submission deadline for final version of the papers
10.09.2009 (afternoon) PAN Workshop
Benno Stein Bauhaus University Weimar
Paolo Rosso Universidad Politécnica de Valencia
Efstathios Stamatatos University of the Aegean
Moshe Koppel Bar-Ilan University
Eneko Agirre University Basque Country
Bauhaus University Weimar:
Benno Stein, Martin Potthast, and Andreas Eiselt
Universidad Politécnica de Valencia:
Paolo Rosso and Alberto Barrón Cedeño
Information about workshop and competition can be found at