Cheaters beware! Tool sniffs out plagiarism
Sleuthing has led to several investigations by medical journal editors
Andrew Gombert / EPA file |
|
But the resulting brouhaha over whether Obama really committed plagiarism when he borrowed a passage from Massachusetts Gov. Deval Patrick for a recent speech is overshadowing a larger question that has plagued scientists for years. Without YouTube or blogs or talking heads to guide them, how can researchers uncover the flagrant copycat studies that have infiltrated the scientific world?
A Dallas research group may be providing some new answers by setting its computer-assisted sights on questionable cut-and-paste documents published by fellow academics, revealing startling examples of unethical behavior in the process.
The group’s freely available online search engine has identified thousands of potential instances of plagiarism by highlighting significant similarities between blocks of text within a vast database of medical research. Subsequent sleuthing by the team of curators at the University of Texas Southwestern Medical Center has led to the retractions of three published studies and an additional seven investigations out of 20 cases pursued so far.
Beyond the evidence of blatant pirating, the results have hinted at broader reasons for worry. By plagiarizing a report on a clinical procedure, for instance, an unethical researcher can artificially bolster the initial report’s conclusions. “In medicine, researchers and clinicians rely very heavily on the research, and so this has high potential for doing harm,” said Harold “Skip” Garner, a professor of biochemistry and internal medicine at the medical center and one of the project’s co-leaders.
Or, to borrow the phrasing of at least one presidential contender: Words matter.
Uncovering plagiarism
Garner said he and his colleagues originally devised their program, named eTBLAST, as a service for other biomedical researchers. By depositing a research summary or entire document in the program’s search window, scientists can retrieve similar documents among the millions stored in MEDLINE, an online storehouse. “You can check the novelty of your idea. You can identify competitors or collaborators, or other experts in the field,” Garner said.
You can also, as it turns out, identify instances of less-than-honorable behavior. Spurred on by ethical discussions in several high-profile publications, Garner and his collaborators began asking whether their program could find studies that were a bit too similar. Last year, the team randomly selected summaries, or abstracts, from the MEDLINE database and put its tool to the test. "We quickly discovered that our code works very well," he said.
With funding from the federal Office of Research Integrity, Garner's group conducted a more systematic review. The results, published recently in the journal Bioinformatics and in a follow-up commentary in the journal Nature, found that essentially identical research published by different sets of authors — potential plagiarism — represented about 0.04 percent of MEDLINE’s database (roughly 6,700 cases in all).
Highly similar studies re-published by the same authors represented another 1.3 percent of the database’s documents. Garner estimated that about half of those seeming duplicates may represent clinical updates, reports of annual meetings or other legitimate publications. But re-positioning the same data in different journals can also pad an author’s resume, a far murkier ethical issue that he said is likely to be resolved not with algorithms or text mining but with clearer community standards.
In all, the group’s aptly named Déjà vu database has flagged more than 71,000 suspicious pairs.
“It’s a sensitive topic, so we’ve tried to be very precise and analytical about this,” Garner said.
How the database works
After first throwing out common words such as “and,” “or,” and “the,” eTBLAST compares the abstract’s remainder to the wording of other summaries in the database and retrieves the top 400 to 1,000 matches.
Click for related content |
For the second step, the algorithm goes sentence by sentence, scanning for matching keywords and word order. The program doesn’t always match alternate spellings of the same word but can pick out synonyms and different forms of the same word — matching “cancerous” and “tumor” to “cancer,” for example.
For highly similar abstracts written by different groups, at least two curators in Garner’s group read the corresponding full-text articles side by side, noting their similarities and differences and posting their comments in Déjà vu.
Within that public database, suspicious pairings have pointed toward several serial plagiarists who have copied others’ work a half-dozen times or more.
- Discuss Story On Newsvine
- Rate Story:
View popularLowHigh - Instant Message
MORE FROM FRONTIERS |
| Add Frontiers headlines to your news reader: |


