Home
Info
Our Tech
Reports
Help
BuyNow
 

Technology Overview:

 
 

1. Introduction.

Plagiarism Detector is a specially designed standalone windows application to effectively detect and report cases of textual plagiarism in multiple documents. This section covers the details of the technology - that is "the way it works" issue.

! It is strongly recommended that you should read the following page before using the application !

2. Technology basics.

The core of the application includes several important parts 'Working modules':

  1. Graphical User Interface.
  2. Document Manager.
  3. Project Manager.
  4. File parser.
  5. Lexical text analyzer.
  6. CAPTCHA automation core.
  7. Profile Selector.
  8. Request forming and processing core.
  9. Reporting core.

The main operational sequence is this 'What you do':

  1. You create a project.
  2. You add a number of documents.
  3. You set the newly created project properties.
  4. You start the analysis.
  5. On successful analysis finish you are prompted to navigate to the reports folder or to load the last analyzed document into the browser.

The detailed sequence:

1. The application works with 'Project' notion.

Project - is a unity consisting of a number of documents (possibly with different locations, types, size, authors etc.) and a number of project properties. A 'Project' is presented by a file with ".pd" extension. E.g. "MyProject.pd", "Subgroup45a.pd", located in the Project Folder.

Project Properties - are the following values:

  1. Project File Location - project file name [complete filename + path]
  2. Project File Name - project file name [only the title]
  3. Project Profile - a predefined set of the next two values:
  4. Chain Length - Check Chain Length (a number of words to be joined together to be checked against Google in one request)
  5. Chain Step - Check Chain Step (a number of words to skip to form a new Check Chain)

To modify current project settings click: "Current Project Properties" button on the main screen. The following screenshot presents the Current Project Properties Screen:

To illustrate the way the last two parameters influence the behavior of the application the following diagram is used:

The two search requests that are going to be sent to Google according to this diagram are:

  1. "Mike likes to eat".
  2. "Sunday..."

The idea is that these two values - Chain Length and Chain Step result in two different directions each:

Big Chain Length - the degree of Plagiarism Suspicion Degree is going to be higher. Accidental hits are excluded.

Small Chain Length - the degree of Plagiarism Suspicion Degree is going to be low. Accidental hits are expected.

Big Chain Step - small amount of time required for the document analysis. Less detailed analysis.

Small Chain Step - big amount of time required for the document analysis. More thorough analysis.

What is Plagiarism Suspicion Degree?

Every time Plagiarism Detector runs into a Plagi-Hit (for more details on this see Alive Reports), a plagiarism suspected place occurs. To definitely state the fact of true Plagiarism - you must check this occurrence manually. But the degree is higher when the word chain is bigger. To illustrate this lets take two different examples:

"I am free" - this check chain consists of 3 words. That is the Chain Length is small its value is '3'. The occurrence of this word sequence over the Internet will be fantastically high. So you may not speak about the True Plagiarism here.

"I am free after the exact midnight today!" - this check chain consists of 8 words. That is the Chain Length is big its value is '8'. The occurrence of this word sequence over the Internet will be... zero. So you may not speak about the True Plagiarism here. In case this sequence was marked as Plagi-Hit you can be 98% sure that this passage is taken from some source over the web.

3. Extremely Important Assessment Criteria

You may put forward a logical question - how do I know that the text is plagiarized?

The answer is the following - check the Top 5 section in the originality report.

Two Originality Report examples:

True Plagiarism:

Truly Original:

The explanation is pretty obvious. Is a sample of 10 sentences will be taken from the web and included into the analyzed document the application will immediately react increasing the Frequency Link counter.

After Google-based analysis finishes, all the harvested urls are accumulated into the so called Url Stack and their Occurrence Frequency is counted. Top 5 is actually the top of the Url Stack - that is the urls that are most frequent links with the suspicion of Plagiarism.

The core idea lies in the fact that having even 2 (!!!) accidental links to 1 (out of billions!) document in the web is mathematically ABSOLUTELY IMPOSSIBLE. It is possible that some small word sequence (2-3 words) can be found on thousands of documents over the web, but if any other word sequence in the checked document links to the same source... It's plagiarized! The google will never ever show this occurrence in such a sequence. Still - make 3 your Plagiarism Barrier - just to be on the safe side.

 
   
   
   
   
   
   
   
   
   
   
 

 

 
 
 
Copyright 2008 SkyLine Software ©. All rights reserved