A Comparative Study of Algorithms for Fingerprint Generation Using Word-based and Character-based Groupings

Please use this identifier to cite or link to this item: http://kmutnb-ir.kmutnb.ac.th/jspui/handle/123456789/309

Title:	A Comparative Study of Algorithms for Fingerprint Generation Using Word-based and Character-based Groupings การศึกษาเปรียบเทียบอัลกอริธึมสำหรับทำลายนิ้วมือแบบใช้กลุ่มคำและกลุ่มอักษร
Authors:	HEMAWATPAKON PLABIN เหมวัสล์ปกร พลับอินทร์ NAWAPORN WISITPONGPHAN นวพร วิสิฐพงศ์พันธ์ King Mongkut's University of Technology North Bangkok NAWAPORN WISITPONGPHAN นวพร วิสิฐพงศ์พันธ์ nawaporn.w@itd.kmutnb.ac.th,nawapornn@kmutnb.ac.th nawaporn.w@itd.kmutnb.ac.th,nawapornn@kmutnb.ac.th
Keywords:	อัลกอริธึมสำหรับสร้างลายนิ้วมือ ดัชนีความคล้ายคลึงของจักการ์ด การป้องกันข้อมูลรั่วไหล Fingerprinting Algorithms Jaccard Similarity Index
Issue Date:	8
Publisher:	King Mongkut's University of Technology North Bangkok
Abstract:	This article presents a character-based Fingerprint algorithm designed for examining and identifying document similarities. The objective is to detect and prevent issues related to plagiarism or unauthorized extraction and use of confidential information from original documents. This study applies techniques derived from word-based Fingerprinting by extracting text features, generating unique document Fingerprints, and evaluating their performance using the Jaccard Similarity method. Our results showed that the character-based fingerprint technique with 4-grams with window size = 4 provides similarity detection results that are closest to the target results, with deviation of no more than 2-3% from the target. This holds true for both cases where confidential information is hidden at a single point (Undivided) in multiple locations within the file (Divided). When compared to the benchmark word-based fingerprinting technique, it was found that the word-based approach can detect similarity twice as fast, but tends to underestimate similarity when target data is embedded in other files. The detection accuracy deviates from the target by 5-12%. บทความนี้นำเสนออัลกอริทึมสำหรับสร้างลายนิ้วมือ (Fingerprint) แบบกลุ่มอักษร (Character-based) เพื่อใช้ในการตรวจสอบและระบุความคล้ายคลึงกันของเอกสาร โดยมีเป้าหมายเพื่อตรวจจับและป้องกันปัญหาการลอกเลียนหรือการนำข้อมูลที่เป็นความลับจากเอกสารต้นฉบับไปใช้งานโดยไม่ได้รับอนุญาต การศึกษานี้ประยุกต์วิธีการทำลายนิ้วมือแบบคำ (Word-based) ด้วยการสกัดคุณลักษณะของข้อความ เพื่อสร้างเป็นลายนิ้วมือที่เป็นเอกลักษณ์เฉพาะของเอกสาร และทำการประเมินประสิทธิภาพของเทคนิคที่นำเสนอด้วยการวัดความคล้ายคลึงด้วยวิธี Jaccard Similarity ซึ่งผลการทดสอบประสิทธิภาพของเทคนิคที่นำเสนอแสดงให้เห็นว่าการสร้างลายนิ้วมือแบบกลุ่มอักษร ที่ขนาด 4-grams และจับกลุ่มด้วย window = 4 ให้ผลการตรวจจับความคล้ายคลึงที่ใกล้เคียงกับค่าเป้าหมายที่ทดลองมากที่สุด โดยจะคลาดเคลื่อนไม่เกิน 2-3% จากค่าเป้าหมาย ทั้งในกรณีที่ซ่อนข้อมูล ณ จุดเดียว (Undivided) และหลายจุด (Divided) ซึ่งเมื่อนำมาเปรียบเทียบกับเทคนิคการสร้างลายนิ้วมือแบบกลุ่มคำ (Word-based) ที่ใช้เป็นฐานสำหรับเปรียบเทียบแล้ว พบว่าแม้ว่าลายนิ้วมือแบบกลุ่มคำจะประมวลผลได้เร็วกว่าเท่าตัว แต่ให้ผลการตรวจจับความคล้ายคลึงต่ำกว่าความเป็นจริงเมื่อข้อมูลที่ต้องการตรวจจับถูกนำมาซ่อนในไฟล์อื่น โดยคลาดเคลื่อนตั้งแต่ 5-12%
URI:	http://kmutnb-ir.kmutnb.ac.th/jspui/handle/123456789/309
Appears in Collections:	FACULTY OF INFORMATION TECHNOLOGY AND DIGITAL INNOVATION

Files in This Item:

File	Description	Size	Format
s6607031857264.pdf		944.22 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets