Goal

Terminology


$$ sim(C1,C2)\;\;=\;\;|C1\;\cap\;C2|\;/\;|C1\;\cup\;C2| $$

Essential Steps for Similar Docs

  1. Shingling
  2. Min-Hashing
  3. Locality-Sensitive Hashing

1_27nQOTC79yfh5lzmL06Ieg.png

🔗 모든 과정 영어 설명 링크

1. Shingling (k-shingle or k-gram)