Similarity between two documents.
-
Updated
Aug 6, 2022 - Python
Similarity between two documents.
Topic Modeling in Cython
was curious about how plagiarism checker works, ended up learning about something completely different 😂
Information Retrieval Lab
The framework that finds a perfect job match for you provided through scraped data from indeed.co.uk.
Individual group project in Python
Aims to provide job searching strategy for new graduates who are interested in data-related positions.
Compare sentences from input document with all sentences from reference documents - find very similar ones.
Assessing MinHash LSH for text similarity. Compares with kNN using BART embeddings as ground truth. Involves data preprocessing, shingle creation, LSH experiments. Findings inform LSH's efficiency in document similarity tasks, enhancing understanding of LSH techniques.
Q3 of Final Project Assignment of the course 'Foundations of Data Science' @ CBS
Document searching from queries using Inverted index
A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford
Classifying news articles with deep learning to build an automatic newsletter
Big data homework solutions
Use of word embeddings and document similarity to solve word analogy problems
Natural Lang processing scripts
A system for automatic tagging of metadata of theses and dissertations from Bicol University
Rust-based text search engine from scratch supporting multiple document similarity metrics (TF-IDF, BM25, BM25VA)
Add a description, image, and links to the document-similarity topic page so that developers can more easily learn about it.
To associate your repository with the document-similarity topic, visit your repo's landing page and select "manage topics."