GitHub - JuiP/LSH: Locality-Sensitive Hashing

LSH : Locality sensitive hashing

CS F469 IR Assignment - 2

Problem Statement:

We have to implement Local Sensitive Hashing to find out duplicate or similar DNA sequences within the corpus. The steps involved are Shingling, Minhashing and Local Sensitive hashing. The main idea is to hash similar documents into buckets and the documents in a particular bucket have high probability of being similar or duplicates.

About the project

Dataset used - Kaggle-human-data

Have a look at the file Design Architecture. It includes the concepts used along with the time taken for each implementation step.

Project By:

Kriti Jethlia: Email- [email protected]
Jui Pradhan: Email- [email protected]
Anusha Agarwal: Email- [email protected]

How to run the code

Clone the repository : https://github.com/KritiJethlia/LSH.git
cd LSH
Run file:
```
       python3 LSH_program.py
```
Type your query in the terminal and wait till it returns the similar DNA sequence results :)

Dependencies/modules used

time
collections
pandas
pickle
Numpy
random
operator
sys
copy

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
Design_Document_Assignment_2.pdf		Design_Document_Assignment_2.pdf
Documentation		Documentation
LSH_program.py		LSH_program.py
README.md		README.md
convert.py		convert.py
human_data.obj		human_data.obj
human_data.txt		human_data.txt
shingles.obj		shingles.obj
signature_matrix.obj		signature_matrix.obj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSH : Locality sensitive hashing

How to run the code

Dependencies/modules used

About

Releases

Packages

Contributors 3

Languages

JuiP/LSH

Folders and files

Latest commit

History

Repository files navigation

LSH : Locality sensitive hashing

How to run the code

Dependencies/modules used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages