From 24cd893099c570696a3796642a2a05d17872c39e Mon Sep 17 00:00:00 2001 From: jwansek Date: Tue, 14 Dec 2021 12:07:25 +0000 Subject: added to the README --- README.md | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e5cb074..98c90d3 100644 --- a/README.md +++ b/README.md @@ -1 +1,31 @@ -# searchEngine \ No newline at end of file +# searchEngine + +## Setup + +- `sudo pip3 install -r requirements.txt` + +- Install nltk and spacy `en_core_web_sm` + +## Index files + +- Unzip Wikibooks.zip to a given directory + +- Run `documents.py` with the first argument as the path to the HTML files: + +- e.g. `python3 documents.py ../../Wikibooks` + +- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place + +## Setting up TF-IDF weighting + +- `python3 tf_idf.py` + +## Searching! + +- You can use `search.py` to conduct searches. Make search terms command line arguments. + +- e.g. `python3 search.py AQA GCSE Computer Science` + +- Results are printed to stdout, to `searches/` as a markdown file + +- It will be rendered as HTML and shown in a web browser automatically \ No newline at end of file -- cgit v1.2.3