diff options
-rw-r--r-- | README.md | 32 |
1 files changed, 31 insertions, 1 deletions
@@ -1 +1,31 @@ -# searchEngine
\ No newline at end of file +# searchEngine + +## Setup + +- `sudo pip3 install -r requirements.txt` + +- Install nltk and spacy `en_core_web_sm` + +## Index files + +- Unzip Wikibooks.zip to a given directory + +- Run `documents.py` with the first argument as the path to the HTML files: + +- e.g. `python3 documents.py ../../Wikibooks` + +- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place + +## Setting up TF-IDF weighting + +- `python3 tf_idf.py` + +## Searching! + +- You can use `search.py` to conduct searches. Make search terms command line arguments. + +- e.g. `python3 search.py AQA GCSE Computer Science` + +- Results are printed to stdout, to `searches/` as a markdown file + +- It will be rendered as HTML and shown in a web browser automatically
\ No newline at end of file |