blob: 98c90d3c254d345c0e9b491b6d74fd5287d3b427 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
# searchEngine
## Setup
- `sudo pip3 install -r requirements.txt`
- Install nltk and spacy `en_core_web_sm`
## Index files
- Unzip Wikibooks.zip to a given directory
- Run `documents.py` with the first argument as the path to the HTML files:
- e.g. `python3 documents.py ../../Wikibooks`
- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place
## Setting up TF-IDF weighting
- `python3 tf_idf.py`
## Searching!
- You can use `search.py` to conduct searches. Make search terms command line arguments.
- e.g. `python3 search.py AQA GCSE Computer Science`
- Results are printed to stdout, to `searches/` as a markdown file
- It will be rendered as HTML and shown in a web browser automatically
|