A minimal HTTP search engine server written in C++11. It indexes web pages using an in-memory inverted index, exposes a plain HTTP interface on port 8080, and returns matching page URLs for any single-term query submitted via a browser or HTTP client.
- Inverted index. Tokenises page content by whitespace and maps each term to the set of pages that contain it, enabling fast constant-time lookups.
- HTTP server. Listens on a raw TCP socket and handles GET requests without any external web framework.
- Search interface. Serves a minimal HTML form at
/and returns query results at/search?query=<term>. - Lightweight by design. No runtime dependencies beyond the C++ standard library and POSIX sockets.
+------------------+
| Browser / |
| HTTP Client |
+--------+---------+
| TCP (port 8080)
v
+------------------+
| startServer() | accept() loop
+--------+---------+
|
v
+------------------+
| handleRequest() | parses raw GET request
+--------+---------+
|
v
+------------------+
| SearchEngine | tokenise + query
+--------+---------+
|
v
+------------------+
| InvertedIndex | unordered_map<token, set<pageId>>
+------------------+
WebSearchEngine/
├── CMakeLists.txt
├── README.md
└── main.cpp
Clone the repository and enter the project directory:
git clone https://github.com/kanyutu707/WebSearchEngine.git
cd WebSearchEngineThe project relies only on the C++ standard library and POSIX networking headers (<netinet/in.h>, <unistd.h>). No third-party libraries are required.
You will need a C++11-compatible compiler and CMake 3.10 or later.
sudo apt-get update
sudo apt-get install build-essential cmakeThe required toolchain ships with Xcode Command Line Tools:
xcode-select --install
brew install cmakecmake -B build -DCMAKE_BUILD_TYPE=Releasecmake --build buildThis produces a binary named WebSearchEngine in the build/ directory.
Start the server:
./build/WebSearchEngineThe server binds to all interfaces on port 8080 and prints a confirmation:
Server is running on port 8080...
Open a browser and navigate to:
http://localhost:8080
Use the form to enter a search term and press Search.
curl "http://localhost:8080/search?query=algorithms"Found: https://example.com<br>Found: https://example2.com<br>
To stop the server, press Ctrl+C in your terminal.
Pages are indexed at startup inside main(). To add your own pages, call engine.addPage() before startServer():
engine.addPage(WebPage("https://yoursite.com", "your page content here"));Each page is tokenised by whitespace. Search queries are matched against individual tokens, so multi-word queries are not supported in the current version.
- Single-term queries only. Multi-word queries are not intersected across tokens.
- No URL decoding. Query strings containing
%20or+for spaces will not match correctly. - The HTTP parser is minimal and does not handle headers, keep-alive, or malformed requests robustly.
- Pages are held entirely in memory; the index does not persist between restarts.
- The server is single-threaded: each request blocks the accept loop until it completes.