# README: Bed Stories Dataset ## Description The Bed Stories dataset is part of a research project that focuses on improving children's information retrieval through semantic query expansion using WordNet. It contains a curated collection of 200 English children’s stories and a set of sample child-formulated queries. This dataset supports the implementation and evaluation of a Lucene-based search engine tailored to children's information-seeking behavior. ## Size Total dataset size: 0.28 MB ## Platform The dataset is intended for use with: - Lucene-based search engine platforms - Python-based tools (optional for preprocessing and evaluation) ## Environment To use this dataset effectively, the following environment is recommended: - Operating System: Windows 10+, macOS, or Linux - Java JDK 8 or later (for Lucene) - Python 3.7+ (optional, for data preprocessing) - Libraries: NLTK (with WordNet), Pandas (for evaluation), Apache Lucene ## Major Component Description - `Data/Queries.txt`: A text file containing a list of child-generated queries. Each line represents one query. - `Data/Data Collection/`: A folder with 200 plain-text story files (e.g., `Text 1.txt`, `Text 2.txt`, ...). Each file contains one English children's story relevant to common themes like honesty, family, friendship, etc. ## Detailed Setup Instructions 1. **Unzip** the dataset archive (`Dataset.zip`) to your desired working directory. 2. **Directory Structure:** ``` Dataset/ └── Data/ ├── Queries.txt └── Data Collection/ ├── Text 1.txt ├── Text 2.txt └── ... (200 total story files) ``` 3. **Prepare Lucene Index**: - Set up an indexing script using Apache Lucene. - Point it to the `Data Collection/` directory to index all `.txt` files. 4. **Query Execution**: - Read queries from `Queries.txt`. - Expand each query using WordNet (with NLTK in Python or integrated in Java). - Use the expanded queries for search via Lucene’s API. 5. (Optional) **Evaluation**: - Compare retrieved results against predefined relevance judgments using MAP or P@3 metrics. ## Contact Information For questions or collaboration, please contact: - Wasmiah Alyami: wwhhww22@outlook.sa - Hend Alzahrani: hend1411@gmail.com