abaev-basex/README.md

41 lines
3 KiB
Markdown
Raw Normal View History

2025-03-21 14:31:29 +03:00
# BaseX webapp of the electronic HEDO
2025-03-21 14:13:07 +03:00
2025-03-21 14:55:49 +03:00
<a href="https://www.gnu.org/licenses/agpl-3.0.html"><img src="https://www.gnu.org/graphics/agplv3-88x31.png" alt="AGPL v3"/></a>
2025-03-21 14:31:29 +03:00
This repository contains the webapp files to be used with [BaseX](https://basex.org/) running as a webapp, plus some scripts to convert between different XML representations.
Original code written by Oleg Belyaev in 20242025.
A public version of this database is currently available [here](https://abaev.belyaev.io/dict).
## Installation
In order to recreate the server setup, you have to make the following steps:
1. Install BaseX. Make sure that [Saxon](https://www.saxonica.com/) is in your classpath (the open source HE will do). This can be done either directly through the environment variable CLASSPATH, or by putting Saxon's JAR files (all of them, including those in `lib/`) into `basex/lib/custom`.
2025-03-21 14:32:51 +03:00
2. Put the content of this repo into the `basex/webapp` directory.
3. Create databases called `abaevdict`, `abaevdict_en`, `abaevdict_ru`, `abaevdict_index`. Their roles are as follows:
2025-03-21 14:31:29 +03:00
- `abaevdict` contains the bilingual XML files of the dictionary encoded in the [Abaev TEI](https://github.com/abaevdict/tei-abaev) format.
These generally come from the [abaev-xml](https://code.cucurri.ru/abaevdict/abaev-xml) repo.
- `abaevdict_en` and `abaevdict_ru` contain the preprocessed, standardized English and Russian XML editions of HEDO. This is done because maintaining a single bilingual database is impractical: users will want to interact with the English and Russian versions only. At some point, they may even diverge. For basic linking, entry IDs should suffice.
- `abaevdict_index` includes various helper documents that are not directly displayed, but used for faster lookup, and for REST API queries.
2025-03-21 14:34:11 +03:00
4. Import the contents of the `entries` folder of [abaev-xml](https://code.cucurri.ru/abaevdict/abaev-xml) into the db `abaevdict`, and the file `biblio/abaev_biblio.xml` into the db `abaevdict_index`. Do not import them as subdirectories/collections; just put them under the database root.
2025-03-21 14:31:29 +03:00
2025-03-21 14:32:51 +03:00
5. Execute the script `scripts/update-indices.bxs` from the GUI or CLI. This does the following things:
2025-03-21 14:31:29 +03:00
- optimizes `abaevdict`;
- imports language data from the CSV file;
- creates the lookup table for entries;
- generates the English and Russian XML in `abaevdict_en` and `abaevdict_ru` (most resource-heavy and time-consuming);
- generates the HTML, so that it doesn't have to be done on-the-fly in most cases;
- optimizes `abaevdict_en`, `abaevdict_ru`;
- creates the indices for foreign-languages mentioned words in `abaevdict_index`;
- optimizes `abaevdict_index`.
Full execution of these actions takes a while, but it doesn't need to be redone unless the database changes; in the latter case, you can get away with regenerating only individual entries, although this has to be done manually for now
2025-03-21 14:32:51 +03:00
6. Run the BaseX http server (`basexhttp`). It will provide all services at `http://localhost:8080` by default.