Tiny Predictive Text uses the **ArT DeCo** (Anchor Tree with Decaying Context) technique to generate sentence completions.
For optimum efficiency and file size, Tiny is built in Rust and compiled to Wasm for plug-and-play implementation in frontend JavaScript using tokenized dictionaries compiled to [Message Pack](https://msgpack.org).
![[Apr-11-2024 04-37-31.gif]]
[Try it here](https://adamjgrant.github.io/Tiny-Predictive-Text/) | [Use in your project](https://github.com/adamjgrant/Tiny-Predictive-Text)
## ArT DeCo
ArT DeCo's data model is a novel technique to store context windows for predictive text using increasingly generalized context storage as context is further away from the insertion point.
The trees have four levels where the first three levels are context strings and the fourth level is a discrete number of completions.
```mermaid
flowchart LR
C[2nd Level Context] --> B[1st Level Context] --> A[Anchor] --> D[Predictions]
```
1. **Anchor**: The last word typed.
2. **First Level Context**: The previous three words before the anchor as a lowercase acronym
3. **Second Level Context**: The previous three words before first level context that are not words in an exclusion list.
The exclusion list includes English prepositions, articles, and verb conjugations of "to be" and "to have."
5. **Completions**: Sentence completions.
For example, in the sentence:
>"The quick brown fox is jumping over the lazy dog"
![[Apr-11-2024 04-41-24.gif]]
the tree would be as follows:
```mermaid
flowchart LR
A[bfj] --> B[otl] --> C[dog] --> D['completion 1']
C --> E['completion 2']
C --> F['completion 3']
C --> G['...']
```
1. **dog**: The last word is the anchor
2. **otl**: Lowercase acronym for "over the lazy"
3. **bfj**: Lowercase acronym for "brown fox jumping", notice the skipping of "is."
4. (Completions found in training)
The tree is unusual in that it explicitly contains no stochastic data yet still works like a [Markov chain](https://en.wikipedia.org/wiki/Markov_chain). Instead, scoring is done in training with periodic "branch pruning" to remove the lowest scoring items. Remaining items are sorted by score before the scores are removed to mint the final dictionary. Wasm takes care of the look ups and uses a [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) calculation to determine the closest context string from the trained dictionary to the one found in the text.
# Features
## Size
Because the tree is a 2-dimensionally fixed size, once it is saturated, *model file size does not increase with additional training data*. Instead the training process continues scoring, promotion, and demotion of strings found in training text. The model improves post saturation by improving what gets to be in it and in what order. This allowed for training on the [3.2 TB OSCAR-2201 English corpus](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201) while still keeping the resulting model small enough to fit on a 3.5" floppy disk.
## Speed
Tiny calculates predictions recursively using Wasm allowing for native speed calcuations and tree traversals. This makes prediction surfacing instantaneous even with the fastest typing.
## Quality
End users can choose to only allow predictions to come through that pass a certain threshold of quality. Match quality is calculated in real time, preventing the need to store any stochastic data in the dictionary. Quality is scored both on the matching of the first- and second-level contexts as well as the predictions themselves.
# Some History
This project was a bit of an obsessive follow-on to [[Tiny Predictive Text]], which garnered some attention on Hacker News. I kept thinking about ways I could improve on this while still working with very little compute and storage to generate actually useful predictive text. This was lucky attempt number 13 and I think I'm ready to go obsess about something else now.