--- license: other license_name: raml-v1.0 datasets: - ReactiveAI/Beta-Pre-Train-Corpus language: - en - pl pipeline_tag: fill-mask tags: - agent gated: true extra_gated_prompt: >- Accept [Reactive AI Model & Architecture License (RAML) v1.0](https://github.com/RxAI-dev/rxlm/blob/main/MODELS_LICENSE.md) terms to access the repository and use model. Reactive Transformer (pending patent #P.453260) is available for free for non-commercial usage. For commercial usage please contact Reactive AI at licensing@rxai.dev extra_gated_fields: Company: text Country: country I want to use this model for: type: select options: - Research - Education - label: Other value: other I agree to use this model for non-commercial use ONLY: checkbox extra_gated_heading: >- You need to agree to use this model only for research or education purposes under Reactive AI Model & Architecture License (RAML) v1.0 extra_gated_description: The repository will be available instantly after accepting license terms extra_gated_button_content: Accept license terms --- # RxT-Beta MLM Head Base (33.8M) > Training & docs in progress >> Progress ~90B/250B tokens **RxT-Beta** is the world's first real-scale stateful **Reactive Language Model (RxLM)** with infinite memory & context, made to confirm new **Reactive Transformer (RxT)** scaling laws and solve **all** the biggest stateless LLMs problems. **RxT** models are natively conversational (and agentic) - instead of reprocessing all the conversation history (chat template) like all the LLMs, it processes only single interactions in real-time and moves the context to dedicated embedding-based memory, that's updated asynchronously between the interactions. It introduces unique features like: - infinite conversation & global context through Mixture-of-Memory (MoM) - live continual learning from interactions in real-time - true real-time processing with near-zero latency - linear conversation cost scaling - fixed computational cost and memory usage for each interaction - increasing quality of responses with subsequent steps of dialogue, without "long-term hallucinations" - natively encoded memory, impossible to read without the model - extreme pre-training efficiency - hybrid stateful reasoning In first small scale experiments **RxT-Alpha** models achieved about **50% higher accuracy** and almost **2x lower perplexity**, than the same size stateless decoder-only baseline, trained on the same simple synthetic dataset (additionally, decoder-only model was pre-trained on 5x more tokens). These results were then confirmed on small 10B tokens subset of real-world data and ~0.3B models (**RxT-Beta Micro**), where **RxT** advantage was even bigger. These promising results, along with all the unique features, demonstrate that **Reactive Transformer** is a revolutionary generational leap and a crucial milestone on the path to **Artificial General Intelligence (AGI)**. Of course, if we will confirm this at scale, which is what we plan to do with **RxT-Beta**. The goal is to compete with ~1-3B params dense stateless LLMs, pre-trained on trillions tokens, using model with only 190M active parameters and about 250B pre-training tokens, and significantly outperform them on long multi-turn conversations. ## Base models **Reactive Transformer** models require new dedicated training pipeline to handle its asynchronous memory and reversed decoder-encoder order. Base models are result of the first supervised stage - _**Joint LM Pre-Training with "cheated context" teacher forcing**_ (more info in [Decoder Card](https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base)). ## MLM Head Masked Language Modeling (MLM) Head (this repository) is separated from [RxT-Beta Encoder](https://huggingface.co/ReactiveAI/RxT-Beta-Encoder-Base), because it's used only for base pre-training and interaction SFT, for separate encoder MLM loss calculation. In final, memory aware stages, encoder results are used for memory updates and MLM head is not needed anymore.