[GitHub] lstm-memorizer

Other

Type

N/A

CVSS Score

Views

Anonymous

Author

Description

PoC for embedding text files into the weights of a character-level LSTM neural network

# Char-level LSTM Memorizer PoC

![Architecture](image.png)

**Original article:** [LSTM or Transformer as "malware packer"](https://bednarskiwsieci.pl/en/blog/lstm-or-transformer-as-malware-packer/)

A simple proof-of-concept demonstrating how to **embed any text file** (e.g., source code) into the weights of a character-level LSTM neural network model and then **accurately reconstruct** its contents during inference.

## Features

- Builds a character-level vocabulary + a special Beginning-of-Sequence (BOS) token.
- Trains an LSTM on a single file until overfitting (memorization).
- Generates the entire character sequence starting from the BOS token.
- Compares SHA-256 checksums of the original and generated files.

## Requirements

- Python 3.12+
- PyTorch 2.7.1+
- NumPy 2.3.1+
- Safetensors 0.5.3+
- Matplotlib 3.10.3+
- Ruff 0.12.2+ (optional)

Use `uv` to manage dependencies. If you don't have `uv` installed, you can install it via pip:

```bash
pip install uv
```

## Installation

1. Clone the repository:

```bash
git clone https://github.com/piotrmaciejbednarski/lstm-memorizer.git
cd lstm-memorizer
```

2. Synchronize using `uv`:

```bash
uv sync
```

## Example

One of the generated examples can be found in the `example` directory. The model weights `model.safetensors` were trained on `bubble_sort.py`.

You can run the entire process in encoding mode and then decoding mode to verify that the generated file is identical to the original.

### Training

```bash
uv run main.py train ./examples/bubble_sort.py \
--epochs 4000 \
--hidden 32 \
--layers 2 \
--lr 1e-3 \
--weights ./output/model.safetensors
```

```
Using device: mps
Epoch 1/4000 loss=4.0081
Epoch 500/4000 loss=0.5532
Epoch 1000/4000 loss=0.1054
Epoch 1500/4000 loss=0.0395
Epoch 2000/4000 loss=0.0220
Epoch 2500/4000 loss=0.0120
Epoch 3000/4000 loss=0.0076
Epoch 3500/4000 loss=0.0051
Epoch 4000/4000 loss=0.0060
Model saved to ./output/model.safetensors
```

### G

References

https://github.com/piotrmaciejbednarski/lstm-memorizer

[GitHub] lstm-memorizer

Description

References

Community Rating

Quick Actions

Related Exploits