# Feedback, Model Retraining, and Benchmarking

### Feedback and Counterfactual Example Generation

When users submit feedback through the **"Teach Your AI"** feature, that feedback is processed using a specialized reasoning large language model. This model creates hypothetical examples based on your feedback, a method known as **counterfactual generation**. Counterfactual generation involves imagining "what-if" scenarios by altering small parts of a given example to test and improve the model's understanding.

Importantly, **your exact feedback is not directly used** to train your model. Instead, the AI generates new hypothetical examples inspired by your input. Both your original feedback and the hypothetical examples **are kept private** and **are not shared** with Protege's general **ShieldLlama** model.

### Model Retraining Process

* Models are **retrained nightly at midnight Pacific Time** on working days.
* If a newly trained model **exceeds the benchmark** set by the prior model, it will **be automatically promoted** to active use.

### Benchmarking and Performance Measurement

To ensure quality improvements, we use **multiple types of benchmarks**:

#### 1. Policy Annotation Set

These are real-world examples where users have labeled AI feedback as incorrect. The model is tested against these annotations to ensure it better captures such cases over time.

#### 2. Test Set

A portion (10%) of the synthetically generated dataset, based on user examples, is held out of the training process. This "test set" is used to objectively measure model performance on unseen data.

#### 3. Confusion Matrix

We also analyze model results using a **confusion matrix**, which breaks down predictions into four categories:

* **True Positive (Green Box):** Correctly identified errors.
* **True Negative (Green Box):** Correctly ignored non-errors.
* **False Positive (Red Box):** Incorrectly flagged something as an error.
* **False Negative (Red Box):** Missed identifying a true error.

The goal is to maximize values in the **green boxes** (true positives and true negatives) while minimizing values in the **red boxes** (false positives and false negatives).

<figure><img src="https://2804394160-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FD9Htw9DUg294GuYmAyyj%2Fuploads%2Fgit-blob-0178b019823488014a70d291c84468e1c7f11355%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### Model Versions and Update Numbering

#### Minor Version Updates

Version updates that are specific to your environment are reflected through minor number increments. For example, in version **10.0.0 to 10.0.1**, the last digit would increase with each minor update that come from Teach your AI fixes.

#### Major Version Upgrades (Model Rebasing)

On a regular cadence, Protege will perform **model rebasing**, updating your model with the latest ShieldLlama model that includes policies, precedents, and compliance standards observed across industries (e.g., FTC, NAD, FDIC, and more). These substantial updates are reflected in **major version upgrades** (e.g., moving from version 10.0 to 11.0, the first digit).

To set up a fine-tuned model for your organization, reach out to <founders@tryprotege.com>.
