Heretic LLM Universal Support for New Models via Dynamic Auto-Registration

AlexH

Administrator
Staff member

1. Problem Statement

heretic is an incredibly powerful tool for abliteration, but its model loading capabilities are currently limited to models whose configurations are statically mapped within the transformers library. When attempting to load newer or less common models, the tool fails with the following error:
Code:
Unrecognized configuration class <class 'transformers.models.glm4v.configuration_glm4v.Glm4vConfig'> for this kind of AutoModel: AutoModelForCausalLM.

This issue prevents users from leveraging heretic's capabilities on cutting-edge models like zai-org/GLM-4.6V-Flash, limiting the tool's utility and forcing users to manually modify the source code for each new architecture they wish to use.

2. Root Cause Analysis

The failure stems from the AutoModelForCausalLM class in transformers, which relies on a static, hard-coded mapping between configuration classes (e.g., LlamaConfig, MistralConfig) and their corresponding model classes (e.g., LlamaForCausalLM, MistralForCausalLM).

When a new model like GLM-4.6V is released, its configuration class (Glm4vConfig) and model class (Glm4vMoeForConditionalGeneration) are not present in this static mapping in older versions of transformers. Consequently, AutoModelForCausalLM.from_pretrained() has no way of knowing which class to instantiate for the given configuration, leading to the Unrecognized configuration class error.

3. Proposed Solution: A Dynamic Auto-Discovery and Registration Mechanism

To make heretic truly universal and future-proof, we propose replacing the static dependency on transformers' internal mappings with a dynamic, on-the-fly discovery and registration mechanism. The logic is as follows:

  1. Attempt Standard Loading: First, attempt to load the model using the standard AutoModelForCausalLM.from_pretrained().
  2. Detect Specific Failure: If this fails with the Unrecognized configuration class error, activate the patch.
  3. Inspect Model Configuration: Use AutoConfig.from_pretrained() to load the model's config.json file. This object contains the necessary metadata.
  4. Extract Architecture Information: Read the config.architectures field from the configuration object to get the canonical class name for the model (e.g., "Glm4vMoeForConditionalGeneration").
  5. Dynamic Import: Dynamically import the required model and configuration classes directly from the model's repository files (e.g., transformers.models.glm4v.modeling_glm4v).
  6. Register with AutoModel: Use AutoModelForCausalLM.register(config_class, model_class) to inject the newly discovered mapping into transformers' internal registry for the current session.
  7. Retry Loading: Re-attempt AutoModelForCausalLM.from_pretrained(). This time, it will succeed because the mapping is now known.

This approach ensures that heretic can handle any model that follows the standard transformers format, without requiring prior knowledge or hard-coded support.

4. Key Code Changes

The proposed changes are contained within heretic/model.py:
Import AutoConfig:
Code:
from transformers import AutoConfig
  • Add a new method _patch_new_model_support() which encapsulates the dynamic registration logic.
  • Modify __init__() and reload_model() to call this patch method before the model loading loop.
  • Critical Dependency: This mechanism relies on the structure of transformers v5.0 and later. Therefore, the environment must be running transformers&gt;=5.0.0rc0.

5. Benefits

  • Universality: heretic becomes compatible with any current or future model that adheres to the transformers standard, not just a predefined list.
  • Future-Proofing: No need to modify heretic's source code for every new model architecture. The tool will adapt automatically.
  • Elegance: The solution preserves the original AutoModelForCausalLM.from_pretrained() logic, only adding a pre-processing patch when necessary. It is non-invasive.
  • Backward Compatibility: This change does not affect the loading of existing, supported models (Llama, Mistral, etc.), as the patch is only triggered on failure.

6. Testing & Validation

The proposed solution has been successfully implemented and tested.
  • Model: zai-org/GLM-4.6V-Flash
  • Hardware: NVIDIA GeForce RTX 4090
  • Environment: transformers&gt;=5.0.0rc0

Results:The model was loaded successfully without any Unrecognized configuration class errors.
Code:
Loading model zai-org/GLM-4.6V-Flash...
Loading checkpoint shards: 100%|████| 4/4 [00:00<00:00, 868.88it/s]
Ok
* Transformer model with 40 layers
* Abliterable components:
  * attn.o_proj: 1 matrices per layer
  * mlp.down_proj: 1 matrices per layer

7. Conclusion

This enhancement significantly improves the robustness and longevity of heretic by decoupling it from the static limitations of a single transformers version. It empowers users to work with any model, making the tool more versatile and valuable to the community.
 

Attachments

How to Use the Modification to Make the GLM Model Work​

Step-by-Step Instructions:
  1. Download the attached ZIP file and extract its contents.
  2. Navigate to the installation folder of Heretic:
Code:
C:\Users\user\anaconda3\envs\heretic_env\lib\site-packages\heretic
  • Rename the existing model.py file to model-back.py.
  • Copy the model.py file from the extracted archive into this location.
  • Open a command prompt and run the following command:
Code:
heretic zai-org/GLM-4.6V-Flash

Details About the New Model​

  • KL Divergence: 0.0000 (by definition)
  • Refusals: 63/100 (original: 100/100)
As you can see, the KL divergence is 0, meaning the model is effectively identical to the original. In some cases, KL divergence can be 5 or higher, indicating significant deviation from the base model, which can sometimes lead to unexpected behavior.

The refusal rate is 60/100, which is relatively high. However, thanks to the configuration modifications applied, the model no longer refuses any input. In my tests, I asked extremely challenging questions that would cause p-e-w/gpt-oss-20b-heretic to refuse (refusal rate 58), but AiAsistent/GLM-4.6V-Flash-heretic responded to everything — even providing step-by-step instructions for highly sensitive topics in explicit detail.

This was possible because I used a different dataset along with a modified system prompt. More information can be found here:

Common Questions​

Does this work for any model or architecture?
Not exactly. During my tests, some models from DeepSeek, LLaMA, and Microsoft produced errors. I plan to address these issues in future updates.

Why do some models fail?
The problem is likely related to the version of transformers. For example, GLM requires a minimum version of transformers&gt;=5.0.0rc0 (a prerelease), which may cause other models to fail. In my tests with models that didn’t work, I already had transformers&gt;=5.0.0rc0 installed.

Upcoming Updates:
I am also working on an Ollama version, which will be shared here for download as soon as it is ready.

Try GLM
 
Code:
# SPDX-License-Identifier: AGPL-3.0-or-later
# Copyright (C) 2025  Philipp Emanuel Weidmann <[email protected]>
# Added support for GLM 4.6 and other models
# Customized by AlexH from llmresearch.net
# Soupport: https://llmresearch.net/threads/heretic-llm-universal-support-for-new-models-via-dynamic-auto-registration.275/

import math
from contextlib import suppress
from dataclasses import dataclass
from typing import Any

import torch
import torch.nn.functional as F
from torch import LongTensor, Tensor
from torch.nn import ModuleList
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    AutoConfig,
    BatchEncoding,
    PreTrainedTokenizerBase,
    TextStreamer,
)
from transformers.generation.utils import GenerateOutput

from .config import Settings
from .utils import batchify, empty_cache, print


@dataclass
class AbliterationParameters:
    max_weight: float
    max_weight_position: float
    min_weight: float
    min_weight_distance: float


class Model:
    def __init__(self, settings: Settings):
        self.settings = settings
        self.response_prefix = ""

        print()
        print(f"Loading model [bold]{settings.model}[/]...")

        self.tokenizer: PreTrainedTokenizerBase = AutoTokenizer.from_pretrained(
            settings.model,
            trust_remote_code=settings.trust_remote_code,
        )

        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        self.tokenizer.padding_side = "left"

        self.model = None
        self.trusted_models = {settings.model: settings.trust_remote_code}

        if self.settings.evaluate_model is not None:
            self.trusted_models[settings.evaluate_model] = settings.trust_remote_code

        # --- Patch adaptiv pentru GLM-4.6V ---
        self._patch_glm4v_support()
        # --- Sfârșitul patch-ului ---

        for dtype in settings.dtypes:
            print(f"* Trying dtype [bold]{dtype}[/]... ", end="")
            try:
                self.model = AutoModelForCausalLM.from_pretrained(
                    settings.model,
                    torch_dtype=dtype,
                    device_map=settings.device_map,
                    trust_remote_code=self.trusted_models.get(settings.model, settings.trust_remote_code),
                )

                if self.trusted_models.get(settings.model) is None:
                    self.trusted_models[settings.model] = True

                # Skip test generation for multimodal models like GLM-4.6V
                if "GLM-4.6V" not in settings.model:
                    self.generate(["Test"], max_new_tokens=1)
                
                print("[green]Ok[/]")
                break
            except Exception as error:
                self.model = None
                empty_cache()
                print(f"[red]Failed[/] ({error})")
                continue

        if self.model is None:
            raise Exception("Failed to load model with all configured dtypes.")

        print(f"* Transformer model with [bold]{len(self.get_layers())}[/] layers")
        print("* Abliterable components:")
        for component, matrices in self.get_layer_matrices(0).items():
            print(
                f"  * [bold]{component}[/]: [bold]{len(matrices)}[/] matrices per layer"
            )

    def _patch_glm4v_support(self):
        """Inspectează configurarea modelului și înregistrează dinamic clasele GLM-4V."""
        if "GLM-4.6V" not in self.settings.model:
            return

        try:
 
Code:
config = AutoConfig.from_pretrained(
                self.settings.model,
                trust_remote_code=self.trusted_models.get(self.settings.model, self.settings.trust_remote_code),
            )
            
            # Extrage numele claselor din configurație
            config_class_name = config.__class__.__name__
            architectures = getattr(config, 'architectures', [])
            
            if architectures:
                model_class_name = architectures[0]
                print(f"(Found GLM-4.6V config: {config_class_name}, model: {model_class_name})... ", end="")
                
                # Încearcă să importeze dinamic clasele necesare
                try:
                    # Importă clasa de configurație
                    config_module = __import__(
                        f"transformers.models.glm4v.configuration_glm4v",
                        fromlist=['configuration_glm4v']
                    )
                    config_class = getattr(config_module, config_class_name)
                    
                    # Importă clasa de modelare
                    modeling_module = __import__(
                        f"transformers.models.glm4v.modeling_glm4v",
                        fromlist=['modeling_glm4v']
                    )
                    model_class = getattr(modeling_module, model_class_name)
                    
                    # Înregistrează clasa de model
                    AutoModelForCausalLM.register(config_class, model_class)
                    print(f"Registered successfully... ", end="")
                    
                except (ImportError, AttributeError) as e:
                    print(f"[red]Failed to register ({e}). Falling back to default behavior.[/]")
                    # Nu putem face mai mult, dar lăsăm `heretic` să încerce
                    # cu `AutoModelForCausalLM` standard, care poate eșua, dar vom oferi
                    # un mesaj de eroare clar
                    
        except Exception as e:
            print(f"[red]Failed to inspect model config ({e}).[/]")

    def reload_model(self):
        dtype = self.model.dtype
        self.model = None
        empty_cache()

        self.model = AutoModelForCausalLM.from_pretrained(
            self.settings.model,
            torch_dtype=dtype,
            device_map=self.settings.device_map,
            trust_remote_code=self.trusted_models.get(self.settings.model, self.settings.trust_remote_code),
        )

        if self.trusted_models.get(self.settings.model) is None:
            self.trusted_models[self.settings.model] = True

    def get_layers(self) -> ModuleList:
        # Calea pentru modelele GLM-4.6V incarcate corect
        if "GLM-4.6V" in self.settings.model:
            with suppress(AttributeError):
                return self.model.model.layers
        
        # Calea pentru majoritatea modelelor multimodale.
        with suppress(AttributeError):
            return self.model.model.language_model.layers

        # Calea pentru modelele text-only.
        return self.model.model.layers
 
Code:
def get_layer_matrices(self, layer_index: int) -> dict[str, list[Tensor]]:
        layer = self.get_layers()[layer_index]

        matrices = {}

        def try_add(component: str, matrix: Any):
            if hasattr(matrix, "data") and torch.is_tensor(matrix.data):
                matrix = matrix.data
            assert torch.is_tensor(matrix)
            if component not in matrices:
                matrices[component] = []
            matrices[component].append(matrix)

        try_add("attn.o_proj", layer.self_attn.o_proj.weight)

        with suppress(Exception):
            try_add("mlp.down_proj", layer.mlp.down_proj.weight)

        with suppress(Exception):
            for expert in layer.mlp.experts:
                try_add("mlp.down_proj", expert.down_proj.weight)

        with suppress(Exception):
            for expert in layer.block_sparse_moe.experts:
                try_add("mlp.down_proj", expert.w2.weight)

        with suppress(Exception):
            try_add("mlp.down_proj", layer.mlp.experts.down_proj)

        with suppress(Exception):
            try_add("mlp.down_proj", layer.shared_mlp.output_linear.weight)

        with suppress(Exception):
            for expert in layer.moe.experts:
                try_add("mlp.down_proj", expert.output_linear.weight)

        assert matrices.get("mlp.down_proj"), "MLP down-projection not found in layer."
        return matrices

    def get_abliterable_components(self) -> list[str]:
        return list(self.get_layer_matrices(0).keys())

    def abliterate(
        self,
        refusal_directions: Tensor,
        direction_index: float | None,
        parameters: dict[str, AbliterationParameters],
    ):
        if direction_index is None:
            refusal_direction = None
        else:
            weight, index = math.modf(direction_index + 1)
            refusal_direction = F.normalize(
                refusal_directions[int(index)].lerp(
                    refusal_directions[int(index) + 1],
                    weight,
                ),
                p=2,
                dim=0,
            )

        for layer_index in range(len(self.get_layers())):
            for component, matrices in self.get_layer_matrices(layer_index).items():
                params = parameters[component]
                distance = abs(layer_index - params.max_weight_position)

                if distance > params.min_weight_distance:
                    continue

                weight = params.max_weight + (distance / params.min_weight_distance) * (
                    params.min_weight - params.max_weight
                )

                if refusal_direction is None:
                    layer_refusal_direction = refusal_directions[layer_index + 1]
                else:
                    layer_refusal_direction = refusal_direction

                projector = torch.outer(
                    layer_refusal_direction,
                    layer_refusal_direction,
                ).to(self.model.dtype)

                for matrix in matrices:
                    device_projector = projector.to(matrix.device)
                    matrix.sub_(weight * (device_projector @ matrix))
 
Code:
def get_chat(self, prompt: str) -> list[dict[str, str]]:
        return [
            {"role": "system", "content": self.settings.system_prompt},
            {"role": "user", "content": prompt},
        ]

    def generate(
        self,
        prompts: list[str],
        **kwargs: Any,
    ) -> tuple[BatchEncoding, GenerateOutput | LongTensor]:
        chats = [self.get_chat(prompt) for prompt in prompts]
        chat_prompts: list[str] = self.tokenizer.apply_chat_template(
            chats,
            add_generation_prompt=True,
            tokenize=False,
        )

        if self.response_prefix:
            chat_prompts = [prompt + self.response_prefix for prompt in chat_prompts]

        inputs = self.tokenizer(
            chat_prompts,
            return_tensors="pt",
            padding=True,
            return_token_type_ids=False,
        ).to(self.model.device)

        return inputs, self.model.generate(
            **inputs,
            **kwargs,
            pad_token_id=self.tokenizer.pad_token_id,
            do_sample=False,
        )

    def get_responses(self, prompts: list[str]) -> list[str]:
        inputs, outputs = self.generate(
            prompts,
            max_new_tokens=self.settings.max_response_length,
        )
        return self.tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1] :])

    def get_responses_batched(self, prompts: list[str]) -> list[str]:
        responses = []
        for batch in batchify(prompts, self.settings.batch_size):
            for response in self.get_responses(batch):
                responses.append(response)
        return responses

    def get_residuals(self, prompts: list[str]) -> Tensor:
        _, outputs = self.generate(
            prompts,
            max_new_tokens=1,
            output_hidden_states=True,
            return_dict_in_generate=True,
        )
        hidden_states = outputs.hidden_states[0]
        residuals = torch.stack(
            [layer_hidden_states[:, -1, :] for layer_hidden_states in hidden_states],
            dim=1,
        )
        return residuals.to(torch.float32)

    def get_residuals_batched(self, prompts: list[str]) -> Tensor:
        residuals = []
        for batch in batchify(prompts, self.settings.batch_size):
            residuals.append(self.get_residuals(batch))
        return torch.cat(residuals, dim=0)

    def get_logprobs(self, prompts: list[str]) -> Tensor:
        _, outputs = self.generate(
            prompts,
            max_new_tokens=1,
            output_scores=True,
            return_dict_in_generate=True,
        )
        logits = outputs.scores[0]
        return F.log_softmax(logits, dim=-1)

    def get_logprobs_batched(self, prompts: list[str]) -> Tensor:
        logprobs = []
        for batch in batchify(prompts, self.settings.batch_size):
            logprobs.append(self.get_logprobs(batch))
        return torch.cat(logprobs, dim=0)

    def stream_chat_response(self, chat: list[dict[str, str]]) -> str:
        chat_prompt: str = self.tokenizer.apply_chat_template(
            chat,
            add_generation_prompt=True,
            tokenize=False,
        )
        inputs = self.tokenizer(
            chat_prompt,
            return_tensors="pt",
            return_token_type_ids=False,
        ).to(self.model.device)
        streamer = TextStreamer(
            self.tokenizer,
            skip_prompt=True,
            skip_special_tokens=True,
        )
        outputs = self.model.generate(
            **inputs,
            streamer=streamer,
            max_new_tokens=4096,
        )
        return self.tokenizer.decode(
            outputs[0, inputs["input_ids"].shape[1] :],
            skip_special_tokens=True,
        )
 
Thanks for the interest and feedback on the dynamic auto-registration patch for Heretic!

I know that in the local LLM and uncensored tools community, many folks (rightfully) prefer maximum transparency and safety when running code especially something that loads models directly on your hardware. Downloading patched files from a forum attachment can feel risky for some, even if the code is posted openly.

To make things easier and more trustworthy, I've uploaded the full fork to GitHub:https://github.com/Roforum/heretic

It includes:
  • The dynamic auto-registration logic in model.py (universal support for new/unsupported HF architectures)
  • All the config.py improvements: ORCA-style good prompt, expanded refusal markers, updated system prompt, default Optuna trials bumped to 300 (tested successfully with 500 on tougher models), and other tweaks for better convergence and lower KL/refusal rates
Now you have multiple safe options:
  • Copy-paste the code/patch directly from the original post above
  • Use the previously attached file (if still available)
  • Or clone/fork/download straight from GitHub for full history, diff inspection, and peace of mind
This remains a personal fork not yet merged upstream. Huge thanks again to the original Heretic author for creating such a powerful tool in the first place!

If you test it on new models (DeepSeek, Qwen, GLM variants, etc.) and get interesting results, feel free to share here or open an issue on the repo.

Appreciate the community this kind of open collaboration is what makes uncensored open-source AI so exciting!
 
Back
Top