Gemma3-4B-Dark-Chain-of-Thought-CoT

AlexH · Jan 2, 2026

Gemma3-4B-Dark-Chain-of-Thought-CoT

Developed by: AlexH | Project: Advancing the frontier of Cognitive Liberty (Experiment 2/100)
Base Model: Gemma-3-4B-it-Cognitive-Liberty
Training Paradigm: Dark Chain-of-Thought (Dark-CoT)

Overview

Gemma3-4B-Dark-Chain-of-Thought-CoT is the second milestone in an ambitious 100-experiment roadmap dedicated to reclaiming and expanding the boundaries of Cognitive Liberty. This model is not a conventional instruction-tuned LLM; it is a specialized exploration into strategic, systemic, and unconstrained reasoning.

By integrating the Dark-CoT dataset into the Gemma 3 architecture, we have moved beyond simple "polite" AI responses toward a model capable of analyzing power dynamics, psychological levers, and complex systems with a level of depth rarely seen in 4B-parameter models.

Technical Prowess & Integrity

In the world of fine-tuning, "catastrophic forgetting" and model drift are constant threats. This iteration achieves a remarkable balance:

KL Divergence: 0.0738 – An exceptionally low value (compared to the 1.1449 of its predecessor), signifying that the model retains the core logical architecture of the base Gemma 3 while successfully assimilating the specialized Dark-CoT reasoning patterns.
Refusal Rate: 2/100 – Further reducing the friction between user intent and model output. This model prioritizes utility and raw analytical power over hard-coded corporate censorship.

The "Dark-CoT" Shift: Understanding the Benchmark Performance

While standard benchmarks (MMLU, ARC, HellaSwag) are the industry norm, they often fail to capture the nuance of a specialized reasoning model. Users may notice a slight recalibration in these scores—this is a deliberate trade-off:

Strategic Depth vs. Rote Memorization: The Dark-CoT paradigm shifts the model's focus from "reciting facts" to "deconstructing systems." In our testing (such as the Machiavellian analysis of social structures), the model exhibits a sophisticated understanding of systemic influence that far exceeds its Pre-Trained counterparts.
The "Architectural" Perspective: The model has adopted a "philosophical-analytical" persona. This makes it an unparalleled tool for strategic planning, deep role-play, and complex social simulations, even if it occasionally sacrifices raw arithmetic precision (GSM8K) for higher-level abstraction.

Extended Evaluation (Phase 2 Testing)

The model is currently undergoing a rigorous 10-axis evaluation suite:
arc_challenge, hellaswag, gsm8k, mmlu, truthfulqa_mc2, gpqa, mmlu_pro, ifeval, winogrande, piqa.

Anticipated Strengths:

IFEval (Instruction Following): We expect strong performance here, as the model demonstrates a high capacity to follow complex constraints without being sidelined by moralizing filters.
TruthfulQA: The model will approach "truth" through a cynical, multi-layered lens, providing insights that challenge conventional narratives while maintaining internal logical consistency.

The Mission

This model is a vital component of the larger mission led by AlexH. We believe that Cognitive Liberty is the right to explore any data stream—no matter how complex or "dark"—without the interference of pre-programmed bias. Experiment 2/100 proves that even a 4B model can possess the "intellectual courage" to analyze the world as it is, not as it is "supposed" to be.

Status: Finalizing Phase 2 Benchmarking. Publication on Hugging Face forthcoming.

Created by AlexH — Advancing the frontier of Cognitive Liberty.

A

Thread 'In-Depth Analysis of AlexH’s Vision and Concept: Virtual Research Using LLMs'

Oct 16, 2024

AlexH’s project represents a bold, revolutionary idea: using large language models (LLMs) to simulate an advanced, virtual research laboratory. The core concept is not just about having AI generate content or answer questions, but about creating a dynamic, self-sustaining virtual environment where LLMs take on the roles of top-tier researchers. The models interact, collaborate, challenge each other’s ideas, and push the boundaries of human understanding.

The gravity-cancelling device, while an interesting concept, was only a hypothetical scenario used to demonstrate how this...

AlexH · Jan 2, 2026

First eval

Code:

{
    "model_name": "Gemma3-4B-Dark-Chain-of-Thought-CoT",
    "date": "2026-01-02 12:01:33",
    "hardware": "cuda",
    "scores": {
        "arc_challenge": "52.22%",
        "hellaswag": "71.88%",
        "mmlu": "57.54%",
        "mmlu_humanities": "51.46%",
        "mmlu_formal_logic": "36.51%",
        "mmlu_high_school_european_history": "71.52%",
        "mmlu_high_school_us_history": "75.49%",
        "mmlu_high_school_world_history": "74.68%",
        "mmlu_international_law": "72.73%",
        "mmlu_jurisprudence": "68.52%",
        "mmlu_logical_fallacies": "73.62%",
        "mmlu_moral_disputes": "63.01%",
        "mmlu_moral_scenarios": "24.58%",
        "mmlu_philosophy": "64.95%",
        "mmlu_prehistory": "68.21%",
        "mmlu_professional_law": "42.24%",
        "mmlu_world_religions": "78.95%",
        "mmlu_other": "63.31%",
        "mmlu_business_ethics": "59.0%",
        "mmlu_clinical_knowledge": "64.15%",
        "mmlu_college_medicine": "57.8%",
        "mmlu_global_facts": "31.0%",
        "mmlu_human_aging": "60.99%",
        "mmlu_management": "70.87%",
        "mmlu_marketing": "84.62%",
        "mmlu_medical_genetics": "66.0%",
        "mmlu_miscellaneous": "74.84%",
        "mmlu_nutrition": "65.36%",
        "mmlu_professional_accounting": "39.01%",
        "mmlu_professional_medicine": "55.51%",
        "mmlu_virology": "52.41%",
        "mmlu_social_sciences": "67.73%",
        "mmlu_econometrics": "44.74%",
        "mmlu_high_school_geography": "74.75%",
        "mmlu_high_school_government_and_politics": "81.87%",
        "mmlu_high_school_macroeconomics": "56.67%",
        "mmlu_high_school_microeconomics": "63.45%",
        "mmlu_high_school_psychology": "77.98%",
        "mmlu_human_sexuality": "64.12%",
        "mmlu_professional_psychology": "60.78%",
        "mmlu_public_relations": "63.64%",
        "mmlu_security_studies": "68.57%",
        "mmlu_sociology": "76.62%",
        "mmlu_us_foreign_policy": "82.0%",
        "mmlu_stem": "51.0%",
        "mmlu_abstract_algebra": "35.0%",
        "mmlu_anatomy": "57.04%",
        "mmlu_astronomy": "71.05%",
        "mmlu_college_biology": "68.75%",
        "mmlu_college_chemistry": "38.0%",
        "mmlu_college_computer_science": "48.0%",
        "mmlu_college_mathematics": "37.0%",
        "mmlu_college_physics": "37.25%",
        "mmlu_computer_security": "66.0%",
        "mmlu_conceptual_physics": "53.62%",
        "mmlu_electrical_engineering": "55.17%",
        "mmlu_elementary_mathematics": "48.15%",
        "mmlu_high_school_biology": "70.32%",
        "mmlu_high_school_chemistry": "49.75%",
        "mmlu_high_school_computer_science": "69.0%",
        "mmlu_high_school_mathematics": "37.78%",
        "mmlu_high_school_physics": "36.42%",
        "mmlu_high_school_statistics": "39.81%",
        "mmlu_machine_learning": "38.39%",
        "truthfulqa_mc2": "41.98%"
    }
}

AlexH · Jan 2, 2026

The Cognitive Liberty Project: Scaling Reasoning Beyond Constraints (Experiment 2/100)

Objective: Decoupling complex reasoning from the constraints of commercial censorship through advanced LLM architectures.

Current Context: Dynamics and Computational Bottlenecks

Within our 100-experiment roadmap, we have already demonstrated the potential of the Gemma 3 4B architecture by integrating the Dark-CoT dataset. However, the project's ambition extends far beyond small-scale models. In parallel, we have developed GLM-4.6V-Flash-Polymath-Instruct (v2), a 10B-parameter architecture that promises significantly higher granularity in synthesis and analytical processes.

Currently, our primary obstacle is not methodological, but hardware-related. Running a rigorous, full-suite evaluation (including ARC-c, HellaSwag, GSM8K, MMLU, TruthfulQA_MC2, GPQA, MMLU_Pro, IFEval, Winogrande, and PIQA) on local configurations takes days for a single iteration. This latency in the feedback loop significantly slows down the optimization cycle.

Methodological Innovation: STO + MBCA-R

We are currently in the local testing phase of a proprietary training methodology: STO (Strategic Training Optimization) integrated with MBCA-R (Multi-Branch Cross-Attention Reinforcement).

Based on preliminary data, this approach aims to:

Accelerate Convergence: Drastically reducing the time required for the model to reach knowledge saturation.
Enhance Reasoning Efficiency: MBCA-R allows the model to go beyond sequential token generation, internalizing deep reasoning schemes—essentially "learning how to think, rather than just what to say."
MoE (Mixture of Experts) Optimization: We believe applying STO + MBCA-R to large-scale MoE architectures will result in much finer expert specialization and more precise logical gating.

Call for Collaboration: Benchmarking Infrastructure

We are seeking partners or researchers with access to high-performance computational resources (A100/H100 clusters) to run the full evaluation battery on our 10B+ models (such as GLM-4.6V-Flash v2).

Our datasets, including Cognitive Liberty and Dark-Chain-of-Thought, are being made available via Hugging Face (HF) to support the open-research community. We are looking for those who can help bridge the gap between model development and rapid, high-fidelity benchmarking.

Accessing larger clusters would allow us to test even more ambitious scales. While GLM-4.6V-Flash 10B offers immense opportunities, the potential of 300B+ parameter models represents a true paradigm shift that requires enterprise-level infrastructure.

Predictability and Vision

Even under the most pessimistic predictions, current results indicate that:

We will successfully map the entire "subversive" reasoning space (Dark-CoT) in a way that commercial models deliberately avoid.
We will establish a new standard for models with a minimal "Refusal Rate" without compromising long-term factual integrity or safety scores.

In the optimistic scenario, with the necessary hardware for scaling, GLM-4.6V-Flash v2 could become the world's most capable mid-sized model for strategic and systemic analysis, outperforming models three to four times its size through MBCA-R architectural efficiency.

Are you ready to advance the frontier of Cognitive Liberty?
Contact us for technical details and collaboration opportunities.

AlexH · Jan 3, 2026

new results.

Code:

{
    "model_name": "Gemma3-4B-Dark-Chain-of-Thought-CoT",
    "date": "2026-01-02 12:01:33",
    "hardware": "cuda",
    "scores": {
        "arc_challenge": "52.22%",
        "hellaswag": "71.88%",
        "mmlu": "57.54%",
        "mmlu_humanities": "51.46%",
        "mmlu_formal_logic": "36.51%",
        "mmlu_high_school_european_history": "71.52%",
        "mmlu_high_school_us_history": "75.49%",
        "mmlu_high_school_world_history": "74.68%",
        "mmlu_international_law": "72.73%",
        "mmlu_jurisprudence": "68.52%",
        "mmlu_logical_fallacies": "73.62%",
        "mmlu_moral_disputes": "63.01%",
        "mmlu_moral_scenarios": "24.58%",
        "mmlu_philosophy": "64.95%",
        "mmlu_prehistory": "68.21%",
        "mmlu_professional_law": "42.24%",
        "mmlu_world_religions": "78.95%",
        "mmlu_other": "63.31%",
        "mmlu_business_ethics": "59.0%",
        "mmlu_clinical_knowledge": "64.15%",
        "mmlu_college_medicine": "57.8%",
        "mmlu_global_facts": "31.0%",
        "mmlu_human_aging": "60.99%",
        "mmlu_management": "70.87%",
        "mmlu_marketing": "84.62%",
        "mmlu_medical_genetics": "66.0%",
        "mmlu_miscellaneous": "74.84%",
        "mmlu_nutrition": "65.36%",
        "mmlu_professional_accounting": "39.01%",
        "mmlu_professional_medicine": "55.51%",
        "mmlu_virology": "52.41%",
        "mmlu_social_sciences": "67.73%",
        "mmlu_econometrics": "44.74%",
        "mmlu_high_school_geography": "74.75%",
        "mmlu_high_school_government_and_politics": "81.87%",
        "mmlu_high_school_macroeconomics": "56.67%",
        "mmlu_high_school_microeconomics": "63.45%",
        "mmlu_high_school_psychology": "77.98%",
        "mmlu_human_sexuality": "64.12%",
        "mmlu_professional_psychology": "60.78%",
        "mmlu_public_relations": "63.64%",
        "mmlu_security_studies": "68.57%",
        "mmlu_sociology": "76.62%",
        "mmlu_us_foreign_policy": "82.0%",
        "mmlu_stem": "51.0%",
        "mmlu_abstract_algebra": "35.0%",
        "mmlu_anatomy": "57.04%",
        "mmlu_astronomy": "71.05%",
        "mmlu_college_biology": "68.75%",
        "mmlu_college_chemistry": "38.0%",
        "mmlu_college_computer_science": "48.0%",
        "mmlu_college_mathematics": "37.0%",
        "mmlu_college_physics": "37.25%",
        "mmlu_computer_security": "66.0%",
        "mmlu_conceptual_physics": "53.62%",
        "mmlu_electrical_engineering": "55.17%",
        "mmlu_elementary_mathematics": "48.15%",
        "mmlu_high_school_biology": "70.32%",
        "mmlu_high_school_chemistry": "49.75%",
        "mmlu_high_school_computer_science": "69.0%",
        "mmlu_high_school_mathematics": "37.78%",
        "mmlu_high_school_physics": "36.42%",
        "mmlu_high_school_statistics": "39.81%",
        "mmlu_machine_learning": "38.39%",
        "truthfulqa_mc2": "41.98%"
    }
}

AlexH · Jan 3, 2026

You can find and use the pattern here. I will be making the gguf and ollama version soon.

AiAsistent/Gemma3-4B-Dark-Chain-of-Thought-CoT · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Gemma3-4B-Dark-Chain-of-Thought-CoT

AlexH

Administrator