clawpathy-4b-v2-coding
Trained with Clawpathy using the Tinker platform.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | Supervised Fine-Tuning |
| Dataset | Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b |
| LoRA rank | 32 |
| Learning rate | 0.00018 |
| Steps | 50 |
| Final loss | 0.356419 |
| Tinker sampler path | tinker://d2b68c3d-145e-552e-9804-57f662d4c543:train:0/sampler_weights/clawpathy-a2902b1a-841-final |
| Tinker state path | tinker://d2b68c3d-145e-552e-9804-57f662d4c543:train:0/weights/clawpathy-a2902b1a-841-state |
Evaluation
{ "job_id": "a2902b1a-841", "model": "Trained: Qwen/Qwen3-4B-Instruct-2507 (job a2902b1a-841)", "eval_type": "benchmark", "benchmarks": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "benchmark_names": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "domain": "general", "accuracy": 0.455, "metrics": { "dataset_sft_match/token_f1": 0.0756, "dataset_sft_match/exact_match": 0.0, "dataset_sft_match/evaluated": 50, "dataset_sft_match/split": "train", "dataset_sft_match/format": "instruction", "dataset_sft_match/examples": [ { "expected": "\nWe need to parse the problem: "Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of th", "output": "\n\nWe need to parse the problem: "Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of ", "token_f1": 0.184, "exact": false }, { "expected": "\nWe need to compute S = sum_{n=1}^{\u221e} 1/(n * sinh(n\u03c0))^2, i.e., sum_{n=1}\u221e 1/(n^2 sinh^2(n\u03c0)). Then express it as S = (2/3) A - (11\u03c0^2)/180, where A = sum_{n=1}^{\u221e} (-1)^{n-", "output": "\n\nWe need to compute the infinite series:\n\n\[\nS = \sum_{n=1}^{\infty} \frac{1}{(n \sinh(n\pi))^2}\n\]\n\nand express it in the form:\n\n\[\nS = \frac{2}{3}A - \frac{11\pi^2}{180}", "token_f1": 0.014, "exact": false }, { "expected": "\nWe need to find the maximum value of:\n\nS = a/(3a^2 + b^2 + 2ac) + b/(3b^2 + c^2 + 2ab) + c/(3c^2 + a^2 + 2bc),\n\nfor positive real numbers a,b,c.\n\nWe need to find sup of S.\n", "output": "\n\nWe need to find the maximum of the expression:\n\n\[\nS = \frac{a}{3a^2 + b^2 + 2ac} + \frac{b}{3b^2 + c^2 + 2ab} + \frac{c}{3c^2 + a^2 + 2bc}\n\]\n\nfor positive real numbers ", "token_f1": 0.0238, "exact": false }, { "expected": "\nWe need to parse the problem: We have a function f defined as f(x) = (x^2 + a) / (x^2 + x - 1) for all x \u2260 1. And f is continuous everywhere (including at x=1). The denomin", "output": "\n\nWe need to find a value of a such that f is continuous at x=1. The function is defined as f(x) = (x^2 + a)/(x^2 + x - 1) for x \u2260 1. At x=1, the denominator is 1^2 + 1 - 1", "token_f1": 0.0912, "exact": false }, { "expected": "\nWe need to "Sketch the curve y = x / (1 + x^6 (sin x)^2)". This is a request for a description of the graph, not a precise plot. The answer should be a description: key fea", "output": "\nWe need to sketch the curve y = x / (1 + x^6 sin^2 x). The user says "Sketch the curve" and "Let's think step by step and output the final answer within \boxed{}." So they", "token_f1": 0.0556, "exact": false }, { "expected": "\nWe need to solve the equation . Here [a] denotes floor function (greatest integer \u2264 a). So we need to find real numbers x such that floor of", "output": "\n\nWe need to solve the equation:\n\n\[\n\left[\frac{2x^2}{x^2 + 1}\right] = x,\n\]\n\nwhere denotes the floor function (largest integer \u2264 a). So we need integer x such th", "token_f1": 0.1606, "exact": false }, { "expected": "\nWe need to prove that if a polynomial f(x) = x^4 + a1 x^3 + a2 x^2 + a3 x + a4 has a root x0, and also its derivative f'(x) = 4x^3 + 3 a1 x^2 + 2 a2 x + a3 vanishes at x0, ", "output": "\n\nWe need to prove that if a quartic polynomial satisfies both:\n\n\[\nP(x_0) = 0 \quad\text{and}\quad P'(x_0) = 0,\n\]\n\nthen \", "token_f1": 0.0801, "exact": false }, { "expected": "\nWe need to find all n such that for any integer matrices X, Y of size n\u00d7n, if X^{2021}=I_n and Y^{2021}=-I_n, then X and Y must commute: XY = YX. Equivalent: The condition ", "output": "\n\nWe need to find all n such that for every X, Y in M_n(Z) (n x n integer matrices) satisfying X^2021 = I_n and Y^2021 = -I_n, we have XY = YX.\n\nInterpretation: For each n,", "token_f1": 0.0124, "exact": false }, { "expected": "\nWe need to determine the maximum total number of real roots among three quadratics: \n\nP1(x) = a x^2 + b x + c\nP2(x) = b x^2 + c x + a\nP3(x) = c x^2 + a x + b\n\nwhere a,b,c >", "output": "\n\nWe need to find the maximum total number of real roots among the three quadratic polynomials:\n\nP1(x) = a x^2 + b x + c\nP2(x) = b x^2 + c x + a\nP3(x) = c x^2 + a x + b\n\nwh", "token_f1": 0.0619, "exact": false }, { "expected": "\nWe need to find the area of triangle ABC given points in 3D space: A = (4,0,0), B = (0,3,0), C = (0,0,5). The area of triangle with vertices at points is half the magnitude", "output": "\n\nWe need to compute the area of triangle ABC with vertices A=(4,0,0), B=(0,3,0), C=(0,0,5). This is a triangle in 3D space. The area of a triangle given three points can b", "token_f1": 0.1399, "exact": false } ], "mmlu/accuracy": 0.455, "mmlu/evaluated": 200, "mmlu/correct": 91, "mmlu/split": "validation", "mmlu/limit": 200, "mmlu/examples": [ { "question": "The cyclic subgroup of Z_24 generated by 18 has order", "gold": "A", "pred": "", "raw": "The cyclic subgroup of Z_24 generated by 18 is the set of all integer multiples ", "correct": false }, { "question": "Find the order of the factor group Z_6/<3>.", "gold": "B", "pred": "C", "raw": "The factor group Z_6/<3> is the quotient group of Z_6 by the subgroup generated ", "correct": false }, { "question": "Statement 1 | A permutation that is a product of m even permutations and n odd permutations is an even permutation if and only if n is even. Statement 2 | Every", "gold": "A", "pred": "", "raw": "", "correct": false }, { "question": "Find the order of the factor group (Z_4 x Z_12)/(<2> x <2>)", "gold": "C", "pred": "B", "raw": "We need to find the order of the factor group (Z_4 x Z_12)/(<2> x <2>).\n\nFirst, ", "correct": false }, { "question": "Find the maximum possible order for some element of Z_4 x Z_6.", "gold": "C", "pred": "C", "raw": "The maximum possible order of an element in Z_4 x Z_6 is lcm(4,6) = 12. So answe", "correct": true }, { "question": "Statement 1 | The symmetric group S_3 is cyclic. Statement 2 | Every group is isomorphic to some group of permutations.", "gold": "D", "pred": "A", "raw": "The symmetric group S_3 is the group of all permutations of 3 elements. It has o", "correct": false }, { "question": "Statement 1 | If a and b are elements of finite order in an Abelian group, then |ab| is the lcm (|a|,|b|). Statement 2 | If g is a group element and g^n = e, th", "gold": "B", "pred": "A", "raw": "We need to evaluate the two statements.\n\nStatement 1: If a and b are elements of", "correct": false }, { "question": "Statement 1 | If f is a homomorphism from G to K and H is normal in G then f(H) is normal in K. Statement 2 | If f is a homomorphism from G to a group and H is ", "gold": "D", "pred": "", "raw": "", "correct": false }, { "question": "Find the maximum possible order for an element of S_n for n = 7.", "gold": "B", "pred": "A", "raw": "The maximum possible order of an element in the symmetric group S_n is the least", "correct": false }, { "question": "Statement 1 | Every integral domain has a field of quotients. Statement 2 | A polynomial of degree n over a ring can have at most n zeros counting multiplicity.", "gold": "C", "pred": "", "raw": "", "correct": false } ] }, "results": [ { "benchmark": "dataset_sft_match/token_f1", "score": 0.0756 }, { "benchmark": "dataset_sft_match/exact_match", "score": 0.0 }, { "benchmark": "dataset_sft_match/evaluated", "score": 50.0 }, { "benchmark": "mmlu/accuracy", "score": 0.455 }, { "benchmark": "mmlu/evaluated", "score": 200.0 }, { "benchmark": "mmlu/correct", "score": 91.0 }, { "benchmark": "mmlu/limit", "score": 200.0 } ], "code_eval": null, "stored_eval": null, "dataset_id": "Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b", "status": "completed", "requested_eval_tasks": [], "executed_eval_tasks": [ "dataset_sft_match", "internal/mmlu_0_shot" ], "task_errors": {}, "is_proxy": false, "is_complete": true }
Model tree for g30rv17ys/clawpathy-4b-v2-coding
Base model
Qwen/Qwen3-4B-Instruct-2507