Idea CPU (ICPU) + Geometry Solver

0) Clean assumptions (so the math is valid)

Space: unit sphere Sd−1S^{d-1} with angular distance D(x,y)=arccos⁡(x⋅y)D(x,y)=\arccos(x\cdot y).
Sets: Ω\Omega (feasible) and GG (goal) are nonempty, closed, spherically convex, and both lie inside a common open hemisphere.
Feasible-goal set: C:=Ω∩G≠∅C:=\Omega\cap G\neq\varnothing.
MERGE: geodesic interpolation slerp(x,y;t)\mathrm{slerp}(x,y;t), t∈(0,1]t\in(0,1].
PRUNE: metric projection PΩ(x)=arg⁡min⁡z∈ΩD(x,z)P_\Omega(x)=\arg\min_{z\in\Omega}D(x,z).
Monitor: F(x):=dist(x,C)F(x):=\mathrm{dist}(x,C).

(Hemisphere → unique shortest geodesics + single-valued 1-Lipschitz projections. We compare to points in C⊆ΩC\subseteq\Omega, never outside Ω\Omega.)

1) ICPU — minimal spec

1.1 Registers & memory

Regs: A, B, C, GOAL (unit vectors).
Status: SIM/DIFF/ZERO/VIOLATION.
Concept memory: addressable unit vectors.
Program/IP: instruction list + pointer.

1.2 ISA (10 ops)

Data
LOAD A,[addr] • STORE A,[addr] • MOVE A,B

Compare/Control
COMPARE A,GOAL → set SIM/DIFF vs threshold θ
BRANCH_IF SIM, addr • JUMP addr • HALT

Cognitive
MERGE A,B→A // A←slerp(A,B;t)A\leftarrow \mathrm{slerp}(A,B;t)
PRUNE A→A // A←PΩ(A)A\leftarrow P_\Omega(A)
REPLACE A,B→A // targeted overwrite (domain-specific)
MAP A,B→A // context/domain adapt (domain-specific)
INVERT A→A // reverse key relations (optional)

Invariant: normalize after every cognitive op.

1.3 Execution loop (core solver)

Repeat:

pick y\*∈arg⁡min⁡y∈CD(A,y)y^\*\in \arg\min_{y\in C} D(A,y) (nearest element/exemplar of CC)
MERGE A,y*→A with step tt
PRUNE A→A
COMPARE A,GOAL (or compute F(A)F(A))
Stop when SIM or step budget.

2) Corrected, complete proof

Lemma 1 (geodesic shrink). For y\*=PC(x)y^\*=P_C(x):
D(slerp(x,y\*;t),y\*)=(1−t) D(x,y\*)D(\mathrm{slerp}(x,y^\*;t),y^\*)=(1-t)\,D(x,y^\*).
So F(slerp(x,y\*;t))≤(1−t)F(x)F(\mathrm{slerp}(x,y^\*;t))\le (1-t)F(x).

Lemma 2 (projection used correctly). For any y∈Ωy\in\Omega:
D(PΩ(z),y)≤D(z,y)D(P_\Omega(z),y)\le D(z,y).
(We compare to y\*∈C⊆Ωy^\*\in C\subseteq\Omega, so this applies.)

Theorem (contraction to feasible goals). Define
T(x):=PΩ(slerp(x,PC(x);t))T(x):=P_\Omega(\mathrm{slerp}(x, P_C(x);t)). Then F(T(x))≤(1−t) F(x).F(T(x))\le(1-t)\,F(x).

Proof. Lemma 1 gives the (1−t)(1-t) shrink toward y\*∈Cy^\*\in C. Lemma 2 says projecting cannot increase distance to y\*y^\*. Combine. □

Corollary (linear rate). F(xk)≤(1−t)kF(x0)F(x_k)\le(1-t)^k F(x_0). Steps to F≤εF\le\varepsilon:
k≥log⁡(ε/F(x0))/log⁡(1−t)k\ge \log(\varepsilon/F(x_0))/\log(1-t).

Biased fusion (score improvement). If ff is geodesically convex on Ω\Omega, then
f(slerp(x,y;t))≤(1−t)f(x)+tf(y)f(\mathrm{slerp}(x,y;t))\le (1-t)f(x)+t f(y).
If f(y)<f(x)f(y)<f(x) and t>0t>0, small tt gives strict decrease.

Approximate target bound. If you slerp toward y∈Cy\in C with D(PC(x),y)≤δD(P_C(x),y)\le\delta:
F(PΩ(slerp(x,y;t)))≤(1−t)F(x)+tδF(P_\Omega(\mathrm{slerp}(x,y;t)))\le(1-t)F(x)+t\delta.
(Near-nearest in CC is fine; δ→0\delta\to0 recovers exact contraction.)

Projected descent version. If you minimize a geodesically convex ff over Ω\Omega via
xk+1=PΩ(slerp(xk,yk;tk))x_{k+1}=P_\Omega(\mathrm{slerp}(x_k,y_k;t_k)) with f(yk)≤f(xk)f(y_k)\le f(x_k) and standard step rules (backtracking or diminishing tkt_k), then f(xk)f(x_k) decreases and converges to the minimizer set of ff on Ω\Omega.

Where it breaks (explicit).
(i) Ω\Omega or GG not hemispherical/geo-convex → projection can be multivalued/non-Lipschitz; no guarantee.
(ii) MERGE not geodesic (e.g., renormed linear mix) → (1−t)(1-t) factor can fail.
Safe fallback: wrap each outer step with accept-if-better: keep old xx unless FF (or ff) decreases.

3) Practical projection & pruning

Model Ω as intersection of spherical halfspaces (caps): each linear constraint a⊤x≥ba^\top x\ge b induces a cap on Sd−1S^{d-1}.
Compute PΩP_\Omega: alternating projections (POCS/Dykstra) on the sphere (use log/exp maps or closed forms for cap projection).
Landscape editing: after improving best value τ\tau, add an elevation cap that rejects points with score ≥τ\ge \tau; this tightens Ω and speeds convergence (still convex if caps remain hemispherical).

4) Warped geometry (optional, proof-preserving)

Use an SPD metric M≻0M\succ0 learned from successful deltas (whitening). Work on the MM-sphere with MM-slerp and MM-projection. Via an isometry to the standard sphere, all results above carry over. Guard: keep MM well-conditioned; rollback any warp step that doesn’t reduce FF.

5) What to measure (to close the loop)

Rate check: on synthetic caps, plot log⁡F(xk)\log F(x_k); slope ≈ log⁡(1−t)\log(1-t).
Slerp vs renorm-avg: show monotonicity breaks with renorm-avg; none with slerp.
Approx-target: vary δ\delta; verify Fk+1≤(1−t)Fk+tδF_{k+1}\le(1-t)F_k+t\delta.
Projection implementation: outer steps never increase FF when using accept-if-better; inner POCS iterations reduce distance to Ω.
Warped metric: same monotone behavior; fewer steps to reach ε\varepsilon.

6) Open questions (the to-do list)

Best geometry: sphere vs learned MM vs product manifolds for real “idea” vectors.
Ω construction: cap recipes that capture real constraints but stay hemispherical.
Target selection: nearest in CC vs exemplar routing vs ANN over a goal hull; sensitivity to δ\delta.
Decoder coupling: map converged vectors to text—retrieval vs constrained generation—without breaking guarantees.
ISA minimality: are REPLACE/MAP/INVERT actually needed, or do MERGE/PRUNE + selection suffice?
Noise robustness: encoder/decoder drift; how small must tt be to keep empirical monotonicity?
Global search: multi-start + occasional MERGE between basins to escape shallow components while keeping Ω convex.

7) TL;DR

Under the hemisphere + convexity assumptions and with MERGE = slerp and PRUNE = projection, the ICPU loop is Fejér-monotone to C=Ω∩GC=\Omega\cap G with linear rate (1−t)(1-t). Biased fusion improves convex scores. Approximations are safe with accept-if-better. Everything else is engineering.