0) Clean assumptions (so the math is valid)
- Space: unit sphere Sd−1S^{d-1} with angular distance D(x,y)=arccos(x⋅y)D(x,y)=\arccos(x\cdot y).
- Sets: Ω\Omega (feasible) and GG (goal) are nonempty, closed, spherically convex, and both lie inside a common open hemisphere.
- Feasible-goal set: C:=Ω∩G≠∅C:=\Omega\cap G\neq\varnothing.
- MERGE: geodesic interpolation slerp(x,y;t)\mathrm{slerp}(x,y;t), t∈(0,1]t\in(0,1].
- PRUNE: metric projection PΩ(x)=argminz∈ΩD(x,z)P_\Omega(x)=\arg\min_{z\in\Omega}D(x,z).
- Monitor: F(x):=dist(x,C)F(x):=\mathrm{dist}(x,C).
(Hemisphere → unique shortest geodesics + single-valued 1-Lipschitz projections. We compare to points in C⊆ΩC\subseteq\Omega, never outside Ω\Omega.)
1) ICPU — minimal spec
1.1 Registers & memory
- Regs:
A, B, C, GOAL
(unit vectors). - Status:
SIM/DIFF/ZERO/VIOLATION
. - Concept memory: addressable unit vectors.
- Program/IP: instruction list + pointer.
1.2 ISA (10 ops)
DataLOAD A,[addr]
• STORE A,[addr]
• MOVE A,B
Compare/ControlCOMPARE A,GOAL
→ set SIM/DIFF
vs threshold θBRANCH_IF SIM, addr
• JUMP addr
• HALT
CognitiveMERGE A,B→A
// A←slerp(A,B;t)A\leftarrow \mathrm{slerp}(A,B;t)PRUNE A→A
// A←PΩ(A)A\leftarrow P_\Omega(A)REPLACE A,B→A
// targeted overwrite (domain-specific)MAP A,B→A
// context/domain adapt (domain-specific)INVERT A→A
// reverse key relations (optional)
Invariant: normalize after every cognitive op.
1.3 Execution loop (core solver)
Repeat:
- pick y\*∈argminy∈CD(A,y)y^\*\in \arg\min_{y\in C} D(A,y) (nearest element/exemplar of CC)
MERGE A,y*→A
with step ttPRUNE A→A
COMPARE A,GOAL
(or compute F(A)F(A))
Stop whenSIM
or step budget.
2) Corrected, complete proof
Lemma 1 (geodesic shrink). For y\*=PC(x)y^\*=P_C(x):
D(slerp(x,y\*;t),y\*)=(1−t) D(x,y\*)D(\mathrm{slerp}(x,y^\*;t),y^\*)=(1-t)\,D(x,y^\*).
So F(slerp(x,y\*;t))≤(1−t)F(x)F(\mathrm{slerp}(x,y^\*;t))\le (1-t)F(x).
Lemma 2 (projection used correctly). For any y∈Ωy\in\Omega:
D(PΩ(z),y)≤D(z,y)D(P_\Omega(z),y)\le D(z,y).
(We compare to y\*∈C⊆Ωy^\*\in C\subseteq\Omega, so this applies.)
Theorem (contraction to feasible goals). Define
T(x):=PΩ(slerp(x,PC(x);t))T(x):=P_\Omega(\mathrm{slerp}(x, P_C(x);t)). Then F(T(x))≤(1−t) F(x).F(T(x))\le(1-t)\,F(x).
Proof. Lemma 1 gives the (1−t)(1-t) shrink toward y\*∈Cy^\*\in C. Lemma 2 says projecting cannot increase distance to y\*y^\*. Combine. □
Corollary (linear rate). F(xk)≤(1−t)kF(x0)F(x_k)\le(1-t)^k F(x_0). Steps to F≤εF\le\varepsilon:
k≥log(ε/F(x0))/log(1−t)k\ge \log(\varepsilon/F(x_0))/\log(1-t).
Biased fusion (score improvement). If ff is geodesically convex on Ω\Omega, then
f(slerp(x,y;t))≤(1−t)f(x)+tf(y)f(\mathrm{slerp}(x,y;t))\le (1-t)f(x)+t f(y).
If f(y)<f(x)f(y)<f(x) and t>0t>0, small tt gives strict decrease.
Approximate target bound. If you slerp toward y∈Cy\in C with D(PC(x),y)≤δD(P_C(x),y)\le\delta:
F(PΩ(slerp(x,y;t)))≤(1−t)F(x)+tδF(P_\Omega(\mathrm{slerp}(x,y;t)))\le(1-t)F(x)+t\delta.
(Near-nearest in CC is fine; δ→0\delta\to0 recovers exact contraction.)
Projected descent version. If you minimize a geodesically convex ff over Ω\Omega via
xk+1=PΩ(slerp(xk,yk;tk))x_{k+1}=P_\Omega(\mathrm{slerp}(x_k,y_k;t_k)) with f(yk)≤f(xk)f(y_k)\le f(x_k) and standard step rules (backtracking or diminishing tkt_k), then f(xk)f(x_k) decreases and converges to the minimizer set of ff on Ω\Omega.
Where it breaks (explicit).
(i) Ω\Omega or GG not hemispherical/geo-convex → projection can be multivalued/non-Lipschitz; no guarantee.
(ii) MERGE not geodesic (e.g., renormed linear mix) → (1−t)(1-t) factor can fail.
Safe fallback: wrap each outer step with accept-if-better: keep old xx unless FF (or ff) decreases.
3) Practical projection & pruning
- Model Ω as intersection of spherical halfspaces (caps): each linear constraint a⊤x≥ba^\top x\ge b induces a cap on Sd−1S^{d-1}.
- Compute PΩP_\Omega: alternating projections (POCS/Dykstra) on the sphere (use log/exp maps or closed forms for cap projection).
- Landscape editing: after improving best value τ\tau, add an elevation cap that rejects points with score ≥τ\ge \tau; this tightens Ω and speeds convergence (still convex if caps remain hemispherical).
4) Warped geometry (optional, proof-preserving)
Use an SPD metric M≻0M\succ0 learned from successful deltas (whitening). Work on the MM-sphere with MM-slerp and MM-projection. Via an isometry to the standard sphere, all results above carry over. Guard: keep MM well-conditioned; rollback any warp step that doesn’t reduce FF.
5) What to measure (to close the loop)
- Rate check: on synthetic caps, plot logF(xk)\log F(x_k); slope ≈ log(1−t)\log(1-t).
- Slerp vs renorm-avg: show monotonicity breaks with renorm-avg; none with slerp.
- Approx-target: vary δ\delta; verify Fk+1≤(1−t)Fk+tδF_{k+1}\le(1-t)F_k+t\delta.
- Projection implementation: outer steps never increase FF when using accept-if-better; inner POCS iterations reduce distance to Ω.
- Warped metric: same monotone behavior; fewer steps to reach ε\varepsilon.
6) Open questions (the to-do list)
- Best geometry: sphere vs learned MM vs product manifolds for real “idea” vectors.
- Ω construction: cap recipes that capture real constraints but stay hemispherical.
- Target selection: nearest in CC vs exemplar routing vs ANN over a goal hull; sensitivity to δ\delta.
- Decoder coupling: map converged vectors to text—retrieval vs constrained generation—without breaking guarantees.
- ISA minimality: are
REPLACE/MAP/INVERT
actually needed, or doMERGE/PRUNE
+ selection suffice? - Noise robustness: encoder/decoder drift; how small must tt be to keep empirical monotonicity?
- Global search: multi-start + occasional
MERGE
between basins to escape shallow components while keeping Ω convex.
7) TL;DR
Under the hemisphere + convexity assumptions and with MERGE = slerp and PRUNE = projection, the ICPU loop is Fejér-monotone to C=Ω∩GC=\Omega\cap G with linear rate (1−t)(1-t). Biased fusion improves convex scores. Approximations are safe with accept-if-better. Everything else is engineering.