VeriNum · ak-2485 · May 30, 2026 · May 30, 2026 · May 31, 2026 · May 31, 2026
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,9 @@ C/bandmat.v
 C/build_csr.v
 C/cholesky.v
 C/densemat.v
+C/cblas/ddot.v
+C/cblas/dasum.v
+C/cblas/dscal.v
 Makefile.coq
 Makefile.coq.conf
 CoqMakefile

diff --git a/C/cblas/README.md b/C/cblas/README.md
@@ -0,0 +1,105 @@
+# LAProof × GSL CBLAS
+
+This directory connects **real-world BLAS code** to LAProof's verified floating-point
+accuracy results.
+
+## Scope
+
+LAProof proves floating-point accuracy theorems (forward error bounds, etc.) about
+*functional models* of basic linear-algebra programs (idealized list/matrix functions
+that compute a result with a precisely specified sequence of IEEE-754 operations).
+Those theorems live in [`accuracy_proofs/`](../../accuracy_proofs/), independent of any
+particular C implementation.
+
+The goal of this directory is the other half of the story: take a **real-world BLAS
+implementation** and *prove* that the compiled C code implements one of those
+functional models. We use the CBLAS layer of the **GNU Scientific Library (GSL)** as the
+implementation under verification.
+
+The pipeline for each routine is:
+
+1. Take the GSL CBLAS source (under [`src/`](src/)) essentially verbatim, with only the
+   minimal edits needed to run CompCert's `clightgen` on it.
+2. Generate the Clight AST (a `*.v` file) with `clightgen`.
+3. Define the **functional model** the C loop literally computes (in a `*_model.v` file),
+   and prove it equals (up to `feq`) the LAProof accuracy model over which the error
+   theorems are stated.
+4. Write a VST **funspec** (in a `spec_*.v` file) whose postcondition relates the returned
+   value to the LAProof model.
+5. **Verify** the C function against that funspec with VST (in a `verif_*.v` file).
+
+Once a routine is verified against a funspec phrased in terms of a LAProof model, the
+corresponding accuracy theorem can be transferred to the value computed by the compiled C
+code. The verification does *not* prove the accuracy theorem's preconditions on its own,
+however. To apply the bound to the C result, one must separately establish that those
+preconditions hold, and they fall into two kinds.
+
+- **Already guaranteed by the funspec precondition.** Some of the theorem's preconditions
+  are exactly things the funspec `PRE` already requires, so they hold for free (e.g. a
+  length precondition that the funspec already imposes on the input arrays).
+- **Additional conditions one must discharge.** Other preconditions are not established by
+  the funspec and must be proved on the side. A typical example is a no-overflow assumption
+  that the accumulated result is finite: this does *not* follow from the funspec merely
+  requiring the inputs to be finite, since a combination of finite values can still
+  overflow to an infinity. Whoever applies the bound must supply this separately.
+
+So the honest statement is: for inputs satisfying both the funspec `PRE` and the accuracy
+theorem's own hypotheses, the bound holds of the C return value. In many cases (such as the
+dot products and other reduction loops verified here) the one hypothesis the funspec does
+not already supply is a *no-overflow* condition: that the accumulated result is finite.
+
+For `cblas_ddot` that leftover precondition is exactly `Hfin`: that the accumulated result
+`dotprodF X Y` is finite. It does not follow merely from the inputs being finite, since a
+sum of finite products can still round to an infinity.
+
+The `feq` postcondition cooperates here: the funspec only guarantees that the C result is
+`feq`-equal to the model value (equality up to `±0` and exceptional values), but once the result is finite,
+`feq` upgrades to real-value equality, so the bound on the model value is literally a bound
+on the compiled C output.
+
+When stating the accuracy corollary for the C function, you choose how to carry `Hfin`:
+
+- **Conditional corollary.** Keep `Hfin` as an explicit hypothesis. The bound then holds
+  whenever the result is in fact finite, and whoever applies the corollary discharges
+  `Hfin` for their particular inputs. (`Hfin` is a proof obligation about the result,
+  not a runtime check in the C code.)
+- **Unconditional corollary.** Strengthen the funspec `PRE` with magnitude/length bounds
+  that imply finiteness. LAProof's [`finite_sum_from_bounded`](../../accuracy_proofs/sum_is_finite.v)
+  and the `fun_bnd_le` lemmas turn such bounds into a proof of `Hfin`, so the corollary
+  carries no extra hypothesis.
+
+### GSL CBLAS upstream
+
+- GSL project home: <https://www.gnu.org/software/gsl/>
+- GSL source repository (Savannah git): <https://git.savannah.gnu.org/cgit/gsl.git>
+- CBLAS directory in the repo: <https://git.savannah.gnu.org/cgit/gsl.git/tree/cblas>
+- Shared BLAS kernels (`source_*.h`) used by the CBLAS routines: <https://git.savannah.gnu.org/cgit/gsl.git/tree/blas>
+
+GSL is licensed under the GNU GPL; the files under [`src/`](src/) retain their upstream
+copyright and license headers.
+
+## What we've verified so far
+
+| Operation | Precision | Function | Files |
+|-----------|-----------|----------|-------|
+| Dot product | double | `cblas_ddot` | [`src/ddot.c`](src/ddot.c), [`src/source_dot_r.h`](src/source_dot_r.h), [`include/cblas.h`](include/cblas.h), [`ddot.v`](ddot.v), [`ddot_model.v`](ddot_model.v), [`spec_ddot.v`](spec_ddot.v), [`verif_ddot.v`](verif_ddot.v) |
+| Sum of absolute values | double | `cblas_dasum` | [`src/dasum.c`](src/dasum.c), [`src/source_asum_r.h`](src/source_asum_r.h), [`include/cblas.h`](include/cblas.h), [`dasum.v`](dasum.v), [`asum_model.v`](asum_model.v), [`spec_dasum.v`](spec_dasum.v), [`verif_dasum.v`](verif_dasum.v) |
+| Scalar multiply (in place) | double | `cblas_dscal` | [`src/dscal.c`](src/dscal.c), [`src/source_scal_r.h`](src/source_scal_r.h), [`include/cblas.h`](include/cblas.h), [`dscal.v`](dscal.v), [`scal_model.v`](scal_model.v), [`spec_dscal.v`](spec_dscal.v), [`verif_dscal.v`](verif_dscal.v) |
+
+**Scope limits (`cblas_ddot`, `cblas_dasum`, `cblas_dscal`):**
+- **Unit stride only** (`incX = incY = 1`), as a deliberate first milestone.
+- For the **reductions** (`ddot`, `dasum`) the postcondition is stated up to `feq`.
+- `cblas_dasum` is verified against `sumF (map BABS X)`; its loop body calls `fabs`,
+  discharged with VSTlib's `fabs_spec` (whose result is `BABS`). The matching accuracy
+  theorem is `sum_acc.fSUM`.
+- `cblas_dscal` is the first **in-place** routine: it overwrites the array (writable share,
+  `void` return) and is verified against an *exact* model `scal_model alpha X =
+  map (fun x => BMULT x alpha) X` (no `feq` required here). The
+  per-element accuracy is a direct unit-roundoff bound on `BMULT`.
+
+## Conventions
+
+This directory mirrors the model/spec/verif split used elsewhere in
+[`C/`](../) (e.g. the sparse routines' `sparse_model.v` / `spec_sparse.v` /
+`verif_sparse.v`): pure model functions and their lemmas in `*_model.v`, VST funspecs in
+`spec_*.v`, and `semax_body` proofs in `verif_*.v`.
diff --git a/C/cblas/asum_model.v b/C/cblas/asum_model.v
@@ -0,0 +1,83 @@
+(**  * LAProof.C.cblas.asum_model: functional model of GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)
+
+(** This file provides [asum_model], the partial accumulation that the
+    [cblas_dasum] loop invariant tracks, with the *step* lemma (how one loop
+    iteration extends the accumulation) and the *end* lemma (the result of the
+    accumulation once the loop finishes).  GSL's kernel ([source_asum_r.h]) is
+<<
+      double r = 0.0;
+      for (i = 0; i < N; i++) { r += fabs(X[ix]); ix += incX; }
+      return r;
+>>
+    i.e., it adds [|X[i]|] left to right into the accumulator ([BPLUS acc |x|]),
+    starting from [+0.0].  The C [fabs] computes [BABS] (the result of VSTlib's
+    [fabs_spec], whose [ff_func] is [BABS]), so the model uses [BABS].
+
+    The matching LAProof accuracy model is [sum_model.sumF] applied to
+    [map BABS X] (the absolute values); over it [sum_acc.fSUM] proves the
+    forward-error bound.  The C's operand order ([BPLUS acc x]
+    vs [sumF]'s [BPLUS x acc]) and initial accumulator ([+0.0] = [Zconst Tdouble 0]
+    vs [sumF]'s [neg_zero]) differ, so the end/bridge lemma relates the two by
+    [feq] (IEEE-equality identifying [±0]/NaN). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [sum_model] only [Require Import]s (not [Export]s) the mathcomp preamble, so
+   this does not pollute the VST namespace with ssreflect notations.  We do NOT
+   [Import common] (it would re-scope [<=]/[<] over [R]); [common.BPLUS_comm] /
+   [common.neg_zero] are used qualified.  [BPLUS_mor], [BABS], the [feq]
+   [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [asum_model X] is the value the C loop *literally* computes: a left fold with
+    the accumulator as the first [BPLUS] operand, separate abs-then-add, starting
+    from [+0.0].  This mirrors the Clight AST in [dasum.v] exactly. *)
+Definition asum_loop (l: list (ftype Tdouble)) : ftype Tdouble :=
+  fold_left (fun acc x => BPLUS acc (BABS x)) l (Zconst Tdouble 0).
+
+Definition asum_model (X: list (ftype Tdouble)) : ftype Tdouble := asum_loop X.
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense asum). *)
+
+Lemma asum_loop_snoc: forall l x,
+  asum_loop (l ++ [x]) = BPLUS (asum_loop l) (BABS x).
+Proof. intros. unfold asum_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending the length-[k] prefix [X[0..k-1]] with the element [X[k]]
+    adds one [BABS] term, with the accumulator as the first [BPLUS] operand --
+    exactly the Clight statement [r = r + fabs(X[k])]. *)
+Lemma asum_model_step: forall (X: list (ftype Tdouble)) k,
+  0 <= k < Zlength X ->
+  asum_model (sublist 0 (k+1) X)
+  = BPLUS (asum_model (sublist 0 k X)) (BABS (Znth k X)).
+Proof.
+  intros X k Hk. unfold asum_model.
+  rewrite (sublist_last_1 0 k X) by lia.
+  rewrite asum_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's summation model
+    [sumF (map BABS X)] up to [feq].  Generalized-accumulator induction using
+    [common.BPLUS_comm] and [BPLUS_mor], with the [feq (+0.0) (-0.0)] base case
+    ([sumF] starts at [neg_zero]). *)
+Lemma asum_model_feq_sumF: forall (X: list (ftype Tdouble)),
+  feq (asum_model X) (sumF (map BABS X)).
+Proof.
+  intros X. unfold asum_model, asum_loop, sumF.
+  assert (Hs: feq (Zconst Tdouble 0) (@common.neg_zero Tdouble)) by reflexivity.
+  revert Hs.
+  generalize (Zconst Tdouble 0).
+  generalize (@common.neg_zero Tdouble).
+  induction X as [|x X IH]; intros g f Hs.
+  - (* X = nil: both folds are the (feq-linked) initial accumulators. *)
+    simpl. exact Hs.
+  - simpl. apply IH.
+    unfold Basics.flip.
+    etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/ddot_model.v b/C/cblas/ddot_model.v
@@ -0,0 +1,122 @@
+(**  * LAProof.C.cblas.ddot_model: functional model of GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)
+
+(** This file provides [ddot_model], the partial accumulation that the
+    [cblas_ddot] loop invariant tracks, with the *step* lemma (how one loop
+    iteration extends the accumulation) and the *end* lemma (the result of the
+    accumulation once the loop finishes).  No VST [funspec]s or Clight ASTs
+    appear here -- those live in [spec_ddot] and [verif_ddot].  GSL's loop
+    (after macro expansion of [source_dot_r.h]) is
+<<
+      double r = 0.0;
+      for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
+      return r;
+>>
+    i.e., it adds [X[i]*Y[i]] left to right into the accumulator
+    ([BPLUS acc prod]), using separate multiply-then-add (not fused
+    multiply-add), starting from [+0.0].
+
+    The matching LAProof accuracy model is [dotprodF]
+    ([dotprodF = dotprod BMULT BPLUS pos_zero]) of
+    [LAProof.accuracy_proofs.dotprod_model]; over it
+    [dot_acc.dotprod_forward_error] proves the forward-error bound -- *not* the
+    FMA-based [floatlib.dotprod].  The C's operand order ([BPLUS acc prod] vs
+    [dotprodF]'s [BPLUS prod acc]) and initial accumulator ([+0.0] =
+    [Zconst Tdouble 0] vs [dotprodF]'s [pos_zero]) differ, so the end/bridge
+    lemma relates the two by [feq] (IEEE addition is commutative up to [feq]). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
+   so this does not pollute the VST namespace with ssreflect notations.  We do
+   NOT [Import common]: that would bring mathcomp's order/ring notations into
+   scope (making [<=]/[<] parse over [R] and breaking [ddot_model_step]).
+   Instead [common.BPLUS_comm] / [common.pos_zero] are used qualified below;
+   [BPLUS_mor] and the [feq] [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [ddot_model X Y] is the value the C loop *literally* computes: a left fold
+    with the accumulator as the first [BPLUS] operand, separate multiply then
+    add, starting from [+0.0].  This mirrors the Clight AST in [ddot.v] exactly. *)
+Definition ddot_loop (xy: list (ftype Tdouble * ftype Tdouble)) : ftype Tdouble :=
+  fold_left (fun acc p => BPLUS acc (BMULT (fst p) (snd p))) xy (Zconst Tdouble 0).
+
+Definition ddot_model (X Y: list (ftype Tdouble)) : ftype Tdouble :=
+  ddot_loop (combine X Y).
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense ddot). *)
+
+Lemma combine_app_eqlen {A B} (l1 l1': list A) (l2 l2': list B):
+  length l1 = length l2 ->
+  combine (l1 ++ l1') (l2 ++ l2') = combine l1 l2 ++ combine l1' l2'.
+Proof.
+  revert l2; induction l1; destruct l2; simpl; intros; try discriminate; auto.
+  f_equal; auto.
+Qed.
+
+Lemma ddot_loop_snoc: forall xy p,
+  ddot_loop (xy ++ [p]) = BPLUS (ddot_loop xy) (BMULT (fst p) (snd p)).
+Proof. intros. unfold ddot_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending both length-[k] prefixes [X[0..k-1]]/[Y[0..k-1]] with the
+    elements [X[k]]/[Y[k]] adds one [BMULT] term, with the accumulator as the
+    first [BPLUS] operand -- exactly the Clight statement [r = r + X[k]*Y[k]]. *)
+Lemma ddot_model_step: forall (X Y: list (ftype Tdouble)) k,
+  Zlength X = Zlength Y -> 0 <= k < Zlength X ->
+  ddot_model (sublist 0 (k+1) X) (sublist 0 (k+1) Y)
+  = BPLUS (ddot_model (sublist 0 k X) (sublist 0 k Y))
+          (BMULT (Znth k X) (Znth k Y)).
+Proof.
+  intros X Y k Hlen Hk. unfold ddot_model.
+  rewrite (sublist_last_1 0 k X) by lia.
+  rewrite (sublist_last_1 0 k Y) by lia.
+  rewrite combine_app_eqlen
+    by (apply Nat2Z.inj; rewrite <- !Zlength_correct, !Zlength_sublist by lia; lia).
+  cbn [combine].   (* combine [Znth k X] [Znth k Y] = [(Znth k X, Znth k Y)] *)
+  rewrite ddot_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's accuracy model
+    [dotprodF] up to [feq].  They differ only by the [BPLUS] operand order
+    ([BPLUS acc prod] here vs [BPLUS prod acc] in [dotprodF]'s fold step), so the
+    proof is a generalized-accumulator induction using [common.BPLUS_comm]
+    (feq (BPLUS x y) (BPLUS y x)) and the [BPLUS_mor] feq-congruence, plus the
+    fact that [List.combine = seq.zip] on equal-length lists and that the two
+    initial zeros ([Zconst Tdouble 0] vs [pos_zero]) are [feq].  This is the one
+    remaining model obligation; once discharged, [dot_acc.dotprod_forward_error]
+    transfers to the C result. *)
+Lemma ddot_model_feq_dotprodF:
+  forall (X Y: list (ftype Tdouble)),
+    Zlength X = Zlength Y ->
+    Forall finite X -> Forall finite Y ->
+    feq (ddot_model X Y) (dotprodF X Y).
+Proof.
+  intros X Y Hlen _ _.
+  assert (Hl: length X = length Y)
+    by (apply Nat2Z.inj; rewrite <- !Zlength_correct; exact Hlen).
+  clear Hlen.
+  unfold ddot_model, ddot_loop, dotprodF, dotprod.
+  (* The two folds differ only by the initial accumulator ([Zconst Tdouble 0] vs
+     [pos_zero], which are [feq]) and the [BPLUS] operand order at each step.
+     Generalize both accumulators, keeping them [feq]-linked, and induct. *)
+  assert (Hs: feq (Zconst Tdouble 0) (@common.pos_zero Tdouble)) by reflexivity.
+  revert Y Hl Hs.
+  generalize (Zconst Tdouble 0).
+  generalize (@common.pos_zero Tdouble).
+  induction X as [|x X IH]; intros g f Y Hl Hs.
+  - (* X = nil (so Y = nil): both folds are the (feq-linked) initial accs. *)
+    destruct Y as [|y Y']; [ | simpl in Hl; discriminate ].
+    simpl. exact Hs.
+  - destruct Y as [|y Y']; simpl in Hl; [ discriminate | ].
+    simpl.
+    apply IH.
+    + congruence.
+    + (* one accumulation step: [BPLUS f prod] vs [flip BPLUS g prod = BPLUS prod g] *)
+      simpl. unfold Basics.flip.
+      etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/include/cblas.h b/C/cblas/include/cblas.h
@@ -0,0 +1,34 @@
+#define INDEX int
+#define OFFSET(N, incX) ((incX) > 0 ?  0 : ((N) - 1) * (-(incX)))
+#define BLAS_ERROR(x)  cblas_xerbla(0, __FILE__, x); 
+
+#define CONJUGATE(x) ((x) == CblasConjTrans)
+#define TRANSPOSE(x) ((x) == CblasTrans || (x) == CblasConjTrans)
+#define UPPER(x) ((x) == CblasUpper)
+#define LOWER(x) ((x) == CblasLower)
+
+/* Handling of packed complex types... */
+
+#define REAL(a,i) (((BASE *) a)[2*(i)])
+#define IMAG(a,i) (((BASE *) a)[2*(i)+1])
+
+#define REAL0(a) (((BASE *)a)[0])
+#define IMAG0(a) (((BASE *)a)[1])
+
+#define CONST_REAL(a,i) (((const BASE *) a)[2*(i)])
+#define CONST_IMAG(a,i) (((const BASE *) a)[2*(i)+1])
+
+#define CONST_REAL0(a) (((const BASE *)a)[0])
+#define CONST_IMAG0(a) (((const BASE *)a)[1])
+
+
+#define GB(KU,KL,lda,i,j) ((KU+1+(i-j))*lda + j)
+
+#define TRCOUNT(N,i) ((((i)+1)*(2*(N)-(i)))/2)
+
+/* #define TBUP(N,i,j) */
+/* #define TBLO(N,i,j) */
+
+#define TPUP(N,i,j) (TRCOUNT(N,(i)-1)+(j)-(i))
+#define TPLO(N,i,j) (((i)*((i)+1))/2 + (j))
+
diff --git a/C/cblas/scal_model.v b/C/cblas/scal_model.v
@@ -0,0 +1,37 @@
+(**  * LAProof.C.cblas.scal_model: functional model of GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)
+
+(** This file provides [scal_model], the elementwise scaling that [cblas_dscal]
+    computes.  Scaling is a deterministic *elementwise* update with no
+    accumulation, so there is no loop invariant tracking a running accumulator
+    and no [feq] bridge -- the C program computes the model exactly.  GSL's
+    kernel ([source_scal_r.h]) is
+<<
+      for (i = 0; i < N; i++) { X[ix] *= alpha; ix += incX; }
+>>
+    i.e., [X[ix] *= alpha] is [BMULT (X[ix]) alpha] (element-first), so the model
+    is the list map below, matching what the C program computes.
+
+    LAProof's existing scale model [mv_mathcomp.scalemx] is alpha-first
+    ([map_mx (BMULT a)]) and matrix-based; relating to it would need a [BMULT]
+    commutativity lemma (not in LAProof) plus a vector/matrix reshape.  That is
+    unnecessary here: the per-element accuracy (relative error <= unit roundoff)
+    is exactly vcfloat's correctly-rounded-[BMULT] bound. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Definition scal_model (alpha: ftype Tdouble) (X: list (ftype Tdouble)) : list (ftype Tdouble) :=
+  map (fun x => BMULT x alpha) X.
+
+Lemma Zlength_scal_model: forall alpha X,
+  Zlength (scal_model alpha X) = Zlength X.
+Proof. intros. unfold scal_model. rewrite Zlength_map. reflexivity. Qed.
+
+Lemma Znth_scal_model: forall alpha X k,
+  0 <= k < Zlength X ->
+  Znth k (scal_model alpha X) = BMULT (Znth k X) alpha.
+Proof. intros. unfold scal_model. rewrite Znth_map by lia. reflexivity. Qed.