diff --git a/.gitignore b/.gitignore
index 7014311..9ad04a8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,9 @@ C/bandmat.v
C/build_csr.v
C/cholesky.v
C/densemat.v
+C/cblas/ddot.v
+C/cblas/dasum.v
+C/cblas/dscal.v
Makefile.coq
Makefile.coq.conf
CoqMakefile
diff --git a/C/cblas/README.md b/C/cblas/README.md
new file mode 100644
index 0000000..89d689a
--- /dev/null
+++ b/C/cblas/README.md
@@ -0,0 +1,105 @@
+# LAProof × GSL CBLAS
+
+This directory connects **real-world BLAS code** to LAProof's verified floating-point
+accuracy results.
+
+## Scope
+
+LAProof proves floating-point accuracy theorems (forward error bounds, etc.) about
+*functional models* of basic linear-algebra programs (idealized list/matrix functions
+that compute a result with a precisely specified sequence of IEEE-754 operations).
+Those theorems live in [`accuracy_proofs/`](../../accuracy_proofs/), independent of any
+particular C implementation.
+
+The goal of this directory is the other half of the story: take a **real-world BLAS
+implementation** and *prove* that the compiled C code implements one of those
+functional models. We use the CBLAS layer of the **GNU Scientific Library (GSL)** as the
+implementation under verification.
+
+The pipeline for each routine is:
+
+1. Take the GSL CBLAS source (under [`src/`](src/)) essentially verbatim, with only the
+ minimal edits needed to run CompCert's `clightgen` on it.
+2. Generate the Clight AST (a `*.v` file) with `clightgen`.
+3. Define the **functional model** the C loop literally computes (in a `*_model.v` file),
+ and prove it equals (up to `feq`) the LAProof accuracy model over which the error
+ theorems are stated.
+4. Write a VST **funspec** (in a `spec_*.v` file) whose postcondition relates the returned
+ value to the LAProof model.
+5. **Verify** the C function against that funspec with VST (in a `verif_*.v` file).
+
+Once a routine is verified against a funspec phrased in terms of a LAProof model, the
+corresponding accuracy theorem can be transferred to the value computed by the compiled C
+code. The verification does *not* prove the accuracy theorem's preconditions on its own,
+however. To apply the bound to the C result, one must separately establish that those
+preconditions hold, and they fall into two kinds.
+
+- **Already guaranteed by the funspec precondition.** Some of the theorem's preconditions
+ are exactly things the funspec `PRE` already requires, so they hold for free (e.g. a
+ length precondition that the funspec already imposes on the input arrays).
+- **Additional conditions one must discharge.** Other preconditions are not established by
+ the funspec and must be proved on the side. A typical example is a no-overflow assumption
+ that the accumulated result is finite: this does *not* follow from the funspec merely
+ requiring the inputs to be finite, since a combination of finite values can still
+ overflow to an infinity. Whoever applies the bound must supply this separately.
+
+So the honest statement is: for inputs satisfying both the funspec `PRE` and the accuracy
+theorem's own hypotheses, the bound holds of the C return value. In many cases (such as the
+dot products and other reduction loops verified here) the one hypothesis the funspec does
+not already supply is a *no-overflow* condition: that the accumulated result is finite.
+
+For `cblas_ddot` that leftover precondition is exactly `Hfin`: that the accumulated result
+`dotprodF X Y` is finite. It does not follow merely from the inputs being finite, since a
+sum of finite products can still round to an infinity.
+
+The `feq` postcondition cooperates here: the funspec only guarantees that the C result is
+`feq`-equal to the model value (equality up to `±0` and exceptional values), but once the result is finite,
+`feq` upgrades to real-value equality, so the bound on the model value is literally a bound
+on the compiled C output.
+
+When stating the accuracy corollary for the C function, you choose how to carry `Hfin`:
+
+- **Conditional corollary.** Keep `Hfin` as an explicit hypothesis. The bound then holds
+ whenever the result is in fact finite, and whoever applies the corollary discharges
+ `Hfin` for their particular inputs. (`Hfin` is a proof obligation about the result,
+ not a runtime check in the C code.)
+- **Unconditional corollary.** Strengthen the funspec `PRE` with magnitude/length bounds
+ that imply finiteness. LAProof's [`finite_sum_from_bounded`](../../accuracy_proofs/sum_is_finite.v)
+ and the `fun_bnd_le` lemmas turn such bounds into a proof of `Hfin`, so the corollary
+ carries no extra hypothesis.
+
+### GSL CBLAS upstream
+
+- GSL project home:
+- GSL source repository (Savannah git):
+- CBLAS directory in the repo:
+- Shared BLAS kernels (`source_*.h`) used by the CBLAS routines:
+
+GSL is licensed under the GNU GPL; the files under [`src/`](src/) retain their upstream
+copyright and license headers.
+
+## What we've verified so far
+
+| Operation | Precision | Function | Files |
+|-----------|-----------|----------|-------|
+| Dot product | double | `cblas_ddot` | [`src/ddot.c`](src/ddot.c), [`src/source_dot_r.h`](src/source_dot_r.h), [`include/cblas.h`](include/cblas.h), [`ddot.v`](ddot.v), [`ddot_model.v`](ddot_model.v), [`spec_ddot.v`](spec_ddot.v), [`verif_ddot.v`](verif_ddot.v) |
+| Sum of absolute values | double | `cblas_dasum` | [`src/dasum.c`](src/dasum.c), [`src/source_asum_r.h`](src/source_asum_r.h), [`include/cblas.h`](include/cblas.h), [`dasum.v`](dasum.v), [`asum_model.v`](asum_model.v), [`spec_dasum.v`](spec_dasum.v), [`verif_dasum.v`](verif_dasum.v) |
+| Scalar multiply (in place) | double | `cblas_dscal` | [`src/dscal.c`](src/dscal.c), [`src/source_scal_r.h`](src/source_scal_r.h), [`include/cblas.h`](include/cblas.h), [`dscal.v`](dscal.v), [`scal_model.v`](scal_model.v), [`spec_dscal.v`](spec_dscal.v), [`verif_dscal.v`](verif_dscal.v) |
+
+**Scope limits (`cblas_ddot`, `cblas_dasum`, `cblas_dscal`):**
+- **Unit stride only** (`incX = incY = 1`), as a deliberate first milestone.
+- For the **reductions** (`ddot`, `dasum`) the postcondition is stated up to `feq`.
+- `cblas_dasum` is verified against `sumF (map BABS X)`; its loop body calls `fabs`,
+ discharged with VSTlib's `fabs_spec` (whose result is `BABS`). The matching accuracy
+ theorem is `sum_acc.fSUM`.
+- `cblas_dscal` is the first **in-place** routine: it overwrites the array (writable share,
+ `void` return) and is verified against an *exact* model `scal_model alpha X =
+ map (fun x => BMULT x alpha) X` (no `feq` required here). The
+ per-element accuracy is a direct unit-roundoff bound on `BMULT`.
+
+## Conventions
+
+This directory mirrors the model/spec/verif split used elsewhere in
+[`C/`](../) (e.g. the sparse routines' `sparse_model.v` / `spec_sparse.v` /
+`verif_sparse.v`): pure model functions and their lemmas in `*_model.v`, VST funspecs in
+`spec_*.v`, and `semax_body` proofs in `verif_*.v`.
diff --git a/C/cblas/asum_model.v b/C/cblas/asum_model.v
new file mode 100644
index 0000000..163f505
--- /dev/null
+++ b/C/cblas/asum_model.v
@@ -0,0 +1,83 @@
+(** * LAProof.C.cblas.asum_model: functional model of GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)
+
+(** This file provides [asum_model], the partial accumulation that the
+ [cblas_dasum] loop invariant tracks, with the *step* lemma (how one loop
+ iteration extends the accumulation) and the *end* lemma (the result of the
+ accumulation once the loop finishes). GSL's kernel ([source_asum_r.h]) is
+<<
+ double r = 0.0;
+ for (i = 0; i < N; i++) { r += fabs(X[ix]); ix += incX; }
+ return r;
+>>
+ i.e., it adds [|X[i]|] left to right into the accumulator ([BPLUS acc |x|]),
+ starting from [+0.0]. The C [fabs] computes [BABS] (the result of VSTlib's
+ [fabs_spec], whose [ff_func] is [BABS]), so the model uses [BABS].
+
+ The matching LAProof accuracy model is [sum_model.sumF] applied to
+ [map BABS X] (the absolute values); over it [sum_acc.fSUM] proves the
+ forward-error bound. The C's operand order ([BPLUS acc x]
+ vs [sumF]'s [BPLUS x acc]) and initial accumulator ([+0.0] = [Zconst Tdouble 0]
+ vs [sumF]'s [neg_zero]) differ, so the end/bridge lemma relates the two by
+ [feq] (IEEE-equality identifying [±0]/NaN). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [sum_model] only [Require Import]s (not [Export]s) the mathcomp preamble, so
+ this does not pollute the VST namespace with ssreflect notations. We do NOT
+ [Import common] (it would re-scope [<=]/[<] over [R]); [common.BPLUS_comm] /
+ [common.neg_zero] are used qualified. [BPLUS_mor], [BABS], the [feq]
+ [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [asum_model X] is the value the C loop *literally* computes: a left fold with
+ the accumulator as the first [BPLUS] operand, separate abs-then-add, starting
+ from [+0.0]. This mirrors the Clight AST in [dasum.v] exactly. *)
+Definition asum_loop (l: list (ftype Tdouble)) : ftype Tdouble :=
+ fold_left (fun acc x => BPLUS acc (BABS x)) l (Zconst Tdouble 0).
+
+Definition asum_model (X: list (ftype Tdouble)) : ftype Tdouble := asum_loop X.
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense asum). *)
+
+Lemma asum_loop_snoc: forall l x,
+ asum_loop (l ++ [x]) = BPLUS (asum_loop l) (BABS x).
+Proof. intros. unfold asum_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending the length-[k] prefix [X[0..k-1]] with the element [X[k]]
+ adds one [BABS] term, with the accumulator as the first [BPLUS] operand --
+ exactly the Clight statement [r = r + fabs(X[k])]. *)
+Lemma asum_model_step: forall (X: list (ftype Tdouble)) k,
+ 0 <= k < Zlength X ->
+ asum_model (sublist 0 (k+1) X)
+ = BPLUS (asum_model (sublist 0 k X)) (BABS (Znth k X)).
+Proof.
+ intros X k Hk. unfold asum_model.
+ rewrite (sublist_last_1 0 k X) by lia.
+ rewrite asum_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's summation model
+ [sumF (map BABS X)] up to [feq]. Generalized-accumulator induction using
+ [common.BPLUS_comm] and [BPLUS_mor], with the [feq (+0.0) (-0.0)] base case
+ ([sumF] starts at [neg_zero]). *)
+Lemma asum_model_feq_sumF: forall (X: list (ftype Tdouble)),
+ feq (asum_model X) (sumF (map BABS X)).
+Proof.
+ intros X. unfold asum_model, asum_loop, sumF.
+ assert (Hs: feq (Zconst Tdouble 0) (@common.neg_zero Tdouble)) by reflexivity.
+ revert Hs.
+ generalize (Zconst Tdouble 0).
+ generalize (@common.neg_zero Tdouble).
+ induction X as [|x X IH]; intros g f Hs.
+ - (* X = nil: both folds are the (feq-linked) initial accumulators. *)
+ simpl. exact Hs.
+ - simpl. apply IH.
+ unfold Basics.flip.
+ etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/ddot_model.v b/C/cblas/ddot_model.v
new file mode 100644
index 0000000..ad41b49
--- /dev/null
+++ b/C/cblas/ddot_model.v
@@ -0,0 +1,122 @@
+(** * LAProof.C.cblas.ddot_model: functional model of GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)
+
+(** This file provides [ddot_model], the partial accumulation that the
+ [cblas_ddot] loop invariant tracks, with the *step* lemma (how one loop
+ iteration extends the accumulation) and the *end* lemma (the result of the
+ accumulation once the loop finishes). No VST [funspec]s or Clight ASTs
+ appear here -- those live in [spec_ddot] and [verif_ddot]. GSL's loop
+ (after macro expansion of [source_dot_r.h]) is
+<<
+ double r = 0.0;
+ for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
+ return r;
+>>
+ i.e., it adds [X[i]*Y[i]] left to right into the accumulator
+ ([BPLUS acc prod]), using separate multiply-then-add (not fused
+ multiply-add), starting from [+0.0].
+
+ The matching LAProof accuracy model is [dotprodF]
+ ([dotprodF = dotprod BMULT BPLUS pos_zero]) of
+ [LAProof.accuracy_proofs.dotprod_model]; over it
+ [dot_acc.dotprod_forward_error] proves the forward-error bound -- *not* the
+ FMA-based [floatlib.dotprod]. The C's operand order ([BPLUS acc prod] vs
+ [dotprodF]'s [BPLUS prod acc]) and initial accumulator ([+0.0] =
+ [Zconst Tdouble 0] vs [dotprodF]'s [pos_zero]) differ, so the end/bridge
+ lemma relates the two by [feq] (IEEE addition is commutative up to [feq]). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
+ so this does not pollute the VST namespace with ssreflect notations. We do
+ NOT [Import common]: that would bring mathcomp's order/ring notations into
+ scope (making [<=]/[<] parse over [R] and breaking [ddot_model_step]).
+ Instead [common.BPLUS_comm] / [common.pos_zero] are used qualified below;
+ [BPLUS_mor] and the [feq] [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [ddot_model X Y] is the value the C loop *literally* computes: a left fold
+ with the accumulator as the first [BPLUS] operand, separate multiply then
+ add, starting from [+0.0]. This mirrors the Clight AST in [ddot.v] exactly. *)
+Definition ddot_loop (xy: list (ftype Tdouble * ftype Tdouble)) : ftype Tdouble :=
+ fold_left (fun acc p => BPLUS acc (BMULT (fst p) (snd p))) xy (Zconst Tdouble 0).
+
+Definition ddot_model (X Y: list (ftype Tdouble)) : ftype Tdouble :=
+ ddot_loop (combine X Y).
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense ddot). *)
+
+Lemma combine_app_eqlen {A B} (l1 l1': list A) (l2 l2': list B):
+ length l1 = length l2 ->
+ combine (l1 ++ l1') (l2 ++ l2') = combine l1 l2 ++ combine l1' l2'.
+Proof.
+ revert l2; induction l1; destruct l2; simpl; intros; try discriminate; auto.
+ f_equal; auto.
+Qed.
+
+Lemma ddot_loop_snoc: forall xy p,
+ ddot_loop (xy ++ [p]) = BPLUS (ddot_loop xy) (BMULT (fst p) (snd p)).
+Proof. intros. unfold ddot_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending both length-[k] prefixes [X[0..k-1]]/[Y[0..k-1]] with the
+ elements [X[k]]/[Y[k]] adds one [BMULT] term, with the accumulator as the
+ first [BPLUS] operand -- exactly the Clight statement [r = r + X[k]*Y[k]]. *)
+Lemma ddot_model_step: forall (X Y: list (ftype Tdouble)) k,
+ Zlength X = Zlength Y -> 0 <= k < Zlength X ->
+ ddot_model (sublist 0 (k+1) X) (sublist 0 (k+1) Y)
+ = BPLUS (ddot_model (sublist 0 k X) (sublist 0 k Y))
+ (BMULT (Znth k X) (Znth k Y)).
+Proof.
+ intros X Y k Hlen Hk. unfold ddot_model.
+ rewrite (sublist_last_1 0 k X) by lia.
+ rewrite (sublist_last_1 0 k Y) by lia.
+ rewrite combine_app_eqlen
+ by (apply Nat2Z.inj; rewrite <- !Zlength_correct, !Zlength_sublist by lia; lia).
+ cbn [combine]. (* combine [Znth k X] [Znth k Y] = [(Znth k X, Znth k Y)] *)
+ rewrite ddot_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's accuracy model
+ [dotprodF] up to [feq]. They differ only by the [BPLUS] operand order
+ ([BPLUS acc prod] here vs [BPLUS prod acc] in [dotprodF]'s fold step), so the
+ proof is a generalized-accumulator induction using [common.BPLUS_comm]
+ (feq (BPLUS x y) (BPLUS y x)) and the [BPLUS_mor] feq-congruence, plus the
+ fact that [List.combine = seq.zip] on equal-length lists and that the two
+ initial zeros ([Zconst Tdouble 0] vs [pos_zero]) are [feq]. This is the one
+ remaining model obligation; once discharged, [dot_acc.dotprod_forward_error]
+ transfers to the C result. *)
+Lemma ddot_model_feq_dotprodF:
+ forall (X Y: list (ftype Tdouble)),
+ Zlength X = Zlength Y ->
+ Forall finite X -> Forall finite Y ->
+ feq (ddot_model X Y) (dotprodF X Y).
+Proof.
+ intros X Y Hlen _ _.
+ assert (Hl: length X = length Y)
+ by (apply Nat2Z.inj; rewrite <- !Zlength_correct; exact Hlen).
+ clear Hlen.
+ unfold ddot_model, ddot_loop, dotprodF, dotprod.
+ (* The two folds differ only by the initial accumulator ([Zconst Tdouble 0] vs
+ [pos_zero], which are [feq]) and the [BPLUS] operand order at each step.
+ Generalize both accumulators, keeping them [feq]-linked, and induct. *)
+ assert (Hs: feq (Zconst Tdouble 0) (@common.pos_zero Tdouble)) by reflexivity.
+ revert Y Hl Hs.
+ generalize (Zconst Tdouble 0).
+ generalize (@common.pos_zero Tdouble).
+ induction X as [|x X IH]; intros g f Y Hl Hs.
+ - (* X = nil (so Y = nil): both folds are the (feq-linked) initial accs. *)
+ destruct Y as [|y Y']; [ | simpl in Hl; discriminate ].
+ simpl. exact Hs.
+ - destruct Y as [|y Y']; simpl in Hl; [ discriminate | ].
+ simpl.
+ apply IH.
+ + congruence.
+ + (* one accumulation step: [BPLUS f prod] vs [flip BPLUS g prod = BPLUS prod g] *)
+ simpl. unfold Basics.flip.
+ etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/include/cblas.h b/C/cblas/include/cblas.h
new file mode 100644
index 0000000..b478671
--- /dev/null
+++ b/C/cblas/include/cblas.h
@@ -0,0 +1,34 @@
+#define INDEX int
+#define OFFSET(N, incX) ((incX) > 0 ? 0 : ((N) - 1) * (-(incX)))
+#define BLAS_ERROR(x) cblas_xerbla(0, __FILE__, x);
+
+#define CONJUGATE(x) ((x) == CblasConjTrans)
+#define TRANSPOSE(x) ((x) == CblasTrans || (x) == CblasConjTrans)
+#define UPPER(x) ((x) == CblasUpper)
+#define LOWER(x) ((x) == CblasLower)
+
+/* Handling of packed complex types... */
+
+#define REAL(a,i) (((BASE *) a)[2*(i)])
+#define IMAG(a,i) (((BASE *) a)[2*(i)+1])
+
+#define REAL0(a) (((BASE *)a)[0])
+#define IMAG0(a) (((BASE *)a)[1])
+
+#define CONST_REAL(a,i) (((const BASE *) a)[2*(i)])
+#define CONST_IMAG(a,i) (((const BASE *) a)[2*(i)+1])
+
+#define CONST_REAL0(a) (((const BASE *)a)[0])
+#define CONST_IMAG0(a) (((const BASE *)a)[1])
+
+
+#define GB(KU,KL,lda,i,j) ((KU+1+(i-j))*lda + j)
+
+#define TRCOUNT(N,i) ((((i)+1)*(2*(N)-(i)))/2)
+
+/* #define TBUP(N,i,j) */
+/* #define TBLO(N,i,j) */
+
+#define TPUP(N,i,j) (TRCOUNT(N,(i)-1)+(j)-(i))
+#define TPLO(N,i,j) (((i)*((i)+1))/2 + (j))
+
diff --git a/C/cblas/scal_model.v b/C/cblas/scal_model.v
new file mode 100644
index 0000000..c8ea503
--- /dev/null
+++ b/C/cblas/scal_model.v
@@ -0,0 +1,37 @@
+(** * LAProof.C.cblas.scal_model: functional model of GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)
+
+(** This file provides [scal_model], the elementwise scaling that [cblas_dscal]
+ computes. Scaling is a deterministic *elementwise* update with no
+ accumulation, so there is no loop invariant tracking a running accumulator
+ and no [feq] bridge -- the C program computes the model exactly. GSL's
+ kernel ([source_scal_r.h]) is
+<<
+ for (i = 0; i < N; i++) { X[ix] *= alpha; ix += incX; }
+>>
+ i.e., [X[ix] *= alpha] is [BMULT (X[ix]) alpha] (element-first), so the model
+ is the list map below, matching what the C program computes.
+
+ LAProof's existing scale model [mv_mathcomp.scalemx] is alpha-first
+ ([map_mx (BMULT a)]) and matrix-based; relating to it would need a [BMULT]
+ commutativity lemma (not in LAProof) plus a vector/matrix reshape. That is
+ unnecessary here: the per-element accuracy (relative error <= unit roundoff)
+ is exactly vcfloat's correctly-rounded-[BMULT] bound. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Definition scal_model (alpha: ftype Tdouble) (X: list (ftype Tdouble)) : list (ftype Tdouble) :=
+ map (fun x => BMULT x alpha) X.
+
+Lemma Zlength_scal_model: forall alpha X,
+ Zlength (scal_model alpha X) = Zlength X.
+Proof. intros. unfold scal_model. rewrite Zlength_map. reflexivity. Qed.
+
+Lemma Znth_scal_model: forall alpha X k,
+ 0 <= k < Zlength X ->
+ Znth k (scal_model alpha X) = BMULT (Znth k X) alpha.
+Proof. intros. unfold scal_model. rewrite Znth_map by lia. reflexivity. Qed.
diff --git a/C/cblas/spec_dasum.v b/C/cblas/spec_dasum.v
new file mode 100644
index 0000000..8c2d6f7
--- /dev/null
+++ b/C/cblas/spec_dasum.v
@@ -0,0 +1,39 @@
+(** * LAProof.C.cblas.spec_dasum: VST specification of GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)
+
+(** Sibling of [LAProof.C.cblas.spec_ddot]. [cblas_dasum] computes the sum of
+ absolute values of a vector; the postcondition relates its result (up to
+ [feq]) to LAProof's summation model [sumF] applied to [map BABS X], over
+ which [sum_acc.fSUM] proves the forward-error bound. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From VSTlib Require Import spec_math.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dasum asum_model.
+(* for [sumF]; [Require Import] (not [Export]) keeps ssreflect out of scope. *)
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** ** Funspec for [cblas_dasum] (unit-stride case [incX = 1]). *)
+Definition cblas_dasum_spec :=
+ DECLARE _cblas_dasum
+ WITH shX: share, n: Z, X: list (ftype Tdouble), px: val
+ PRE [ tint, tptr tdouble, tint ]
+ PROP (readable_share shX; Zlength X = n; 0 <= n <= Int.max_signed;
+ Forall finite X)
+ PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1))
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px)
+ POST [ tdouble ]
+ EX s: ftype Tdouble,
+ PROP (feq s (sumF (map BABS X)))
+ RETURN (Vfloat s)
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px).
+
+Definition DasumASI : funspecs := [ cblas_dasum_spec ].
+Definition dasum_imports : funspecs := [ fabs_spec ]. (* fabs is the one external call *)
+Definition Gprog : funspecs := dasum_imports ++ DasumASI.
diff --git a/C/cblas/spec_ddot.v b/C/cblas/spec_ddot.v
new file mode 100644
index 0000000..c3ace94
--- /dev/null
+++ b/C/cblas/spec_ddot.v
@@ -0,0 +1,67 @@
+(** * LAProof.C.cblas.spec_ddot: VST specification of GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)
+
+(** This file connects a real-world BLAS implementation -- GSL's double-precision
+ dot product [cblas_ddot] -- to LAProof's functional dot-product model.
+
+ GSL's loop (after macro expansion of [source_dot_r.h]) is
+<<
+ double r = 0.0;
+ for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
+ return r;
+>>
+ i.e. a forward, left-to-right accumulation using *separate* multiply-then-add
+ (not fused multiply-add), starting from +0.0. That is exactly the
+ [dotprodF] model of [LAProof.accuracy_proofs.dotprod_model]
+ ([dotprodF = dotprod BMULT BPLUS pos_zero]), over which the forward-error
+ theorem [dot_acc.dotprod_forward_error] is proved -- *not* the FMA-based
+ [floatlib.dotprod].
+
+ Note on operand order: the C statement [r += X*Y] computes [BPLUS acc prod]
+ (accumulator first), whereas [dotprodF]'s fold step is [BPLUS prod acc]
+ (product first). IEEE addition is commutative *up to [feq]*, so we state the
+ postcondition with [feq] -- the same convention used by
+ [LAProof.C.spec_sparse] for [csr_row_vector_multiply_spec]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import ddot ddot_model.
+(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
+ so this does not pollute the VST namespace with ssreflect notations. We need
+ it here for [dotprodF], which the postcondition is stated against. *)
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** The partial-accumulation model [ddot_model] and its start/step/end lemmas
+ live in [LAProof.C.cblas.ddot_model] (mirroring the [sparse_model] /
+ [spec_sparse] split). This file states only the VST [funspec]. *)
+
+(** ** Funspec for [cblas_ddot] (unit-stride case [incX = incY = 1]). *)
+(** Unit stride is a deliberate first milestone, not a fundamental limit. *)
+Definition cblas_ddot_spec :=
+ DECLARE _cblas_ddot
+ WITH shX: share, shY: share, n: Z,
+ X: list (ftype Tdouble), Y: list (ftype Tdouble),
+ px: val, py: val
+ PRE [ tint, tptr tdouble, tint, tptr tdouble, tint ]
+ PROP (readable_share shX; readable_share shY;
+ Zlength X = n; Zlength Y = n; 0 <= n <= Int.max_signed;
+ Forall finite X; Forall finite Y)
+ PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1); py; Vint (Int.repr 1))
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+ data_at shY (tarray tdouble n) (map Vfloat Y) py)
+ POST [ tdouble ]
+ EX s: ftype Tdouble,
+ PROP (feq s (dotprodF X Y))
+ RETURN (Vfloat s)
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+ data_at shY (tarray tdouble n) (map Vfloat Y) py).
+
+Definition DdotASI : funspecs := [ cblas_ddot_spec ].
+Definition ddot_imports : funspecs := []. (* no external calls: the loop is self-contained *)
+Definition Gprog : funspecs := ddot_imports ++ DdotASI.
diff --git a/C/cblas/spec_dscal.v b/C/cblas/spec_dscal.v
new file mode 100644
index 0000000..c2db868
--- /dev/null
+++ b/C/cblas/spec_dscal.v
@@ -0,0 +1,35 @@
+(** * LAProof.C.cblas.spec_dscal: VST specification of GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)
+
+(** [cblas_dscal] scales a vector in place by [alpha]. Because scaling is
+ deterministic elementwise, the postcondition is stated as an *exact* equality
+ (the array is left holding [scal_model alpha X]); no [feq]/[EX] is needed,
+ unlike the reduction routines [ddot]/[dasum]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dscal scal_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** ** Funspec for [cblas_dscal] (unit-stride case [incX = 1]). *)
+(** Unit stride is a first milestone, not a fundamental limit.*)
+Definition cblas_dscal_spec :=
+ DECLARE _cblas_dscal
+ WITH sh: share, n: Z, alpha: ftype Tdouble, X: list (ftype Tdouble), px: val
+ PRE [ tint, tdouble, tptr tdouble, tint ]
+ PROP (writable_share sh; Zlength X = n; 0 <= n <= Int.max_signed)
+ PARAMS (Vint (Int.repr n); Vfloat alpha; px; Vint (Int.repr 1))
+ SEP (data_at sh (tarray tdouble n) (map Vfloat X) px)
+ POST [ tvoid ]
+ PROP ()
+ RETURN ()
+ SEP (data_at sh (tarray tdouble n) (map Vfloat (scal_model alpha X)) px).
+
+Definition DscalASI : funspecs := [ cblas_dscal_spec ].
+Definition dscal_imports : funspecs := []. (* no external calls *)
+Definition Gprog : funspecs := dscal_imports ++ DscalASI.
diff --git a/C/cblas/src/dasum.c b/C/cblas/src/dasum.c
new file mode 100644
index 0000000..ec02c99
--- /dev/null
+++ b/C/cblas/src/dasum.c
@@ -0,0 +1,24 @@
+//ldoc on
+/**
+ * # `dasum.c`: GSL CBLAS real double-precision sum of absolute values
+ *
+ * Thin wrapper that instantiates the generic kernel `source_asum_r.h` at type
+ * `double`. Verified in `LAProof.C.cblas.verif_dasum` against the functional
+ * model `LAProof.C.cblas.asum_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+ unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute
+ nothing to cblas_dasum's body beyond the libc fabs, which we obtain directly
+ from (as the other LAProof C sources do, e.g. densemat.c). */
+/* #include */
+/* #include */
+#include
+#include "cblas.h"
+
+double
+cblas_dasum (const int N, const double *X, const int incX)
+{
+#define BASE double
+#include "source_asum_r.h"
+#undef BASE
+}
diff --git a/C/cblas/src/ddot.c b/C/cblas/src/ddot.c
new file mode 100644
index 0000000..45e8c19
--- /dev/null
+++ b/C/cblas/src/ddot.c
@@ -0,0 +1,27 @@
+//ldoc on
+/**
+ * # `ddot.c`: GSL CBLAS real double-precision dot product
+ *
+ * Thin wrapper that instantiates the generic kernel `source_dot_r.h` at type
+ * `double`. Verified in `LAProof.C.cblas.verif_ddot` against the functional
+ * model `LAProof.C.cblas.ddot_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+ unused by cblas_ddot's body (only OFFSET/INDEX from "cblas.h" are needed)
+ and are unresolvable here because GSL's gsl/ symlink dir is not generated. */
+/* #include */
+/* #include */
+#include "cblas.h"
+
+double
+cblas_ddot (const int N, const double *X, const int incX, const double *Y,
+ const int incY)
+{
+#define INIT_VAL 0.0
+#define ACC_TYPE double
+#define BASE double
+#include "source_dot_r.h"
+#undef ACC_TYPE
+#undef BASE
+#undef INIT_VAL
+}
diff --git a/C/cblas/src/dscal.c b/C/cblas/src/dscal.c
new file mode 100644
index 0000000..4b2e0e6
--- /dev/null
+++ b/C/cblas/src/dscal.c
@@ -0,0 +1,23 @@
+//ldoc on
+/**
+ * # `dscal.c`: GSL CBLAS real double-precision scaling (in place)
+ *
+ * Thin wrapper that instantiates the generic kernel `source_scal_r.h` at type
+ * `double`. Verified in `LAProof.C.cblas.verif_dscal` against the functional
+ * model `LAProof.C.cblas.scal_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+ unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute
+ nothing to cblas_dscal's body (only INDEX from "cblas.h" is needed; the loop
+ uses no libc math). */
+/* #include */
+/* #include */
+#include "cblas.h"
+
+void
+cblas_dscal (const int N, const double alpha, double *X, const int incX)
+{
+#define BASE double
+#include "source_scal_r.h"
+#undef BASE
+}
diff --git a/C/cblas/src/source_asum_r.h b/C/cblas/src/source_asum_r.h
new file mode 100644
index 0000000..65a3515
--- /dev/null
+++ b/C/cblas/src/source_asum_r.h
@@ -0,0 +1,43 @@
+/* blas/source_asum_r.h
+ *
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Accumulation kernel (`source_asum_r.h`)
+ *
+ * A forward, left-to-right accumulation of `fabs(X[ix])` into `r`, starting
+ * from `+0.0`. The `incX <= 0` guard returns `0` for nonpositive stride; with
+ * unit stride (`incX = 1`) it is dead and `ix` advances one element per
+ * iteration.
+ */
+{
+ BASE r = 0.0;
+ INDEX i;
+ INDEX ix = 0;
+
+ if (incX <= 0) {
+ return 0;
+ }
+
+ for (i = 0; i < N; i++) {
+ r += fabs(X[ix]);
+ ix += incX;
+ }
+ return r;
+}
diff --git a/C/cblas/src/source_dot_r.h b/C/cblas/src/source_dot_r.h
new file mode 100644
index 0000000..0775c77
--- /dev/null
+++ b/C/cblas/src/source_dot_r.h
@@ -0,0 +1,41 @@
+/* blas/source_dot_r.h
+ *
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Accumulation kernel (`source_dot_r.h`)
+ *
+ * A forward, left-to-right accumulation with separate multiply-then-add (not
+ * fused). With unit stride (`incX = incY = 1`) the `OFFSET` conditionals are
+ * `0` and the indices `ix`, `iy` advance one element per iteration.
+ */
+{
+ ACC_TYPE r = INIT_VAL;
+ INDEX i;
+ INDEX ix = OFFSET(N, incX);
+ INDEX iy = OFFSET(N, incY);
+
+ for (i = 0; i < N; i++) {
+ r += X[ix] * Y[iy];
+ ix += incX;
+ iy += incY;
+ }
+
+ return r;
+}
diff --git a/C/cblas/src/source_scal_r.h b/C/cblas/src/source_scal_r.h
new file mode 100644
index 0000000..4bc0fe9
--- /dev/null
+++ b/C/cblas/src/source_scal_r.h
@@ -0,0 +1,40 @@
+/* blas/source_scal_r.h
+ *
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Scaling kernel (`source_scal_r.h`)
+ *
+ * An in-place elementwise update `X[ix] *= alpha` with no accumulation. The
+ * `incX <= 0` guard returns early for nonpositive stride; with unit stride
+ * (`incX = 1`) it is dead and `ix` advances one element per iteration.
+ */
+{
+ INDEX i;
+ INDEX ix = 0;
+
+ if (incX <= 0) {
+ return;
+ }
+
+ for (i = 0; i < N; i++) {
+ X[ix] *= alpha;
+ ix += incX;
+ }
+}
diff --git a/C/cblas/verif_dasum.v b/C/cblas/verif_dasum.v
new file mode 100644
index 0000000..0938c66
--- /dev/null
+++ b/C/cblas/verif_dasum.v
@@ -0,0 +1,60 @@
+(** * LAProof.C.cblas.verif_dasum: VST proof for GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c]. *)
+
+(** [body_cblas_dasum] proves that the compiled C [cblas_dasum] (unit stride)
+ refines LAProof's summation model: its result is [feq]-equal to
+ [sumF (map BABS X)]. The loop invariant carries the partial accumulation
+ [asum_model (sublist 0 k X)]; the step and bridge reasoning that discharges
+ it lives in [LAProof.C.cblas.asum_model]. The loop body makes an [fabs]
+ [forward_call], and a dead [if (incX<=0) return 0;] guard precedes the loop
+ (unreachable since [incX = 1]). *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From VSTlib Require Import spec_math.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dasum asum_model spec_dasum.
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_dasum: semax_body Vprog Gprog f_cblas_dasum cblas_dasum_spec.
+Proof.
+start_function.
+forward. (* r = 0.0 *)
+forward. (* ix = 0 *)
+(* if (incX <= 0) return 0; -- dead, incX = 1 *)
+forward_if.
+{ rep_lia. } (* then-branch: incX <= 0 contradicts incX = 1 *)
+(* accumulation loop *)
+forward_for_simple_bound (Zlength X)
+ (EX k:Z,
+ PROP ()
+ LOCAL (temp _r (Vfloat (asum_model (sublist 0 k X)));
+ temp _ix (Vint (Int.repr k));
+ temp _N (Vint (Int.repr (Zlength X)));
+ temp _incX (Vint (Int.repr 1));
+ temp _X px)
+ SEP (data_at shX (tarray tdouble (Zlength X)) (map Vfloat X) px))%assert.
+- (* entry (k = 0): asum_model nil = +0.0 *)
+ entailer!!.
+- (* body: t = X[ix]; r += fabs(t); ix += incX *)
+ forward. (* t'2 = X[ix] *)
+ forward_call (Znth i X). (* t'1 = fabs(X[i]) = BABS (Znth i X) *)
+ forward. (* r = r + t'1 *)
+ forward. (* ix = ix + incX *)
+ entailer!!.
+ rewrite (asum_model_step X i) by list_solve.
+ reflexivity.
+- (* exit (k = Zlength X): return r = asum_model X *)
+ forward. (* return r *)
+ Exists (asum_model (sublist 0 (Zlength X) X)).
+ entailer!!.
+ rewrite sublist_same by lia.
+ apply asum_model_feq_sumF.
+Qed.
+
+(** Payoff (stub): because the result is [feq]-equal to [sumF (map BABS X)], the
+ forward-error bound [LAProof.accuracy_proofs.sum_acc.fSUM] (applied to the
+ list [map BABS X]) bounds the C result's error whenever that result is finite
+ (the no-overflow hypothesis [Hfin]; see [C/cblas/README.md]). *)
diff --git a/C/cblas/verif_ddot.v b/C/cblas/verif_ddot.v
new file mode 100644
index 0000000..1ac765a
--- /dev/null
+++ b/C/cblas/verif_ddot.v
@@ -0,0 +1,94 @@
+(** * LAProof.C.cblas.verif_ddot: VST proof for GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c]. *)
+
+(** [body_cblas_ddot] proves that the compiled C [cblas_ddot] (unit stride)
+ refines the [dotprodF] accuracy model: its result is [feq]-equal to
+ [dotprodF X Y]. The loop invariant carries the partial accumulation
+ [ddot_model (sublist 0 k X) (sublist 0 k Y)]; the step and bridge reasoning
+ that discharges it lives in [LAProof.C.cblas.ddot_model]. Two dead [OFFSET]
+ guards (the [incX>0]/[incY>0] conditionals) precede the loop, unreachable
+ since [incX = incY = 1]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import ddot ddot_model spec_ddot.
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_ddot: semax_body Vprog Gprog f_cblas_ddot cblas_ddot_spec.
+Proof.
+start_function.
+(* C: r = 0.0; ix = OFFSET(N,incX); iy = OFFSET(N,incY); with incX=incY=1 ⇒ ix=iy=0.
+ Each OFFSET is a (incX>0 ? 0 : ...) conditional; the else-branch is
+ unreachable here because incX=incY=1, so [rep_lia] closes it. *)
+forward. (* _r = 0.0 *)
+forward_if (PROP ()
+ LOCAL (temp _t'1 (Vint (Int.repr 0));
+ temp _r (Vfloat (Float.of_bits (Int64.repr 0)));
+ temp _N (Vint (Int.repr n)); temp _X px;
+ temp _incX (Vint (Int.repr 1)); temp _Y py;
+ temp _incY (Vint (Int.repr 1)))
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+ data_at shY (tarray tdouble n) (map Vfloat Y) py)).
+{ forward. entailer!!. } (* incX = 1 > 0 : t'1 := 0 *)
+{ rep_lia. } (* dead else-branch (incX = 1 is > 0) *)
+forward. (* _ix = _t'1 (= 0) *)
+forward_if (PROP ()
+ LOCAL (temp _t'2 (Vint (Int.repr 0));
+ temp _ix (Vint (Int.repr 0));
+ temp _r (Vfloat (Float.of_bits (Int64.repr 0)));
+ temp _N (Vint (Int.repr n)); temp _X px;
+ temp _incX (Vint (Int.repr 1)); temp _Y py;
+ temp _incY (Vint (Int.repr 1)))
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+ data_at shY (tarray tdouble n) (map Vfloat Y) py)).
+{ forward. entailer!!. } (* incY = 1 > 0 : t'2 := 0 *)
+{ rep_lia. } (* dead else-branch (incY = 1 is > 0) *)
+forward. (* _iy = _t'2 (= 0) *)
+(* The accumulation loop. Invariant: _r holds the value the C loop has
+ accumulated over the first k element-pairs, namely [ddot_model] of the
+ length-k prefixes; the strided indices ix,iy advance by 1 each step (= k). *)
+forward_for_simple_bound n
+ (EX k:Z,
+ PROP ()
+ LOCAL (temp _r (Vfloat (ddot_model (sublist 0 k X) (sublist 0 k Y)));
+ temp _ix (Vint (Int.repr k));
+ temp _iy (Vint (Int.repr k));
+ temp _N (Vint (Int.repr n));
+ temp _incX (Vint (Int.repr 1));
+ temp _incY (Vint (Int.repr 1));
+ temp _X px; temp _Y py)
+ SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+ data_at shY (tarray tdouble n) (map Vfloat Y) py))%assert.
+- (* Loop entry (k = 0). [sublist 0 0 = nil] and [ddot_model nil nil =
+ Zconst Tdouble 0] (= the C [+0.0]); [entailer!!] discharges it. *)
+ entailer!!.
+- (* Loop body: read X[i], Y[i]; [r = r + X[i]*Y[i]]; ix++; iy++. Re-establish
+ the invariant for [i+1] via the step lemma [ddot_model_step]. *)
+ forward. (* t'3 = X[ix] *)
+ forward. (* t'4 = Y[iy] *)
+ forward. (* r = r + t'3 * t'4 *)
+ forward. (* ix = ix + incX *)
+ forward. (* iy = iy + incY *)
+ assert (Hi: 0 <= i < Zlength X) by list_solve.
+ assert (Hxy: Zlength X = Zlength Y) by list_solve.
+ entailer!.
+ rewrite (ddot_model_step X Y i Hxy Hi).
+ reflexivity.
+- (* Loop exit (k = n). Return [r = ddot_model (sublist 0 n X) (sublist 0 n Y)];
+ [sublist_same] gives [= ddot_model X Y], and the bridge lemma
+ [ddot_model_feq_dotprodF] discharges [feq _ (dotprodF X Y)]. *)
+ forward. (* return r *)
+ Exists (ddot_model (sublist 0 (Zlength X) X) (sublist 0 (Zlength X) Y)).
+ entailer!.
+ rewrite (sublist_same 0 (Zlength X) X) by lia.
+ rewrite (sublist_same 0 (Zlength X) Y) by lia.
+ apply ddot_model_feq_dotprodF; [ lia | auto | auto ].
+Qed.
+
+(** Payoff (stub): because [cblas_ddot_spec]'s result is [feq]-equal to
+ [dotprodF X Y], the forward-error bound
+ [LAProof.accuracy_proofs.dot_acc.dotprod_forward_error] applies directly to
+ the value computed by the compiled C [cblas_ddot]. *)
diff --git a/C/cblas/verif_dscal.v b/C/cblas/verif_dscal.v
new file mode 100644
index 0000000..2651116
--- /dev/null
+++ b/C/cblas/verif_dscal.v
@@ -0,0 +1,85 @@
+(** * LAProof.C.cblas.verif_dscal: VST proof for GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c]. *)
+
+(** [body_cblas_dscal] proves that after the compiled C [cblas_dscal] (unit
+ stride) the array holds exactly [scal_model alpha X] (= [map (fun x => BMULT
+ x alpha) X]). This is an in-place routine: a writable array with a store in
+ the loop, whose invariant is "scaled prefix ++ original suffix"; the per-step
+ [upd_Znth]/[sublist] identity is discharged by [list_solve]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dscal scal_model spec_dscal.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_dscal: semax_body Vprog Gprog f_cblas_dscal cblas_dscal_spec.
+Proof.
+start_function.
+forward. (* ix = 0 *)
+forward_if.
+{ rep_lia. } (* dead: incX <= 0 contradicts incX = 1 *)
+(* Invariant: the array holds a list [L] of length [Zlength X] whose first [k]
+ entries are already scaled ([Znth j X * alpha]) and whose remaining entries
+ are still the original [X]. Keeping the contents as [map Vfloat L] for a
+ *plain* list [L] lets the load and store simplify; the pointwise [Znth]
+ constraints make each step a simple [upd_Znth_same]/[upd_Znth_diff] case. *)
+forward_for_simple_bound (Zlength X)
+ (EX k:Z, EX L:list (ftype Tdouble),
+ PROP (Zlength L = Zlength X;
+ forall j, 0 <= j < k -> Znth j L = (Znth j X * alpha)%F64;
+ forall j, k <= j < Zlength X -> Znth j L = Znth j X)
+ LOCAL (temp _alpha (Vfloat alpha);
+ temp _ix (Vint (Int.repr k));
+ temp _N (Vint (Int.repr (Zlength X)));
+ temp _incX (Vint (Int.repr 1));
+ temp _X px)
+ SEP (data_at sh (tarray tdouble (Zlength X)) (map Vfloat L) px))%assert.
+- (* entry (k = 0): L := X *)
+ Exists X. entailer!!.
+- (* body: t = X[ix]; X[ix] = t * alpha; ix += incX *)
+ Intros.
+ (* Re-assert the invariant facts under explicit names so they survive the
+ [forward]s (which clear the auto-named PROP hypotheses). *)
+ assert (Hlen: Zlength L = Zlength X) by assumption.
+ assert (Hpre: forall j, 0 <= j < i -> Znth j L = (Znth j X * alpha)%F64) by assumption.
+ assert (Hsuf: forall j, i <= j < Zlength X -> Znth j L = Znth j X) by assumption.
+ assert (Hi: 0 <= i < Zlength L) by lia.
+ assert (HLX: Znth i L = Znth i X) by (apply Hsuf; lia).
+ forward. (* t'1 = X[ix] *)
+ forward. (* X[ix] = t'1 * alpha *)
+ forward. (* ix += incX *)
+ Exists (upd_Znth i L (BMULT (Znth i X) alpha)).
+ entailer!!.
+ 2: { (* SEP: the stored value [X[i]*alpha] is the model element [BMULT .. alpha] *)
+ apply derives_refl'. f_equal. rewrite <- upd_Znth_map. f_equal.
+ rewrite ?Znth_map by lia.
+ replace (@Znth float Inhabitant_float i L)
+ with (@Znth float Inhabitant_float i X) by (symmetry; apply HLX).
+ reflexivity. }
+ (* the three pointwise list invariants for the updated list *)
+ split3.
+ { list_solve. }
+ { intros j Hj. destruct (Z.eq_dec j i) as [->|Hne].
+ { rewrite upd_Znth_same by lia. reflexivity. }
+ { rewrite upd_Znth_diff by lia. apply Hpre; lia. } }
+ { intros j Hj. rewrite upd_Znth_diff by lia. apply Hsuf; lia. }
+- (* exit (k = Zlength X): the whole array now holds [scal_model alpha X] *)
+ Intros L.
+ assert (Hpre: forall j, 0 <= j < Zlength X -> Znth j L = (Znth j X * alpha)%F64)
+ by assumption.
+ assert (Hlen: Zlength L = Zlength X) by assumption.
+ assert (L = scal_model alpha X). {
+ apply Znth_eq_ext; [ rewrite Zlength_scal_model; lia | ].
+ intros j Hj. rewrite Hlen in Hj.
+ rewrite Znth_scal_model by lia. rewrite Hpre by lia. reflexivity. }
+ subst L. entailer!!.
+Qed.
+
+(** Payoff: each element [BMULT (Znth k X) alpha] is the correctly-rounded
+ product, so when that product is finite its relative error vs the exact
+ product is bounded by the unit roundoff (vcfloat's [BMULT] accuracy lemma).
+ A product can still overflow, but the finiteness (no-overflow) requirement is
+ per element -- each [X[k]*alpha] independently -- rather than a single global
+ hypothesis over an accumulation. *)
diff --git a/Makefile b/Makefile
index 574bac7..99355c4 100644
--- a/Makefile
+++ b/Makefile
@@ -20,6 +20,7 @@ clean:
if [ -e Makefile.coq ]; then $(MAKE) -f Makefile.coq cleanall; fi
$(RM) Makefile.coq Makefile.coq.conf
$(RM) C/sparse.v C/cholesky.v C/alloc.v C/densemat.v C/bandmat.v
+ $(RM) C/cblas/ddot.v C/cblas/dasum.v C/cblas/dscal.v
INSTALLFILES1 ?= $(shell awk '/accuracy_proofs/{print $$NF"o"}' _CoqProject)
diff --git a/Makefile.coq.local b/Makefile.coq.local
index 56581ac..d395271 100644
--- a/Makefile.coq.local
+++ b/Makefile.coq.local
@@ -2,7 +2,7 @@
COQDOCEXTRAFLAGS= -g --no-lib-name --with-header header.html --index genindex --lib-subtitles --interpolate -utf8
accuracy: accuracy_proofs/export.vo
-C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo
+C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo C/cblas/verif_ddot.vo C/cblas/verif_dasum.vo C/cblas/verif_dscal.vo
# Ugh. We can't just store the reference copy of index.html in html/index.html, because Makefile.coq's
# standard "make clean" removes the html directory entirely. So we gotta do this hack:
@@ -31,7 +31,8 @@ install-extra:: $(CFILES) $(HFILES)
# Quarto documentation of C programs
LDOC= lua cdoc/ldoc.lua
-CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd
+CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd \
+ cdoc/cblas_ddot.qmd cdoc/cblas_dasum.qmd cdoc/cblas_dscal.qmd
cdoc/%.html: cdoc/%.qmd
cd cdoc; quarto render $(notdir $^) --to html --quiet
@@ -42,6 +43,14 @@ cdoc/%.qmd: C/include/%.h C/src/%.c
cdoc/build_csr.qmd: C/include/sparse.h C/src/build_csr.c
$(LDOC) -highlight c -p quarto -o $@ C/src/build_csr.c
+# GSL CBLAS routines: document the wrapper .c together with its source_*_r.h kernel.
+cdoc/cblas_ddot.qmd: C/cblas/src/ddot.c C/cblas/src/source_dot_r.h
+ $(LDOC) -highlight c -p quarto -o $@ $^
+cdoc/cblas_dasum.qmd: C/cblas/src/dasum.c C/cblas/src/source_asum_r.h
+ $(LDOC) -highlight c -p quarto -o $@ $^
+cdoc/cblas_dscal.qmd: C/cblas/src/dscal.c C/cblas/src/source_scal_r.h
+ $(LDOC) -highlight c -p quarto -o $@ $^
+
cdocs: $(patsubst %.qmd,%.html,$(CDOCS))
@@ -49,6 +58,10 @@ cdocs: $(patsubst %.qmd,%.html,$(CDOCS))
C/%.v: C/src/%.c
clightgen -normalize -IC/include $< -o $@
+# cblas subdirectory: GSL BLAS routines ported into LAProof
+C/cblas/%.v: C/cblas/src/%.c
+ clightgen -normalize -IC/cblas/include $< -o $@
+
# the rest of this is adapted from the makedepend part of ../Makefile
C/sparse.v: C/include/sparse.h
@@ -57,3 +70,6 @@ C/alloc.v: C/include/alloc.h
C/bandmat.v: C/include/densemat.h C/include/bandmat.h
C/densemat.v: C/include/densemat.h
C/build_csr.v: C/include/sparse.h
+C/cblas/ddot.v: C/cblas/include/cblas.h C/cblas/src/source_dot_r.h
+C/cblas/dasum.v: C/cblas/include/cblas.h C/cblas/src/source_asum_r.h
+C/cblas/dscal.v: C/cblas/include/cblas.h C/cblas/src/source_scal_r.h
diff --git a/_CoqProject b/_CoqProject
index a404827..f5e4fb2 100644
--- a/_CoqProject
+++ b/_CoqProject
@@ -47,3 +47,18 @@ C/build_csr.v
C/verif_build_csr.v
C/distinct.v
C/partial_csr.v
+
+C/cblas/ddot.v
+C/cblas/ddot_model.v
+C/cblas/spec_ddot.v
+C/cblas/verif_ddot.v
+
+C/cblas/dasum.v
+C/cblas/asum_model.v
+C/cblas/spec_dasum.v
+C/cblas/verif_dasum.v
+
+C/cblas/dscal.v
+C/cblas/scal_model.v
+C/cblas/spec_dscal.v
+C/cblas/verif_dscal.v
diff --git a/index.html b/index.html
index 6cfba60..db2e33a 100644
--- a/index.html
+++ b/index.html
@@ -62,6 +62,7 @@ C programs
bandmat.[c,h]: Symmetric band matrix operations
sparse.[c,h]: CSR (Compressed Sparse Row) matrix operations
build_csr.c: CSR (Compressed Sparse Row) matrix construction
+ddot.c, dasum.c, dscal.c: GSL CBLAS Level-1 vector routines
Algorithm Accuracy Proofs
@@ -163,6 +164,21 @@ C Program Correctness Proofs
Verification of COO->CSR conversion
Verified Software Unit
+ BLAS Level-1 (GSL CBLAS): ddot.c, dasum.c, dscal.c
+