diff --git a/.gitignore b/.gitignore
index 7014311..9ad04a8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,9 @@ C/bandmat.v
 C/build_csr.v
 C/cholesky.v
 C/densemat.v
+C/cblas/ddot.v
+C/cblas/dasum.v
+C/cblas/dscal.v
 Makefile.coq
 Makefile.coq.conf
 CoqMakefile
diff --git a/C/cblas/README.md b/C/cblas/README.md
new file mode 100644
index 0000000..89d689a
--- /dev/null
+++ b/C/cblas/README.md
@@ -0,0 +1,105 @@
+# LAProof × GSL CBLAS
+
+This directory connects **real-world BLAS code** to LAProof's verified floating-point
+accuracy results.
+
+## Scope
+
+LAProof proves floating-point accuracy theorems (forward error bounds, etc.) about
+*functional models* of basic linear-algebra programs (idealized list/matrix functions
+that compute a result with a precisely specified sequence of IEEE-754 operations).
+Those theorems live in [`accuracy_proofs/`](../../accuracy_proofs/), independent of any
+particular C implementation.
+
+The goal of this directory is the other half of the story: take a **real-world BLAS
+implementation** and *prove* that the compiled C code implements one of those
+functional models. We use the CBLAS layer of the **GNU Scientific Library (GSL)** as the
+implementation under verification.
+
+The pipeline for each routine is:
+
+1. Take the GSL CBLAS source (under [`src/`](src/)) essentially verbatim, with only the
+   minimal edits needed to run CompCert's `clightgen` on it.
+2. Generate the Clight AST (a `*.v` file) with `clightgen`.
+3. Define the **functional model** the C loop literally computes (in a `*_model.v` file),
+   and prove it equals (up to `feq`) the LAProof accuracy model over which the error
+   theorems are stated.
+4. Write a VST **funspec** (in a `spec_*.v` file) whose postcondition relates the returned
+   value to the LAProof model.
+5. **Verify** the C function against that funspec with VST (in a `verif_*.v` file).
+
+Once a routine is verified against a funspec phrased in terms of a LAProof model, the
+corresponding accuracy theorem can be transferred to the value computed by the compiled C
+code. The verification does *not* prove the accuracy theorem's preconditions on its own,
+however. To apply the bound to the C result, one must separately establish that those
+preconditions hold, and they fall into two kinds.
+
+- **Already guaranteed by the funspec precondition.** Some of the theorem's preconditions
+  are exactly things the funspec `PRE` already requires, so they hold for free (e.g. a
+  length precondition that the funspec already imposes on the input arrays).
+- **Additional conditions one must discharge.** Other preconditions are not established by
+  the funspec and must be proved on the side. A typical example is a no-overflow assumption
+  that the accumulated result is finite: this does *not* follow from the funspec merely
+  requiring the inputs to be finite, since a combination of finite values can still
+  overflow to an infinity. Whoever applies the bound must supply this separately.
+
+So the honest statement is: for inputs satisfying both the funspec `PRE` and the accuracy
+theorem's own hypotheses, the bound holds of the C return value. In many cases (such as the
+dot products and other reduction loops verified here) the one hypothesis the funspec does
+not already supply is a *no-overflow* condition: that the accumulated result is finite.
+
+For `cblas_ddot` that leftover precondition is exactly `Hfin`: that the accumulated result
+`dotprodF X Y` is finite. It does not follow merely from the inputs being finite, since a
+sum of finite products can still round to an infinity.
+
+The `feq` postcondition cooperates here: the funspec only guarantees that the C result is
+`feq`-equal to the model value (equality up to `±0` and exceptional values), but once the result is finite,
+`feq` upgrades to real-value equality, so the bound on the model value is literally a bound
+on the compiled C output.
+
+When stating the accuracy corollary for the C function, you choose how to carry `Hfin`:
+
+- **Conditional corollary.** Keep `Hfin` as an explicit hypothesis. The bound then holds
+  whenever the result is in fact finite, and whoever applies the corollary discharges
+  `Hfin` for their particular inputs. (`Hfin` is a proof obligation about the result,
+  not a runtime check in the C code.)
+- **Unconditional corollary.** Strengthen the funspec `PRE` with magnitude/length bounds
+  that imply finiteness. LAProof's [`finite_sum_from_bounded`](../../accuracy_proofs/sum_is_finite.v)
+  and the `fun_bnd_le` lemmas turn such bounds into a proof of `Hfin`, so the corollary
+  carries no extra hypothesis.
+
+### GSL CBLAS upstream
+
+- GSL project home: <https://www.gnu.org/software/gsl/>
+- GSL source repository (Savannah git): <https://git.savannah.gnu.org/cgit/gsl.git>
+- CBLAS directory in the repo: <https://git.savannah.gnu.org/cgit/gsl.git/tree/cblas>
+- Shared BLAS kernels (`source_*.h`) used by the CBLAS routines: <https://git.savannah.gnu.org/cgit/gsl.git/tree/blas>
+
+GSL is licensed under the GNU GPL; the files under [`src/`](src/) retain their upstream
+copyright and license headers.
+
+## What we've verified so far
+
+| Operation | Precision | Function | Files |
+|-----------|-----------|----------|-------|
+| Dot product | double | `cblas_ddot` | [`src/ddot.c`](src/ddot.c), [`src/source_dot_r.h`](src/source_dot_r.h), [`include/cblas.h`](include/cblas.h), [`ddot.v`](ddot.v), [`ddot_model.v`](ddot_model.v), [`spec_ddot.v`](spec_ddot.v), [`verif_ddot.v`](verif_ddot.v) |
+| Sum of absolute values | double | `cblas_dasum` | [`src/dasum.c`](src/dasum.c), [`src/source_asum_r.h`](src/source_asum_r.h), [`include/cblas.h`](include/cblas.h), [`dasum.v`](dasum.v), [`asum_model.v`](asum_model.v), [`spec_dasum.v`](spec_dasum.v), [`verif_dasum.v`](verif_dasum.v) |
+| Scalar multiply (in place) | double | `cblas_dscal` | [`src/dscal.c`](src/dscal.c), [`src/source_scal_r.h`](src/source_scal_r.h), [`include/cblas.h`](include/cblas.h), [`dscal.v`](dscal.v), [`scal_model.v`](scal_model.v), [`spec_dscal.v`](spec_dscal.v), [`verif_dscal.v`](verif_dscal.v) |
+
+**Scope limits (`cblas_ddot`, `cblas_dasum`, `cblas_dscal`):**
+- **Unit stride only** (`incX = incY = 1`), as a deliberate first milestone.
+- For the **reductions** (`ddot`, `dasum`) the postcondition is stated up to `feq`.
+- `cblas_dasum` is verified against `sumF (map BABS X)`; its loop body calls `fabs`,
+  discharged with VSTlib's `fabs_spec` (whose result is `BABS`). The matching accuracy
+  theorem is `sum_acc.fSUM`.
+- `cblas_dscal` is the first **in-place** routine: it overwrites the array (writable share,
+  `void` return) and is verified against an *exact* model `scal_model alpha X =
+  map (fun x => BMULT x alpha) X` (no `feq` required here). The
+  per-element accuracy is a direct unit-roundoff bound on `BMULT`.
+
+## Conventions
+
+This directory mirrors the model/spec/verif split used elsewhere in
+[`C/`](../) (e.g. the sparse routines' `sparse_model.v` / `spec_sparse.v` /
+`verif_sparse.v`): pure model functions and their lemmas in `*_model.v`, VST funspecs in
+`spec_*.v`, and `semax_body` proofs in `verif_*.v`.
diff --git a/C/cblas/asum_model.v b/C/cblas/asum_model.v
new file mode 100644
index 0000000..163f505
--- /dev/null
+++ b/C/cblas/asum_model.v
@@ -0,0 +1,83 @@
+(**  * LAProof.C.cblas.asum_model: functional model of GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)
+
+(** This file provides [asum_model], the partial accumulation that the
+    [cblas_dasum] loop invariant tracks, with the *step* lemma (how one loop
+    iteration extends the accumulation) and the *end* lemma (the result of the
+    accumulation once the loop finishes).  GSL's kernel ([source_asum_r.h]) is
+<<
+      double r = 0.0;
+      for (i = 0; i < N; i++) { r += fabs(X[ix]); ix += incX; }
+      return r;
+>>
+    i.e., it adds [|X[i]|] left to right into the accumulator ([BPLUS acc |x|]),
+    starting from [+0.0].  The C [fabs] computes [BABS] (the result of VSTlib's
+    [fabs_spec], whose [ff_func] is [BABS]), so the model uses [BABS].
+
+    The matching LAProof accuracy model is [sum_model.sumF] applied to
+    [map BABS X] (the absolute values); over it [sum_acc.fSUM] proves the
+    forward-error bound.  The C's operand order ([BPLUS acc x]
+    vs [sumF]'s [BPLUS x acc]) and initial accumulator ([+0.0] = [Zconst Tdouble 0]
+    vs [sumF]'s [neg_zero]) differ, so the end/bridge lemma relates the two by
+    [feq] (IEEE-equality identifying [±0]/NaN). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [sum_model] only [Require Import]s (not [Export]s) the mathcomp preamble, so
+   this does not pollute the VST namespace with ssreflect notations.  We do NOT
+   [Import common] (it would re-scope [<=]/[<] over [R]); [common.BPLUS_comm] /
+   [common.neg_zero] are used qualified.  [BPLUS_mor], [BABS], the [feq]
+   [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [asum_model X] is the value the C loop *literally* computes: a left fold with
+    the accumulator as the first [BPLUS] operand, separate abs-then-add, starting
+    from [+0.0].  This mirrors the Clight AST in [dasum.v] exactly. *)
+Definition asum_loop (l: list (ftype Tdouble)) : ftype Tdouble :=
+  fold_left (fun acc x => BPLUS acc (BABS x)) l (Zconst Tdouble 0).
+
+Definition asum_model (X: list (ftype Tdouble)) : ftype Tdouble := asum_loop X.
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense asum). *)
+
+Lemma asum_loop_snoc: forall l x,
+  asum_loop (l ++ [x]) = BPLUS (asum_loop l) (BABS x).
+Proof. intros. unfold asum_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending the length-[k] prefix [X[0..k-1]] with the element [X[k]]
+    adds one [BABS] term, with the accumulator as the first [BPLUS] operand --
+    exactly the Clight statement [r = r + fabs(X[k])]. *)
+Lemma asum_model_step: forall (X: list (ftype Tdouble)) k,
+  0 <= k < Zlength X ->
+  asum_model (sublist 0 (k+1) X)
+  = BPLUS (asum_model (sublist 0 k X)) (BABS (Znth k X)).
+Proof.
+  intros X k Hk. unfold asum_model.
+  rewrite (sublist_last_1 0 k X) by lia.
+  rewrite asum_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's summation model
+    [sumF (map BABS X)] up to [feq].  Generalized-accumulator induction using
+    [common.BPLUS_comm] and [BPLUS_mor], with the [feq (+0.0) (-0.0)] base case
+    ([sumF] starts at [neg_zero]). *)
+Lemma asum_model_feq_sumF: forall (X: list (ftype Tdouble)),
+  feq (asum_model X) (sumF (map BABS X)).
+Proof.
+  intros X. unfold asum_model, asum_loop, sumF.
+  assert (Hs: feq (Zconst Tdouble 0) (@common.neg_zero Tdouble)) by reflexivity.
+  revert Hs.
+  generalize (Zconst Tdouble 0).
+  generalize (@common.neg_zero Tdouble).
+  induction X as [|x X IH]; intros g f Hs.
+  - (* X = nil: both folds are the (feq-linked) initial accumulators. *)
+    simpl. exact Hs.
+  - simpl. apply IH.
+    unfold Basics.flip.
+    etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/ddot_model.v b/C/cblas/ddot_model.v
new file mode 100644
index 0000000..ad41b49
--- /dev/null
+++ b/C/cblas/ddot_model.v
@@ -0,0 +1,122 @@
+(**  * LAProof.C.cblas.ddot_model: functional model of GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)
+
+(** This file provides [ddot_model], the partial accumulation that the
+    [cblas_ddot] loop invariant tracks, with the *step* lemma (how one loop
+    iteration extends the accumulation) and the *end* lemma (the result of the
+    accumulation once the loop finishes).  No VST [funspec]s or Clight ASTs
+    appear here -- those live in [spec_ddot] and [verif_ddot].  GSL's loop
+    (after macro expansion of [source_dot_r.h]) is
+<<
+      double r = 0.0;
+      for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
+      return r;
+>>
+    i.e., it adds [X[i]*Y[i]] left to right into the accumulator
+    ([BPLUS acc prod]), using separate multiply-then-add (not fused
+    multiply-add), starting from [+0.0].
+
+    The matching LAProof accuracy model is [dotprodF]
+    ([dotprodF = dotprod BMULT BPLUS pos_zero]) of
+    [LAProof.accuracy_proofs.dotprod_model]; over it
+    [dot_acc.dotprod_forward_error] proves the forward-error bound -- *not* the
+    FMA-based [floatlib.dotprod].  The C's operand order ([BPLUS acc prod] vs
+    [dotprodF]'s [BPLUS prod acc]) and initial accumulator ([+0.0] =
+    [Zconst Tdouble 0] vs [dotprodF]'s [pos_zero]) differ, so the end/bridge
+    lemma relates the two by [feq] (IEEE addition is commutative up to [feq]). *)
+
+Require Import VST.floyd.proofauto.
+Require Import vcfloat.VCFloat.
+Require Import vcfloat.FPStdCompCert.
+From vcfloat Require Import FPStdLib.
+From LAProof.C Require Import floatlib.
+(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
+   so this does not pollute the VST namespace with ssreflect notations.  We do
+   NOT [Import common]: that would bring mathcomp's order/ring notations into
+   scope (making [<=]/[<] parse over [R] and breaking [ddot_model_step]).
+   Instead [common.BPLUS_comm] / [common.pos_zero] are used qualified below;
+   [BPLUS_mor] and the [feq] [Equivalence] come from vcfloat. *)
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+(** [ddot_model X Y] is the value the C loop *literally* computes: a left fold
+    with the accumulator as the first [BPLUS] operand, separate multiply then
+    add, starting from [+0.0].  This mirrors the Clight AST in [ddot.v] exactly. *)
+Definition ddot_loop (xy: list (ftype Tdouble * ftype Tdouble)) : ftype Tdouble :=
+  fold_left (fun acc p => BPLUS acc (BMULT (fst p) (snd p))) xy (Zconst Tdouble 0).
+
+Definition ddot_model (X Y: list (ftype Tdouble)) : ftype Tdouble :=
+  ddot_loop (combine X Y).
+
+(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense ddot). *)
+
+Lemma combine_app_eqlen {A B} (l1 l1': list A) (l2 l2': list B):
+  length l1 = length l2 ->
+  combine (l1 ++ l1') (l2 ++ l2') = combine l1 l2 ++ combine l1' l2'.
+Proof.
+  revert l2; induction l1; destruct l2; simpl; intros; try discriminate; auto.
+  f_equal; auto.
+Qed.
+
+Lemma ddot_loop_snoc: forall xy p,
+  ddot_loop (xy ++ [p]) = BPLUS (ddot_loop xy) (BMULT (fst p) (snd p)).
+Proof. intros. unfold ddot_loop. rewrite fold_left_app. reflexivity. Qed.
+
+(** *step*: extending both length-[k] prefixes [X[0..k-1]]/[Y[0..k-1]] with the
+    elements [X[k]]/[Y[k]] adds one [BMULT] term, with the accumulator as the
+    first [BPLUS] operand -- exactly the Clight statement [r = r + X[k]*Y[k]]. *)
+Lemma ddot_model_step: forall (X Y: list (ftype Tdouble)) k,
+  Zlength X = Zlength Y -> 0 <= k < Zlength X ->
+  ddot_model (sublist 0 (k+1) X) (sublist 0 (k+1) Y)
+  = BPLUS (ddot_model (sublist 0 k X) (sublist 0 k Y))
+          (BMULT (Znth k X) (Znth k Y)).
+Proof.
+  intros X Y k Hlen Hk. unfold ddot_model.
+  rewrite (sublist_last_1 0 k X) by lia.
+  rewrite (sublist_last_1 0 k Y) by lia.
+  rewrite combine_app_eqlen
+    by (apply Nat2Z.inj; rewrite <- !Zlength_correct, !Zlength_sublist by lia; lia).
+  cbn [combine].   (* combine [Znth k X] [Znth k Y] = [(Znth k X, Znth k Y)] *)
+  rewrite ddot_loop_snoc. reflexivity.
+Qed.
+
+(** *end*/bridge: the finished accumulation equals LAProof's accuracy model
+    [dotprodF] up to [feq].  They differ only by the [BPLUS] operand order
+    ([BPLUS acc prod] here vs [BPLUS prod acc] in [dotprodF]'s fold step), so the
+    proof is a generalized-accumulator induction using [common.BPLUS_comm]
+    (feq (BPLUS x y) (BPLUS y x)) and the [BPLUS_mor] feq-congruence, plus the
+    fact that [List.combine = seq.zip] on equal-length lists and that the two
+    initial zeros ([Zconst Tdouble 0] vs [pos_zero]) are [feq].  This is the one
+    remaining model obligation; once discharged, [dot_acc.dotprod_forward_error]
+    transfers to the C result. *)
+Lemma ddot_model_feq_dotprodF:
+  forall (X Y: list (ftype Tdouble)),
+    Zlength X = Zlength Y ->
+    Forall finite X -> Forall finite Y ->
+    feq (ddot_model X Y) (dotprodF X Y).
+Proof.
+  intros X Y Hlen _ _.
+  assert (Hl: length X = length Y)
+    by (apply Nat2Z.inj; rewrite <- !Zlength_correct; exact Hlen).
+  clear Hlen.
+  unfold ddot_model, ddot_loop, dotprodF, dotprod.
+  (* The two folds differ only by the initial accumulator ([Zconst Tdouble 0] vs
+     [pos_zero], which are [feq]) and the [BPLUS] operand order at each step.
+     Generalize both accumulators, keeping them [feq]-linked, and induct. *)
+  assert (Hs: feq (Zconst Tdouble 0) (@common.pos_zero Tdouble)) by reflexivity.
+  revert Y Hl Hs.
+  generalize (Zconst Tdouble 0).
+  generalize (@common.pos_zero Tdouble).
+  induction X as [|x X IH]; intros g f Y Hl Hs.
+  - (* X = nil (so Y = nil): both folds are the (feq-linked) initial accs. *)
+    destruct Y as [|y Y']; [ | simpl in Hl; discriminate ].
+    simpl. exact Hs.
+  - destruct Y as [|y Y']; simpl in Hl; [ discriminate | ].
+    simpl.
+    apply IH.
+    + congruence.
+    + (* one accumulation step: [BPLUS f prod] vs [flip BPLUS g prod = BPLUS prod g] *)
+      simpl. unfold Basics.flip.
+      etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
+Qed.
diff --git a/C/cblas/include/cblas.h b/C/cblas/include/cblas.h
new file mode 100644
index 0000000..b478671
--- /dev/null
+++ b/C/cblas/include/cblas.h
@@ -0,0 +1,34 @@
+#define INDEX int
+#define OFFSET(N, incX) ((incX) > 0 ?  0 : ((N) - 1) * (-(incX)))
+#define BLAS_ERROR(x)  cblas_xerbla(0, __FILE__, x); 
+
+#define CONJUGATE(x) ((x) == CblasConjTrans)
+#define TRANSPOSE(x) ((x) == CblasTrans || (x) == CblasConjTrans)
+#define UPPER(x) ((x) == CblasUpper)
+#define LOWER(x) ((x) == CblasLower)
+
+/* Handling of packed complex types... */
+
+#define REAL(a,i) (((BASE *) a)[2*(i)])
+#define IMAG(a,i) (((BASE *) a)[2*(i)+1])
+
+#define REAL0(a) (((BASE *)a)[0])
+#define IMAG0(a) (((BASE *)a)[1])
+
+#define CONST_REAL(a,i) (((const BASE *) a)[2*(i)])
+#define CONST_IMAG(a,i) (((const BASE *) a)[2*(i)+1])
+
+#define CONST_REAL0(a) (((const BASE *)a)[0])
+#define CONST_IMAG0(a) (((const BASE *)a)[1])
+
+
+#define GB(KU,KL,lda,i,j) ((KU+1+(i-j))*lda + j)
+
+#define TRCOUNT(N,i) ((((i)+1)*(2*(N)-(i)))/2)
+
+/* #define TBUP(N,i,j) */
+/* #define TBLO(N,i,j) */
+
+#define TPUP(N,i,j) (TRCOUNT(N,(i)-1)+(j)-(i))
+#define TPLO(N,i,j) (((i)*((i)+1))/2 + (j))
+
diff --git a/C/cblas/scal_model.v b/C/cblas/scal_model.v
new file mode 100644
index 0000000..c8ea503
--- /dev/null
+++ b/C/cblas/scal_model.v
@@ -0,0 +1,37 @@
+(**  * LAProof.C.cblas.scal_model: functional model of GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)
+
+(** This file provides [scal_model], the elementwise scaling that [cblas_dscal]
+    computes.  Scaling is a deterministic *elementwise* update with no
+    accumulation, so there is no loop invariant tracking a running accumulator
+    and no [feq] bridge -- the C program computes the model exactly.  GSL's
+    kernel ([source_scal_r.h]) is
+<<
+      for (i = 0; i < N; i++) { X[ix] *= alpha; ix += incX; }
+>>
+    i.e., [X[ix] *= alpha] is [BMULT (X[ix]) alpha] (element-first), so the model
+    is the list map below, matching what the C program computes.
+
+    LAProof's existing scale model [mv_mathcomp.scalemx] is alpha-first
+    ([map_mx (BMULT a)]) and matrix-based; relating to it would need a [BMULT]
+    commutativity lemma (not in LAProof) plus a vector/matrix reshape.  That is
+    unnecessary here: the per-element accuracy (relative error <= unit roundoff)
+    is exactly vcfloat's correctly-rounded-[BMULT] bound. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Definition scal_model (alpha: ftype Tdouble) (X: list (ftype Tdouble)) : list (ftype Tdouble) :=
+  map (fun x => BMULT x alpha) X.
+
+Lemma Zlength_scal_model: forall alpha X,
+  Zlength (scal_model alpha X) = Zlength X.
+Proof. intros. unfold scal_model. rewrite Zlength_map. reflexivity. Qed.
+
+Lemma Znth_scal_model: forall alpha X k,
+  0 <= k < Zlength X ->
+  Znth k (scal_model alpha X) = BMULT (Znth k X) alpha.
+Proof. intros. unfold scal_model. rewrite Znth_map by lia. reflexivity. Qed.
diff --git a/C/cblas/spec_dasum.v b/C/cblas/spec_dasum.v
new file mode 100644
index 0000000..8c2d6f7
--- /dev/null
+++ b/C/cblas/spec_dasum.v
@@ -0,0 +1,39 @@
+(**  * LAProof.C.cblas.spec_dasum: VST specification of GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)
+
+(** Sibling of [LAProof.C.cblas.spec_ddot].  [cblas_dasum] computes the sum of
+    absolute values of a vector; the postcondition relates its result (up to
+    [feq]) to LAProof's summation model [sumF] applied to [map BABS X], over
+    which [sum_acc.fSUM] proves the forward-error bound. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From VSTlib Require Import spec_math.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dasum asum_model.
+(* for [sumF]; [Require Import] (not [Export]) keeps ssreflect out of scope. *)
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** ** Funspec for [cblas_dasum] (unit-stride case [incX = 1]). *)
+Definition cblas_dasum_spec :=
+ DECLARE _cblas_dasum
+ WITH shX: share, n: Z, X: list (ftype Tdouble), px: val
+ PRE [ tint, tptr tdouble, tint ]
+   PROP (readable_share shX; Zlength X = n; 0 <= n <= Int.max_signed;
+         Forall finite X)
+   PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1))
+   SEP (data_at shX (tarray tdouble n) (map Vfloat X) px)
+ POST [ tdouble ]
+   EX s: ftype Tdouble,
+   PROP (feq s (sumF (map BABS X)))
+   RETURN (Vfloat s)
+   SEP (data_at shX (tarray tdouble n) (map Vfloat X) px).
+
+Definition DasumASI : funspecs := [ cblas_dasum_spec ].
+Definition dasum_imports : funspecs := [ fabs_spec ].   (* fabs is the one external call *)
+Definition Gprog : funspecs := dasum_imports ++ DasumASI.
diff --git a/C/cblas/spec_ddot.v b/C/cblas/spec_ddot.v
new file mode 100644
index 0000000..c3ace94
--- /dev/null
+++ b/C/cblas/spec_ddot.v
@@ -0,0 +1,67 @@
+(**  * LAProof.C.cblas.spec_ddot: VST specification of GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)
+
+(** This file connects a real-world BLAS implementation -- GSL's double-precision
+    dot product [cblas_ddot] -- to LAProof's functional dot-product model.
+
+    GSL's loop (after macro expansion of [source_dot_r.h]) is
+<<
+      double r = 0.0;
+      for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
+      return r;
+>>
+    i.e. a forward, left-to-right accumulation using *separate* multiply-then-add
+    (not fused multiply-add), starting from +0.0.  That is exactly the
+    [dotprodF] model of [LAProof.accuracy_proofs.dotprod_model]
+    ([dotprodF = dotprod BMULT BPLUS pos_zero]), over which the forward-error
+    theorem [dot_acc.dotprod_forward_error] is proved -- *not* the FMA-based
+    [floatlib.dotprod].
+
+    Note on operand order: the C statement [r += X*Y] computes [BPLUS acc prod]
+    (accumulator first), whereas [dotprodF]'s fold step is [BPLUS prod acc]
+    (product first).  IEEE addition is commutative *up to [feq]*, so we state the
+    postcondition with [feq] -- the same convention used by
+    [LAProof.C.spec_sparse] for [csr_row_vector_multiply_spec]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import ddot ddot_model.
+(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
+   so this does not pollute the VST namespace with ssreflect notations.  We need
+   it here for [dotprodF], which the postcondition is stated against. *)
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** The partial-accumulation model [ddot_model] and its start/step/end lemmas
+    live in [LAProof.C.cblas.ddot_model] (mirroring the [sparse_model] /
+    [spec_sparse] split).  This file states only the VST [funspec]. *)
+
+(** ** Funspec for [cblas_ddot] (unit-stride case [incX = incY = 1]). *)
+(** Unit stride is a deliberate first milestone, not a fundamental limit. *)
+Definition cblas_ddot_spec :=
+ DECLARE _cblas_ddot
+ WITH shX: share, shY: share, n: Z,
+      X: list (ftype Tdouble), Y: list (ftype Tdouble),
+      px: val, py: val
+ PRE [ tint, tptr tdouble, tint, tptr tdouble, tint ]
+   PROP (readable_share shX; readable_share shY;
+         Zlength X = n; Zlength Y = n; 0 <= n <= Int.max_signed;
+         Forall finite X; Forall finite Y)
+   PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1); py; Vint (Int.repr 1))
+   SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+        data_at shY (tarray tdouble n) (map Vfloat Y) py)
+ POST [ tdouble ]
+   EX s: ftype Tdouble,
+   PROP (feq s (dotprodF X Y))
+   RETURN (Vfloat s)
+   SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+        data_at shY (tarray tdouble n) (map Vfloat Y) py).
+
+Definition DdotASI : funspecs := [ cblas_ddot_spec ].
+Definition ddot_imports : funspecs := [].   (* no external calls: the loop is self-contained *)
+Definition Gprog : funspecs := ddot_imports ++ DdotASI.
diff --git a/C/cblas/spec_dscal.v b/C/cblas/spec_dscal.v
new file mode 100644
index 0000000..c2db868
--- /dev/null
+++ b/C/cblas/spec_dscal.v
@@ -0,0 +1,35 @@
+(**  * LAProof.C.cblas.spec_dscal: VST specification of GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)
+
+(** [cblas_dscal] scales a vector in place by [alpha].  Because scaling is
+    deterministic elementwise, the postcondition is stated as an *exact* equality
+    (the array is left holding [scal_model alpha X]); no [feq]/[EX] is needed,
+    unlike the reduction routines [ddot]/[dasum]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dscal scal_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined.
+Definition Vprog : varspecs. mk_varspecs prog. Defined.
+
+(** ** Funspec for [cblas_dscal] (unit-stride case [incX = 1]). *)
+(** Unit stride is a first milestone, not a fundamental limit.*)
+Definition cblas_dscal_spec :=
+ DECLARE _cblas_dscal
+ WITH sh: share, n: Z, alpha: ftype Tdouble, X: list (ftype Tdouble), px: val
+ PRE [ tint, tdouble, tptr tdouble, tint ]
+   PROP (writable_share sh; Zlength X = n; 0 <= n <= Int.max_signed)
+   PARAMS (Vint (Int.repr n); Vfloat alpha; px; Vint (Int.repr 1))
+   SEP (data_at sh (tarray tdouble n) (map Vfloat X) px)
+ POST [ tvoid ]
+   PROP ()
+   RETURN ()
+   SEP (data_at sh (tarray tdouble n) (map Vfloat (scal_model alpha X)) px).
+
+Definition DscalASI : funspecs := [ cblas_dscal_spec ].
+Definition dscal_imports : funspecs := [].   (* no external calls *)
+Definition Gprog : funspecs := dscal_imports ++ DscalASI.
diff --git a/C/cblas/src/dasum.c b/C/cblas/src/dasum.c
new file mode 100644
index 0000000..ec02c99
--- /dev/null
+++ b/C/cblas/src/dasum.c
@@ -0,0 +1,24 @@
+//ldoc on
+/**
+ * # `dasum.c`: GSL CBLAS real double-precision sum of absolute values
+ *
+ * Thin wrapper that instantiates the generic kernel `source_asum_r.h` at type
+ * `double`.  Verified in `LAProof.C.cblas.verif_dasum` against the functional
+ * model `LAProof.C.cblas.asum_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+   unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute
+   nothing to cblas_dasum's body beyond the libc fabs, which we obtain directly
+   from <math.h> (as the other LAProof C sources do, e.g. densemat.c). */
+/* #include <gsl/gsl_math.h>  */
+/* #include <gsl/gsl_cblas.h> */
+#include <math.h>
+#include "cblas.h"
+
+double
+cblas_dasum (const int N, const double *X, const int incX)
+{
+#define BASE double
+#include "source_asum_r.h"
+#undef BASE
+}
diff --git a/C/cblas/src/ddot.c b/C/cblas/src/ddot.c
new file mode 100644
index 0000000..45e8c19
--- /dev/null
+++ b/C/cblas/src/ddot.c
@@ -0,0 +1,27 @@
+//ldoc on
+/**
+ * # `ddot.c`: GSL CBLAS real double-precision dot product
+ *
+ * Thin wrapper that instantiates the generic kernel `source_dot_r.h` at type
+ * `double`.  Verified in `LAProof.C.cblas.verif_ddot` against the functional
+ * model `LAProof.C.cblas.ddot_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+   unused by cblas_ddot's body (only OFFSET/INDEX from "cblas.h" are needed)
+   and are unresolvable here because GSL's gsl/ symlink dir is not generated. */
+/* #include <gsl/gsl_math.h>  */
+/* #include <gsl/gsl_cblas.h> */
+#include "cblas.h"
+
+double
+cblas_ddot (const int N, const double *X, const int incX, const double *Y,
+            const int incY)
+{
+#define INIT_VAL  0.0
+#define ACC_TYPE  double
+#define BASE double
+#include "source_dot_r.h"
+#undef ACC_TYPE
+#undef BASE
+#undef INIT_VAL
+}
diff --git a/C/cblas/src/dscal.c b/C/cblas/src/dscal.c
new file mode 100644
index 0000000..4b2e0e6
--- /dev/null
+++ b/C/cblas/src/dscal.c
@@ -0,0 +1,23 @@
+//ldoc on
+/**
+ * # `dscal.c`: GSL CBLAS real double-precision scaling (in place)
+ *
+ * Thin wrapper that instantiates the generic kernel `source_scal_r.h` at type
+ * `double`.  Verified in `LAProof.C.cblas.verif_dscal` against the functional
+ * model `LAProof.C.cblas.scal_model`.
+ */
+/* The following GSL headers are commented out for clightgen: they are
+   unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute
+   nothing to cblas_dscal's body (only INDEX from "cblas.h" is needed; the loop
+   uses no libc math). */
+/* #include <gsl/gsl_math.h>  */
+/* #include <gsl/gsl_cblas.h> */
+#include "cblas.h"
+
+void
+cblas_dscal (const int N, const double alpha, double *X, const int incX)
+{
+#define BASE double
+#include "source_scal_r.h"
+#undef BASE
+}
diff --git a/C/cblas/src/source_asum_r.h b/C/cblas/src/source_asum_r.h
new file mode 100644
index 0000000..65a3515
--- /dev/null
+++ b/C/cblas/src/source_asum_r.h
@@ -0,0 +1,43 @@
+/* blas/source_asum_r.h
+ * 
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Accumulation kernel (`source_asum_r.h`)
+ *
+ * A forward, left-to-right accumulation of `fabs(X[ix])` into `r`, starting
+ * from `+0.0`.  The `incX <= 0` guard returns `0` for nonpositive stride; with
+ * unit stride (`incX = 1`) it is dead and `ix` advances one element per
+ * iteration.
+ */
+{
+  BASE r = 0.0;
+  INDEX i;
+  INDEX ix = 0;
+
+  if (incX <= 0) {
+    return 0;
+  }
+
+  for (i = 0; i < N; i++) {
+    r += fabs(X[ix]);
+    ix += incX;
+  }
+  return r;
+}
diff --git a/C/cblas/src/source_dot_r.h b/C/cblas/src/source_dot_r.h
new file mode 100644
index 0000000..0775c77
--- /dev/null
+++ b/C/cblas/src/source_dot_r.h
@@ -0,0 +1,41 @@
+/* blas/source_dot_r.h
+ * 
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Accumulation kernel (`source_dot_r.h`)
+ *
+ * A forward, left-to-right accumulation with separate multiply-then-add (not
+ * fused).  With unit stride (`incX = incY = 1`) the `OFFSET` conditionals are
+ * `0` and the indices `ix`, `iy` advance one element per iteration.
+ */
+{
+  ACC_TYPE r = INIT_VAL;
+  INDEX i;
+  INDEX ix = OFFSET(N, incX);
+  INDEX iy = OFFSET(N, incY);
+
+  for (i = 0; i < N; i++) {
+    r += X[ix] * Y[iy];
+    ix += incX;
+    iy += incY;
+  }
+
+  return r;
+}
diff --git a/C/cblas/src/source_scal_r.h b/C/cblas/src/source_scal_r.h
new file mode 100644
index 0000000..4bc0fe9
--- /dev/null
+++ b/C/cblas/src/source_scal_r.h
@@ -0,0 +1,40 @@
+/* blas/source_scal_r.h
+ * 
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+//ldoc on
+/**
+ * ## Scaling kernel (`source_scal_r.h`)
+ *
+ * An in-place elementwise update `X[ix] *= alpha` with no accumulation.  The
+ * `incX <= 0` guard returns early for nonpositive stride; with unit stride
+ * (`incX = 1`) it is dead and `ix` advances one element per iteration.
+ */
+{
+  INDEX i;
+  INDEX ix = 0;
+
+  if (incX <= 0) {
+    return;
+  }
+
+  for (i = 0; i < N; i++) {
+    X[ix] *= alpha;
+    ix += incX;
+  }
+}
diff --git a/C/cblas/verif_dasum.v b/C/cblas/verif_dasum.v
new file mode 100644
index 0000000..0938c66
--- /dev/null
+++ b/C/cblas/verif_dasum.v
@@ -0,0 +1,60 @@
+(**  * LAProof.C.cblas.verif_dasum: VST proof for GSL's [cblas_dasum]. *)
+(** ** Corresponds to C program [C/cblas/src/dasum.c]. *)
+
+(** [body_cblas_dasum] proves that the compiled C [cblas_dasum] (unit stride)
+    refines LAProof's summation model: its result is [feq]-equal to
+    [sumF (map BABS X)].  The loop invariant carries the partial accumulation
+    [asum_model (sublist 0 k X)]; the step and bridge reasoning that discharges
+    it lives in [LAProof.C.cblas.asum_model].  The loop body makes an [fabs]
+    [forward_call], and a dead [if (incX<=0) return 0;] guard precedes the loop
+    (unreachable since [incX = 1]). *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From VSTlib Require Import spec_math.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dasum asum_model spec_dasum.
+Require Import LAProof.accuracy_proofs.sum_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_dasum: semax_body Vprog Gprog f_cblas_dasum cblas_dasum_spec.
+Proof.
+start_function.
+forward.                       (* r = 0.0 *)
+forward.                       (* ix = 0 *)
+(* if (incX <= 0) return 0;  -- dead, incX = 1 *)
+forward_if.
+{ rep_lia. }                   (* then-branch: incX <= 0 contradicts incX = 1 *)
+(* accumulation loop *)
+forward_for_simple_bound (Zlength X)
+  (EX k:Z,
+    PROP ()
+    LOCAL (temp _r (Vfloat (asum_model (sublist 0 k X)));
+           temp _ix (Vint (Int.repr k));
+           temp _N (Vint (Int.repr (Zlength X)));
+           temp _incX (Vint (Int.repr 1));
+           temp _X px)
+    SEP (data_at shX (tarray tdouble (Zlength X)) (map Vfloat X) px))%assert.
+- (* entry (k = 0): asum_model nil = +0.0 *)
+  entailer!!.
+- (* body: t = X[ix]; r += fabs(t); ix += incX *)
+  forward.                     (* t'2 = X[ix] *)
+  forward_call (Znth i X).     (* t'1 = fabs(X[i]) = BABS (Znth i X) *)
+  forward.                     (* r = r + t'1 *)
+  forward.                     (* ix = ix + incX *)
+  entailer!!.
+  rewrite (asum_model_step X i) by list_solve.
+  reflexivity.
+- (* exit (k = Zlength X): return r = asum_model X *)
+  forward.                     (* return r *)
+  Exists (asum_model (sublist 0 (Zlength X) X)).
+  entailer!!.
+  rewrite sublist_same by lia.
+  apply asum_model_feq_sumF.
+Qed.
+
+(** Payoff (stub): because the result is [feq]-equal to [sumF (map BABS X)], the
+    forward-error bound [LAProof.accuracy_proofs.sum_acc.fSUM] (applied to the
+    list [map BABS X]) bounds the C result's error whenever that result is finite
+    (the no-overflow hypothesis [Hfin]; see [C/cblas/README.md]). *)
diff --git a/C/cblas/verif_ddot.v b/C/cblas/verif_ddot.v
new file mode 100644
index 0000000..1ac765a
--- /dev/null
+++ b/C/cblas/verif_ddot.v
@@ -0,0 +1,94 @@
+(**  * LAProof.C.cblas.verif_ddot: VST proof for GSL's [cblas_ddot]. *)
+(** ** Corresponds to C program [C/cblas/src/ddot.c]. *)
+
+(** [body_cblas_ddot] proves that the compiled C [cblas_ddot] (unit stride)
+    refines the [dotprodF] accuracy model: its result is [feq]-equal to
+    [dotprodF X Y].  The loop invariant carries the partial accumulation
+    [ddot_model (sublist 0 k X) (sublist 0 k Y)]; the step and bridge reasoning
+    that discharges it lives in [LAProof.C.cblas.ddot_model].  Two dead [OFFSET]
+    guards (the [incX>0]/[incY>0] conditionals) precede the loop, unreachable
+    since [incX = incY = 1]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import ddot ddot_model spec_ddot.
+Require Import LAProof.accuracy_proofs.dotprod_model.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_ddot: semax_body Vprog Gprog f_cblas_ddot cblas_ddot_spec.
+Proof.
+start_function.
+(* C: r = 0.0; ix = OFFSET(N,incX); iy = OFFSET(N,incY); with incX=incY=1 ⇒ ix=iy=0.
+   Each OFFSET is a (incX>0 ? 0 : ...) conditional; the else-branch is
+   unreachable here because incX=incY=1, so [rep_lia] closes it. *)
+forward.                       (* _r = 0.0 *)
+forward_if (PROP ()
+  LOCAL (temp _t'1 (Vint (Int.repr 0));
+         temp _r (Vfloat (Float.of_bits (Int64.repr 0)));
+         temp _N (Vint (Int.repr n)); temp _X px;
+         temp _incX (Vint (Int.repr 1)); temp _Y py;
+         temp _incY (Vint (Int.repr 1)))
+  SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+       data_at shY (tarray tdouble n) (map Vfloat Y) py)).
+{ forward. entailer!!. }       (* incX = 1 > 0 : t'1 := 0 *)
+{ rep_lia. }                   (* dead else-branch (incX = 1 is > 0) *)
+forward.                       (* _ix = _t'1 (= 0) *)
+forward_if (PROP ()
+  LOCAL (temp _t'2 (Vint (Int.repr 0));
+         temp _ix (Vint (Int.repr 0));
+         temp _r (Vfloat (Float.of_bits (Int64.repr 0)));
+         temp _N (Vint (Int.repr n)); temp _X px;
+         temp _incX (Vint (Int.repr 1)); temp _Y py;
+         temp _incY (Vint (Int.repr 1)))
+  SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+       data_at shY (tarray tdouble n) (map Vfloat Y) py)).
+{ forward. entailer!!. }       (* incY = 1 > 0 : t'2 := 0 *)
+{ rep_lia. }                   (* dead else-branch (incY = 1 is > 0) *)
+forward.                       (* _iy = _t'2 (= 0) *)
+(* The accumulation loop.  Invariant: _r holds the value the C loop has
+   accumulated over the first k element-pairs, namely [ddot_model] of the
+   length-k prefixes; the strided indices ix,iy advance by 1 each step (= k). *)
+forward_for_simple_bound n
+  (EX k:Z,
+    PROP ()
+    LOCAL (temp _r (Vfloat (ddot_model (sublist 0 k X) (sublist 0 k Y)));
+           temp _ix (Vint (Int.repr k));
+           temp _iy (Vint (Int.repr k));
+           temp _N (Vint (Int.repr n));
+           temp _incX (Vint (Int.repr 1));
+           temp _incY (Vint (Int.repr 1));
+           temp _X px; temp _Y py)
+    SEP (data_at shX (tarray tdouble n) (map Vfloat X) px;
+         data_at shY (tarray tdouble n) (map Vfloat Y) py))%assert.
+- (* Loop entry (k = 0).  [sublist 0 0 = nil] and [ddot_model nil nil =
+     Zconst Tdouble 0] (= the C [+0.0]); [entailer!!] discharges it. *)
+  entailer!!.
+- (* Loop body: read X[i], Y[i]; [r = r + X[i]*Y[i]]; ix++; iy++.  Re-establish
+     the invariant for [i+1] via the step lemma [ddot_model_step]. *)
+  forward.                       (* t'3 = X[ix] *)
+  forward.                       (* t'4 = Y[iy] *)
+  forward.                       (* r = r + t'3 * t'4 *)
+  forward.                       (* ix = ix + incX *)
+  forward.                       (* iy = iy + incY *)
+  assert (Hi: 0 <= i < Zlength X) by list_solve.
+  assert (Hxy: Zlength X = Zlength Y) by list_solve.
+  entailer!.
+  rewrite (ddot_model_step X Y i Hxy Hi).
+  reflexivity.
+- (* Loop exit (k = n).  Return [r = ddot_model (sublist 0 n X) (sublist 0 n Y)];
+     [sublist_same] gives [= ddot_model X Y], and the bridge lemma
+     [ddot_model_feq_dotprodF] discharges [feq _ (dotprodF X Y)]. *)
+  forward.                       (* return r *)
+  Exists (ddot_model (sublist 0 (Zlength X) X) (sublist 0 (Zlength X) Y)).
+  entailer!.
+  rewrite (sublist_same 0 (Zlength X) X) by lia.
+  rewrite (sublist_same 0 (Zlength X) Y) by lia.
+  apply ddot_model_feq_dotprodF; [ lia | auto | auto ].
+Qed.
+
+(** Payoff (stub): because [cblas_ddot_spec]'s result is [feq]-equal to
+    [dotprodF X Y], the forward-error bound
+    [LAProof.accuracy_proofs.dot_acc.dotprod_forward_error] applies directly to
+    the value computed by the compiled C [cblas_ddot]. *)
diff --git a/C/cblas/verif_dscal.v b/C/cblas/verif_dscal.v
new file mode 100644
index 0000000..2651116
--- /dev/null
+++ b/C/cblas/verif_dscal.v
@@ -0,0 +1,85 @@
+(**  * LAProof.C.cblas.verif_dscal: VST proof for GSL's [cblas_dscal]. *)
+(** ** Corresponds to C program [C/cblas/src/dscal.c]. *)
+
+(** [body_cblas_dscal] proves that after the compiled C [cblas_dscal] (unit
+    stride) the array holds exactly [scal_model alpha X] (= [map (fun x => BMULT
+    x alpha) X]).  This is an in-place routine: a writable array with a store in
+    the loop, whose invariant is "scaled prefix ++ original suffix"; the per-step
+    [upd_Znth]/[sublist] identity is discharged by [list_solve]. *)
+
+Require Import VST.floyd.proofauto.
+From vcfloat Require Import FPStdCompCert FPStdLib.
+From LAProof.C Require Import floatlib.
+From LAProof.C.cblas Require Import dscal scal_model spec_dscal.
+
+Set Bullet Behavior "Strict Subproofs".
+
+Lemma body_cblas_dscal: semax_body Vprog Gprog f_cblas_dscal cblas_dscal_spec.
+Proof.
+start_function.
+forward.                       (* ix = 0 *)
+forward_if.
+{ rep_lia. }                   (* dead: incX <= 0 contradicts incX = 1 *)
+(* Invariant: the array holds a list [L] of length [Zlength X] whose first [k]
+   entries are already scaled ([Znth j X * alpha]) and whose remaining entries
+   are still the original [X].  Keeping the contents as [map Vfloat L] for a
+   *plain* list [L] lets the load and store simplify; the pointwise [Znth]
+   constraints make each step a simple [upd_Znth_same]/[upd_Znth_diff] case. *)
+forward_for_simple_bound (Zlength X)
+  (EX k:Z, EX L:list (ftype Tdouble),
+    PROP (Zlength L = Zlength X;
+          forall j, 0 <= j < k -> Znth j L = (Znth j X * alpha)%F64;
+          forall j, k <= j < Zlength X -> Znth j L = Znth j X)
+    LOCAL (temp _alpha (Vfloat alpha);
+           temp _ix (Vint (Int.repr k));
+           temp _N (Vint (Int.repr (Zlength X)));
+           temp _incX (Vint (Int.repr 1));
+           temp _X px)
+    SEP (data_at sh (tarray tdouble (Zlength X)) (map Vfloat L) px))%assert.
+- (* entry (k = 0): L := X *)
+  Exists X. entailer!!.
+- (* body: t = X[ix]; X[ix] = t * alpha; ix += incX *)
+  Intros.
+  (* Re-assert the invariant facts under explicit names so they survive the
+     [forward]s (which clear the auto-named PROP hypotheses). *)
+  assert (Hlen: Zlength L = Zlength X) by assumption.
+  assert (Hpre: forall j, 0 <= j < i -> Znth j L = (Znth j X * alpha)%F64) by assumption.
+  assert (Hsuf: forall j, i <= j < Zlength X -> Znth j L = Znth j X) by assumption.
+  assert (Hi: 0 <= i < Zlength L) by lia.
+  assert (HLX: Znth i L = Znth i X) by (apply Hsuf; lia).
+  forward.                     (* t'1 = X[ix] *)
+  forward.                     (* X[ix] = t'1 * alpha *)
+  forward.                     (* ix += incX *)
+  Exists (upd_Znth i L (BMULT (Znth i X) alpha)).
+  entailer!!.
+  2: { (* SEP: the stored value [X[i]*alpha] is the model element [BMULT .. alpha] *)
+       apply derives_refl'. f_equal. rewrite <- upd_Znth_map. f_equal.
+       rewrite ?Znth_map by lia.
+       replace (@Znth float Inhabitant_float i L)
+          with (@Znth float Inhabitant_float i X) by (symmetry; apply HLX).
+       reflexivity. }
+  (* the three pointwise list invariants for the updated list *)
+  split3.
+  { list_solve. }
+  { intros j Hj. destruct (Z.eq_dec j i) as [->|Hne].
+    { rewrite upd_Znth_same by lia. reflexivity. }
+    { rewrite upd_Znth_diff by lia. apply Hpre; lia. } }
+  { intros j Hj. rewrite upd_Znth_diff by lia. apply Hsuf; lia. }
+- (* exit (k = Zlength X): the whole array now holds [scal_model alpha X] *)
+  Intros L.
+  assert (Hpre: forall j, 0 <= j < Zlength X -> Znth j L = (Znth j X * alpha)%F64)
+    by assumption.
+  assert (Hlen: Zlength L = Zlength X) by assumption.
+  assert (L = scal_model alpha X). {
+    apply Znth_eq_ext; [ rewrite Zlength_scal_model; lia | ].
+    intros j Hj. rewrite Hlen in Hj.
+    rewrite Znth_scal_model by lia. rewrite Hpre by lia. reflexivity. }
+  subst L. entailer!!.
+Qed.
+
+(** Payoff: each element [BMULT (Znth k X) alpha] is the correctly-rounded
+    product, so when that product is finite its relative error vs the exact
+    product is bounded by the unit roundoff (vcfloat's [BMULT] accuracy lemma).
+    A product can still overflow, but the finiteness (no-overflow) requirement is
+    per element -- each [X[k]*alpha] independently -- rather than a single global
+    hypothesis over an accumulation. *)
diff --git a/Makefile b/Makefile
index 574bac7..99355c4 100644
--- a/Makefile
+++ b/Makefile
@@ -20,6 +20,7 @@ clean:
 	if [ -e Makefile.coq ]; then $(MAKE) -f Makefile.coq cleanall; fi 
 	$(RM) Makefile.coq Makefile.coq.conf
 	$(RM) C/sparse.v C/cholesky.v C/alloc.v C/densemat.v C/bandmat.v
+	$(RM) C/cblas/ddot.v C/cblas/dasum.v C/cblas/dscal.v
 
 INSTALLFILES1 ?= $(shell awk '/accuracy_proofs/{print $$NF"o"}' _CoqProject)
 
diff --git a/Makefile.coq.local b/Makefile.coq.local
index 56581ac..d395271 100644
--- a/Makefile.coq.local
+++ b/Makefile.coq.local
@@ -2,7 +2,7 @@
 COQDOCEXTRAFLAGS= -g --no-lib-name --with-header header.html --index genindex --lib-subtitles --interpolate -utf8
 
 accuracy: accuracy_proofs/export.vo
-C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo
+C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo C/cblas/verif_ddot.vo C/cblas/verif_dasum.vo C/cblas/verif_dscal.vo
 
 #  Ugh.  We can't just store the reference copy of index.html in html/index.html, because Makefile.coq's
 #   standard "make clean" removes the html directory entirely.  So we gotta do this hack:
@@ -31,7 +31,8 @@ install-extra:: $(CFILES) $(HFILES)
 
 # Quarto documentation of C programs
 LDOC= 	lua cdoc/ldoc.lua
-CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd
+CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd \
+       cdoc/cblas_ddot.qmd cdoc/cblas_dasum.qmd cdoc/cblas_dscal.qmd
 
 cdoc/%.html: cdoc/%.qmd
 	cd cdoc; quarto render $(notdir $^) --to html --quiet
@@ -42,6 +43,14 @@ cdoc/%.qmd: C/include/%.h C/src/%.c
 cdoc/build_csr.qmd: C/include/sparse.h C/src/build_csr.c
 	$(LDOC) -highlight c -p quarto -o $@ C/src/build_csr.c
 
+# GSL CBLAS routines: document the wrapper .c together with its source_*_r.h kernel.
+cdoc/cblas_ddot.qmd: C/cblas/src/ddot.c C/cblas/src/source_dot_r.h
+	$(LDOC) -highlight c -p quarto -o $@ $^
+cdoc/cblas_dasum.qmd: C/cblas/src/dasum.c C/cblas/src/source_asum_r.h
+	$(LDOC) -highlight c -p quarto -o $@ $^
+cdoc/cblas_dscal.qmd: C/cblas/src/dscal.c C/cblas/src/source_scal_r.h
+	$(LDOC) -highlight c -p quarto -o $@ $^
+
 cdocs: $(patsubst %.qmd,%.html,$(CDOCS))
 
 
@@ -49,6 +58,10 @@ cdocs: $(patsubst %.qmd,%.html,$(CDOCS))
 C/%.v: C/src/%.c
 	clightgen -normalize -IC/include $< -o $@
 
+# cblas subdirectory: GSL BLAS routines ported into LAProof
+C/cblas/%.v: C/cblas/src/%.c
+	clightgen -normalize -IC/cblas/include $< -o $@
+
 
 # the rest of this is adapted from the makedepend part of ../Makefile
 C/sparse.v: C/include/sparse.h
@@ -57,3 +70,6 @@ C/alloc.v: C/include/alloc.h
 C/bandmat.v: C/include/densemat.h C/include/bandmat.h
 C/densemat.v: C/include/densemat.h
 C/build_csr.v: C/include/sparse.h
+C/cblas/ddot.v: C/cblas/include/cblas.h C/cblas/src/source_dot_r.h
+C/cblas/dasum.v: C/cblas/include/cblas.h C/cblas/src/source_asum_r.h
+C/cblas/dscal.v: C/cblas/include/cblas.h C/cblas/src/source_scal_r.h
diff --git a/_CoqProject b/_CoqProject
index a404827..f5e4fb2 100644
--- a/_CoqProject
+++ b/_CoqProject
@@ -47,3 +47,18 @@ C/build_csr.v
 C/verif_build_csr.v
 C/distinct.v
 C/partial_csr.v
+
+C/cblas/ddot.v
+C/cblas/ddot_model.v
+C/cblas/spec_ddot.v
+C/cblas/verif_ddot.v
+
+C/cblas/dasum.v
+C/cblas/asum_model.v
+C/cblas/spec_dasum.v
+C/cblas/verif_dasum.v
+
+C/cblas/dscal.v
+C/cblas/scal_model.v
+C/cblas/spec_dscal.v
+C/cblas/verif_dscal.v
diff --git a/index.html b/index.html
index 6cfba60..db2e33a 100644
--- a/index.html
+++ b/index.html
@@ -62,6 +62,7 @@ <h3 class="section">C programs</h3>
 <li><a href="bandmat.html">bandmat.[c,h]</a>:&nbsp;&nbsp;&nbsp;Symmetric band matrix operations
 <li><a href="sparse.html">sparse.[c,h]</a>:&nbsp;&nbsp;&nbsp;CSR (Compressed Sparse Row) matrix operations
 <li><a href="build_csr.html">build_csr.c</a>:&nbsp;&nbsp;&nbsp;CSR (Compressed Sparse Row) matrix construction
+<li><a href="cblas_ddot.html">ddot.c</a>, <a href="cblas_dasum.html">dasum.c</a>, <a href="cblas_dscal.html">dscal.c</a>:&nbsp;&nbsp;&nbsp;GSL CBLAS Level-1 vector routines
 </ul>
 
 <h3 class="section">Algorithm Accuracy Proofs</h3>
@@ -163,6 +164,21 @@ <h3 class="section">C Program Correctness Proofs</h3>
       <li><a href="LAProof.C.verif_build_csr.html">Verification of COO->CSR conversion</a>
       <li><a href="LAProof.C.VSU_sparse.html">Verified Software Unit</a>
     </ul>
+  <dt>BLAS Level-1 (GSL CBLAS): <a href="cblas_ddot.html">ddot.c</a>, <a href="cblas_dasum.html">dasum.c</a>, <a href="cblas_dscal.html">dscal.c</a>
+    <dd><ul>
+      <li>Functional models and model/accuracy-model bridge lemmas:
+        <a href="LAProof.C.cblas.ddot_model.html">ddot</a>,
+        <a href="LAProof.C.cblas.asum_model.html">dasum</a>,
+        <a href="LAProof.C.cblas.scal_model.html">dscal</a>
+      <li>Specifications:
+        <a href="LAProof.C.cblas.spec_ddot.html">ddot</a>,
+        <a href="LAProof.C.cblas.spec_dasum.html">dasum</a>,
+        <a href="LAProof.C.cblas.spec_dscal.html">dscal</a>
+      <li>Correctness proofs:
+        <a href="LAProof.C.cblas.verif_ddot.html">ddot</a>,
+        <a href="LAProof.C.cblas.verif_dasum.html">dasum</a>,
+        <a href="LAProof.C.cblas.verif_dscal.html">dscal</a>
+    </ul>
 </dl>
 
 <div id="footer">