diff --git a/.gitignore b/.gitignore index 7014311..9ad04a8 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,9 @@ C/bandmat.v C/build_csr.v C/cholesky.v C/densemat.v +C/cblas/ddot.v +C/cblas/dasum.v +C/cblas/dscal.v Makefile.coq Makefile.coq.conf CoqMakefile diff --git a/C/cblas/README.md b/C/cblas/README.md new file mode 100644 index 0000000..89d689a --- /dev/null +++ b/C/cblas/README.md @@ -0,0 +1,105 @@ +# LAProof × GSL CBLAS + +This directory connects **real-world BLAS code** to LAProof's verified floating-point +accuracy results. + +## Scope + +LAProof proves floating-point accuracy theorems (forward error bounds, etc.) about +*functional models* of basic linear-algebra programs (idealized list/matrix functions +that compute a result with a precisely specified sequence of IEEE-754 operations). +Those theorems live in [`accuracy_proofs/`](../../accuracy_proofs/), independent of any +particular C implementation. + +The goal of this directory is the other half of the story: take a **real-world BLAS +implementation** and *prove* that the compiled C code implements one of those +functional models. We use the CBLAS layer of the **GNU Scientific Library (GSL)** as the +implementation under verification. + +The pipeline for each routine is: + +1. Take the GSL CBLAS source (under [`src/`](src/)) essentially verbatim, with only the + minimal edits needed to run CompCert's `clightgen` on it. +2. Generate the Clight AST (a `*.v` file) with `clightgen`. +3. Define the **functional model** the C loop literally computes (in a `*_model.v` file), + and prove it equals (up to `feq`) the LAProof accuracy model over which the error + theorems are stated. +4. Write a VST **funspec** (in a `spec_*.v` file) whose postcondition relates the returned + value to the LAProof model. +5. **Verify** the C function against that funspec with VST (in a `verif_*.v` file). + +Once a routine is verified against a funspec phrased in terms of a LAProof model, the +corresponding accuracy theorem can be transferred to the value computed by the compiled C +code. The verification does *not* prove the accuracy theorem's preconditions on its own, +however. To apply the bound to the C result, one must separately establish that those +preconditions hold, and they fall into two kinds. + +- **Already guaranteed by the funspec precondition.** Some of the theorem's preconditions + are exactly things the funspec `PRE` already requires, so they hold for free (e.g. a + length precondition that the funspec already imposes on the input arrays). +- **Additional conditions one must discharge.** Other preconditions are not established by + the funspec and must be proved on the side. A typical example is a no-overflow assumption + that the accumulated result is finite: this does *not* follow from the funspec merely + requiring the inputs to be finite, since a combination of finite values can still + overflow to an infinity. Whoever applies the bound must supply this separately. + +So the honest statement is: for inputs satisfying both the funspec `PRE` and the accuracy +theorem's own hypotheses, the bound holds of the C return value. In many cases (such as the +dot products and other reduction loops verified here) the one hypothesis the funspec does +not already supply is a *no-overflow* condition: that the accumulated result is finite. + +For `cblas_ddot` that leftover precondition is exactly `Hfin`: that the accumulated result +`dotprodF X Y` is finite. It does not follow merely from the inputs being finite, since a +sum of finite products can still round to an infinity. + +The `feq` postcondition cooperates here: the funspec only guarantees that the C result is +`feq`-equal to the model value (equality up to `±0` and exceptional values), but once the result is finite, +`feq` upgrades to real-value equality, so the bound on the model value is literally a bound +on the compiled C output. + +When stating the accuracy corollary for the C function, you choose how to carry `Hfin`: + +- **Conditional corollary.** Keep `Hfin` as an explicit hypothesis. The bound then holds + whenever the result is in fact finite, and whoever applies the corollary discharges + `Hfin` for their particular inputs. (`Hfin` is a proof obligation about the result, + not a runtime check in the C code.) +- **Unconditional corollary.** Strengthen the funspec `PRE` with magnitude/length bounds + that imply finiteness. LAProof's [`finite_sum_from_bounded`](../../accuracy_proofs/sum_is_finite.v) + and the `fun_bnd_le` lemmas turn such bounds into a proof of `Hfin`, so the corollary + carries no extra hypothesis. + +### GSL CBLAS upstream + +- GSL project home: +- GSL source repository (Savannah git): +- CBLAS directory in the repo: +- Shared BLAS kernels (`source_*.h`) used by the CBLAS routines: + +GSL is licensed under the GNU GPL; the files under [`src/`](src/) retain their upstream +copyright and license headers. + +## What we've verified so far + +| Operation | Precision | Function | Files | +|-----------|-----------|----------|-------| +| Dot product | double | `cblas_ddot` | [`src/ddot.c`](src/ddot.c), [`src/source_dot_r.h`](src/source_dot_r.h), [`include/cblas.h`](include/cblas.h), [`ddot.v`](ddot.v), [`ddot_model.v`](ddot_model.v), [`spec_ddot.v`](spec_ddot.v), [`verif_ddot.v`](verif_ddot.v) | +| Sum of absolute values | double | `cblas_dasum` | [`src/dasum.c`](src/dasum.c), [`src/source_asum_r.h`](src/source_asum_r.h), [`include/cblas.h`](include/cblas.h), [`dasum.v`](dasum.v), [`asum_model.v`](asum_model.v), [`spec_dasum.v`](spec_dasum.v), [`verif_dasum.v`](verif_dasum.v) | +| Scalar multiply (in place) | double | `cblas_dscal` | [`src/dscal.c`](src/dscal.c), [`src/source_scal_r.h`](src/source_scal_r.h), [`include/cblas.h`](include/cblas.h), [`dscal.v`](dscal.v), [`scal_model.v`](scal_model.v), [`spec_dscal.v`](spec_dscal.v), [`verif_dscal.v`](verif_dscal.v) | + +**Scope limits (`cblas_ddot`, `cblas_dasum`, `cblas_dscal`):** +- **Unit stride only** (`incX = incY = 1`), as a deliberate first milestone. +- For the **reductions** (`ddot`, `dasum`) the postcondition is stated up to `feq`. +- `cblas_dasum` is verified against `sumF (map BABS X)`; its loop body calls `fabs`, + discharged with VSTlib's `fabs_spec` (whose result is `BABS`). The matching accuracy + theorem is `sum_acc.fSUM`. +- `cblas_dscal` is the first **in-place** routine: it overwrites the array (writable share, + `void` return) and is verified against an *exact* model `scal_model alpha X = + map (fun x => BMULT x alpha) X` (no `feq` required here). The + per-element accuracy is a direct unit-roundoff bound on `BMULT`. + +## Conventions + +This directory mirrors the model/spec/verif split used elsewhere in +[`C/`](../) (e.g. the sparse routines' `sparse_model.v` / `spec_sparse.v` / +`verif_sparse.v`): pure model functions and their lemmas in `*_model.v`, VST funspecs in +`spec_*.v`, and `semax_body` proofs in `verif_*.v`. diff --git a/C/cblas/asum_model.v b/C/cblas/asum_model.v new file mode 100644 index 0000000..163f505 --- /dev/null +++ b/C/cblas/asum_model.v @@ -0,0 +1,83 @@ +(** * LAProof.C.cblas.asum_model: functional model of GSL's [cblas_dasum]. *) +(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *) + +(** This file provides [asum_model], the partial accumulation that the + [cblas_dasum] loop invariant tracks, with the *step* lemma (how one loop + iteration extends the accumulation) and the *end* lemma (the result of the + accumulation once the loop finishes). GSL's kernel ([source_asum_r.h]) is +<< + double r = 0.0; + for (i = 0; i < N; i++) { r += fabs(X[ix]); ix += incX; } + return r; +>> + i.e., it adds [|X[i]|] left to right into the accumulator ([BPLUS acc |x|]), + starting from [+0.0]. The C [fabs] computes [BABS] (the result of VSTlib's + [fabs_spec], whose [ff_func] is [BABS]), so the model uses [BABS]. + + The matching LAProof accuracy model is [sum_model.sumF] applied to + [map BABS X] (the absolute values); over it [sum_acc.fSUM] proves the + forward-error bound. The C's operand order ([BPLUS acc x] + vs [sumF]'s [BPLUS x acc]) and initial accumulator ([+0.0] = [Zconst Tdouble 0] + vs [sumF]'s [neg_zero]) differ, so the end/bridge lemma relates the two by + [feq] (IEEE-equality identifying [±0]/NaN). *) + +Require Import VST.floyd.proofauto. +Require Import vcfloat.VCFloat. +Require Import vcfloat.FPStdCompCert. +From vcfloat Require Import FPStdLib. +From LAProof.C Require Import floatlib. +(* [sum_model] only [Require Import]s (not [Export]s) the mathcomp preamble, so + this does not pollute the VST namespace with ssreflect notations. We do NOT + [Import common] (it would re-scope [<=]/[<] over [R]); [common.BPLUS_comm] / + [common.neg_zero] are used qualified. [BPLUS_mor], [BABS], the [feq] + [Equivalence] come from vcfloat. *) +Require Import LAProof.accuracy_proofs.sum_model. + +Set Bullet Behavior "Strict Subproofs". + +(** [asum_model X] is the value the C loop *literally* computes: a left fold with + the accumulator as the first [BPLUS] operand, separate abs-then-add, starting + from [+0.0]. This mirrors the Clight AST in [dasum.v] exactly. *) +Definition asum_loop (l: list (ftype Tdouble)) : ftype Tdouble := + fold_left (fun acc x => BPLUS acc (BABS x)) l (Zconst Tdouble 0). + +Definition asum_model (X: list (ftype Tdouble)) : ftype Tdouble := asum_loop X. + +(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense asum). *) + +Lemma asum_loop_snoc: forall l x, + asum_loop (l ++ [x]) = BPLUS (asum_loop l) (BABS x). +Proof. intros. unfold asum_loop. rewrite fold_left_app. reflexivity. Qed. + +(** *step*: extending the length-[k] prefix [X[0..k-1]] with the element [X[k]] + adds one [BABS] term, with the accumulator as the first [BPLUS] operand -- + exactly the Clight statement [r = r + fabs(X[k])]. *) +Lemma asum_model_step: forall (X: list (ftype Tdouble)) k, + 0 <= k < Zlength X -> + asum_model (sublist 0 (k+1) X) + = BPLUS (asum_model (sublist 0 k X)) (BABS (Znth k X)). +Proof. + intros X k Hk. unfold asum_model. + rewrite (sublist_last_1 0 k X) by lia. + rewrite asum_loop_snoc. reflexivity. +Qed. + +(** *end*/bridge: the finished accumulation equals LAProof's summation model + [sumF (map BABS X)] up to [feq]. Generalized-accumulator induction using + [common.BPLUS_comm] and [BPLUS_mor], with the [feq (+0.0) (-0.0)] base case + ([sumF] starts at [neg_zero]). *) +Lemma asum_model_feq_sumF: forall (X: list (ftype Tdouble)), + feq (asum_model X) (sumF (map BABS X)). +Proof. + intros X. unfold asum_model, asum_loop, sumF. + assert (Hs: feq (Zconst Tdouble 0) (@common.neg_zero Tdouble)) by reflexivity. + revert Hs. + generalize (Zconst Tdouble 0). + generalize (@common.neg_zero Tdouble). + induction X as [|x X IH]; intros g f Hs. + - (* X = nil: both folds are the (feq-linked) initial accumulators. *) + simpl. exact Hs. + - simpl. apply IH. + unfold Basics.flip. + etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ]. +Qed. diff --git a/C/cblas/ddot_model.v b/C/cblas/ddot_model.v new file mode 100644 index 0000000..ad41b49 --- /dev/null +++ b/C/cblas/ddot_model.v @@ -0,0 +1,122 @@ +(** * LAProof.C.cblas.ddot_model: functional model of GSL's [cblas_ddot]. *) +(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *) + +(** This file provides [ddot_model], the partial accumulation that the + [cblas_ddot] loop invariant tracks, with the *step* lemma (how one loop + iteration extends the accumulation) and the *end* lemma (the result of the + accumulation once the loop finishes). No VST [funspec]s or Clight ASTs + appear here -- those live in [spec_ddot] and [verif_ddot]. GSL's loop + (after macro expansion of [source_dot_r.h]) is +<< + double r = 0.0; + for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; } + return r; +>> + i.e., it adds [X[i]*Y[i]] left to right into the accumulator + ([BPLUS acc prod]), using separate multiply-then-add (not fused + multiply-add), starting from [+0.0]. + + The matching LAProof accuracy model is [dotprodF] + ([dotprodF = dotprod BMULT BPLUS pos_zero]) of + [LAProof.accuracy_proofs.dotprod_model]; over it + [dot_acc.dotprod_forward_error] proves the forward-error bound -- *not* the + FMA-based [floatlib.dotprod]. The C's operand order ([BPLUS acc prod] vs + [dotprodF]'s [BPLUS prod acc]) and initial accumulator ([+0.0] = + [Zconst Tdouble 0] vs [dotprodF]'s [pos_zero]) differ, so the end/bridge + lemma relates the two by [feq] (IEEE addition is commutative up to [feq]). *) + +Require Import VST.floyd.proofauto. +Require Import vcfloat.VCFloat. +Require Import vcfloat.FPStdCompCert. +From vcfloat Require Import FPStdLib. +From LAProof.C Require Import floatlib. +(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble, + so this does not pollute the VST namespace with ssreflect notations. We do + NOT [Import common]: that would bring mathcomp's order/ring notations into + scope (making [<=]/[<] parse over [R] and breaking [ddot_model_step]). + Instead [common.BPLUS_comm] / [common.pos_zero] are used qualified below; + [BPLUS_mor] and the [feq] [Equivalence] come from vcfloat. *) +Require Import LAProof.accuracy_proofs.dotprod_model. + +Set Bullet Behavior "Strict Subproofs". + +(** [ddot_model X Y] is the value the C loop *literally* computes: a left fold + with the accumulator as the first [BPLUS] operand, separate multiply then + add, starting from [+0.0]. This mirrors the Clight AST in [ddot.v] exactly. *) +Definition ddot_loop (xy: list (ftype Tdouble * ftype Tdouble)) : ftype Tdouble := + fold_left (fun acc p => BPLUS acc (BMULT (fst p) (snd p))) xy (Zconst Tdouble 0). + +Definition ddot_model (X Y: list (ftype Tdouble)) : ftype Tdouble := + ddot_loop (combine X Y). + +(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense ddot). *) + +Lemma combine_app_eqlen {A B} (l1 l1': list A) (l2 l2': list B): + length l1 = length l2 -> + combine (l1 ++ l1') (l2 ++ l2') = combine l1 l2 ++ combine l1' l2'. +Proof. + revert l2; induction l1; destruct l2; simpl; intros; try discriminate; auto. + f_equal; auto. +Qed. + +Lemma ddot_loop_snoc: forall xy p, + ddot_loop (xy ++ [p]) = BPLUS (ddot_loop xy) (BMULT (fst p) (snd p)). +Proof. intros. unfold ddot_loop. rewrite fold_left_app. reflexivity. Qed. + +(** *step*: extending both length-[k] prefixes [X[0..k-1]]/[Y[0..k-1]] with the + elements [X[k]]/[Y[k]] adds one [BMULT] term, with the accumulator as the + first [BPLUS] operand -- exactly the Clight statement [r = r + X[k]*Y[k]]. *) +Lemma ddot_model_step: forall (X Y: list (ftype Tdouble)) k, + Zlength X = Zlength Y -> 0 <= k < Zlength X -> + ddot_model (sublist 0 (k+1) X) (sublist 0 (k+1) Y) + = BPLUS (ddot_model (sublist 0 k X) (sublist 0 k Y)) + (BMULT (Znth k X) (Znth k Y)). +Proof. + intros X Y k Hlen Hk. unfold ddot_model. + rewrite (sublist_last_1 0 k X) by lia. + rewrite (sublist_last_1 0 k Y) by lia. + rewrite combine_app_eqlen + by (apply Nat2Z.inj; rewrite <- !Zlength_correct, !Zlength_sublist by lia; lia). + cbn [combine]. (* combine [Znth k X] [Znth k Y] = [(Znth k X, Znth k Y)] *) + rewrite ddot_loop_snoc. reflexivity. +Qed. + +(** *end*/bridge: the finished accumulation equals LAProof's accuracy model + [dotprodF] up to [feq]. They differ only by the [BPLUS] operand order + ([BPLUS acc prod] here vs [BPLUS prod acc] in [dotprodF]'s fold step), so the + proof is a generalized-accumulator induction using [common.BPLUS_comm] + (feq (BPLUS x y) (BPLUS y x)) and the [BPLUS_mor] feq-congruence, plus the + fact that [List.combine = seq.zip] on equal-length lists and that the two + initial zeros ([Zconst Tdouble 0] vs [pos_zero]) are [feq]. This is the one + remaining model obligation; once discharged, [dot_acc.dotprod_forward_error] + transfers to the C result. *) +Lemma ddot_model_feq_dotprodF: + forall (X Y: list (ftype Tdouble)), + Zlength X = Zlength Y -> + Forall finite X -> Forall finite Y -> + feq (ddot_model X Y) (dotprodF X Y). +Proof. + intros X Y Hlen _ _. + assert (Hl: length X = length Y) + by (apply Nat2Z.inj; rewrite <- !Zlength_correct; exact Hlen). + clear Hlen. + unfold ddot_model, ddot_loop, dotprodF, dotprod. + (* The two folds differ only by the initial accumulator ([Zconst Tdouble 0] vs + [pos_zero], which are [feq]) and the [BPLUS] operand order at each step. + Generalize both accumulators, keeping them [feq]-linked, and induct. *) + assert (Hs: feq (Zconst Tdouble 0) (@common.pos_zero Tdouble)) by reflexivity. + revert Y Hl Hs. + generalize (Zconst Tdouble 0). + generalize (@common.pos_zero Tdouble). + induction X as [|x X IH]; intros g f Y Hl Hs. + - (* X = nil (so Y = nil): both folds are the (feq-linked) initial accs. *) + destruct Y as [|y Y']; [ | simpl in Hl; discriminate ]. + simpl. exact Hs. + - destruct Y as [|y Y']; simpl in Hl; [ discriminate | ]. + simpl. + apply IH. + + congruence. + + (* one accumulation step: [BPLUS f prod] vs [flip BPLUS g prod = BPLUS prod g] *) + simpl. unfold Basics.flip. + etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ]. +Qed. diff --git a/C/cblas/include/cblas.h b/C/cblas/include/cblas.h new file mode 100644 index 0000000..b478671 --- /dev/null +++ b/C/cblas/include/cblas.h @@ -0,0 +1,34 @@ +#define INDEX int +#define OFFSET(N, incX) ((incX) > 0 ? 0 : ((N) - 1) * (-(incX))) +#define BLAS_ERROR(x) cblas_xerbla(0, __FILE__, x); + +#define CONJUGATE(x) ((x) == CblasConjTrans) +#define TRANSPOSE(x) ((x) == CblasTrans || (x) == CblasConjTrans) +#define UPPER(x) ((x) == CblasUpper) +#define LOWER(x) ((x) == CblasLower) + +/* Handling of packed complex types... */ + +#define REAL(a,i) (((BASE *) a)[2*(i)]) +#define IMAG(a,i) (((BASE *) a)[2*(i)+1]) + +#define REAL0(a) (((BASE *)a)[0]) +#define IMAG0(a) (((BASE *)a)[1]) + +#define CONST_REAL(a,i) (((const BASE *) a)[2*(i)]) +#define CONST_IMAG(a,i) (((const BASE *) a)[2*(i)+1]) + +#define CONST_REAL0(a) (((const BASE *)a)[0]) +#define CONST_IMAG0(a) (((const BASE *)a)[1]) + + +#define GB(KU,KL,lda,i,j) ((KU+1+(i-j))*lda + j) + +#define TRCOUNT(N,i) ((((i)+1)*(2*(N)-(i)))/2) + +/* #define TBUP(N,i,j) */ +/* #define TBLO(N,i,j) */ + +#define TPUP(N,i,j) (TRCOUNT(N,(i)-1)+(j)-(i)) +#define TPLO(N,i,j) (((i)*((i)+1))/2 + (j)) + diff --git a/C/cblas/scal_model.v b/C/cblas/scal_model.v new file mode 100644 index 0000000..c8ea503 --- /dev/null +++ b/C/cblas/scal_model.v @@ -0,0 +1,37 @@ +(** * LAProof.C.cblas.scal_model: functional model of GSL's [cblas_dscal]. *) +(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *) + +(** This file provides [scal_model], the elementwise scaling that [cblas_dscal] + computes. Scaling is a deterministic *elementwise* update with no + accumulation, so there is no loop invariant tracking a running accumulator + and no [feq] bridge -- the C program computes the model exactly. GSL's + kernel ([source_scal_r.h]) is +<< + for (i = 0; i < N; i++) { X[ix] *= alpha; ix += incX; } +>> + i.e., [X[ix] *= alpha] is [BMULT (X[ix]) alpha] (element-first), so the model + is the list map below, matching what the C program computes. + + LAProof's existing scale model [mv_mathcomp.scalemx] is alpha-first + ([map_mx (BMULT a)]) and matrix-based; relating to it would need a [BMULT] + commutativity lemma (not in LAProof) plus a vector/matrix reshape. That is + unnecessary here: the per-element accuracy (relative error <= unit roundoff) + is exactly vcfloat's correctly-rounded-[BMULT] bound. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From LAProof.C Require Import floatlib. + +Set Bullet Behavior "Strict Subproofs". + +Definition scal_model (alpha: ftype Tdouble) (X: list (ftype Tdouble)) : list (ftype Tdouble) := + map (fun x => BMULT x alpha) X. + +Lemma Zlength_scal_model: forall alpha X, + Zlength (scal_model alpha X) = Zlength X. +Proof. intros. unfold scal_model. rewrite Zlength_map. reflexivity. Qed. + +Lemma Znth_scal_model: forall alpha X k, + 0 <= k < Zlength X -> + Znth k (scal_model alpha X) = BMULT (Znth k X) alpha. +Proof. intros. unfold scal_model. rewrite Znth_map by lia. reflexivity. Qed. diff --git a/C/cblas/spec_dasum.v b/C/cblas/spec_dasum.v new file mode 100644 index 0000000..8c2d6f7 --- /dev/null +++ b/C/cblas/spec_dasum.v @@ -0,0 +1,39 @@ +(** * LAProof.C.cblas.spec_dasum: VST specification of GSL's [cblas_dasum]. *) +(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *) + +(** Sibling of [LAProof.C.cblas.spec_ddot]. [cblas_dasum] computes the sum of + absolute values of a vector; the postcondition relates its result (up to + [feq]) to LAProof's summation model [sumF] applied to [map BABS X], over + which [sum_acc.fSUM] proves the forward-error bound. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From VSTlib Require Import spec_math. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import dasum asum_model. +(* for [sumF]; [Require Import] (not [Export]) keeps ssreflect out of scope. *) +Require Import LAProof.accuracy_proofs.sum_model. + +Set Bullet Behavior "Strict Subproofs". + +#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined. +Definition Vprog : varspecs. mk_varspecs prog. Defined. + +(** ** Funspec for [cblas_dasum] (unit-stride case [incX = 1]). *) +Definition cblas_dasum_spec := + DECLARE _cblas_dasum + WITH shX: share, n: Z, X: list (ftype Tdouble), px: val + PRE [ tint, tptr tdouble, tint ] + PROP (readable_share shX; Zlength X = n; 0 <= n <= Int.max_signed; + Forall finite X) + PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1)) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px) + POST [ tdouble ] + EX s: ftype Tdouble, + PROP (feq s (sumF (map BABS X))) + RETURN (Vfloat s) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px). + +Definition DasumASI : funspecs := [ cblas_dasum_spec ]. +Definition dasum_imports : funspecs := [ fabs_spec ]. (* fabs is the one external call *) +Definition Gprog : funspecs := dasum_imports ++ DasumASI. diff --git a/C/cblas/spec_ddot.v b/C/cblas/spec_ddot.v new file mode 100644 index 0000000..c3ace94 --- /dev/null +++ b/C/cblas/spec_ddot.v @@ -0,0 +1,67 @@ +(** * LAProof.C.cblas.spec_ddot: VST specification of GSL's [cblas_ddot]. *) +(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *) + +(** This file connects a real-world BLAS implementation -- GSL's double-precision + dot product [cblas_ddot] -- to LAProof's functional dot-product model. + + GSL's loop (after macro expansion of [source_dot_r.h]) is +<< + double r = 0.0; + for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; } + return r; +>> + i.e. a forward, left-to-right accumulation using *separate* multiply-then-add + (not fused multiply-add), starting from +0.0. That is exactly the + [dotprodF] model of [LAProof.accuracy_proofs.dotprod_model] + ([dotprodF = dotprod BMULT BPLUS pos_zero]), over which the forward-error + theorem [dot_acc.dotprod_forward_error] is proved -- *not* the FMA-based + [floatlib.dotprod]. + + Note on operand order: the C statement [r += X*Y] computes [BPLUS acc prod] + (accumulator first), whereas [dotprodF]'s fold step is [BPLUS prod acc] + (product first). IEEE addition is commutative *up to [feq]*, so we state the + postcondition with [feq] -- the same convention used by + [LAProof.C.spec_sparse] for [csr_row_vector_multiply_spec]. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import ddot ddot_model. +(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble, + so this does not pollute the VST namespace with ssreflect notations. We need + it here for [dotprodF], which the postcondition is stated against. *) +Require Import LAProof.accuracy_proofs.dotprod_model. + +Set Bullet Behavior "Strict Subproofs". + +#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined. +Definition Vprog : varspecs. mk_varspecs prog. Defined. + +(** The partial-accumulation model [ddot_model] and its start/step/end lemmas + live in [LAProof.C.cblas.ddot_model] (mirroring the [sparse_model] / + [spec_sparse] split). This file states only the VST [funspec]. *) + +(** ** Funspec for [cblas_ddot] (unit-stride case [incX = incY = 1]). *) +(** Unit stride is a deliberate first milestone, not a fundamental limit. *) +Definition cblas_ddot_spec := + DECLARE _cblas_ddot + WITH shX: share, shY: share, n: Z, + X: list (ftype Tdouble), Y: list (ftype Tdouble), + px: val, py: val + PRE [ tint, tptr tdouble, tint, tptr tdouble, tint ] + PROP (readable_share shX; readable_share shY; + Zlength X = n; Zlength Y = n; 0 <= n <= Int.max_signed; + Forall finite X; Forall finite Y) + PARAMS (Vint (Int.repr n); px; Vint (Int.repr 1); py; Vint (Int.repr 1)) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px; + data_at shY (tarray tdouble n) (map Vfloat Y) py) + POST [ tdouble ] + EX s: ftype Tdouble, + PROP (feq s (dotprodF X Y)) + RETURN (Vfloat s) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px; + data_at shY (tarray tdouble n) (map Vfloat Y) py). + +Definition DdotASI : funspecs := [ cblas_ddot_spec ]. +Definition ddot_imports : funspecs := []. (* no external calls: the loop is self-contained *) +Definition Gprog : funspecs := ddot_imports ++ DdotASI. diff --git a/C/cblas/spec_dscal.v b/C/cblas/spec_dscal.v new file mode 100644 index 0000000..c2db868 --- /dev/null +++ b/C/cblas/spec_dscal.v @@ -0,0 +1,35 @@ +(** * LAProof.C.cblas.spec_dscal: VST specification of GSL's [cblas_dscal]. *) +(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *) + +(** [cblas_dscal] scales a vector in place by [alpha]. Because scaling is + deterministic elementwise, the postcondition is stated as an *exact* equality + (the array is left holding [scal_model alpha X]); no [feq]/[EX] is needed, + unlike the reduction routines [ddot]/[dasum]. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import dscal scal_model. + +Set Bullet Behavior "Strict Subproofs". + +#[export] Instance CompSpecs : compspecs. make_compspecs prog. Defined. +Definition Vprog : varspecs. mk_varspecs prog. Defined. + +(** ** Funspec for [cblas_dscal] (unit-stride case [incX = 1]). *) +(** Unit stride is a first milestone, not a fundamental limit.*) +Definition cblas_dscal_spec := + DECLARE _cblas_dscal + WITH sh: share, n: Z, alpha: ftype Tdouble, X: list (ftype Tdouble), px: val + PRE [ tint, tdouble, tptr tdouble, tint ] + PROP (writable_share sh; Zlength X = n; 0 <= n <= Int.max_signed) + PARAMS (Vint (Int.repr n); Vfloat alpha; px; Vint (Int.repr 1)) + SEP (data_at sh (tarray tdouble n) (map Vfloat X) px) + POST [ tvoid ] + PROP () + RETURN () + SEP (data_at sh (tarray tdouble n) (map Vfloat (scal_model alpha X)) px). + +Definition DscalASI : funspecs := [ cblas_dscal_spec ]. +Definition dscal_imports : funspecs := []. (* no external calls *) +Definition Gprog : funspecs := dscal_imports ++ DscalASI. diff --git a/C/cblas/src/dasum.c b/C/cblas/src/dasum.c new file mode 100644 index 0000000..ec02c99 --- /dev/null +++ b/C/cblas/src/dasum.c @@ -0,0 +1,24 @@ +//ldoc on +/** + * # `dasum.c`: GSL CBLAS real double-precision sum of absolute values + * + * Thin wrapper that instantiates the generic kernel `source_asum_r.h` at type + * `double`. Verified in `LAProof.C.cblas.verif_dasum` against the functional + * model `LAProof.C.cblas.asum_model`. + */ +/* The following GSL headers are commented out for clightgen: they are + unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute + nothing to cblas_dasum's body beyond the libc fabs, which we obtain directly + from (as the other LAProof C sources do, e.g. densemat.c). */ +/* #include */ +/* #include */ +#include +#include "cblas.h" + +double +cblas_dasum (const int N, const double *X, const int incX) +{ +#define BASE double +#include "source_asum_r.h" +#undef BASE +} diff --git a/C/cblas/src/ddot.c b/C/cblas/src/ddot.c new file mode 100644 index 0000000..45e8c19 --- /dev/null +++ b/C/cblas/src/ddot.c @@ -0,0 +1,27 @@ +//ldoc on +/** + * # `ddot.c`: GSL CBLAS real double-precision dot product + * + * Thin wrapper that instantiates the generic kernel `source_dot_r.h` at type + * `double`. Verified in `LAProof.C.cblas.verif_ddot` against the functional + * model `LAProof.C.cblas.ddot_model`. + */ +/* The following GSL headers are commented out for clightgen: they are + unused by cblas_ddot's body (only OFFSET/INDEX from "cblas.h" are needed) + and are unresolvable here because GSL's gsl/ symlink dir is not generated. */ +/* #include */ +/* #include */ +#include "cblas.h" + +double +cblas_ddot (const int N, const double *X, const int incX, const double *Y, + const int incY) +{ +#define INIT_VAL 0.0 +#define ACC_TYPE double +#define BASE double +#include "source_dot_r.h" +#undef ACC_TYPE +#undef BASE +#undef INIT_VAL +} diff --git a/C/cblas/src/dscal.c b/C/cblas/src/dscal.c new file mode 100644 index 0000000..4b2e0e6 --- /dev/null +++ b/C/cblas/src/dscal.c @@ -0,0 +1,23 @@ +//ldoc on +/** + * # `dscal.c`: GSL CBLAS real double-precision scaling (in place) + * + * Thin wrapper that instantiates the generic kernel `source_scal_r.h` at type + * `double`. Verified in `LAProof.C.cblas.verif_dscal` against the functional + * model `LAProof.C.cblas.scal_model`. + */ +/* The following GSL headers are commented out for clightgen: they are + unresolvable here (GSL's gsl/ symlink dir is not generated) and contribute + nothing to cblas_dscal's body (only INDEX from "cblas.h" is needed; the loop + uses no libc math). */ +/* #include */ +/* #include */ +#include "cblas.h" + +void +cblas_dscal (const int N, const double alpha, double *X, const int incX) +{ +#define BASE double +#include "source_scal_r.h" +#undef BASE +} diff --git a/C/cblas/src/source_asum_r.h b/C/cblas/src/source_asum_r.h new file mode 100644 index 0000000..65a3515 --- /dev/null +++ b/C/cblas/src/source_asum_r.h @@ -0,0 +1,43 @@ +/* blas/source_asum_r.h + * + * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +//ldoc on +/** + * ## Accumulation kernel (`source_asum_r.h`) + * + * A forward, left-to-right accumulation of `fabs(X[ix])` into `r`, starting + * from `+0.0`. The `incX <= 0` guard returns `0` for nonpositive stride; with + * unit stride (`incX = 1`) it is dead and `ix` advances one element per + * iteration. + */ +{ + BASE r = 0.0; + INDEX i; + INDEX ix = 0; + + if (incX <= 0) { + return 0; + } + + for (i = 0; i < N; i++) { + r += fabs(X[ix]); + ix += incX; + } + return r; +} diff --git a/C/cblas/src/source_dot_r.h b/C/cblas/src/source_dot_r.h new file mode 100644 index 0000000..0775c77 --- /dev/null +++ b/C/cblas/src/source_dot_r.h @@ -0,0 +1,41 @@ +/* blas/source_dot_r.h + * + * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +//ldoc on +/** + * ## Accumulation kernel (`source_dot_r.h`) + * + * A forward, left-to-right accumulation with separate multiply-then-add (not + * fused). With unit stride (`incX = incY = 1`) the `OFFSET` conditionals are + * `0` and the indices `ix`, `iy` advance one element per iteration. + */ +{ + ACC_TYPE r = INIT_VAL; + INDEX i; + INDEX ix = OFFSET(N, incX); + INDEX iy = OFFSET(N, incY); + + for (i = 0; i < N; i++) { + r += X[ix] * Y[iy]; + ix += incX; + iy += incY; + } + + return r; +} diff --git a/C/cblas/src/source_scal_r.h b/C/cblas/src/source_scal_r.h new file mode 100644 index 0000000..4bc0fe9 --- /dev/null +++ b/C/cblas/src/source_scal_r.h @@ -0,0 +1,40 @@ +/* blas/source_scal_r.h + * + * Copyright (C) 1996, 1997, 1998, 1999, 2000 Gerard Jungman + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 3 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +//ldoc on +/** + * ## Scaling kernel (`source_scal_r.h`) + * + * An in-place elementwise update `X[ix] *= alpha` with no accumulation. The + * `incX <= 0` guard returns early for nonpositive stride; with unit stride + * (`incX = 1`) it is dead and `ix` advances one element per iteration. + */ +{ + INDEX i; + INDEX ix = 0; + + if (incX <= 0) { + return; + } + + for (i = 0; i < N; i++) { + X[ix] *= alpha; + ix += incX; + } +} diff --git a/C/cblas/verif_dasum.v b/C/cblas/verif_dasum.v new file mode 100644 index 0000000..0938c66 --- /dev/null +++ b/C/cblas/verif_dasum.v @@ -0,0 +1,60 @@ +(** * LAProof.C.cblas.verif_dasum: VST proof for GSL's [cblas_dasum]. *) +(** ** Corresponds to C program [C/cblas/src/dasum.c]. *) + +(** [body_cblas_dasum] proves that the compiled C [cblas_dasum] (unit stride) + refines LAProof's summation model: its result is [feq]-equal to + [sumF (map BABS X)]. The loop invariant carries the partial accumulation + [asum_model (sublist 0 k X)]; the step and bridge reasoning that discharges + it lives in [LAProof.C.cblas.asum_model]. The loop body makes an [fabs] + [forward_call], and a dead [if (incX<=0) return 0;] guard precedes the loop + (unreachable since [incX = 1]). *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From VSTlib Require Import spec_math. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import dasum asum_model spec_dasum. +Require Import LAProof.accuracy_proofs.sum_model. + +Set Bullet Behavior "Strict Subproofs". + +Lemma body_cblas_dasum: semax_body Vprog Gprog f_cblas_dasum cblas_dasum_spec. +Proof. +start_function. +forward. (* r = 0.0 *) +forward. (* ix = 0 *) +(* if (incX <= 0) return 0; -- dead, incX = 1 *) +forward_if. +{ rep_lia. } (* then-branch: incX <= 0 contradicts incX = 1 *) +(* accumulation loop *) +forward_for_simple_bound (Zlength X) + (EX k:Z, + PROP () + LOCAL (temp _r (Vfloat (asum_model (sublist 0 k X))); + temp _ix (Vint (Int.repr k)); + temp _N (Vint (Int.repr (Zlength X))); + temp _incX (Vint (Int.repr 1)); + temp _X px) + SEP (data_at shX (tarray tdouble (Zlength X)) (map Vfloat X) px))%assert. +- (* entry (k = 0): asum_model nil = +0.0 *) + entailer!!. +- (* body: t = X[ix]; r += fabs(t); ix += incX *) + forward. (* t'2 = X[ix] *) + forward_call (Znth i X). (* t'1 = fabs(X[i]) = BABS (Znth i X) *) + forward. (* r = r + t'1 *) + forward. (* ix = ix + incX *) + entailer!!. + rewrite (asum_model_step X i) by list_solve. + reflexivity. +- (* exit (k = Zlength X): return r = asum_model X *) + forward. (* return r *) + Exists (asum_model (sublist 0 (Zlength X) X)). + entailer!!. + rewrite sublist_same by lia. + apply asum_model_feq_sumF. +Qed. + +(** Payoff (stub): because the result is [feq]-equal to [sumF (map BABS X)], the + forward-error bound [LAProof.accuracy_proofs.sum_acc.fSUM] (applied to the + list [map BABS X]) bounds the C result's error whenever that result is finite + (the no-overflow hypothesis [Hfin]; see [C/cblas/README.md]). *) diff --git a/C/cblas/verif_ddot.v b/C/cblas/verif_ddot.v new file mode 100644 index 0000000..1ac765a --- /dev/null +++ b/C/cblas/verif_ddot.v @@ -0,0 +1,94 @@ +(** * LAProof.C.cblas.verif_ddot: VST proof for GSL's [cblas_ddot]. *) +(** ** Corresponds to C program [C/cblas/src/ddot.c]. *) + +(** [body_cblas_ddot] proves that the compiled C [cblas_ddot] (unit stride) + refines the [dotprodF] accuracy model: its result is [feq]-equal to + [dotprodF X Y]. The loop invariant carries the partial accumulation + [ddot_model (sublist 0 k X) (sublist 0 k Y)]; the step and bridge reasoning + that discharges it lives in [LAProof.C.cblas.ddot_model]. Two dead [OFFSET] + guards (the [incX>0]/[incY>0] conditionals) precede the loop, unreachable + since [incX = incY = 1]. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import ddot ddot_model spec_ddot. +Require Import LAProof.accuracy_proofs.dotprod_model. + +Set Bullet Behavior "Strict Subproofs". + +Lemma body_cblas_ddot: semax_body Vprog Gprog f_cblas_ddot cblas_ddot_spec. +Proof. +start_function. +(* C: r = 0.0; ix = OFFSET(N,incX); iy = OFFSET(N,incY); with incX=incY=1 ⇒ ix=iy=0. + Each OFFSET is a (incX>0 ? 0 : ...) conditional; the else-branch is + unreachable here because incX=incY=1, so [rep_lia] closes it. *) +forward. (* _r = 0.0 *) +forward_if (PROP () + LOCAL (temp _t'1 (Vint (Int.repr 0)); + temp _r (Vfloat (Float.of_bits (Int64.repr 0))); + temp _N (Vint (Int.repr n)); temp _X px; + temp _incX (Vint (Int.repr 1)); temp _Y py; + temp _incY (Vint (Int.repr 1))) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px; + data_at shY (tarray tdouble n) (map Vfloat Y) py)). +{ forward. entailer!!. } (* incX = 1 > 0 : t'1 := 0 *) +{ rep_lia. } (* dead else-branch (incX = 1 is > 0) *) +forward. (* _ix = _t'1 (= 0) *) +forward_if (PROP () + LOCAL (temp _t'2 (Vint (Int.repr 0)); + temp _ix (Vint (Int.repr 0)); + temp _r (Vfloat (Float.of_bits (Int64.repr 0))); + temp _N (Vint (Int.repr n)); temp _X px; + temp _incX (Vint (Int.repr 1)); temp _Y py; + temp _incY (Vint (Int.repr 1))) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px; + data_at shY (tarray tdouble n) (map Vfloat Y) py)). +{ forward. entailer!!. } (* incY = 1 > 0 : t'2 := 0 *) +{ rep_lia. } (* dead else-branch (incY = 1 is > 0) *) +forward. (* _iy = _t'2 (= 0) *) +(* The accumulation loop. Invariant: _r holds the value the C loop has + accumulated over the first k element-pairs, namely [ddot_model] of the + length-k prefixes; the strided indices ix,iy advance by 1 each step (= k). *) +forward_for_simple_bound n + (EX k:Z, + PROP () + LOCAL (temp _r (Vfloat (ddot_model (sublist 0 k X) (sublist 0 k Y))); + temp _ix (Vint (Int.repr k)); + temp _iy (Vint (Int.repr k)); + temp _N (Vint (Int.repr n)); + temp _incX (Vint (Int.repr 1)); + temp _incY (Vint (Int.repr 1)); + temp _X px; temp _Y py) + SEP (data_at shX (tarray tdouble n) (map Vfloat X) px; + data_at shY (tarray tdouble n) (map Vfloat Y) py))%assert. +- (* Loop entry (k = 0). [sublist 0 0 = nil] and [ddot_model nil nil = + Zconst Tdouble 0] (= the C [+0.0]); [entailer!!] discharges it. *) + entailer!!. +- (* Loop body: read X[i], Y[i]; [r = r + X[i]*Y[i]]; ix++; iy++. Re-establish + the invariant for [i+1] via the step lemma [ddot_model_step]. *) + forward. (* t'3 = X[ix] *) + forward. (* t'4 = Y[iy] *) + forward. (* r = r + t'3 * t'4 *) + forward. (* ix = ix + incX *) + forward. (* iy = iy + incY *) + assert (Hi: 0 <= i < Zlength X) by list_solve. + assert (Hxy: Zlength X = Zlength Y) by list_solve. + entailer!. + rewrite (ddot_model_step X Y i Hxy Hi). + reflexivity. +- (* Loop exit (k = n). Return [r = ddot_model (sublist 0 n X) (sublist 0 n Y)]; + [sublist_same] gives [= ddot_model X Y], and the bridge lemma + [ddot_model_feq_dotprodF] discharges [feq _ (dotprodF X Y)]. *) + forward. (* return r *) + Exists (ddot_model (sublist 0 (Zlength X) X) (sublist 0 (Zlength X) Y)). + entailer!. + rewrite (sublist_same 0 (Zlength X) X) by lia. + rewrite (sublist_same 0 (Zlength X) Y) by lia. + apply ddot_model_feq_dotprodF; [ lia | auto | auto ]. +Qed. + +(** Payoff (stub): because [cblas_ddot_spec]'s result is [feq]-equal to + [dotprodF X Y], the forward-error bound + [LAProof.accuracy_proofs.dot_acc.dotprod_forward_error] applies directly to + the value computed by the compiled C [cblas_ddot]. *) diff --git a/C/cblas/verif_dscal.v b/C/cblas/verif_dscal.v new file mode 100644 index 0000000..2651116 --- /dev/null +++ b/C/cblas/verif_dscal.v @@ -0,0 +1,85 @@ +(** * LAProof.C.cblas.verif_dscal: VST proof for GSL's [cblas_dscal]. *) +(** ** Corresponds to C program [C/cblas/src/dscal.c]. *) + +(** [body_cblas_dscal] proves that after the compiled C [cblas_dscal] (unit + stride) the array holds exactly [scal_model alpha X] (= [map (fun x => BMULT + x alpha) X]). This is an in-place routine: a writable array with a store in + the loop, whose invariant is "scaled prefix ++ original suffix"; the per-step + [upd_Znth]/[sublist] identity is discharged by [list_solve]. *) + +Require Import VST.floyd.proofauto. +From vcfloat Require Import FPStdCompCert FPStdLib. +From LAProof.C Require Import floatlib. +From LAProof.C.cblas Require Import dscal scal_model spec_dscal. + +Set Bullet Behavior "Strict Subproofs". + +Lemma body_cblas_dscal: semax_body Vprog Gprog f_cblas_dscal cblas_dscal_spec. +Proof. +start_function. +forward. (* ix = 0 *) +forward_if. +{ rep_lia. } (* dead: incX <= 0 contradicts incX = 1 *) +(* Invariant: the array holds a list [L] of length [Zlength X] whose first [k] + entries are already scaled ([Znth j X * alpha]) and whose remaining entries + are still the original [X]. Keeping the contents as [map Vfloat L] for a + *plain* list [L] lets the load and store simplify; the pointwise [Znth] + constraints make each step a simple [upd_Znth_same]/[upd_Znth_diff] case. *) +forward_for_simple_bound (Zlength X) + (EX k:Z, EX L:list (ftype Tdouble), + PROP (Zlength L = Zlength X; + forall j, 0 <= j < k -> Znth j L = (Znth j X * alpha)%F64; + forall j, k <= j < Zlength X -> Znth j L = Znth j X) + LOCAL (temp _alpha (Vfloat alpha); + temp _ix (Vint (Int.repr k)); + temp _N (Vint (Int.repr (Zlength X))); + temp _incX (Vint (Int.repr 1)); + temp _X px) + SEP (data_at sh (tarray tdouble (Zlength X)) (map Vfloat L) px))%assert. +- (* entry (k = 0): L := X *) + Exists X. entailer!!. +- (* body: t = X[ix]; X[ix] = t * alpha; ix += incX *) + Intros. + (* Re-assert the invariant facts under explicit names so they survive the + [forward]s (which clear the auto-named PROP hypotheses). *) + assert (Hlen: Zlength L = Zlength X) by assumption. + assert (Hpre: forall j, 0 <= j < i -> Znth j L = (Znth j X * alpha)%F64) by assumption. + assert (Hsuf: forall j, i <= j < Zlength X -> Znth j L = Znth j X) by assumption. + assert (Hi: 0 <= i < Zlength L) by lia. + assert (HLX: Znth i L = Znth i X) by (apply Hsuf; lia). + forward. (* t'1 = X[ix] *) + forward. (* X[ix] = t'1 * alpha *) + forward. (* ix += incX *) + Exists (upd_Znth i L (BMULT (Znth i X) alpha)). + entailer!!. + 2: { (* SEP: the stored value [X[i]*alpha] is the model element [BMULT .. alpha] *) + apply derives_refl'. f_equal. rewrite <- upd_Znth_map. f_equal. + rewrite ?Znth_map by lia. + replace (@Znth float Inhabitant_float i L) + with (@Znth float Inhabitant_float i X) by (symmetry; apply HLX). + reflexivity. } + (* the three pointwise list invariants for the updated list *) + split3. + { list_solve. } + { intros j Hj. destruct (Z.eq_dec j i) as [->|Hne]. + { rewrite upd_Znth_same by lia. reflexivity. } + { rewrite upd_Znth_diff by lia. apply Hpre; lia. } } + { intros j Hj. rewrite upd_Znth_diff by lia. apply Hsuf; lia. } +- (* exit (k = Zlength X): the whole array now holds [scal_model alpha X] *) + Intros L. + assert (Hpre: forall j, 0 <= j < Zlength X -> Znth j L = (Znth j X * alpha)%F64) + by assumption. + assert (Hlen: Zlength L = Zlength X) by assumption. + assert (L = scal_model alpha X). { + apply Znth_eq_ext; [ rewrite Zlength_scal_model; lia | ]. + intros j Hj. rewrite Hlen in Hj. + rewrite Znth_scal_model by lia. rewrite Hpre by lia. reflexivity. } + subst L. entailer!!. +Qed. + +(** Payoff: each element [BMULT (Znth k X) alpha] is the correctly-rounded + product, so when that product is finite its relative error vs the exact + product is bounded by the unit roundoff (vcfloat's [BMULT] accuracy lemma). + A product can still overflow, but the finiteness (no-overflow) requirement is + per element -- each [X[k]*alpha] independently -- rather than a single global + hypothesis over an accumulation. *) diff --git a/Makefile b/Makefile index 574bac7..99355c4 100644 --- a/Makefile +++ b/Makefile @@ -20,6 +20,7 @@ clean: if [ -e Makefile.coq ]; then $(MAKE) -f Makefile.coq cleanall; fi $(RM) Makefile.coq Makefile.coq.conf $(RM) C/sparse.v C/cholesky.v C/alloc.v C/densemat.v C/bandmat.v + $(RM) C/cblas/ddot.v C/cblas/dasum.v C/cblas/dscal.v INSTALLFILES1 ?= $(shell awk '/accuracy_proofs/{print $$NF"o"}' _CoqProject) diff --git a/Makefile.coq.local b/Makefile.coq.local index 56581ac..d395271 100644 --- a/Makefile.coq.local +++ b/Makefile.coq.local @@ -2,7 +2,7 @@ COQDOCEXTRAFLAGS= -g --no-lib-name --with-header header.html --index genindex --lib-subtitles --interpolate -utf8 accuracy: accuracy_proofs/export.vo -C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo +C: C/verif_alloc.vo C/verif_sparse.vo C/verif_sparse_byrows.vo C/VSU_densemat.vo C/verif_build_csr.vo C/cblas/verif_ddot.vo C/cblas/verif_dasum.vo C/cblas/verif_dscal.vo # Ugh. We can't just store the reference copy of index.html in html/index.html, because Makefile.coq's # standard "make clean" removes the html directory entirely. So we gotta do this hack: @@ -31,7 +31,8 @@ install-extra:: $(CFILES) $(HFILES) # Quarto documentation of C programs LDOC= lua cdoc/ldoc.lua -CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd +CDOCS= cdoc/alloc.qmd cdoc/densemat.qmd cdoc/bandmat.qmd cdoc/sparse.qmd cdoc/build_csr.qmd \ + cdoc/cblas_ddot.qmd cdoc/cblas_dasum.qmd cdoc/cblas_dscal.qmd cdoc/%.html: cdoc/%.qmd cd cdoc; quarto render $(notdir $^) --to html --quiet @@ -42,6 +43,14 @@ cdoc/%.qmd: C/include/%.h C/src/%.c cdoc/build_csr.qmd: C/include/sparse.h C/src/build_csr.c $(LDOC) -highlight c -p quarto -o $@ C/src/build_csr.c +# GSL CBLAS routines: document the wrapper .c together with its source_*_r.h kernel. +cdoc/cblas_ddot.qmd: C/cblas/src/ddot.c C/cblas/src/source_dot_r.h + $(LDOC) -highlight c -p quarto -o $@ $^ +cdoc/cblas_dasum.qmd: C/cblas/src/dasum.c C/cblas/src/source_asum_r.h + $(LDOC) -highlight c -p quarto -o $@ $^ +cdoc/cblas_dscal.qmd: C/cblas/src/dscal.c C/cblas/src/source_scal_r.h + $(LDOC) -highlight c -p quarto -o $@ $^ + cdocs: $(patsubst %.qmd,%.html,$(CDOCS)) @@ -49,6 +58,10 @@ cdocs: $(patsubst %.qmd,%.html,$(CDOCS)) C/%.v: C/src/%.c clightgen -normalize -IC/include $< -o $@ +# cblas subdirectory: GSL BLAS routines ported into LAProof +C/cblas/%.v: C/cblas/src/%.c + clightgen -normalize -IC/cblas/include $< -o $@ + # the rest of this is adapted from the makedepend part of ../Makefile C/sparse.v: C/include/sparse.h @@ -57,3 +70,6 @@ C/alloc.v: C/include/alloc.h C/bandmat.v: C/include/densemat.h C/include/bandmat.h C/densemat.v: C/include/densemat.h C/build_csr.v: C/include/sparse.h +C/cblas/ddot.v: C/cblas/include/cblas.h C/cblas/src/source_dot_r.h +C/cblas/dasum.v: C/cblas/include/cblas.h C/cblas/src/source_asum_r.h +C/cblas/dscal.v: C/cblas/include/cblas.h C/cblas/src/source_scal_r.h diff --git a/_CoqProject b/_CoqProject index a404827..f5e4fb2 100644 --- a/_CoqProject +++ b/_CoqProject @@ -47,3 +47,18 @@ C/build_csr.v C/verif_build_csr.v C/distinct.v C/partial_csr.v + +C/cblas/ddot.v +C/cblas/ddot_model.v +C/cblas/spec_ddot.v +C/cblas/verif_ddot.v + +C/cblas/dasum.v +C/cblas/asum_model.v +C/cblas/spec_dasum.v +C/cblas/verif_dasum.v + +C/cblas/dscal.v +C/cblas/scal_model.v +C/cblas/spec_dscal.v +C/cblas/verif_dscal.v diff --git a/index.html b/index.html index 6cfba60..db2e33a 100644 --- a/index.html +++ b/index.html @@ -62,6 +62,7 @@

C programs

  • bandmat.[c,h]:   Symmetric band matrix operations
  • sparse.[c,h]:   CSR (Compressed Sparse Row) matrix operations
  • build_csr.c:   CSR (Compressed Sparse Row) matrix construction +
  • ddot.c, dasum.c, dscal.c:   GSL CBLAS Level-1 vector routines

    Algorithm Accuracy Proofs

    @@ -163,6 +164,21 @@

    C Program Correctness Proofs

  • Verification of COO->CSR conversion
  • Verified Software Unit +
    BLAS Level-1 (GSL CBLAS): ddot.c, dasum.c, dscal.c +