Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ C/bandmat.v
C/build_csr.v
C/cholesky.v
C/densemat.v
C/cblas/ddot.v
C/cblas/dasum.v
C/cblas/dscal.v
Makefile.coq
Makefile.coq.conf
CoqMakefile
Expand Down
105 changes: 105 additions & 0 deletions C/cblas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# LAProof × GSL CBLAS

This directory connects **real-world BLAS code** to LAProof's verified floating-point
accuracy results.

## Scope

LAProof proves floating-point accuracy theorems (forward error bounds, etc.) about
*functional models* of basic linear-algebra programs (idealized list/matrix functions
that compute a result with a precisely specified sequence of IEEE-754 operations).
Those theorems live in [`accuracy_proofs/`](../../accuracy_proofs/), independent of any
particular C implementation.

The goal of this directory is the other half of the story: take a **real-world BLAS
implementation** and *prove* that the compiled C code implements one of those
functional models. We use the CBLAS layer of the **GNU Scientific Library (GSL)** as the
implementation under verification.

The pipeline for each routine is:

1. Take the GSL CBLAS source (under [`src/`](src/)) essentially verbatim, with only the
minimal edits needed to run CompCert's `clightgen` on it.
2. Generate the Clight AST (a `*.v` file) with `clightgen`.
3. Define the **functional model** the C loop literally computes (in a `*_model.v` file),
and prove it equals (up to `feq`) the LAProof accuracy model over which the error
theorems are stated.
4. Write a VST **funspec** (in a `spec_*.v` file) whose postcondition relates the returned
value to the LAProof model.
5. **Verify** the C function against that funspec with VST (in a `verif_*.v` file).

Once a routine is verified against a funspec phrased in terms of a LAProof model, the
corresponding accuracy theorem can be transferred to the value computed by the compiled C
code. The verification does *not* prove the accuracy theorem's preconditions on its own,
however. To apply the bound to the C result, one must separately establish that those
preconditions hold, and they fall into two kinds.

- **Already guaranteed by the funspec precondition.** Some of the theorem's preconditions
are exactly things the funspec `PRE` already requires, so they hold for free (e.g. a
length precondition that the funspec already imposes on the input arrays).
- **Additional conditions one must discharge.** Other preconditions are not established by
the funspec and must be proved on the side. A typical example is a no-overflow assumption
that the accumulated result is finite: this does *not* follow from the funspec merely
requiring the inputs to be finite, since a combination of finite values can still
overflow to an infinity. Whoever applies the bound must supply this separately.

So the honest statement is: for inputs satisfying both the funspec `PRE` and the accuracy
theorem's own hypotheses, the bound holds of the C return value. In many cases (such as the
dot products and other reduction loops verified here) the one hypothesis the funspec does
not already supply is a *no-overflow* condition: that the accumulated result is finite.

For `cblas_ddot` that leftover precondition is exactly `Hfin`: that the accumulated result
`dotprodF X Y` is finite. It does not follow merely from the inputs being finite, since a
sum of finite products can still round to an infinity.

The `feq` postcondition cooperates here: the funspec only guarantees that the C result is
`feq`-equal to the model value (equality up to `±0` and exceptional values), but once the result is finite,
`feq` upgrades to real-value equality, so the bound on the model value is literally a bound
on the compiled C output.

When stating the accuracy corollary for the C function, you choose how to carry `Hfin`:

- **Conditional corollary.** Keep `Hfin` as an explicit hypothesis. The bound then holds
whenever the result is in fact finite, and whoever applies the corollary discharges
`Hfin` for their particular inputs. (`Hfin` is a proof obligation about the result,
not a runtime check in the C code.)
- **Unconditional corollary.** Strengthen the funspec `PRE` with magnitude/length bounds
that imply finiteness. LAProof's [`finite_sum_from_bounded`](../../accuracy_proofs/sum_is_finite.v)
and the `fun_bnd_le` lemmas turn such bounds into a proof of `Hfin`, so the corollary
carries no extra hypothesis.

### GSL CBLAS upstream

- GSL project home: <https://www.gnu.org/software/gsl/>
- GSL source repository (Savannah git): <https://git.savannah.gnu.org/cgit/gsl.git>
- CBLAS directory in the repo: <https://git.savannah.gnu.org/cgit/gsl.git/tree/cblas>
- Shared BLAS kernels (`source_*.h`) used by the CBLAS routines: <https://git.savannah.gnu.org/cgit/gsl.git/tree/blas>

GSL is licensed under the GNU GPL; the files under [`src/`](src/) retain their upstream
copyright and license headers.

## What we've verified so far

| Operation | Precision | Function | Files |
|-----------|-----------|----------|-------|
| Dot product | double | `cblas_ddot` | [`src/ddot.c`](src/ddot.c), [`src/source_dot_r.h`](src/source_dot_r.h), [`include/cblas.h`](include/cblas.h), [`ddot.v`](ddot.v), [`ddot_model.v`](ddot_model.v), [`spec_ddot.v`](spec_ddot.v), [`verif_ddot.v`](verif_ddot.v) |
| Sum of absolute values | double | `cblas_dasum` | [`src/dasum.c`](src/dasum.c), [`src/source_asum_r.h`](src/source_asum_r.h), [`include/cblas.h`](include/cblas.h), [`dasum.v`](dasum.v), [`asum_model.v`](asum_model.v), [`spec_dasum.v`](spec_dasum.v), [`verif_dasum.v`](verif_dasum.v) |
| Scalar multiply (in place) | double | `cblas_dscal` | [`src/dscal.c`](src/dscal.c), [`src/source_scal_r.h`](src/source_scal_r.h), [`include/cblas.h`](include/cblas.h), [`dscal.v`](dscal.v), [`scal_model.v`](scal_model.v), [`spec_dscal.v`](spec_dscal.v), [`verif_dscal.v`](verif_dscal.v) |

**Scope limits (`cblas_ddot`, `cblas_dasum`, `cblas_dscal`):**
- **Unit stride only** (`incX = incY = 1`), as a deliberate first milestone.
- For the **reductions** (`ddot`, `dasum`) the postcondition is stated up to `feq`.
- `cblas_dasum` is verified against `sumF (map BABS X)`; its loop body calls `fabs`,
discharged with VSTlib's `fabs_spec` (whose result is `BABS`). The matching accuracy
theorem is `sum_acc.fSUM`.
- `cblas_dscal` is the first **in-place** routine: it overwrites the array (writable share,
`void` return) and is verified against an *exact* model `scal_model alpha X =
map (fun x => BMULT x alpha) X` (no `feq` required here). The
per-element accuracy is a direct unit-roundoff bound on `BMULT`.

## Conventions

This directory mirrors the model/spec/verif split used elsewhere in
[`C/`](../) (e.g. the sparse routines' `sparse_model.v` / `spec_sparse.v` /
`verif_sparse.v`): pure model functions and their lemmas in `*_model.v`, VST funspecs in
`spec_*.v`, and `semax_body` proofs in `verif_*.v`.
83 changes: 83 additions & 0 deletions C/cblas/asum_model.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
(** * LAProof.C.cblas.asum_model: functional model of GSL's [cblas_dasum]. *)
(** ** Corresponds to C program [C/cblas/src/dasum.c] (ported from GSL cblas). *)

(** This file provides [asum_model], the partial accumulation that the
[cblas_dasum] loop invariant tracks, with the *step* lemma (how one loop
iteration extends the accumulation) and the *end* lemma (the result of the
accumulation once the loop finishes). GSL's kernel ([source_asum_r.h]) is
<<
double r = 0.0;
for (i = 0; i < N; i++) { r += fabs(X[ix]); ix += incX; }
return r;
>>
i.e., it adds [|X[i]|] left to right into the accumulator ([BPLUS acc |x|]),
starting from [+0.0]. The C [fabs] computes [BABS] (the result of VSTlib's
[fabs_spec], whose [ff_func] is [BABS]), so the model uses [BABS].

The matching LAProof accuracy model is [sum_model.sumF] applied to
[map BABS X] (the absolute values); over it [sum_acc.fSUM] proves the
forward-error bound. The C's operand order ([BPLUS acc x]
vs [sumF]'s [BPLUS x acc]) and initial accumulator ([+0.0] = [Zconst Tdouble 0]
vs [sumF]'s [neg_zero]) differ, so the end/bridge lemma relates the two by
[feq] (IEEE-equality identifying [±0]/NaN). *)

Require Import VST.floyd.proofauto.
Require Import vcfloat.VCFloat.
Require Import vcfloat.FPStdCompCert.
From vcfloat Require Import FPStdLib.
From LAProof.C Require Import floatlib.
(* [sum_model] only [Require Import]s (not [Export]s) the mathcomp preamble, so
this does not pollute the VST namespace with ssreflect notations. We do NOT
[Import common] (it would re-scope [<=]/[<] over [R]); [common.BPLUS_comm] /
[common.neg_zero] are used qualified. [BPLUS_mor], [BABS], the [feq]
[Equivalence] come from vcfloat. *)
Require Import LAProof.accuracy_proofs.sum_model.

Set Bullet Behavior "Strict Subproofs".

(** [asum_model X] is the value the C loop *literally* computes: a left fold with
the accumulator as the first [BPLUS] operand, separate abs-then-add, starting
from [+0.0]. This mirrors the Clight AST in [dasum.v] exactly. *)
Definition asum_loop (l: list (ftype Tdouble)) : ftype Tdouble :=
fold_left (fun acc x => BPLUS acc (BABS x)) l (Zconst Tdouble 0).

Definition asum_model (X: list (ftype Tdouble)) : ftype Tdouble := asum_loop X.

(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense asum). *)

Lemma asum_loop_snoc: forall l x,
asum_loop (l ++ [x]) = BPLUS (asum_loop l) (BABS x).
Proof. intros. unfold asum_loop. rewrite fold_left_app. reflexivity. Qed.

(** *step*: extending the length-[k] prefix [X[0..k-1]] with the element [X[k]]
adds one [BABS] term, with the accumulator as the first [BPLUS] operand --
exactly the Clight statement [r = r + fabs(X[k])]. *)
Lemma asum_model_step: forall (X: list (ftype Tdouble)) k,
0 <= k < Zlength X ->
asum_model (sublist 0 (k+1) X)
= BPLUS (asum_model (sublist 0 k X)) (BABS (Znth k X)).
Proof.
intros X k Hk. unfold asum_model.
rewrite (sublist_last_1 0 k X) by lia.
rewrite asum_loop_snoc. reflexivity.
Qed.

(** *end*/bridge: the finished accumulation equals LAProof's summation model
[sumF (map BABS X)] up to [feq]. Generalized-accumulator induction using
[common.BPLUS_comm] and [BPLUS_mor], with the [feq (+0.0) (-0.0)] base case
([sumF] starts at [neg_zero]). *)
Lemma asum_model_feq_sumF: forall (X: list (ftype Tdouble)),
feq (asum_model X) (sumF (map BABS X)).
Proof.
intros X. unfold asum_model, asum_loop, sumF.
assert (Hs: feq (Zconst Tdouble 0) (@common.neg_zero Tdouble)) by reflexivity.
revert Hs.
generalize (Zconst Tdouble 0).
generalize (@common.neg_zero Tdouble).
induction X as [|x X IH]; intros g f Hs.
- (* X = nil: both folds are the (feq-linked) initial accumulators. *)
simpl. exact Hs.
- simpl. apply IH.
unfold Basics.flip.
etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
Qed.
122 changes: 122 additions & 0 deletions C/cblas/ddot_model.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
(** * LAProof.C.cblas.ddot_model: functional model of GSL's [cblas_ddot]. *)
(** ** Corresponds to C program [C/cblas/src/ddot.c] (ported from GSL cblas). *)

(** This file provides [ddot_model], the partial accumulation that the
[cblas_ddot] loop invariant tracks, with the *step* lemma (how one loop
iteration extends the accumulation) and the *end* lemma (the result of the
accumulation once the loop finishes). No VST [funspec]s or Clight ASTs
appear here -- those live in [spec_ddot] and [verif_ddot]. GSL's loop
(after macro expansion of [source_dot_r.h]) is
<<
double r = 0.0;
for (i = 0; i < N; i++) { r += X[ix] * Y[iy]; ix += incX; iy += incY; }
return r;
>>
i.e., it adds [X[i]*Y[i]] left to right into the accumulator
([BPLUS acc prod]), using separate multiply-then-add (not fused
multiply-add), starting from [+0.0].

The matching LAProof accuracy model is [dotprodF]
([dotprodF = dotprod BMULT BPLUS pos_zero]) of
[LAProof.accuracy_proofs.dotprod_model]; over it
[dot_acc.dotprod_forward_error] proves the forward-error bound -- *not* the
FMA-based [floatlib.dotprod]. The C's operand order ([BPLUS acc prod] vs
[dotprodF]'s [BPLUS prod acc]) and initial accumulator ([+0.0] =
[Zconst Tdouble 0] vs [dotprodF]'s [pos_zero]) differ, so the end/bridge
lemma relates the two by [feq] (IEEE addition is commutative up to [feq]). *)

Require Import VST.floyd.proofauto.
Require Import vcfloat.VCFloat.
Require Import vcfloat.FPStdCompCert.
From vcfloat Require Import FPStdLib.
From LAProof.C Require Import floatlib.
(* [dotprod_model] only [Require Import]s (not [Export]s) the mathcomp preamble,
so this does not pollute the VST namespace with ssreflect notations. We do
NOT [Import common]: that would bring mathcomp's order/ring notations into
scope (making [<=]/[<] parse over [R] and breaking [ddot_model_step]).
Instead [common.BPLUS_comm] / [common.pos_zero] are used qualified below;
[BPLUS_mor] and the [feq] [Equivalence] come from vcfloat. *)
Require Import LAProof.accuracy_proofs.dotprod_model.

Set Bullet Behavior "Strict Subproofs".

(** [ddot_model X Y] is the value the C loop *literally* computes: a left fold
with the accumulator as the first [BPLUS] operand, separate multiply then
add, starting from [+0.0]. This mirrors the Clight AST in [ddot.v] exactly. *)
Definition ddot_loop (xy: list (ftype Tdouble * ftype Tdouble)) : ftype Tdouble :=
fold_left (fun acc p => BPLUS acc (BMULT (fst p) (snd p))) xy (Zconst Tdouble 0).

Definition ddot_model (X Y: list (ftype Tdouble)) : ftype Tdouble :=
ddot_loop (combine X Y).

(** ** Model lemmas (the [partial_row_*] triple, specialised to a dense ddot). *)

Lemma combine_app_eqlen {A B} (l1 l1': list A) (l2 l2': list B):
length l1 = length l2 ->
combine (l1 ++ l1') (l2 ++ l2') = combine l1 l2 ++ combine l1' l2'.
Proof.
revert l2; induction l1; destruct l2; simpl; intros; try discriminate; auto.
f_equal; auto.
Qed.

Lemma ddot_loop_snoc: forall xy p,
ddot_loop (xy ++ [p]) = BPLUS (ddot_loop xy) (BMULT (fst p) (snd p)).
Proof. intros. unfold ddot_loop. rewrite fold_left_app. reflexivity. Qed.

(** *step*: extending both length-[k] prefixes [X[0..k-1]]/[Y[0..k-1]] with the
elements [X[k]]/[Y[k]] adds one [BMULT] term, with the accumulator as the
first [BPLUS] operand -- exactly the Clight statement [r = r + X[k]*Y[k]]. *)
Lemma ddot_model_step: forall (X Y: list (ftype Tdouble)) k,
Zlength X = Zlength Y -> 0 <= k < Zlength X ->
ddot_model (sublist 0 (k+1) X) (sublist 0 (k+1) Y)
= BPLUS (ddot_model (sublist 0 k X) (sublist 0 k Y))
(BMULT (Znth k X) (Znth k Y)).
Proof.
intros X Y k Hlen Hk. unfold ddot_model.
rewrite (sublist_last_1 0 k X) by lia.
rewrite (sublist_last_1 0 k Y) by lia.
rewrite combine_app_eqlen
by (apply Nat2Z.inj; rewrite <- !Zlength_correct, !Zlength_sublist by lia; lia).
cbn [combine]. (* combine [Znth k X] [Znth k Y] = [(Znth k X, Znth k Y)] *)
rewrite ddot_loop_snoc. reflexivity.
Qed.

(** *end*/bridge: the finished accumulation equals LAProof's accuracy model
[dotprodF] up to [feq]. They differ only by the [BPLUS] operand order
([BPLUS acc prod] here vs [BPLUS prod acc] in [dotprodF]'s fold step), so the
proof is a generalized-accumulator induction using [common.BPLUS_comm]
(feq (BPLUS x y) (BPLUS y x)) and the [BPLUS_mor] feq-congruence, plus the
fact that [List.combine = seq.zip] on equal-length lists and that the two
initial zeros ([Zconst Tdouble 0] vs [pos_zero]) are [feq]. This is the one
remaining model obligation; once discharged, [dot_acc.dotprod_forward_error]
transfers to the C result. *)
Lemma ddot_model_feq_dotprodF:
forall (X Y: list (ftype Tdouble)),
Zlength X = Zlength Y ->
Forall finite X -> Forall finite Y ->
feq (ddot_model X Y) (dotprodF X Y).
Proof.
intros X Y Hlen _ _.
assert (Hl: length X = length Y)
by (apply Nat2Z.inj; rewrite <- !Zlength_correct; exact Hlen).
clear Hlen.
unfold ddot_model, ddot_loop, dotprodF, dotprod.
(* The two folds differ only by the initial accumulator ([Zconst Tdouble 0] vs
[pos_zero], which are [feq]) and the [BPLUS] operand order at each step.
Generalize both accumulators, keeping them [feq]-linked, and induct. *)
assert (Hs: feq (Zconst Tdouble 0) (@common.pos_zero Tdouble)) by reflexivity.
revert Y Hl Hs.
generalize (Zconst Tdouble 0).
generalize (@common.pos_zero Tdouble).
induction X as [|x X IH]; intros g f Y Hl Hs.
- (* X = nil (so Y = nil): both folds are the (feq-linked) initial accs. *)
destruct Y as [|y Y']; [ | simpl in Hl; discriminate ].
simpl. exact Hs.
- destruct Y as [|y Y']; simpl in Hl; [ discriminate | ].
simpl.
apply IH.
+ congruence.
+ (* one accumulation step: [BPLUS f prod] vs [flip BPLUS g prod = BPLUS prod g] *)
simpl. unfold Basics.flip.
etransitivity; [ apply common.BPLUS_comm | apply BPLUS_mor; [ reflexivity | exact Hs ] ].
Qed.
34 changes: 34 additions & 0 deletions C/cblas/include/cblas.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#define INDEX int
#define OFFSET(N, incX) ((incX) > 0 ? 0 : ((N) - 1) * (-(incX)))
#define BLAS_ERROR(x) cblas_xerbla(0, __FILE__, x);

#define CONJUGATE(x) ((x) == CblasConjTrans)
#define TRANSPOSE(x) ((x) == CblasTrans || (x) == CblasConjTrans)
#define UPPER(x) ((x) == CblasUpper)
#define LOWER(x) ((x) == CblasLower)

/* Handling of packed complex types... */

#define REAL(a,i) (((BASE *) a)[2*(i)])
#define IMAG(a,i) (((BASE *) a)[2*(i)+1])

#define REAL0(a) (((BASE *)a)[0])
#define IMAG0(a) (((BASE *)a)[1])

#define CONST_REAL(a,i) (((const BASE *) a)[2*(i)])
#define CONST_IMAG(a,i) (((const BASE *) a)[2*(i)+1])

#define CONST_REAL0(a) (((const BASE *)a)[0])
#define CONST_IMAG0(a) (((const BASE *)a)[1])


#define GB(KU,KL,lda,i,j) ((KU+1+(i-j))*lda + j)

#define TRCOUNT(N,i) ((((i)+1)*(2*(N)-(i)))/2)

/* #define TBUP(N,i,j) */
/* #define TBLO(N,i,j) */

#define TPUP(N,i,j) (TRCOUNT(N,(i)-1)+(j)-(i))
#define TPLO(N,i,j) (((i)*((i)+1))/2 + (j))

37 changes: 37 additions & 0 deletions C/cblas/scal_model.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
(** * LAProof.C.cblas.scal_model: functional model of GSL's [cblas_dscal]. *)
(** ** Corresponds to C program [C/cblas/src/dscal.c] (ported from GSL cblas). *)

(** This file provides [scal_model], the elementwise scaling that [cblas_dscal]
computes. Scaling is a deterministic *elementwise* update with no
accumulation, so there is no loop invariant tracking a running accumulator
and no [feq] bridge -- the C program computes the model exactly. GSL's
kernel ([source_scal_r.h]) is
<<
for (i = 0; i < N; i++) { X[ix] *= alpha; ix += incX; }
>>
i.e., [X[ix] *= alpha] is [BMULT (X[ix]) alpha] (element-first), so the model
is the list map below, matching what the C program computes.

LAProof's existing scale model [mv_mathcomp.scalemx] is alpha-first
([map_mx (BMULT a)]) and matrix-based; relating to it would need a [BMULT]
commutativity lemma (not in LAProof) plus a vector/matrix reshape. That is
unnecessary here: the per-element accuracy (relative error <= unit roundoff)
is exactly vcfloat's correctly-rounded-[BMULT] bound. *)

Require Import VST.floyd.proofauto.
From vcfloat Require Import FPStdCompCert FPStdLib.
From LAProof.C Require Import floatlib.

Set Bullet Behavior "Strict Subproofs".

Definition scal_model (alpha: ftype Tdouble) (X: list (ftype Tdouble)) : list (ftype Tdouble) :=
map (fun x => BMULT x alpha) X.

Lemma Zlength_scal_model: forall alpha X,
Zlength (scal_model alpha X) = Zlength X.
Proof. intros. unfold scal_model. rewrite Zlength_map. reflexivity. Qed.

Lemma Znth_scal_model: forall alpha X k,
0 <= k < Zlength X ->
Znth k (scal_model alpha X) = BMULT (Znth k X) alpha.
Proof. intros. unfold scal_model. rewrite Znth_map by lia. reflexivity. Qed.
Loading