Skip to content

Computation of cross-covariance of state and action #35

@dvtailor

Description

@dvtailor

From only looking at the docstrings of the relevant functions, I think I noticed a discrepancy to the paper. I am writing this without checking the math in the code so I may be wrong.

V returned in RbfController.compute_action() in controllers.py
corresponds to Cov[x,u]

From backtracking to MGPR.predict_given_factorizations() in models/mgpr.py, I think the docstrings indicate that:

V = cov[x,x]^{-1} @ cov[x,pi] @ cov[pi,u]

where I call pi the action before squashing

From section 5.5 of the 2015 paper, it says:

V = cov[x,pi] @ cov[pi,pi]^{-1} @ cov[pi,u]

Are these expressions equivalent or have I misread something. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions