Add optional weight to pagerank#2555
Conversation
81aeca1 to
de7612f
Compare
fabianmurariu
left a comment
There was a problem hiding this comment.
looking up the property every time by string is costly, we would look it up once via the graph node metadata, then calling get_by_id
de7612f to
bbbd032
Compare
phdavis1027
left a comment
There was a problem hiding this comment.
- applied feedback by looking up weight property by index instead of string
- removed some random formatting changes that muddied the diff
- noticed floating point ==, which seems smelly
- rust tests pass, i forget how i made the python side compile without clam though
- some conflicts with current
db_v4that i don't have time to iron out right now but i can get on it as soon as this looks otherwise ready to ship
bbbd032 to
d76bd61
Compare
d76bd61 to
22cb38d
Compare
|
conflicts resolved to the best of my ability, plus switched to delta check rather than raw f64 equality |
|
This looks good to me! Iterating over every edge between every pair of vertices every iteration might suffer on graphs with lots of transactions, though. An optimization to address this would be to cache the cumulative pairwise weights for vertices with a sufficiently large number of edges between them. |
By 'transactions' do you mean number of rows after e.g., exploding the DataFrame? When I first wrote this PR, I wasn't quite aware of the exact shape of this API, but have encountered it while testing for internal application. In any case, it seems to me that pagerank just sees the state of each edge after the latest update? |
Oh yep you're right! I was thinking of iterating over an exploded edge, which is what you want to do if you're trying to get the weighted pagerank in a transaction network (rather than just fetching latest transaction). |
Well, in fact, now that I revisit this PR in the light of some more hands-on experience, that is the use-case I had in mind. Am I reading you correctly that there is an API for pagerank over the graph induced by the exploded edge set? If not, potentially a follow-up PR? |
What changes were proposed in this pull request?
Add an optional weight parameter to pagerank
Why are the changes needed?
To capture the semantics of weighted pagerank. For example, in fraud detection
Does this PR introduce any user-facing change? If yes is this documented?
Yes, it adds an extra param which is captured in the docstrings
How was this patch tested?
Test scenarios were checked for exact match with networkx. Additionally, basic semantic sanity checking like 'weight =None' should be treated as equivalent to all weights = 1.0
Are there any further changes required?
No