Avoid slowdown in get_flags_str#258
Conversation
since the current `rubin-env` uses Python 3.13.
and not an array.
|
Are there any additional tests we should add to make sure we are comfortable? |
|
I had added a test against various integer types and lengths. As long as any of the implementations don't break that test, I think we can be comfortable. Please expand that test if you'd like and happy to go with any implementation of casting as long as the per-call time is under 10 microseconds. If you are willing to drop the support for |
|
I'm actually confused as to how this PR passes. I am going to try some stuff locally. |
|
FWIW, the CI uses numpy>=2, so the breakage you saw locally with numpy v1 wouldn't occur. Since you want to keep the support for numpy 1, I believe we should drop the last commit and go with the array based thing that this PR originally had. But please try this out locally and make any changes you'd like; you should have edit access to directly add commits here. |
|
OK. Our previous test missed an important case of python ints. I have now rearranged things to explicitly truncate the extra bits. |
|
We may want to emit a warning in this case or simply have the code error instead of truncating. What do you think @esheldon? |
esheldon
left a comment
There was a problem hiding this comment.
I'd also be fine with just crashing if it is out of range.
|
@arunkannawadi can you time this to see if it is still faster? |
|
My machine is being odd on timing and I see no difference amongst the various options. |
|
This is slow (50 microseconds). A good rule-of-thumb is after casting |
|
Thanks. Try the latest commit. |
|
This is good. 6 microseconds or so. |
|
I believe there's going to be a new |
In #248 , we introduced safe casting to
uint32, but it seemed to have significantly slowdown the bitwise AND operation that follows. In DP2 production, the calls toget_flags_strare taking a significant amount of time, and this is a low-hanging fruit to fix. Along with it is another minor optimization calculating powers of 2 cumulatively. Together, this brings down the per-call time from ~50 μs to ~8 μs, which is a signficant gain given how many times this functions gets called.