Skip to content

Avoid slowdown in get_flags_str#258

Merged
beckermr merged 15 commits into
esheldon:masterfrom
arunkannawadi:flagstr
Jun 5, 2026
Merged

Avoid slowdown in get_flags_str#258
beckermr merged 15 commits into
esheldon:masterfrom
arunkannawadi:flagstr

Conversation

@arunkannawadi

Copy link
Copy Markdown
Contributor

In #248 , we introduced safe casting to uint32, but it seemed to have significantly slowdown the bitwise AND operation that follows. In DP2 production, the calls to get_flags_str are taking a significant amount of time, and this is a low-hanging fruit to fix. Along with it is another minor optimization calculating powers of 2 cumulatively. Together, this brings down the per-call time from ~50 μs to ~8 μs, which is a signficant gain given how many times this functions gets called.

Comment thread ngmix/flags.py
@esheldon

esheldon commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Are there any additional tests we should add to make sure we are comfortable?

@arunkannawadi

Copy link
Copy Markdown
Contributor Author

I had added a test against various integer types and lengths. As long as any of the implementations don't break that test, I think we can be comfortable. Please expand that test if you'd like and happy to go with any implementation of casting as long as the per-call time is under 10 microseconds. If you are willing to drop the support for val to be a 8-bit or 16-bit (unsigned) ints, a casting is not even necessary.

@arunkannawadi

Copy link
Copy Markdown
Contributor Author

@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

I'm actually confused as to how this PR passes. I am going to try some stuff locally.

@arunkannawadi

Copy link
Copy Markdown
Contributor Author

FWIW, the CI uses numpy>=2, so the breakage you saw locally with numpy v1 wouldn't occur. Since you want to keep the support for numpy 1, I believe we should drop the last commit and go with the array based thing that this PR originally had. But please try this out locally and make any changes you'd like; you should have edit access to directly add commits here.

@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

OK. Our previous test missed an important case of python ints. I have now rearranged things to explicitly truncate the extra bits.

@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

We may want to emit a warning in this case or simply have the code error instead of truncating.

What do you think @esheldon?

@esheldon esheldon left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be fine with just crashing if it is out of range.

Comment thread ngmix/flags.py
Comment thread ngmix/flags.py Outdated
Comment thread ngmix/flags.py Outdated
@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

@arunkannawadi can you time this to see if it is still faster?

@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

My machine is being odd on timing and I see no difference amongst the various options.

@arunkannawadi

Copy link
Copy Markdown
Contributor Author

This is slow (50 microseconds). A good rule-of-thumb is after casting val must not be of type array.

Comment thread ngmix/flags.py Outdated
@beckermr

beckermr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Thanks. Try the latest commit.

@arunkannawadi

Copy link
Copy Markdown
Contributor Author

This is good. 6 microseconds or so.

Comment thread ngmix/flags.py
@beckermr beckermr merged commit 9a4d396 into esheldon:master Jun 5, 2026
6 checks passed
@arunkannawadi

Copy link
Copy Markdown
Contributor Author

I believe there's going to be a new rubinenv coming soon. if you can cut a release with this and other changes, we can get them in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants