Skip to content

Implement sexed-dtype to avoid byte-swapping#647

Merged
kif merged 22 commits into
silx-kit:mainfrom
kif:310_sexed_dtype
Jun 19, 2026
Merged

Implement sexed-dtype to avoid byte-swapping#647
kif merged 22 commits into
silx-kit:mainfrom
kif:310_sexed_dtype

Conversation

@kif

@kif kif commented Jun 16, 2026

Copy link
Copy Markdown
Member

Lots of modification all over the code ... and in addition, one has access only to little-endian computers.

Close #310

@kif kif requested review from jonwright, payno, t20100 and woutdenolf June 17, 2026 16:05
@kif

kif commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Hi reviewers,
For this silx dev week I decided to set Claude on an 7-year-old issue, reported at that time by Peter Boesecke.
Claude managed to migrate all project from swapping bytes to directly reading/writing data with an "endianness aware data-type", well it was nevertheless a day of work. I took me as long to clean up manually the code, there could be some remaining inconsistencies, so than for the review.

@kif kif added this to the v2026.6 milestone Jun 17, 2026
@woutdenolf

woutdenolf commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I see one byteswap call left. Less crucial but good to stay consistent.

MYIMAGE = numpy.ones((256, 256), numpy.uint16) * 16
MYIMAGE[0, 0] = 0
MYIMAGE[1, 1] = 32
MYIMAGE[127:129, 127:129] = 65535
if not numpy.little_endian:
MYIMAGE.byteswap(True)

LE_uint16 = np.dtype("uint16").newbyteorder("<")

MYIMAGE = np.full((256, 256), 16, dtype=LE_uint16)

Comment thread src/fabio/compression/compression.py Outdated
:param version: PCK version 1 or 2
:param normal_start: position of the normal value section (can be auto-guessed)
:param swap_needed: set to True when reading data from a foreign endianness (little on big or big on little)
:param byteorder: set to ">" for big=endian or "<" for little-endian data decompression

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://numpy.org/doc/2.1/reference/generated/numpy.dtype.newbyteorder.html

There are more. Perhaps reference numpy's newbyteorder instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, there are only types of byte-order, all the other fall back on any of the two supported.

This is a python adapter, we could add support for the other, but:

  • "|" makes no sense, data-types are all longer than 1 byte
  • "=" is what we want to prevent because we would not have to manage it if we could
  • "S" has neither sense here since we have no starting point.
    Actually none of the alternative solution make sense in this function, so I would prefer keeping it simple: <>

OK for correcting the typo

Comment thread src/fabio/ext/mar345_IO.pyx Outdated
:param version: PCK version 1 or 2
:param normal_start: position of the normal value section (can be auto-guessed)
:param swap_needed: set to True when reading data from a foreign endianness (little on big or big on little)
:param byteorder: set to ">" to decompress big-endian data else "<" for little-endian

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Comment thread src/fabio/mar345image.py Outdated
self.numhigh = None
self.numpixels = None
self.swap_needed = None
self.byteorder = "="

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://numpy.org/doc/2.1/reference/generated/numpy.dtype.newbyteorder.html

  • ‘S’ - swap dtype from current to opposite endian
  • {‘<’, ‘little’} - little endian
  • {‘>’, ‘big’} - big endian
  • {‘=’, ‘native’} - native order
  • {‘|’, ‘I’} - ignore (no change to byte order)

To keep the same behavior this would be "|" instead of "=" afaiu.

Comment thread src/fabio/edfimage.py Outdated
if self.swap_needed():
data.byteswap(True)
stype = self.get_stype(self._dtype, self._data_byteorder)
print(stype)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(stype)

Comment thread src/fabio/edfimage.py Outdated
raw = f.read(frame.blobsize)
try:
data = numpy.frombuffer(raw, dtype=self.bytecode).copy()
file_endianness = "big" if (numpy.little_endian == bool(frame.swap_needed())) else "little"

@woutdenolf woutdenolf Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm getting too old for this

numpy.little_endian = machine_is_little
swap_needed = machine_endianness != file_endianness

file_is_big = machine_is_little == swap_needed
file_is_big = machine_is_little == (machine_endianness != file_endianness)
file_is_big = file_endianness == BIG

file_endianness = "big" if file_is_big else "little"

This would be much simpler:

file_endianness = (
    "little"
    if frame._data_byteorder is ENDIANNESS.LITTLE
    else "big"
)

datatype="int",
signed="n",
swap="n",
byteorder="<",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be "|" if we want to keep the same behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without "reference point" one cannot keep anything !

@woutdenolf woutdenolf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why it took you a day. Jeeeezus. I did a pass but I suggest a third person goes through the PR as well.

@kif

kif commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Thanks @woutdenolf you spotted some which went through ...

Comment thread src/fabio/edfimage.py
Comment on lines +823 to +837
@deprecation.deprecated
def swap_needed(self):
"""
Decide if we need to byteswap
"""
if self._data_byteorder is ENDIANNESS.LITTLE and numpy.little_endian:
return False
elif self._data_byteorder is ENDIANNESS.BIG and numpy.little_endian:
return True
elif self._data_byteorder is ENDIANNESS.LITTLE and not numpy.little_endian:
return True
elif self._data_byteorder is ENDIANNESS.BIG and not numpy.little_endian:
return False
else:
logger.warning("Unconsistent endianness !!!")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As byteorder can only be "little" or "big" I think the following would simplify the code:
I think _data_byteorder can be | (does not matter) or None. not sure the "does not matter" is handled today.

    def swap_needed(self):
        """
        Decide if we need to byteswap
        """
        if self._data_byteorder is ENDIANNESS.LITTLE:
            return not numpy.little_endian:
        elif self._data_byteorder is ENDIANNESS.BIG:
            return numpy.little_endian:
        else:
            logger.warning("Unconsistent endianness !!!")

Comment thread src/fabio/dm3image.py Outdated
@kif kif merged commit 7c29891 into silx-kit:main Jun 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Byteswap in all cases where bpp>1 ...

4 participants