Implement sexed-dtype to avoid byte-swapping#647
Conversation
Claude-code used
|
Hi reviewers, |
|
I see one fabio/src/fabio/test/codecs/test_brukerimage.py Lines 60 to 65 in d474564 LE_uint16 = np.dtype("uint16").newbyteorder("<")
MYIMAGE = np.full((256, 256), 16, dtype=LE_uint16) |
| :param version: PCK version 1 or 2 | ||
| :param normal_start: position of the normal value section (can be auto-guessed) | ||
| :param swap_needed: set to True when reading data from a foreign endianness (little on big or big on little) | ||
| :param byteorder: set to ">" for big=endian or "<" for little-endian data decompression |
There was a problem hiding this comment.
https://numpy.org/doc/2.1/reference/generated/numpy.dtype.newbyteorder.html
There are more. Perhaps reference numpy's newbyteorder instead.
There was a problem hiding this comment.
To my knowledge, there are only types of byte-order, all the other fall back on any of the two supported.
This is a python adapter, we could add support for the other, but:
- "|" makes no sense, data-types are all longer than 1 byte
- "=" is what we want to prevent because we would not have to manage it if we could
- "S" has neither sense here since we have no starting point.
Actually none of the alternative solution make sense in this function, so I would prefer keeping it simple: <>
OK for correcting the typo
| :param version: PCK version 1 or 2 | ||
| :param normal_start: position of the normal value section (can be auto-guessed) | ||
| :param swap_needed: set to True when reading data from a foreign endianness (little on big or big on little) | ||
| :param byteorder: set to ">" to decompress big-endian data else "<" for little-endian |
| self.numhigh = None | ||
| self.numpixels = None | ||
| self.swap_needed = None | ||
| self.byteorder = "=" |
There was a problem hiding this comment.
https://numpy.org/doc/2.1/reference/generated/numpy.dtype.newbyteorder.html
- ‘S’ - swap dtype from current to opposite endian
- {‘<’, ‘little’} - little endian
- {‘>’, ‘big’} - big endian
- {‘=’, ‘native’} - native order
- {‘|’, ‘I’} - ignore (no change to byte order)
To keep the same behavior this would be "|" instead of "=" afaiu.
| if self.swap_needed(): | ||
| data.byteswap(True) | ||
| stype = self.get_stype(self._dtype, self._data_byteorder) | ||
| print(stype) |
There was a problem hiding this comment.
| print(stype) |
| raw = f.read(frame.blobsize) | ||
| try: | ||
| data = numpy.frombuffer(raw, dtype=self.bytecode).copy() | ||
| file_endianness = "big" if (numpy.little_endian == bool(frame.swap_needed())) else "little" |
There was a problem hiding this comment.
Ok I'm getting too old for this
numpy.little_endian = machine_is_little
swap_needed = machine_endianness != file_endianness
file_is_big = machine_is_little == swap_needed
file_is_big = machine_is_little == (machine_endianness != file_endianness)
file_is_big = file_endianness == BIG
file_endianness = "big" if file_is_big else "little"This would be much simpler:
file_endianness = (
"little"
if frame._data_byteorder is ENDIANNESS.LITTLE
else "big"
)| datatype="int", | ||
| signed="n", | ||
| swap="n", | ||
| byteorder="<", |
There was a problem hiding this comment.
I think this should be "|" if we want to keep the same behavior.
There was a problem hiding this comment.
Without "reference point" one cannot keep anything !
|
Thanks @woutdenolf you spotted some which went through ... |
| @deprecation.deprecated | ||
| def swap_needed(self): | ||
| """ | ||
| Decide if we need to byteswap | ||
| """ | ||
| if self._data_byteorder is ENDIANNESS.LITTLE and numpy.little_endian: | ||
| return False | ||
| elif self._data_byteorder is ENDIANNESS.BIG and numpy.little_endian: | ||
| return True | ||
| elif self._data_byteorder is ENDIANNESS.LITTLE and not numpy.little_endian: | ||
| return True | ||
| elif self._data_byteorder is ENDIANNESS.BIG and not numpy.little_endian: | ||
| return False | ||
| else: | ||
| logger.warning("Unconsistent endianness !!!") |
There was a problem hiding this comment.
As byteorder can only be "little" or "big" I think the following would simplify the code:
I think _data_byteorder can be | (does not matter) or None. not sure the "does not matter" is handled today.
def swap_needed(self):
"""
Decide if we need to byteswap
"""
if self._data_byteorder is ENDIANNESS.LITTLE:
return not numpy.little_endian:
elif self._data_byteorder is ENDIANNESS.BIG:
return numpy.little_endian:
else:
logger.warning("Unconsistent endianness !!!")
Lots of modification all over the code ... and in addition, one has access only to little-endian computers.
Close #310