High Memory Footprint Due to (mostly?) Unused Attribute

## Description

I've noticed that the `_per_letter_annotations` attribute in the `SeqLike` class is **not actively used** in the package by many. However, it is contributing significantly to the memory footprint of all objects of this class. This issue can affect performance, especially in environments where resource efficiency is critical. (particularly memory)

For example, if we take a 201 character long nucleotide string,

```python
import random
seed = 33 #Setting seed for reproduceability
nt_length = 201 #Sequence length
random.seed(seed) 
letters = ["A","T","G","C"] #Picking ATGC DNA nucleotide characters
nt_seq = ''.join(random.choice(letters) for _ in range(nt_length)) #Creating a random string
```
And create a **nucleotide** SeqLike object of the nucleotide string `nt_seq`,

```python
from seqlike import SeqLike
seq_obj = SeqLike(sequence=nt_seq,seq_type="NT") #Creating a seqlike object
```
Looking at the memory footprint of the `seq_obj` using [pympler](https://github.com/pympler/pympler)

```python
from pympler import asizeof
print("Size of SeqLike Object:", asizeof.asizeof(seq_obj), "bytes") #Looking at size of the object
```

```Output
Size of SeqLike Object: 19328 bytes
```

Further digging into the memory footprint of the object by unpacking attributes,

```python
from pympler import asizeof

def get_attribute_sizes(obj, path='', visited=None, sizes=None):
    if visited is None:
        visited = set()
    if sizes is None:
        sizes = {}

    obj_id = id(obj)
    if obj_id in visited:
        return sizes
    visited.add(obj_id)

    # Calculate the size and store it if not zero
    obj_size = asizeof.asizeof(obj)
    if obj_size > 0:
        sizes[path if path else 'self'] = obj_size

    # Handle different types of collections and objects
    if hasattr(obj, '__dict__'):
        for attr, value in obj.__dict__.items():
            full_path = f"{path}.{attr}" if path else attr
            get_attribute_sizes(value, full_path, visited, sizes)
    elif isinstance(obj, dict):
        for key, value in obj.items():
            full_path = f"{path}.{key}" if path else str(key)
            get_attribute_sizes(value, full_path, visited, sizes)
    elif isinstance(obj, (list, set, tuple)):
        for index, item in enumerate(obj):
            full_path = f"{path}[{index}]" if path else f"[{index}]"
            get_attribute_sizes(item, full_path, visited, sizes)
    return sizes
```

```attribute_sizes = get_attribute_sizes(seq)```

Plotting top 20 attributes

![image](https://github.com/modernatx/seqlike/assets/52894785/96ae5ffc-bcce-4c35-8d5f-67ce49395b06)

We see that `_per_letter_annotations` makes up a sizeable chunk of the `_nt_record` attribute, **13472** bytes to be precise.

Further dissecting the `_per_letter_annotations` attribute, we can see that it is a dictionary with 1 key (`seqnums`) value pair and the values are a single list with string elements that are presumably indices that go up to the length of the sequence.

```python
print(seq_obj._nt_record._per_letter_annotations.keys()) #See keys of the dictionary
print(seq_obj._nt_record._per_letter_annotations["seqnums"]) #Focus on values of the dictionary
print(type(seq_obj._nt_record._per_letter_annotations["seqnums"][0])) #See data type of the element in the list
```
```Output
dict_keys(['seqnums'])
['1', '2', '3', '4', '5', '6', '7', '8', '9' .... '201']
<class 'str'>
```

By setting the `seq_obj._nt_record._per_letter_annotations` to `None` we can see a considerable reduction in memory occupied by the object

```python
from pympler import asizeof
seq_obj._nt_record._per_letter_annotations = None
print("Size of SeqLike Object:", asizeof.asizeof(seq_obj), "bytes")
```
```Output
Size of SeqLike Object: 5856 bytes
```
![image](https://github.com/modernatx/seqlike/assets/52894785/0264bb04-80f7-4086-a2d3-07c394d28223)

## Comments

1. This is a **reduction** in **memory footprint** of this one object by ~**70%**.

2. I have observed **similar behavior** for the `_aa_record._per_letter_annotations` as well. So the same still applies for objects created as an **AA record** instead of an NT record.

3. The **memory bloat can add up significantly over time** and can be a **critical limiting factor** (memory-wise) especially for **large** machine-learning/computational biology data processing and analysis applications.

## Expected Behavior

Objects of the `SeqLike` class should not allocate memory for attributes that are not used, thereby reducing the overall memory footprint of the application.

## Current Behavior

Currently, every instance of the `SeqLike` class includes the `_per_letter_annotations` attribute, which increases the memory usage unnecessarily.

## Possible Solution

One **temporary** potential solution to address this issue is to set the `_per_letter_annotations` attribute to `None` after its last necessary use, or entirely remove this attribute if it is confirmed to be redundant.

Alternative solutions may include looking at (line 1082 in particular)

https://github.com/modernatx/seqlike/blob/dde761ced5e3dcf86010d1e50abc3b268f794d8f/seqlike/SeqLike.py#L1070-L1083

and modifying the function behavior where per letter annotations are only added based on a condition as opposed to being added by default.


	@dispatch(SeqRecord)
	def record_from(sequence, **kwargs) -> SeqRecord:
	"""Construct SeqRecord from SeqRecord sequences.

	:param sequence: A SeqRecord object.
	:param **kwargs: Passed through to SeqRecord constructor.
	:returns: A SeqRecord object.

	"""
	s: SeqRecord = deepcopy(sequence)
	for k, v in kwargs.items():
	setattr(s, k, v)
	s = add_seqnums_to_letter_annotations(s)
	return s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High Memory Footprint Due to (mostly?) Unused Attribute #84

Description

Comments

Expected Behavior

Current Behavior

Possible Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

High Memory Footprint Due to (mostly?) Unused Attribute #84

Description

Description

Comments

Expected Behavior

Current Behavior

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions