New features

🔴 High Priority
🟡 Medium Priority
🟢 Low Priority


## General
- [x] Tutorial loading and using Suicide Risk Lexicon. `l.load_lexicon(name)`
- [ ] load/save all vs. most prototypical
- [ ] Tutorial: add example loading embeddings. `cts.measure(documents, stored_embeddings_path = PATH)`
- [ ] Add docstring for everything.
- [ ] Add example of how much a result costs for a definition (add date and model). 
- [ ] Use generative AI model from huggingface from cache, instead of huggingface API

## Tutorial
- [ ] save other outputs of lexicon count and cts.
- [ ] Put CTS first. 
- [ ] Sort lexicon by similarity for validation (see code in lexicon repo)
- [ ] Show how to load embeddings pickle to save time.
- [ ] Add ipywidgets and tqdm and jupyter to toml so you can view progress bar.

## Lexicon
- [x] `lexicon.extract` should output a columns called `document_n` and `document_str`
- [x] `l.extract()` as a method instead of `lexicon.extract(l.constructs)`
- [x] obtain `lexicon_dict = l.to_dict()` --> `{construct: [tokens]}` from lexicon object. 
- [ ] `lexicon.add` clean up how I store metadata automatically and manually. Maybe create a brief input() dialogue so it saves user, timestamp, source, etc. 
- [ ] create lexicon from seance. Modify seance tutorial accordingly. 

#### Outputs/visualization
- [x] `Clean up output for `matches_per_construct` `matches_counter_d` and `matches_per_doc`?
- [ ] I created a highlight function. But I had other code I used to look at context by printing (in lexicon repo). add to tutorial and scripts.


## CTS:
- [ ] 🔴 instead of saving `cosine_similarities` (2GB for 5000 CTL chats, compressed), you can provide the tuple `(lexicon token, document token, similarity)` in the DF.  And just output a visualization for those as HTML.
- [ ] exact match within a string == 1
- [ ] threshold = 0.4 (depends on embedding) for final score. Remember CTS for bursting study where values (without model) where too high for some features. 
- [ ] Add additional arguments for CTS
- [ ] # TODO: double check values for temperature from paper

#### Outputs/visualization
- [x] CTS: Plot matches in text
- [ ] CTS: show top token and cosine similarity in column of features DF 

## Extensions
- [ ] Implement in R or have an R wrapper
- [ ] Create a website where csv file can return features. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features #2

General

Tutorial

Lexicon

Outputs/visualization

CTS:

Outputs/visualization

Extensions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

New features #2

Description

General

Tutorial

Lexicon

Outputs/visualization

CTS:

Outputs/visualization

Extensions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions