Predicting Sentence Acceptability Judgments in Multimodal Contexts

Data and code for multimodal sentence acceptability judgment.

Hyewon Jang, Nikolai Ilinykh, Sharid Loáiciga, Jey Han Lau, Shalom Lappin, Predicting Sentence Acceptability Judgments in Multimodal Contexts, to appear at CMCL 2026 (arxiv).

Human participants and vision language models (VLMs) rated acceptability of English sentences on a scale of 1 (very unnatural) - 4 (very natural). The sentences were preceded by a relevant visual context (R), irrelevant visual context (I), and no contexts (N).

Data

1. `sentences.csv`

Human acceptability judgment on 75 original English sentences taken from News, Books, and Wikipedia + 225 backtranslated sentences of them.

2. `Images`

GPT-5 generated images describing the 75 English sentences.

3. `ModelPredictions`

Sentence acceptability ratings provided by 7 VLMs (InternVL3-1B, InternVL3-8B, Qwen2.5-3B, Qwen2.5-7B, llava-1.5-7b, gpt-4o & gpt-4o-mini) averaged across multiple attempts (seeds) for each sentence.

4. `ModelLogits`

Logits extracted for each sentence preceded by relevant, irrelevant, null visual contexts for the 5 open-source VLMs - with multiple attempts (seeds) for each sentence.

Code

1. `generate_gpt.py`

Code for sentence acceptability ratings by gpt-4o & gpt-40-mini.

2. `generate_open_source_models.py`

Code for sentence acceptability ratings by InternVL3-1B, InternVL3-8B, Qwen2.5-3B, Qwen2.5-7B & llava-1.5-7b.

3. `get_logits_open_source_models.py`

Code for logit extractions from open-source models for each sentence following relevant, irrelevant, and null visual contexts.

4. `correlations.ipynb`

Pearson and Spearman correlations between [human ratings ~ model ratings], [human ratings ~ normalized model logprobs], [model ratings ~ normalized model logprobs].

5. `regression.ipynb`

Total least square regressions between ratings in each condition pair ([N-R], [N-I], [R-I]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Sentence Acceptability Judgments in Multimodal Contexts

Data

1. `sentences.csv`

2. `Images`

3. `ModelPredictions`

4. `ModelLogits`

Code

1. `generate_gpt.py`

2. `generate_open_source_models.py`

3. `get_logits_open_source_models.py`

4. `correlations.ipynb`

5. `regression.ipynb`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Images		Images
ModelLogits		ModelLogits
ModelPredictions		ModelPredictions
LICENSE		LICENSE
README.md		README.md
correlations.ipynb		correlations.ipynb
generate_gpt.py		generate_gpt.py
generate_open_source_models.py		generate_open_source_models.py
get_logits_open_source_models.py		get_logits_open_source_models.py
regression.ipynb		regression.ipynb
sentences.csv		sentences.csv

Folders and files

Latest commit

History

Repository files navigation

Predicting Sentence Acceptability Judgments in Multimodal Contexts

Data

1. sentences.csv

2. Images

3. ModelPredictions

4. ModelLogits

Code

1. generate_gpt.py

2. generate_open_source_models.py

3. get_logits_open_source_models.py

4. correlations.ipynb

5. regression.ipynb

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `sentences.csv`

2. `Images`

3. `ModelPredictions`

4. `ModelLogits`

1. `generate_gpt.py`

2. `generate_open_source_models.py`

3. `get_logits_open_source_models.py`

4. `correlations.ipynb`

5. `regression.ipynb`

Packages