`pyiso4`

An attempt to implement the ISO 4 standard for journal titles abbreviations within a simple application written in Python, as described in Section 7.1 of the ISSN Manual. Inspired by the NPM package abbrevIso.

Install and use

From PyPI:

pip install pyiso4

Usage:

$ iso4abbreviate "Journal of the American Chemical Society"
J. Am. Chem. Soc.

You can abbreviate multiple titles at the same time:

$ iso4abbreviate "Journal of Chemical Physics" "Journal of Physical Chemistry A"
J. Chem. Phys.
J. Phys. Chem. A

By default, the program abbreviate using this list of abbreviation (slightly modified version of LTWA 2021) and this list of stopwords. You can change that using --ltwa and --stopwords to provide your own files instead (with the same syntax as theirs).

As for rule 7.1.11, namely that abbreviations of generic words such as part, etc. are omitted unless they are required, the program removes them by default. To change this behavior, use --keep-part.

Python API

from pyiso4.ltwa import Abbreviate

# create an abbreviator (using the default LTWA)
abbreviator = Abbreviate.create()

# abbreviate something
abbreviation = abbreviator('Journal of the American Chemical Society', remove_part=True)

Known issues

A list of failed tests is found here. It currently fails:

to fulfil rule 7.1.7 of the manual: keep prepositions in expressions (like in vivo) and place/personal name intact (it is difficult to know in advance),
to fulfil rules 7.1.2, 7.1.3 and 7.1.8 of the same manual (but this is of lesser importance),
on compound words (such as microengineering), except if explicitly found in the LTWA,
on some ligatures (but it handles the most common ones, such as œ and æ)

Furthermore, this package does not handle LaTeX-specific accentuation and special characters (such as \'e or \&). When dealing with BibTeX files (e.g., to transform journal in their abbreviation), it is recommended to decode these with pybtex or bibtexparser. Directly converting LaTeX strings to unicode and back is also possible with pylatexenc.

Difference with the `iso4` package

A previous implementation of the ISO4 rules in Python is done in adlpr/iso4. The differences are the following:

The database of this package is more up to date (2021 vs 2017), although this shouldn't make much of a difference in practice.
The iso4 package relies on NLTK, which is a heavy dependency, while this package rely on a custom, faster, lexer. It might however be that NLTK is more accurate.
An interesting feature of iso4 that is not (yet?) implemented here (see #6) is that it is capable of correctly handling the language information.

Contributions

Contributions, either by filling issues or via pull requests are welcomed. More information here.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
pyiso4		pyiso4
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
release_it.sh		release_it.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`pyiso4`

Install and use

Python API

Known issues

Difference with the `iso4` package

Contributions

About

Uh oh!

Releases 5

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pyiso4

Install and use

Python API

Known issues

Difference with the iso4 package

Contributions

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors

Uh oh!

Languages

`pyiso4`

Difference with the `iso4` package