An attempt to implement the ISO 4 standard for journal titles abbreviations within a simple application written in Python, as described in Section 7.1 of the ISSN Manual. Inspired by the NPM package abbrevIso.
From PyPI:
pip install pyiso4Usage:
$ iso4abbreviate "Journal of the American Chemical Society"
J. Am. Chem. Soc.
You can abbreviate multiple titles at the same time:
$ iso4abbreviate "Journal of Chemical Physics" "Journal of Physical Chemistry A"
J. Chem. Phys.
J. Phys. Chem. A
By default, the program abbreviate using this list of abbreviation (slightly modified version of LTWA 2021)
and this list of stopwords.
You can change that using --ltwa and --stopwords to provide your own files instead (with the same syntax as theirs).
As for rule 7.1.11, namely that abbreviations of generic words such as part, etc. are omitted unless they are required,
the program removes them by default.
To change this behavior, use --keep-part.
from pyiso4.ltwa import Abbreviate
# create an abbreviator (using the default LTWA)
abbreviator = Abbreviate.create()
# abbreviate something
abbreviation = abbreviator('Journal of the American Chemical Society', remove_part=True)A list of failed tests is found here. It currently fails:
- to fulfil rule 7.1.7 of the manual: keep prepositions in expressions (like in vivo) and place/personal name intact (it is difficult to know in advance),
- to fulfil rules 7.1.2, 7.1.3 and 7.1.8 of the same manual (but this is of lesser importance),
- on compound words (such as microengineering), except if explicitly found in the LTWA,
- on some ligatures (but it handles the most common ones, such as
œandæ)
Furthermore, this package does not handle LaTeX-specific accentuation and special characters (such as \'e or \&).
When dealing with BibTeX files (e.g., to transform journal in their abbreviation), it is recommended to decode these with pybtex or bibtexparser.
Directly converting LaTeX strings to unicode and back is also possible with pylatexenc.
A previous implementation of the ISO4 rules in Python is done in adlpr/iso4.
The differences are the following:
- The database of this package is more up to date (2021 vs 2017), although this shouldn't make much of a difference in practice.
- The
iso4package relies on NLTK, which is a heavy dependency, while this package rely on a custom, faster, lexer. It might however be that NLTK is more accurate. - An interesting feature of
iso4that is not (yet?) implemented here (see #6) is that it is capable of correctly handling the language information.
Contributions, either by filling issues or via pull requests are welcomed. More information here.