Skip to content

IllegalStateException when parsing specific URL with Readability4J (topCandidate.parent() must not be null) #25

Description

@samtheeagle578

Hello,

First, I would like to express my appreciation to @dankito and everyone else involved for developing such a useful library as Readability4J.

I encountered an issue while parsing content from the following URL: https://www.whitecoatinvestor.com/high-yield-savings-accounts-364/. When attempting to parse the page, Readability4J throws an IllegalStateException.

Here is the stack trace of the exception:

java.lang.IllegalStateException: topCandidate.parent() must not be null
	at net.dankito.readability4j.processor.ArticleGrabber.getTextDirection(ArticleGrabber.kt:1118)
	at net.dankito.readability4j.processor.ArticleGrabber.grabArticle(ArticleGrabber.kt:167)
	at net.dankito.readability4j.processor.ArticleGrabber.grabArticle$default(ArticleGrabber.kt:57)
	at net.dankito.readability4j.Readability4J.parse(Readability4J.kt:101)

I am not well-versed in Kotlin. I have debugged the issue but unfortunately, I have no meaningful insight to provide that could assist in resolving it. Below is the full HTML content that is causing the issue, available for review:

https://1drv.ms/u/s!AnpDf81AVQi-ht46xIci3w9gWPfCNA?e=mOy7e7

Additionally, here is how I am using Readability4J in my code:

Readability4J readability4J = new Readability4J(link, content);    
Article article = readability4J.parse();

Any guidance on how to resolve or work around this problem would be greatly appreciated.

Thank you for your support and efforts!

Kind regards,
Sam

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions