~sircmpwn/core.sr.ht

4972e0163e06c55fb54736e6e59c924377b9463f — y0ast 28 days ago 46b5311 0.66.8
Fix parsing plain email and markdown links

Currently [email@sr.ht](https://sr.ht) is parsed as both an email and a
url, leading to nested urls and incorrect behavior after sanitization.

This was previously fixed in the context of double urls:
https://git.sr.ht/~sircmpwn/core.sr.ht/commit/a214061c48fc813023826f6dd5d63640e0b8e475

That fix works because the regex of PlainLink matches part of the
markdown url (it matches through the `](<url>)` parts) and therefore the
precedence of mistletoe kicks in.

However that fix doesn't work for email addresses. The inner node is
PlainLink (in the case of email) and RawText (in the case of a url).

The only solution I see is to turn `Link` child `PlainLink` nodes into
`RawText` ones. Alternatively the regex of `PlainLink` could be changed
to also match part of the markdown in the case of email (and let the
precedence kick in), but this feels like big hack to me.

Fixes: https://todo.sr.ht/~sircmpwn/sr.ht/271
1 files changed, 5 insertions(+), 1 deletions(-)

M srht/markdown.py
M srht/markdown.py => srht/markdown.py +5 -1
@@ 12,7 12,7 @@ import mistletoe as m
from mistletoe.span_token import SpanToken, RawText
import re

SRHT_MARKDOWN_VERSION = 11
SRHT_MARKDOWN_VERSION = 12

class PlainLink(SpanToken):
    """


@@ 74,6 74,10 @@ class SrhtRenderer(m.HTMLRenderer):
        if not url.startswith("#"):
            url = self._relative_url(url)
        target = self.escape_url(url)

        for i in range(len(token.children)):
            if isinstance(token.children[i], PlainLink):
                token.children[i] = RawText(token.children[i].target)
        inner = self.render_inner(token)
        return template.format(target=target, title=title, inner=inner)