Fix parsing plain email and markdown links Currently [email@sr.ht](https://sr.ht) is parsed as both an email and a url, leading to nested urls and incorrect behavior after sanitization. This was previously fixed in the context of double urls: https://git.sr.ht/~sircmpwn/core.sr.ht/commit/a214061c48fc813023826f6dd5d63640e0b8e475 That fix works because the regex of PlainLink matches part of the markdown url (it matches through the `](<url>)` parts) and therefore the precedence of mistletoe kicks in. However that fix doesn't work for email addresses. The inner node is PlainLink (in the case of email) and RawText (in the case of a url). The only solution I see is to turn `Link` child `PlainLink` nodes into `RawText` ones. Alternatively the regex of `PlainLink` could be changed to also match part of the markdown in the case of email (and let the precedence kick in), but this feels like big hack to me. Fixes: https://todo.sr.ht/~sircmpwn/sr.ht/271
1 files changed, 5 insertions(+), 1 deletions(-) M srht/markdown.py
M srht/markdown.py => srht/markdown.py +5 -1
@@ 12,7 12,7 @@ import mistletoe as m from mistletoe.span_token import SpanToken, RawText import re SRHT_MARKDOWN_VERSION = 11 SRHT_MARKDOWN_VERSION = 12 class PlainLink(SpanToken): @@ """ 74,6 74,10 @@ class SrhtRenderer(m.HTMLRenderer): if not url.startswith("#"): url = self._relative_url(url) target = self.escape_url(url) for i in range(len(token.children)): if isinstance(token.children[i], PlainLink): token.children[i] = RawText(token.children[i].target) inner = self.render_inner(token) return template.format(target=target, title=title, inner=inner)