README
h2g is a HTML to gemtext converter. It reads HTML from stdin and writes
gemtext to stdout handling a subset of HTML elements and entities.
The following HTML elements are recognized, the rest is ignored:
* <a href=>
A reference number is inserted instead of the link and the link is
added to a list at the bottom of the document. Links to element
identifier are ignored. In relative local links (starting with '.') a
'.html' suffix is replaced with '.gmi'.
* <b>
Element is surrounded with '*'.
* <br>
A line break is enforced.
* <em>, <i>, <u>
Element is surrounded with '_'.
* <h1> to <h6>
Content is put on a single line and prefixed with the corresponding
number of '#'. Block is enclosed with empty lines.
* <img>
Alt text is printed in place of the image and the source is added to
the footnote link list.
* <p>
Block is enclosed with empty lines.
* <pre>, <blockquote>
Content is written as is, dropping leading and trailing empty lines.
Block is enclosed with empty lines.
* <table>, <tr>, <th>, <td>
Tables are surrounded with empty lines. Each row is printed to a
single line. A literal tab character is inserted between two <td>
elements. <tr> is treated the same as <td>.
* <li> inside <ol>
Each <li> element is printed to a single line prefixed with a
consecutively increasing number. Block is enclosed with empty lines.
* <li> inside <ul>
Each <li> element is printed to a single line prefixed with '*'. Block
is enclosed with empty lines.
* <s>
For every word in the element a ^W is printed after the element.
CAVEATS
* All input is ignored until a <body> element is found!
BUGS PATCHES FEATURE REQUESTS QUESTIONS INSULTS
mail@rkta.de
EXAMPLE
Input:
------
<!DOCTYPE html>
<html lang="en">
<head>
<title>TITLE</title>
</head>
<body>
<header>
<H1>H1</H1>
</header>
<h2>H2</h2>
<p><s>A sentence</s>Paragraph <em>with</em> an <u>important</u>
<a href="./local.html"><b>local</b> link</a>.</p>
<img alt='alt text' src='./img.png'>
<pre>
Pre-formatted
text
</pre>
break<br>row
<ul> <li>List entry</li> </ul>
<ol> <li>Ordered list entry</li> </ol>
<table>
<tr><th>Entity</th><th>Symbol</th></tr>
<tr><td>&amp;</td><td>&</td></tr>
<tr><td>&apos;</td><td>'</td></tr>
<tr><td>&gt;</td><td>></td></tr>
<tr><td>&lt;</td><td><</td></tr>
</table> </body> </html>
Output:
-------
# H1
## H2
A sentence^W^WParagraph _with_ an _important_ *local* link[0].
alt text[1]
```
Pre-formatted
text
```
break
row
* List entry
* 1) Ordered list entry
Entity Symbol
& &
' '
> >
< <
=> ./local.gmi [0] local link
=> ./img.png [1] alt text