~artemis/gogls

go lib to parse the gls format

refs

main
browse  log 

clone

read-only
https://git.sr.ht/~artemis/gogls
read/write
git@git.sr.ht:~artemis/gogls

You can also use your local clone with git send-email.

#GoGLS

Project licensed under the CNPLv7+ terms.

A parser for the GLS format, a format for writing Gay Little Stories.

#On the GLS format

I'm making this document format to have a saner base on which I can easily write stories. Its aim is not to fully cover what a "serious" / "professional" / whatever format would cover, but to provide me with the base tools I need for myself. The document file extension is expected to be .gls by default (gls standing for gay little story uwu).

#General rules

  • The document is expected to use \lf as line return character; any other set will be ignored (a \cr\lf line return will count the \lf as line return and the \cr as part of the line).
  • Any freeform text element is to be trimmed (trimset: space, tab, CR)

#Document format

A story document is comprised of a metadata block and a content block. All elements cited here are required for the document to be valid. The compiler doesn't need (and should probably not need) to handle invalid documents.

METADATA

CONTENT

The document always ends with a line return as last character.

The base format for this will then be <METADATA>\n\n<CONTENT>\n.

#Metadata format

The metadata block must hold all important data related to the document. The following data is made available.

  • document type (required; possible values are series and oneshot)
  • publication date, or draft if not published yet (required; publication date of format YYYY-MM-DD, with optional suffix of .X where X is a number of at least 1 digit character). This suffix may be used to publish multiple documents in the same day (e.g. 2023-03-02.1, 2023-03-02.2, etc.). For ordering, it is to be interpreted as numerical value.
  • content warnings (optional, value is a list of keywords)
  • prompt (optional, value is a text line; a prompt is usually a topic given to the writer as source of inspiration/focus for the story)

For oneshots, the following metadata is made available.

  • story title (required)

For series, the following metadata is made available.

  • story title (required)
  • chapter number (required, nonnull positive integer)
  • chapter title (required)

The formatting is defined as follows.

<DOCUMENT TYPE> <PUBLICATION DATE>
<STORY METADATA>
<EXTRA METADATA>
  • Extra metadata is, without order requirements, content warnings and prompt
    • content warning is defined in the format cw: <list of cws> the format for the cw list is "keywords separated by ,"
    • prompt is defined in the format prompt: prompt text
  • Story metadata is for oneshots the story title, and for series the story title, chapter number, and chapter title

The formatting for oneshots is defined as follows.

<STORY TITLE>

The formatting for series is defined as follows.

<STORY TITLE>
<CHAPTER NUMBER>: <CHAPTER TITLE>

An example for a story document using all described parts here is shown below.

series 2022-11-09
A pretty lengthy story title
2: A new chapter
prompt: You're trying to get something on the page, anything honestly
cw: test story, lots of gayness

#Content format

There are three distinct groups of "rules" (ie lines).

  • A text rule (some basic in-document text)
  • A dialogue line rule (a text chunk, but the entire line is told by someone)
  • A comment rule (some author's notes, feedback, etc; any kind of annotation)
  • A time skip / section change rule (used to put a clear break between two parts, like for skipping time, location, and such)
  • A meta rule (some "commands" changing the parsing state/behaviour)

Since I want the document to be pretty easy to parse on a per-line basis (ie to be able to identify what a line is right from its first character), comment and meta rules have their own prefix.

  • A line starting with // is a comment
  • A line starting with # is a meta rule
  • A line starting with --- is a time skip rule
  • An empty line is considered as a paragraph separator
  • A line starting with [, comprised of a person identifier, then closed by ] is a dialogue line rule
  • Anything else is a text rule

A person identifier is a single character used to identify a person (a speaker).

A special case is the empty line / paragraph separator logic.

If an empty line appears, it will be counted as a paragraph separator. In other words, the previous paragraph (text rule set) is closed. A new paragraph (text rule set) will then only be opened when a text line will be found; any subsequent empty line or non-text rule line will not open a new paragraph.

A paragraph is automatically closed at end of document.

This is a test --beginning of paragraph
Another word
--empty line, end of paragraph

// This is a comment -- not a text rule, not opening a paragraph


This is another test --first text rule since end of paragraph, a new one is opened
[E] I am a dialogue line --dialogue line rule, closes the previous paragraph and opens+closes a paragraph with "I am a dialogue line" and attached metadata "spoken by E"
--- Later on --this is a time skip rule, with a label

[P] I am another line --dialogue line rule, doesn't close anything since the previous rule closed the paragraph it made; opens+closes a paragraph with "I am another line" and attached metadata "spoken by P"
And I am a final paragraph --text rule, opening a paragraph

This example may generate the following HTML.

<p>This is a test<br/>
Another word</p>
<!-- This is a comment -->
<p>This is another test</p>
<p class="speaker speaker-E" data-speaker="E">I am a dialogue line</p>
<hr data-label="Later on"/>
<p class="speaker speaker-P" data-speaker="P">I am another line</p>
<p>And I am a final paragraph</p>

#Comments

Comments are "just that", a compiler may safely ignore them as they only carry metadata for the writer, and not the reader.

Their prefix is three characters, two forward slashes (/) and an ascii standard space ( , code 0x20).

#Meta rules

Meta rules are rules modifying the compiler's behaviour on the fly. The first use case I have in mind is to add or change metadata on speakers/actors.

A meta rule always start with a # and no space afterwards; its rule identifier is put right after. That means the meta rule format is #<RULE ID>, for example #&.

#Text and dialogue lines rules

Text rules are, at their core, text. They're your story.

Dialogue lines are made to stand out / be focused, as they're a single unit. This implies that each dialogue line is its own paragraph (ie parsing a dialogue line will close any open paragraph, and open/close one with its content).

#Inline text rules

Text lines can have inline components, for example they're useful / needed for inline dialogue lines (to provide the metadata of who's speaking).

An inline rule cannot be opened when another rule is already opened (no nested rules, and no overlapping rules).

  • Inline dialogue rules are built of the following <PERSON: some text> (eg <E: This is my line>).
  • Inline text fragments referring to a person (like a word/set of words referring to a person) are built of the following {PERSON: the word(s)} (eg {E: they})

Historically, I started to write this EBNF. I may draw from it to write a real parser now.

<document> ::= <metadata> <eol> <eol> <content>? <eol>

/* METADATA */
<metadata> ::= <header> <eol> <extra>
			| <header>

<header> ::= <seriesheader> <eol> <series>
			| <oneshotheader> <eol> <oneshot>

<seriesheader> ::= "series " <pubdate>
<oneshotheader> ::= "oneshot " <pubdate>
<pubdate> ::= <date> | <draft>
<date> ::= <d> <d> <d> <d> "-" <d> <d> "-" <d> <d>
<draft> ::= "draft"

<oneshot> ::= <storytitle>
<series> ::= <storytitle> <eol> <chapterline>

<storytitle> ::= <text>
<chapterline> ::= <chapterno> ": " <chaptertitle>
<chapterno> ::= <dpos>
<chaptertitle> ::= <text>

<extra> ::= <extraline> <eol> <extra> | <extraline>
<extraline> ::= (<cws> | <prompt>)

<prompt> ::= "prompt: " <text>
<cw> ::= <text>
<cwlist> ::= <cw> | <cw> "," <cwlist>
<cws> ::= "cw: " <cwlist>

/* CONTENT */
<content> ::= <contentline>+
<contentline> ::=
			<comment>
						| <meta>
						| <timeskip>
						| <dialogueline>
						| <genericline>
						| <eol>
<comment> ::= "// " <text>
<meta> ::= "#"
<timeskip> ::= "---" (" " <text>)?
<dialogueline> ::= "[" <utf> "] " <text>
<genericline> ::= <fragment> (" " <fragment> | " ")*

<fragment> ::= <dialogue>
			| <ref>
						| <word>
<dialogue> ::= "<" <utf> ": " <text> ">"
<ref> ::= "{" <utf> ": " <text> "}"

/* basics */
<text> ::= " "* <utf> (<utf> | " ")*
<word> ::= <utf>+
<dpos> ::= "0"* [1-9] <d>*
<d> ::= [0-9]
<eol> ::= "\n"
/* meant as a generic placeholder for any kind of utf8 sequence; */
/* obviously not utf8 here */
<utf> ::= [a-z] | [A-Z] | <d>
Do not follow this link