~trs-80/ostrta-spec

75cabc7529cedf7e657b406a31d827b0e9d0cd66 — TRS-80 9 months ago 36ae36c
Add Data File section, flesh out Timestamp-ID specification

Also several small grammar & readability edits throughout.
4 files changed, 256 insertions(+), 40 deletions(-)

M README.md
M README.org
M Specifications.md
M Specifications.org
M README.md => README.md +6 -6
@@ 30,12 30,12 @@ If you are looking for the Emacs Lisp implementation of OSTRTA, it can (soon!<su

### Current Status

I have been developing and using many of these concepts (and additional ones besides) for quite some time on a personal basis and I think there is something worth sharing here.  So I decided to post it up publicly for feedback while I continue with further experimentation and development.
I have been developing and using many of these concepts (and additional ones besides) for quite some time on a personal basis, and they have worked well enough that I thought they may be useful to others.  So I decided to post them up publicly for discussion while I continue with further experimentation and development.

### Goals

-   To come up with some common set of useful, general, free and open standards that many different people can implement in whatever language(s) they prefer, such that we don't all have to keep re-inventing the same wheels over and over again all by ourselves.
    -   To have a single, easy to point to place on the Internet where said standard is all laid out (i.e., this page) or at least a place to focus discussion and further the development of ideas.
-   To come up with some common set of useful, general, free and open standards that many different people can implement in whatever language(s) they prefer, such that we don't all have to keep re-inventing the same wheels over and over again each by ourselves.
    -   To have a single, easy to point to place on the Internet where said standard is all laid out (i.e., this page).  Or at least a place to focus discussion and further develop ideas.

-   The end goal being able to improve the ability to store and recall various sorts of information in a more useful, reliable, and/or convenient way.



@@ 61,11 61,11 @@ In addition to the above (and for same reasons), we should also only use what I 

The essential notion of Controlled Vocabulary (CV) is to select items from some list rather than entering them free form, in order to eliminate typographical and other errors.  This mostly applies to things like tags but the concept could be extended to many others.

1.  Disambiguation notes live directly with CV items
1.  Disambiguation notes SHOULD live directly with CV items

    Disambiguation notes are simple reminders to yourself like "use tag1 for X" or "no tag2 is for Y, for Z use tag3 instead" etc.  It is a simple form of metadata management.
    Disambiguation notes are simple reminders to yourself like "use tag1 for X" or "no, tag2 is for Y; for Z use tag3 instead" etc.  It is a simple form of metadata management.
    
    Additional disambiguation notes should be contained directly in the same place you are choosing the CV item from.  This can be implemented in a language-specific data structure, or, preferably in a "CV file" which we propose as a (very simple text file) [specification](Specifications.md).
    Disambiguation notes SHOULD be contained directly in the same place you are choosing the CV item from.  This can be implemented in a language-specific data structure, or, preferably in a "CV file" which we propose as a (very simple text file) [specification](Specifications.org#cv-file-format).

2.  Gardening


M README.org => README.org +7 -7
@@ 28,16 28,16 @@ If you are looking for the Emacs Lisp implementation of OSTRTA, it can (soon!^TM
    :CUSTOM_ID:            current-status
    :END:

I have been developing and using many of these concepts (and additional ones besides) for quite some time on a personal basis and I think there is something worth sharing here.  So I decided to post it up publicly for feedback while I continue with further experimentation and development.
I have been developing and using many of these concepts (and additional ones besides) for quite some time on a personal basis, and they have worked well enough that I thought they may be useful to others.  So I decided to post them up publicly for discussion while I continue with further experimentation and development.

*** Goals
    :PROPERTIES:
    :CUSTOM_ID:            goals
    :END:

- To come up with some common set of useful, general, free and open standards that many different people can implement in whatever language(s) they prefer, such that we don't all have to keep re-inventing the same wheels over and over again all by ourselves.
- To come up with some common set of useful, general, free and open standards that many different people can implement in whatever language(s) they prefer, such that we don't all have to keep re-inventing the same wheels over and over again each by ourselves.

  - To have a single, easy to point to place on the Internet where said standard is all laid out (i.e., this page) or at least a place to focus discussion and further the development of ideas.
  - To have a single, easy to point to place on the Internet where said standard is all laid out (i.e., this page).  Or at least a place to focus discussion and further develop ideas.

- The end goal being able to improve the ability to store and recall various sorts of information in a more useful, reliable, and/or convenient way.



@@ 75,14 75,14 @@ In addition to the above (and for same reasons), we should also only use what I 

The essential notion of Controlled Vocabulary (CV) is to select items from some list rather than entering them free form, in order to eliminate typographical and other errors.  This mostly applies to things like tags but the concept could be extended to many others.

**** Disambiguation notes live directly with CV items
**** Disambiguation notes SHOULD live directly with CV items
     :PROPERTIES:
     :CUSTOM_ID:            disambiguation-notes-live-directly-with-cv-items
     :CUSTOM_ID:            disambiguation-notes-should-live-directly-with-cv-items
     :END:

Disambiguation notes are simple reminders to yourself like "use tag1 for X" or "no tag2 is for Y, for Z use tag3 instead" etc.  It is a simple form of metadata management.
Disambiguation notes are simple reminders to yourself like "use tag1 for X" or "no, tag2 is for Y; for Z use tag3 instead" etc.  It is a simple form of metadata management.

Additional disambiguation notes should be contained directly in the same place you are choosing the CV item from.  This can be implemented in a language-specific data structure, or, preferably in a "CV file" which we propose as a (very simple text file) [[file:Specifications.org::#cv-file-format][specification]].
Disambiguation notes SHOULD be contained directly in the same place you are choosing the CV item from.  This can be implemented in a language-specific data structure, or, preferably in a "CV file" which we propose as a (very simple text file) [[file:Specifications.org#cv-file-format][specification]].

**** Gardening
     :PROPERTIES:

M Specifications.md => Specifications.md +176 -13
@@ 1,11 1,12 @@
1.  [Specifications](#specifications)
    1.  [Controlled Vocabulary](#controlled-vocabulary)
        1.  [CV File Format](#cv-file-format)
    2.  [Filename](#filename)
    2.  [Data File](#orgc24304e)
    3.  [Filename](#filename)
        1.  [Minimum](#minimum)
        2.  [Full Filename Specification](#full-filename-specification)
    3.  [Filesystem](#filesystem)
    4.  [Timestamp-ID](#timestamp-id)
    4.  [Filesystem](#filesystem)
    5.  [Timestamp-ID](#timestamp-id)
        1.  [ostrta-id-N](#ostrta-id-n)

# Specifications


@@ 48,10 49,99 @@ Using common example of selecting tag(s), the plain text CV file implementation 
2.  In addition to the above:
    1.  Implementations SHOULD provide a user selectable option whether to limit selections strictly to the choices in CV file, or allow adding new items "on the fly."

## Data File

For tabular data meeting the following criteria:

1.  multiple fields / columns
2.  more complicated than what is possible with a [CV file](#cv-file-format)
3.  not nested (more than a few levels)
4.  nor otherwise complicated enough to require JSON

&#x2026;we propose a simple yet dramatic improvement to the common CSV file, a return to using basic ASCII control codes which were expressly designed for the purpose, and have none of the (mostly quote related) parsing and escaping issues of CSV files.<sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup>

<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">


<colgroup>
<col  class="org-left" />

<col  class="org-right" />

<col  class="org-left" />

<col  class="org-left" />

<col  class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Seq</th>
<th scope="col" class="org-right">Dec</th>
<th scope="col" class="org-left">Hex</th>
<th scope="col" class="org-left">Abbrev</th>
<th scope="col" class="org-left">Name</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">`^\`</td>
<td class="org-right">28</td>
<td class="org-left">1C</td>
<td class="org-left">FS</td>
<td class="org-left">File Separator</td>
</tr>


<tr>
<td class="org-left">`^]`</td>
<td class="org-right">29</td>
<td class="org-left">1D</td>
<td class="org-left">GS</td>
<td class="org-left">Group Separator</td>
</tr>


<tr>
<td class="org-left">`^^`</td>
<td class="org-right">30</td>
<td class="org-left">1E</td>
<td class="org-left">RS</td>
<td class="org-left">Record Separator</td>
</tr>


<tr>
<td class="org-left">`^_`</td>
<td class="org-right">31</td>
<td class="org-left">1F</td>
<td class="org-left">US</td>
<td class="org-left">Unit Separator</td>
</tr>
</tbody>
</table>

Above table and below quote are from Wikipedia article [C0 and C1 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#Field_separators).<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup>

> Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.

Therefore we propose:

1.  For many types of tabular data, it is enough to simply use US (`^_`) instead of the comma delimiter of CSV, and therefore that is what you SHOULD do.
    -   In which case, quoting is not required, nor escaping of quotes, eliminating all related parsing issues.
2.  Newline MAY be used as record (row) separator (not to be confused with the above ASCII RS character), in fact it SHOULD be used in the common case of simple, flat tabular data.
3.  "Higher" levels (according to above Wikipedia quote) of escape character delimeters (e.g., RS, GS, FS) SHOULD only be used in cases where additional levels of depth / grouping are required.
4.  When depth / complexity (or other requirements) exceed what this can provide, other common, free and open, and widely supported data formats (e.g., JSON, etc.) SHOULD be used instead.

## Filename

The filename spec is based upon (and closely related to) the [timestamp-ID](#timestamp-id) spec.

A simple example (in this case, a photo filename):

    YYYY-MM-DD-HHMM_description_text_here--tag1-tag2-tag3_with_spaces.jpg

### Minimum

The minimum file name considered to be following the spec would be a simple [ostrta-id-4](#ostrta-id-n) with no extension:


@@ 62,10 152,6 @@ In the Elisp implementation, this simple check is performed by the function `ost

### Full Filename Specification

A simple example (in this case, a photo filename):

    YYYY-MM-DD-HHMM_description_text_here--tag1-tag2-tag3_with_spaces.jpg

A much more detailed definition:

    timestamp-id [_description...] [--[tag...]-another_tag...] [.ext]


@@ 104,7 190,7 @@ A much more detailed definition:
    
    3.  The intention of this rule is to insure the timestamp-id portion of the filename remains a reliable identifier.

Alternatively, you MAY leave the base timestamp-id there by itself (perhaps only along with the extension) and implement your metadata in another index file or even a database (although plain text files are always [preferred](README.md)).<sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup>
Alternatively, you MAY leave the base timestamp-id there by itself (perhaps only along with the extension) and implement your metadata in another index file or even a database (although plain text files are always [preferred](README.md)).<sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>

## Filesystem



@@ 145,6 231,83 @@ One thing in particular I noticed so far is that having the intermediate month f

Related closely to the base [filename](#filename) spec, and vice-versa.

The Timestamp-ID specification is a very simple "ISO-like" timestamp:

    YYYY-MM-DD-HHMMSS

<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">


<colgroup>
<col  class="org-left" />

<col  class="org-left" />

<col  class="org-left" />

<col  class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Token</th>
<th scope="col" class="org-left">Value</th>
<th scope="col" class="org-left">Format</th>
<th scope="col" class="org-left">Required?</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">**YYYY**</td>
<td class="org-left">the year</td>
<td class="org-left">4 digit</td>
<td class="org-left">MUST</td>
</tr>


<tr>
<td class="org-left">**MM**</td>
<td class="org-left">the month</td>
<td class="org-left">zero padded</td>
<td class="org-left">MUST</td>
</tr>


<tr>
<td class="org-left">**DD**</td>
<td class="org-left">the day</td>
<td class="org-left">zero padded</td>
<td class="org-left">MUST</td>
</tr>


<tr>
<td class="org-left">**HH**</td>
<td class="org-left">the hour</td>
<td class="org-left">24 hour</td>
<td class="org-left">MUST</td>
</tr>


<tr>
<td class="org-left">**MM**</td>
<td class="org-left">the minute</td>
<td class="org-left">zero padded</td>
<td class="org-left">MUST</td>
</tr>


<tr>
<td class="org-left">**SS**</td>
<td class="org-left">the second</td>
<td class="org-left">zero padded</td>
<td class="org-left">OPTIONAL</td>
</tr>
</tbody>
</table>

Time resolution smaller than one second MAY be defined, but so far there has been no need and thus no discussion what that might look like.

### ostrta-id-N

The notion of `-4` and `-6` comes from the size of the last group of digits in the timestamp:


@@ 190,13 353,13 @@ The notion of `-4` and `-6` comes from the size of the last group of digits in t

Therefore it is an expression of the level of time resolution (minute and second, respectively).

I suppose there MAY eventually be `-8` (or further) but I personally have not come across the need as of yet.
-   Historical note: At one point early on, I was using an underscore between day and time.  But then I realized we are still just talking about degrees of time.  And since they are all similar (time), I think we should simply stick with hyphens throughout.

-   Then we would also need to get into discussion of whether to use period, etc. for fractional seconds or what.  So I suppose we cross that bridge when we come to it.

Historical note: At one point early on, I was using an underscore between day and time.  But then I realized we are still just talking about degrees of time.  And since they are all similar (time), I think we should simply stick with hyphens throughout.
## Footnotes

<sup><a id="fn.1" href="#fnr.1">1</a></sup> Credit for this idea goes to denizens of `#emacs`, who turned me on to [this](https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/) excellent article.

## Footnotes
<sup><a id="fn.2" href="#fnr.2">2</a></sup> See also the [ASCII](https://en.wikipedia.org/wiki/ASCII) article, where there is more discussion in [Control characters](https://en.wikipedia.org/wiki/ASCII#Control_characters) section, and an even better [chart](https://en.wikipedia.org/wiki/ASCII#Control_code_chart) featuring additional helpful data.

<sup><a id="fn.1" href="#fnr.1">1</a></sup> In fact this is the approach I took in the (as yet unreleased) Meme Manager as some memes have far too much metadata to comfortably store in the filename.
<sup><a id="fn.3" href="#fnr.3">3</a></sup> In fact this is the approach I took in the (as yet unreleased) Meme Manager as some memes have far too much metadata to comfortably store in the filename.

M Specifications.org => Specifications.org +67 -14
@@ 53,6 53,40 @@ Using common example of selecting tag(s), the plain text CV file implementation 

   1. Implementations SHOULD provide a user selectable option whether to limit selections strictly to the choices in CV file, or allow adding new items "on the fly."

** Data File

For tabular data meeting the following criteria:

1. multiple fields / columns
2. more complicated than what is possible with a [[#cv-file-format][CV file]]
3. not nested (more than a few levels)
4. nor otherwise complicated enough to require JSON

...we propose a simple yet dramatic improvement to the common CSV file, a return to using basic ASCII control codes which were expressly designed for the purpose, and have none of the (mostly quote related) parsing and escaping issues of CSV files.[fn:1]

|-----+-----+-----+--------+------------------|
| Seq | Dec | Hex | Abbrev | Name             |
|-----+-----+-----+--------+------------------|
| =^\=  |  28 | 1C  | FS     | File Separator   |
| =^]=  |  29 | 1D  | GS     | Group Separator  |
| =^^=  |  30 | 1E  | RS     | Record Separator |
| =^_=  |  31 | 1F  | US     | Unit Separator   |
|-----+-----+-----+--------+------------------|

Above table and below quote are from Wikipedia article [[https://en.wikipedia.org/wiki/C0_and_C1_control_codes#Field_separators][C0 and C1 control codes]].[fn:2]

#+begin_quote
Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.
#+end_quote

Therefore we propose:

1. For many types of tabular data, it is enough to simply use US (=^_=) instead of the comma delimiter of CSV, and therefore that is what you SHOULD do.
   - In which case, quoting is not required, nor escaping of quotes, eliminating all related parsing issues.
2. Newline MAY be used as record (row) separator (not to be confused with the above ASCII RS character), in fact it SHOULD be used in the common case of simple, flat tabular data.
3. "Higher" levels (according to above Wikipedia quote) of escape character delimeters (e.g., RS, GS, FS) SHOULD only be used in cases where additional levels of depth / grouping are required.
4. When depth / complexity (or other requirements) exceed what this can provide, other common, free and open, and widely supported data formats (e.g., JSON, etc.) SHOULD be used instead.

** Filename
   :PROPERTIES:
   :CUSTOM_ID:            filename


@@ 60,6 94,12 @@ Using common example of selecting tag(s), the plain text CV file implementation 

The filename spec is based upon (and closely related to) the [[#timestamp-id][timestamp-ID]] spec.

A simple example (in this case, a photo filename):

#+begin_example
  YYYY-MM-DD-HHMM_description_text_here--tag1-tag2-tag3_with_spaces.jpg
#+end_example

*** Minimum
    :PROPERTIES:
    :CUSTOM_ID:            minimum


@@ 78,12 118,6 @@ In the Elisp implementation, this simple check is performed by the function =ost
    :CUSTOM_ID:            full-filename-specification
    :END:

A simple example (in this case, a photo filename):

#+begin_example
  YYYY-MM-DD-HHMM_description_text_here--tag1-tag2-tag3_with_spaces.jpg
#+end_example

A much more detailed definition:

#+begin_example


@@ 130,7 164,7 @@ A much more detailed definition:

   3. The intention of this rule is to insure the timestamp-id portion of the filename remains a reliable identifier.

Alternatively, you MAY leave the base timestamp-id there by itself (perhaps only along with the extension) and implement your metadata in another index file or even a database (although plain text files are always [[file:README.org::#relying-strictly-on-floss-and-lowest-common-denominator-formats][preferred]]).[fn:1]
Alternatively, you MAY leave the base timestamp-id there by itself (perhaps only along with the extension) and implement your metadata in another index file or even a database (although plain text files are always [[file:README.org::#relying-strictly-on-floss-and-lowest-common-denominator-formats][preferred]]).[fn:3]

** Filesystem
   :PROPERTIES:


@@ 179,6 213,25 @@ One thing in particular I noticed so far is that having the intermediate month f

Related closely to the base [[#filename][filename]] spec, and vice-versa.

The Timestamp-ID specification is a very simple "ISO-like" timestamp:

#+begin_example
  YYYY-MM-DD-HHMMSS
#+end_example

|-------+------------+-------------+-----------|
| Token | Value      | Format      | Required? |
|-------+------------+-------------+-----------|
| *YYYY*  | the year   | 4 digit     | MUST      |
| *MM*    | the month  | zero padded | MUST      |
| *DD*    | the day    | zero padded | MUST      |
| *HH*    | the hour   | 24 hour     | MUST      |
| *MM*    | the minute | zero padded | MUST      |
| *SS*    | the second | zero padded | OPTIONAL  |
|-------+------------+-------------+-----------|

Time resolution smaller than one second MAY be defined, but so far there has been no need and thus no discussion what that might look like.

*** ostrta-id-N
    :PROPERTIES:
    :CUSTOM_ID:            ostrta-id-n


@@ 196,15 249,15 @@ The notion of =-4= and =-6= comes from the size of the last group of digits in t

Therefore it is an expression of the level of time resolution (minute and second, respectively).

I suppose there MAY eventually be =-8= (or further) but I personally have not come across the need as of yet.

- Then we would also need to get into discussion of whether to use period, etc. for fractional seconds or what.  So I suppose we cross that bridge when we come to it.
- Historical note: At one point early on, I was using an underscore between day and time.  But then I realized we are still just talking about degrees of time.  And since they are all similar (time), I think we should simply stick with hyphens throughout.

Historical note: At one point early on, I was using an underscore between day and time.  But then I realized we are still just talking about degrees of time.  And since they are all similar (time), I think we should simply stick with hyphens throughout.

** Footnotes
* Footnotes
   :PROPERTIES:
   :CUSTOM_ID:            footnotes
   :END:

[fn:1] In fact this is the approach I took in the (as yet unreleased) Meme Manager as some memes have far too much metadata to comfortably store in the filename.
[fn:1] Credit for this idea goes to denizens of =#emacs=, who turned me on to [[https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/][this]] excellent article.

[fn:2] See also the [[https://en.wikipedia.org/wiki/ASCII][ASCII]] article, where there is more discussion in [[https://en.wikipedia.org/wiki/ASCII#Control_characters][Control characters]] section, and an even better [[https://en.wikipedia.org/wiki/ASCII#Control_code_chart][chart]] featuring additional helpful data.

[fn:3] In fact this is the approach I took in the (as yet unreleased) Meme Manager as some memes have far too much metadata to comfortably store in the filename.