e5365b17d9ab856128fb67e9f3dabf109afdbf6f — Humaid AlQassimi 9 days ago fc42354 master
blog: added empty character post
1 files changed, 70 insertions(+), 0 deletions(-)

A content/blog/empty-char-go.md
A content/blog/empty-char-go.md => content/blog/empty-char-go.md +70 -0
@@ 0,0 1,70 @@
title: "Detecting the Empty Character in Go"
date: 2020-07-28

Recently, I have been working on an online ticketing system. I have been using
`strings.TrimSpace` for a while, and it worked well. I tested
it with the "empty character" from
[emptycharacter.com](https://emptycharacter.com/), and it failed to detect
whatever whitespace characters it was using.

I thought it was just `strings.TrimSpace` not detecting different types of
Unicode's empty characters. So I replaced it with
`strings.TrimFunc(s, unicode.IsSpace)`, and it still didn't clear the

Disecting that empty character, we find it actually made up of five different

- `U+200F`: Right-To-Left Mark
- `U+200F`: Right-To-Left Mark
- `U+200E`: Left-To-Right Mark
- `U+0020`: Regular Space
- `U+200E`: Left-To-Right Mark

We can see that it is using a control character to prevent the regular space
from being trimmed.

However, Go doesn't list these characters as control characters[^1], so we
cannot use `unicode.IsControl`. But it is included in the
`unicode.Bidi_Control` subset. Here's my first solution:

func isImproperChar(r rune) bool {
	return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)

strings.TrimFunc(s, IsImproperChar)

This would trim away at bi-directional control characters, which is probably
a really bad idea especially in systems supporting Arabic, Hebrew, or other
right-to-left languages.

So we can just trim it to measure the length, then discarding the trimmed

func IsEmpty(s string) bool {
	return len(strings.TrimFunc(s, func(r rune) bool {
		return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
	})) == 0

Try it out on [the Go playground](https://play.golang.org/p/S74NV_KP0Xv)!

Have a better solution? Please let me know!

*This is my eighth post in the [#100DaysToOffload](https://100daystooffload.com)

[^1]: Not listed on
[unicode/tables.go:7108](https://golang.org/src/unicode/tables.go#L7108) as
`pC` (control character), but rather it's included in the
[Bidi_Control](https://golang.org/src/unicode/tables.go#L5673) subset.
[^2]: I thought `unicode.IsSpace` wasn't detecting detecting some types of
spaces. But after [some testing](https://play.golang.org/p/S6T9gK5f8lw), that
doesn't seem to be the case.