Move to sr.ht
Fix package name
Fix persian language code
Natural language detection for Go.
Installation:
go get -u github.com/abadojack/whatlanggo
Simple usage example:
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
//Blacklist
options := whatlanggo.Options{
Blacklist: map[whatlanggo.Lang]bool{
whatlanggo.Ydd: true,
},
}
info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)
fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])
//Whitelist
options1 := whatlanggo.Options{
Whitelist: map[whatlanggo.Lang]bool{
whatlanggo.Epo: true,
whatlanggo.Ukr: true,
},
}
info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}
For more details, please check the documentation.
Go 1.8 or higher
The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.
It is based on the following factors:
rate
in the code base.Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:
For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.
whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.
Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.