27 Jun 2017By Tigran Galstyan and
A few months ago, we showed how effectively an LSTM network can perform text
For humans, transliteration is a relatively easy and interpretable task, so it’s a good task for interpreting what the network is doing, and whether it is similar to how humans approach the same task.
In this post we’ll try to understand: What do individual neurons of the network actually learn? How are they used to make decisions?
About half of the billions of internet users speak languages written in non-Latin alphabets, like Russian, Arabic, Chinese, Greek and Armenian. Very often, they haphazardly use the Latin alphabet to write those languages.
Привет: Privet, Privyet, Priwjet, …كيف حالك: kayf halk, keyf 7alek, …Բարև Ձեզ: Barev Dzez, Barew Dzez, …
So a growing share of user-generated text content is in these “Latinized” or “romanized” formats that are difficult to parse, search or even identify. Transliteration is