Skeyer gets a Jenkins server

Since my primary laptop was under repair, here is a quick update for the last couple of weeks:

I wrote a bunch of tests and benchmarks for Skeyer and set up a Jenkins server to run them for me on every git push. Since I couldn’t get the BitBucket’s Jenkins hook to work, I manually added a POST hook like this:

http://<username>:<user authentication key>@<jenkins server address>/job/<project name>/build?token=<build token>

I even picked up a few WebDev skills to write a shiny UI for the Test Reports, generated from QTestlib’s xml files (I tried looking for the existing tools, but none of them seem to be working):

 

Yep. I actually manually created the test cases for the top 1000 words in the english Dictionary.
I probably should clean this code up and release a QTestlib Reports tool. Later on….

 

 

By the end of this week, I hope to stop working on the “Engine” part. So next week I can actually fix the Maliit plugin to make it ready for an alpha build.

 

Cheers! 🙂

Skeyer gets a Jenkins server

Under the hood of Skeyer…

Here comes a little insight (and bragging!) into the internals of Skeyer – at least the way it works as of now…

If i give you a grid of keys – a fairly general assumption about most keyboards, and ask you to swipe the word “hey” on it, what would you do? You start at the H key then swipe till you reach the E key and then swipe till you reach the Y key. If you were to do that on your QWERTY keyboard, the keys you would have most probably pressed are : “HGFTRERTY”. Lets call this swipeHint of a word…

If you were to write a program to compute this, it would probably look like:


function swipeHint( word )
{
    if(word.length < 2)
        return word
    var result = ''
    for( var i = 0; i < word.length - 1; i++ ){
        result += word[i] + keys_between( word[i], word[i+1] )
    }
    result += word[word.length-1]
    return result
}

Pretty straight forward isn’t it? Now, the problem Skeyer was trying to solve boils down to that of a dictionary search for the input pattern. Instead of searching for words, we search for swipeHint(words), and then return the words related to the swipeHints.

I tried using an existing spelling corrector (hunspell) for this task… But it was of no use. Its main problems were:

  1. It was VERY slow… (Notice the VERY in caps!)
  2. It had no clue about what words are more “probable” than others.. Some one told me that I use the word “like” a lot. So if i mistype that as “loke”, it was suggesting me “luke” instead of “like”.
  3. It was very noisy… It had no clue about the layout of the keyboard, so it ended up suggesting many words which are totally unrelated to the word you swiped/intended to type. for eg. if i mistyped “meat” and typed “meay”, it was suggesting me “mean” instead of “meat”.

So I dug around on the internet a little and found Norvig’s Spell Checker, which was basically unusuable if you want to find words which differ by more than 3 characters. So, I wrote a very straightforward implementation of such a matcher myself, something along the lines of:


function findSimilarWords( input, dictionary, count )
{
    var results = []

    for(var word in dictionary){
        // Bayes Theorem
        // Our dictionary can also provide probabilit(word), based on the user's usage...
        var score = similarity( input, swipeHint(word) )*probability(word)
        if( score > threshold )
            results.add(word, score)
      }
    return results
}

Now the question is: How do we compute similarity( string1, string2 )?

Once again, the internet to the rescue and I found: Edit distanceNikita’s blog explains it a lot better.

I implemented an editDistance method whose cost function for substitution of two characters returns the distance between the characters you are trying to substitute on the keyboard. The only problem was that it was too.. slow.

The dynamic programming approach for it was an O(n*m) algorithm, where n, m are the lengths of patterns we are trying to compare.
To put things into perspective, my dictionary has 150,000 words. And an average word length of 30. Even if i were to search 10% of it, (you know… words starting wtih the same character as the first character of the input), that would be: 15000*30*30 = 13,500,000 operations, which is definitely waaay too much for your little phone.

Now a series of neat little tricks to make this work:

  1. Since we are checking for a threshold, as per Nikita’s blog, I stopped computing similarity after finding the threshold number of differences. So the similarity(string1, string2), now became O(k*min(m,n)). If we are looking for words not differing by more than 50%, for an average word length of 30, k would be 15, and the total number of operations would be 6750000. That’s still waaay too much.
  2. We are interested in finding no more than 5 words (Even that’s too much if you ask me, but lets keep that at 5 for now). So If you already found 5 words whose score is 0.8, there is no point computing the similarity of the input with words whose probability of occurance is 50% as score = probability*similarity.So skip computing the edit distance for a word, if it’s probability < minScore(result). For the word “beautiful”, whose swipeHint is “bvgfdrewsasdfghyuyttyuiuhgfgyuiol”, this brings down the cost to around 2700000.
  3. As we already know what the minimum score our word needs to enter the results, why still keep k at 50% of the input length? why not less? we are computing edit distance for not more than k differences anyway

    // we know that
    similarity = 1.0 - editDistance( string1, string2, k )/maxLength(string1,string2)


    // and that
    score = similarity(input, word, k)*probability(word)


    // As we want a score > results' minScore
    k = min( k, ( maxLength(input, word)*( 1.0 - results.minScore()/probability(word) ) ) )

    This brings down the cost of “beautiful” to ~1700000 operations (900msec) …

  4. Okay, so even that’s too expensive. On my 8 year old laptop with Core 2 Duo processor, that gives me results in about 900msec… and I assume it would take the same amount of time on your Jolla/Ubuntu touch devices…You don’t want to swipe 1 word per second do you? If the word lengths differ by more than k, it is obvious their edit distance would be more than k. So skip all the words whose lengths differ by more than k. As it turns out this helps, but not so much.“beautiful” now costs ~1500000 operations (700msec).
  5. I tried looking at other distance metrics for quickly estimating the similarity of two strings. The first nice one was Hamming Distance ( http://en.wikipedia.org/wiki/Hamming_distance ), but sadly it only applies to strings of equal length. So I tried computing Hamming Distance for the “signature” of two strings. i.e, if the a string A has the alphabets “C”, “A”, “T”, it’s
    signature = 1 << index(C) | 1 << index(A) | 1 << index(T)
    As it turns out this absolutely sucks as a measure of similarity for two normal strings. But the insight this gave me was that if the number of characters of each alphabet in the two strings differs by more than k, then their editDistance would obviously be more than k, so don’t bother computing edit distance for them… this can be done in O(n).
    “beautiful” now costs about about 132370 operations or ~101msec, with a lazy implementation of this, which isn’t too bad for a brute force matcher don’t you think? From 5 secs to 101 msec without too much effort 😀

 

Now, I m getting back to some more… less interesting tasks from my Todo list for Skeyer.. Trello

Btw. I am still looking for a job, So if you know anyone who is hiring for a role that you think suits me ( My Resume ), please let me know/let them know 🙂

Under the hood of Skeyer…

Skeyer gets a maliit plugin – Part 1

 

Hopefully, this image shall be a video for the next blogpost
Skeyer gets a maliit plugin – Part 1

Last week I started refactoring Skeyer into libskeyer and the demo.The main reason to do this is to:

1) Easily write automated tests to benchmark the performance and precision of Skeyer’s algorithm(s): I’ve realized I need a more objective way to measure the performance and precision of Skeyer than to manually swipe a few words and look at the suggestions and make a wild guess if it is good or not.

2) I have also started writing a Maliit plugin for Skeyer(based on the examples from Maliit-framework): I’ve realized it is important to build the maliit plugin first. That way, I’ll have better idea of constraints on Skeyer, based on my usage of it on a real device. And of course to easily show off Skeyer to everyone. A real world application seems to be much more…. interesting… than a tech demo.

However, this plugin is still a work in progress and needs a lot more work to be usable. Right now I seem to be facing weird flickering issues with the plugin window and can’t even interact with it. I have no clue where to even look for clues. So my immediate plan of action(for this weekend/next weekend) is to first finish off the Maliit plugin. (I’m looking at the Ubuntu Touch/Open WebOS keyboards for inspiration, help, code. Both of them use Maliit.)

 

I have also started looking at implementing a very interesting new feature for Skeyer, based on the way I’ve been using my Nokia N9 lately (Yup. No other phone seems to beat the N9 experience for me so far). I think this feature will make Skeyer a lot more useful. But I need to really build this feature first to test it’s worth. So I will talk more about this once I have something to show(off? 🙂 ).

To recap, my immediate priorities now are:

1) Finish off the maliit plugin (and dig more into the Ubuntu touch/Open WebOS virtual keyboard).

2) Write the tests/automated benchmarks.

3) Start looking into implementing the interesting feature for Skeyer. I’m thinking of releasing(AND open sourcing) Skeyer, once I have a set of tests and benchmarks. Hmm.. looks like it’s going to be busy weekends for a couple of months 🙂

Cheers!

 

Update: grrrrr… My hard disk crashed and I lost the work I did on the Refactoring and the Maliit plugin I was working on. So I am having to redo it again. So as usual “expect the delays”…

Skeyer gets a maliit plugin – Part 1

Skeyer gets a facelift

Work In Progress

1) Implemented a basic User Dictionary functionality

2) Cleaned up the Keyboard file format. I think I need to do this once more.

3) Experimented with a few more custom dictionaries generated from stardict dictionaries. The prediction performance is now satisfactory.

4) Slight improvements to the word prediction algorithm based on the perplexity. The prediction precision in general has improved quite a lot, except in the case of short words for which you have to swipe a long track (like “tfdsasdfghjijnbhgt” -> “taint”) where it is now abysmal.

5) The facelift for the UI has started. Something tells me this will be a lot trickier to get right though.

Next Tasks: ——————–

1) Implement the long press and extended keypress popup functionality for the keys.

2) Annoy Maliit folks to help me compile the Maliit plugin. (Update: DONE. Thanks to bfederau, I got to compile and run Maliit on my desktop.) – Now time to develop the Maliit plugin.

3) Fix the annoyances with the Prediction functionality.

4) Implement the state machine to take care of actual text input.

5) Finish off the UI facelift

Skeyer gets a facelift

On the way to Skeyer…

Recently I’ve had some free time on my hands , So I’ve started looking at writing a Maliit plugin for a gesture based keyboard.. I spent the last weekend coming up with a basic Demo of such a keyboard in Qt/QML … Apparently it’s really not that difficult.

This was really a quick and dirty project to test the waters and I’d say i’m pleasantly surprised how easy and how performant this was. Even on my 5 year old Samsung Galaxy Captivate( 1Ghz processor, 370MB RAM), there was absolutely no visible lag in predicting ~5 words from a dictionary of around 30000 words.

And here is the demo of what’s working so far:

What works:

1) A basic disambiguation algorithm, which says how likely “hgfdedfghjklo” is to be the word “Hello”.  At it’s core, this is a modified Levenshtein Distance algorithm.

2) A basic prediction function which ranks words like “Hello” “He” “Hell” … in the increasing likelihood that they generally  appear. ( Currently I’m using a basic Unigram generated from http://norvig.com/big.txt – written for his spell checker )

3) All the necessary infrastructure to parse keyboard definition files and create a virtual keyboard. ( I am designing this with multiple languages/keyboard layouts in mind )

TODOs :

1) Improve the word disambiguation algorithm and the feedback system. (quite interesting, probably – priority 1)

2) Improve the prediction precision by analyzing more real world text and analyzing user’s error model – priority 3)

3) Add a gesture to support inputting repeating characters. (priority 2)

4) Implement a proper state machine in engine to support things like backspace, arrow keys, auto capitalization etc.. (priority 1)

5) Make a Maliit plugin out of it. (priority 2 – mainly because it seems like it would be faster to develop and test the demo than the whole Maliit plugin)

6) Look into multi language support. (priority 5 – kind of too much for me)

7) Make it prettier (priority 4)

I’m hoping to make this FOSS… But I haven’t yet decided if I would want to put a price on this => the code stays behind closed doors for now. So if you don’t hear any updates from from me about this in 1-2 months from now (I’m only working on this during weekends, and spare time as of now…), feel free to ping me and I’ll open up the source code immediately.

Thanks to :

1) norvig.com/spell-correct.html
2) codeproject.com/Articles/36869/Fuzzy-Search
3) presage.sourceforge.net/
4) www.iconfinder.com/
5) invokeit.wordpress.com/frequency-word-lists/
6) abloz.com/huzheng/stardict-dic/dict.org/

Cheers

On the way to Skeyer…

Moving on…

Since I m long overdue for this blogpost, here are a few pictures to say it all……

Thanks a ton for the Maemo Community: The Puppy File Server won this gorgeous device in the Maemo Coding Competition 2012!
Got to have a really amazing conversations with really amazing people during the 3 days of interviews for Dream Industries in Moscow! I really wish them a very good luck with dreamindustriesunderattack.tumblr.com
It was really an exciting time for Luna: Apart from numerous improvements, Thanks to Michael Holub, Aditya Bhatt, Smit Shah, The upcoming VSXu 0.4 even got some OS X love!

2012 really was an year full of amazing conversations, really interesting ideas and interestingly interesting events for me….. since i might or might not ever talk about them again……. let me simply jot them down briefly:

a) Why the current state of Mobile Advertising is done horribly wrong… – a conversation that probably became one of the major reasons i quit my previous job:

b) How do we build an ideal music recommendation service: simply(and magically) suggest all the songs a user likes – then how would the user even know if he likes/dislikes a song? Adding a sense of progress/exploration?  A sense of flow? – a conversation that only lead to more questions…

c) Talked about even more interesting things I never thought I would even think about, with even more amazing people – which finally helped me choose the path i currently am on…

But as we all know, all talk, no work makes jack a dull boy. So

I’ve joined e-GITS as a Software Engineer.
Moving on…

4 weeks of unemployement

And its that time for a bit of looking back at things…

So the first week was a bit of cycling for pleasure and meeting up old friends I ve been wanting to meet for over a year…
And the next week I made the Puppy File Server and pushed it to the Nokia Store! 😀 (Visit the project page for more details)
And the Week After that I rewrote the forgotten VSXu-Tomahawk Integration to get it ready for primetime! 😀
And then…. Experimented a little bit with meteorjs, half finished a surprise pet project with VSXu, Resurrected my good (9 year) old machine because my fav. 5 year old laptop decided to die (fried up motherboard because of overheating :/ ) and umm… thats mostly it i guess…

And Then a week of tooooooootal slacking off….. i cant even find how/where the time leaked away ………….  So i guess its time i  start looking for a job … so let me know if you know of any interesting opportunities for me 🙂

4 weeks of unemployement