Tag Archives: Image Processing

OCR – Early results encouraging

Well after a couple of days of effort, I’ve hit my first major milestone. Using screen grabs from Words with Friends boards on the iPhone, I’m now able to parse out the board. Given the input:

Image

My board parser produces the following output.

            V Q
           YE A
           SR R
            BAM
             D
       CLEFT AT
J     SO     GI
IS   ZOO KITTEN
BE   AXLE  RI N
?POOFS I  HAND?
 T  O  E CINE
    ROUSTER
   DARN  LEG

Now there are a couple of mistakes here where letters have been mis-recognised and more worryingly, there are a couple of spots where the code hasn’t even worked out that there is a tile present. Nevertheless, for a simple algorithm I’m pretty happy with this as a first pass!

The broad approach here is to first locate tiles on the board using colour as a guideline. For each tile we then try to recognise the character represented on the tile. Since there are only 26 distinct tiles, this is a reasonably straightforward task (compared for example with recognising handwriting!)

Obviously I’m going to revisit the OCR code and train on more board positions (I only trained on 8 boards to get this level of recognition) but I think this validates the basic approach.

Beyond training, other things I might need to take a look at are:

1. Getting rid of the red circle with the score in it. This is definitely corrupting my character recognition. You can see the mis-recognised A was marred by the score.

2. If you kind of squint at the screen, you’ll see that there’s a colour gradient across the tiles. Tiles at the very top and bottom of the screen are darker yellow while tiles in the central band of the board are much lighter. Since my tile recogniser relies on detecting colours, it may be the paleness of these central tiles that’s causing them not to be detected.

Anyway, it’s late now so I’m off to bed but will post a much more detailed description of the algorithms used to find tiles and recognise letters tomorrow.

For those of you who can’t wait, you might like to sneak a peek at Peter Frey and David Slate’s paper, Letter Recognition Using Holland-Style Adaptive Classifiers.

Advertisements

So today I had the idea that an interesting project would be to build an app which can provide ‘assistance’ with scrabble like games — or cheating if you like. In particular I’d like to set it loose on Words With Friends in the first instance. Words With Friends is a Zynga game which is very similar to Scrabble but has a different board layout and number of tiles (presumably for legal reasons). It’s available on mobile devices as well as via Facebook and is very widely played.

What I want to be able to do is to take an image of the board position along with my tiles and make a recommendation as to where to play for a maximum score.  The output should be an image of the board as captured with the tiles to be played and their locations indicated.

The basic pipeline is:

  1. Capture the image
  2. Parse the image into a data structure amenable to search and scoring
  3. Using the players tiles, and an appropriate dictionary, work out the highest scoring move possible
  4. Render an output image with the best move played.

Moving beyond Words With Friends, once the initial framework is in place, it should be possible to snapshot a real life Scrabble board and do the same kind of thing. If I can get the code fast enough, ideally I’d like to be able to make this real time as an AR application processing realtime video from a mobile phone camera and showing the move overlaid on video of the board.

But one step at a time…I’ve decided to start with Words With Friends because it’s very easy to capture an image directly from the iPhone so I can focus on the harder parts of the code.

After a bit of Googling I’ve decided to build an implementation of Andrew Appel and Guy Jacobsen’s paper The World’s Fastest Scrabble Program as my WWF solver engine. My target platform is going to be iPhone initially and the aim is to have the whole app onboard. Depending on dictionary size, this may be a little enthusiastic so I may end up requiring a server side component but we will see.

Here’s my vision of how the app will work:

Picture of words with friends board and tile rack

Input Image including tiles

Picture of words with friends board showing best move

Output image showing best move