OCR | Ministry Of Touch

As promised yesterday, today I’m going to spend a little time describing how we parse out the Words with Friends board to work out what tiles have already been played.

The screen shot contains a lot of detail but for the purposes of this post, we’re only interested in the playing area itself. While we could spend a lot of time using computer vision techniques to locate the main playing area, for example locating horizontal and vertical lines, looking for the corners of the playing area etc., it’s really not worth it because for a given device, the board is always in the same part of the image. If we can snapshot the image on iPad, iPhone 3, 4 and 5 then we can just count pixels to work out where the board is.

Cells on the board

We think of the board as being made up of a number of cells – 15 x 15 to be exact. A cell is just a spot where we can play a tile. The main things to note about cells is that they are square and that there are exactly 15 of them filling the width of the screen. This means we can work out how big they are by dividing the width of the snapshot image by 15.

Once we’ve worked out these measurements, we can find the right part of the image for any given cell by simply using the offset from the top of the board area and adding in an appropriate X and Y offset based on the cell width.

The next task is working out whether there is a tile played at a given cell or not.

My first thought was that I could do this by looking for a mostly yellow cell at each location but there’s a problem with that. As mentioned yesterday there is a slight colour gradient across the board which means that tiles at the top of the board are a darker yellow than tiles in the middle of the board.

Also, where a tile is played on a double or triple word or letter score, the colour of the underlying cell bleeds through. This means that a tile in the upper half of the board on a double word score (red) has pretty much exactly the same colour as an empty triple word score tile (orange).

This doesn’t mean we can’t use colour at all, it just means that we need to be aware of where a tile is on the board and what’s underneath it. There are six types of cell, each of which has it’s own colour characteristics. These are:

Empty – mostly grey
Double letter – mostly blue
Triple letter – mostly green
Double word – mostly red
Triple word – mostly orange
Middle tile – a white cross on a darkish red background

There are two important facts here. Firstly, we know exactly where we expect to find these cells on the board and secondly, the colour gradient used on the tiles is not used on the board cells. This means every TW cell is the same colour as every other TW cell.

Tile average colours

We first calculate the mean colour across each type of cell. You can see these colours here. Note that the colours are expressed in L*a*b* rather than the normal RGB triplets which is why there are negative values.

Now when we look at a cell on the board, we can calculate the average colour across that cell and if it differs from the expected values by more than some threshold (I use 10%) then we assume that it has a tile on it.

Processing of tile E

Once we have the locations of each tile, we can extract that cell and do some further image processing to work out what the letter on that tile is. First I take the grey scale version of the tile and then threshold using Otsu’s method. This gives me a black and white version of the tile. From here, I calculate the bounding box of the letter itself (i.e. tripping out the score in the corner of the tile) before trying to recognise the letter.

OCR in general can be difficult and expensive however in this case, we have a very limited domain. There are only 26 letters to recognise and they are all positioned in the same way within a cell give or take a pixel or two.

This paper describes a set of 16 different measurements that can be made across a set of letters from a font to distinguish between them. I use these measurements, normalised to percentages, to build a descriptor of each possible letter A-Z and then or each tile in the space, I compare to the stored descriptor, calculating the Euclidean distance from each to select the nearest neighbour.

Problem Tiles

This gives pretty fair, though not perfect results as seen yesterday. One of the key problems is with the letter W. As you can see from the processing of that tile, the score in the corner of the tile actually merges with the letter itself in the processed tile. This may be making it more difficult to recognise. Similarly, the red circle containing the score of the last turn played can also corrupt the letter on a tile making it hard to recognise. The next step is to improve the pre-processing of tile images to hopefully get better recognition.

I’m going to attempt to improve these results by trying to tidy up the source images a little.

Well after a couple of days of effort, I’ve hit my first major milestone. Using screen grabs from Words with Friends boards on the iPhone, I’m now able to parse out the board. Given the input:

My board parser produces the following output.

            V Q
           YE A
           SR R
            BAM
             D
       CLEFT AT
J     SO     GI
IS   ZOO KITTEN
BE   AXLE  RI N
?POOFS I  HAND?
 T  O  E CINE
    ROUSTER
   DARN  LEG

Now there are a couple of mistakes here where letters have been mis-recognised and more worryingly, there are a couple of spots where the code hasn’t even worked out that there is a tile present. Nevertheless, for a simple algorithm I’m pretty happy with this as a first pass!

The broad approach here is to first locate tiles on the board using colour as a guideline. For each tile we then try to recognise the character represented on the tile. Since there are only 26 distinct tiles, this is a reasonably straightforward task (compared for example with recognising handwriting!)

Obviously I’m going to revisit the OCR code and train on more board positions (I only trained on 8 boards to get this level of recognition) but I think this validates the basic approach.

Beyond training, other things I might need to take a look at are:

1. Getting rid of the red circle with the score in it. This is definitely corrupting my character recognition. You can see the mis-recognised A was marred by the score.

2. If you kind of squint at the screen, you’ll see that there’s a colour gradient across the tiles. Tiles at the very top and bottom of the screen are darker yellow while tiles in the central band of the board are much lighter. Since my tile recogniser relies on detecting colours, it may be the paleness of these central tiles that’s causing them not to be detected.

Anyway, it’s late now so I’m off to bed but will post a much more detailed description of the algorithms used to find tiles and recognise letters tomorrow.

For those of you who can’t wait, you might like to sneak a peek at Peter Frey and David Slate’s paper, Letter Recognition Using Holland-Style Adaptive Classifiers.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Ministry Of Touch

Musings of a mobile app developer

Tag Archives: OCR

Parsing the WWF board

OCR – Early results encouraging