Tag Archives: iPhone Development

Parsing the WWF board

As promised yesterday, today I’m going to spend a little time describing how we parse out the Words with Friends board to work out what tiles have already been played.

The screen shot contains a lot of detail but for the purposes of this post, we’re only interested in the playing area itself. While we could spend a lot of time using computer vision techniques to locate the main playing area, for example locating horizontal and vertical lines, looking for the corners of the playing area etc., it’s really not worth it because for a given device, the board is always in the same part of the image. If we can snapshot the image on iPad, iPhone 3, 4 and 5 then we can just count pixels to work out where the board is.


Cells on the board

We think of the board as being made up of a number of cells – 15 x 15 to be exact. A cell is just a spot where we can play a tile. The main things to note about cells is that they are square and that there are exactly 15 of them filling the width of the screen. This means we can work out how big they are by dividing the width of the snapshot image by 15.

Once we’ve worked out these measurements, we can find the right part of the image for any given cell by simply using the offset from the top of the board area and adding in an appropriate X and Y offset based on the cell width.

The next task is working out whether there is a tile played at a given cell or not.

My first thought was that I could do this by looking for a mostly yellow cell at each location but there’s a problem with that. As mentioned yesterday there is a slight colour gradient across the board which means that tiles at the top of the board are a darker yellow than tiles in the middle of the board.

Also, where a tile is played on a double or triple word or letter score, the colour of the underlying cell bleeds through. This means that a tile in the upper half of the board on a double word score (red) has pretty much exactly the same colour as an empty triple word score tile (orange).

This doesn’t mean we can’t use colour at all, it just means that we need to be aware of where a tile is on the board and what’s underneath it. There are six types of cell, each of which has it’s own colour characteristics. These are:

  • Empty – mostly grey
  • Double letter – mostly blue
  • Triple letter – mostly green
  • Double word – mostly red
  • Triple word – mostly orange
  • Middle tile – a white cross on a darkish red background

There are two important facts here. Firstly, we know exactly where we expect to find these cells on the board and secondly, the colour gradient used on the tiles is not used on the board cells. This means every TW cell is the same colour as every other TW cell.

Tile average colours

Tile average colours

We first calculate the mean colour across each type of cell. You can see these colours here. Note that the colours are expressed in L*a*b* rather than the normal RGB triplets which is why there are negative values.

Now when we look at a cell on the board, we can calculate the average colour across that cell and if it differs from the expected values by more than some threshold (I use 10%) then we assume that it has a tile on it.


Processing of tile E

Once we have the locations of each tile, we can extract that cell and do some further image processing to work out what the letter on that tile is. First I take the grey scale version of the tile and then threshold using Otsu’s method. This gives me a black and white version of the tile. From here, I calculate the bounding box of the letter itself (i.e. tripping out the score in the corner of the tile) before trying to recognise the letter.

OCR in general can be difficult and expensive however in this case, we have a very limited domain. There are only 26 letters to recognise and they are all positioned in the same way within a cell give or take a pixel or two.

This paper describes a set of 16 different measurements that can be made across a set of letters from a font to distinguish between them. I use these measurements, normalised to percentages,  to build a descriptor of each possible letter A-Z and then or each tile in the space, I compare to the stored descriptor, calculating the Euclidean distance from each to select the nearest neighbour.


Problem Tiles

This gives pretty fair, though not perfect results as seen yesterday. One of the key problems is with the letter W. As you can see from the processing of that tile, the score in the corner of the tile actually merges with the letter itself in the processed tile. This may be making it more difficult to recognise. Similarly, the red circle containing the score of the last turn played can also corrupt the letter on a tile making it hard to recognise. The next step is to improve the pre-processing of tile images to hopefully get better recognition.

I’m going to attempt to improve these results by trying to tidy up the source images a little.


OCR – Early results encouraging

Well after a couple of days of effort, I’ve hit my first major milestone. Using screen grabs from Words with Friends boards on the iPhone, I’m now able to parse out the board. Given the input:


My board parser produces the following output.

            V Q
           YE A
           SR R
       CLEFT AT
J     SO     GI

Now there are a couple of mistakes here where letters have been mis-recognised and more worryingly, there are a couple of spots where the code hasn’t even worked out that there is a tile present. Nevertheless, for a simple algorithm I’m pretty happy with this as a first pass!

The broad approach here is to first locate tiles on the board using colour as a guideline. For each tile we then try to recognise the character represented on the tile. Since there are only 26 distinct tiles, this is a reasonably straightforward task (compared for example with recognising handwriting!)

Obviously I’m going to revisit the OCR code and train on more board positions (I only trained on 8 boards to get this level of recognition) but I think this validates the basic approach.

Beyond training, other things I might need to take a look at are:

1. Getting rid of the red circle with the score in it. This is definitely corrupting my character recognition. You can see the mis-recognised A was marred by the score.

2. If you kind of squint at the screen, you’ll see that there’s a colour gradient across the tiles. Tiles at the very top and bottom of the screen are darker yellow while tiles in the central band of the board are much lighter. Since my tile recogniser relies on detecting colours, it may be the paleness of these central tiles that’s causing them not to be detected.

Anyway, it’s late now so I’m off to bed but will post a much more detailed description of the algorithms used to find tiles and recognise letters tomorrow.

For those of you who can’t wait, you might like to sneak a peek at Peter Frey and David Slate’s paper, Letter Recognition Using Holland-Style Adaptive Classifiers.

So today I had the idea that an interesting project would be to build an app which can provide ‘assistance’ with scrabble like games — or cheating if you like. In particular I’d like to set it loose on Words With Friends in the first instance. Words With Friends is a Zynga game which is very similar to Scrabble but has a different board layout and number of tiles (presumably for legal reasons). It’s available on mobile devices as well as via Facebook and is very widely played.

What I want to be able to do is to take an image of the board position along with my tiles and make a recommendation as to where to play for a maximum score.  The output should be an image of the board as captured with the tiles to be played and their locations indicated.

The basic pipeline is:

  1. Capture the image
  2. Parse the image into a data structure amenable to search and scoring
  3. Using the players tiles, and an appropriate dictionary, work out the highest scoring move possible
  4. Render an output image with the best move played.

Moving beyond Words With Friends, once the initial framework is in place, it should be possible to snapshot a real life Scrabble board and do the same kind of thing. If I can get the code fast enough, ideally I’d like to be able to make this real time as an AR application processing realtime video from a mobile phone camera and showing the move overlaid on video of the board.

But one step at a time…I’ve decided to start with Words With Friends because it’s very easy to capture an image directly from the iPhone so I can focus on the harder parts of the code.

After a bit of Googling I’ve decided to build an implementation of Andrew Appel and Guy Jacobsen’s paper The World’s Fastest Scrabble Program as my WWF solver engine. My target platform is going to be iPhone initially and the aim is to have the whole app onboard. Depending on dictionary size, this may be a little enthusiastic so I may end up requiring a server side component but we will see.

Here’s my vision of how the app will work:

Picture of words with friends board and tile rack

Input Image including tiles

Picture of words with friends board showing best move

Output image showing best move