As promised yesterday, today I’m going to spend a little time describing how we parse out the Words with Friends board to work out what tiles have already been played.
The screen shot contains a lot of detail but for the purposes of this post, we’re only interested in the playing area itself. While we could spend a lot of time using computer vision techniques to locate the main playing area, for example locating horizontal and vertical lines, looking for the corners of the playing area etc., it’s really not worth it because for a given device, the board is always in the same part of the image. If we can snapshot the image on iPad, iPhone 3, 4 and 5 then we can just count pixels to work out where the board is.
We think of the board as being made up of a number of cells – 15 x 15 to be exact. A cell is just a spot where we can play a tile. The main things to note about cells is that they are square and that there are exactly 15 of them filling the width of the screen. This means we can work out how big they are by dividing the width of the snapshot image by 15.
Once we’ve worked out these measurements, we can find the right part of the image for any given cell by simply using the offset from the top of the board area and adding in an appropriate X and Y offset based on the cell width.
The next task is working out whether there is a tile played at a given cell or not.
My first thought was that I could do this by looking for a mostly yellow cell at each location but there’s a problem with that. As mentioned yesterday there is a slight colour gradient across the board which means that tiles at the top of the board are a darker yellow than tiles in the middle of the board.
Also, where a tile is played on a double or triple word or letter score, the colour of the underlying cell bleeds through. This means that a tile in the upper half of the board on a double word score (red) has pretty much exactly the same colour as an empty triple word score tile (orange).
This doesn’t mean we can’t use colour at all, it just means that we need to be aware of where a tile is on the board and what’s underneath it. There are six types of cell, each of which has it’s own colour characteristics. These are:
- Empty – mostly grey
- Double letter – mostly blue
- Triple letter – mostly green
- Double word – mostly red
- Triple word – mostly orange
- Middle tile – a white cross on a darkish red background
There are two important facts here. Firstly, we know exactly where we expect to find these cells on the board and secondly, the colour gradient used on the tiles is not used on the board cells. This means every TW cell is the same colour as every other TW cell.
We first calculate the mean colour across each type of cell. You can see these colours here. Note that the colours are expressed in L*a*b* rather than the normal RGB triplets which is why there are negative values.
Now when we look at a cell on the board, we can calculate the average colour across that cell and if it differs from the expected values by more than some threshold (I use 10%) then we assume that it has a tile on it.
Once we have the locations of each tile, we can extract that cell and do some further image processing to work out what the letter on that tile is. First I take the grey scale version of the tile and then threshold using Otsu’s method. This gives me a black and white version of the tile. From here, I calculate the bounding box of the letter itself (i.e. tripping out the score in the corner of the tile) before trying to recognise the letter.
OCR in general can be difficult and expensive however in this case, we have a very limited domain. There are only 26 letters to recognise and they are all positioned in the same way within a cell give or take a pixel or two.
This paper describes a set of 16 different measurements that can be made across a set of letters from a font to distinguish between them. I use these measurements, normalised to percentages, to build a descriptor of each possible letter A-Z and then or each tile in the space, I compare to the stored descriptor, calculating the Euclidean distance from each to select the nearest neighbour.
This gives pretty fair, though not perfect results as seen yesterday. One of the key problems is with the letter W. As you can see from the processing of that tile, the score in the corner of the tile actually merges with the letter itself in the processed tile. This may be making it more difficult to recognise. Similarly, the red circle containing the score of the last turn played can also corrupt the letter on a tile making it hard to recognise. The next step is to improve the pre-processing of tile images to hopefully get better recognition.
I’m going to attempt to improve these results by trying to tidy up the source images a little.