The photobomb
As my first semester of grad school comes to an end(!), I wanted to share a final post about something I think is super cool, and happens to be related to my research on Adaptive Randomized SVD. But before that, let's talk about photobombs.
One of the key ingredients to any good photobomb, besides the occasion that you're bombing, is the photo, preferably a digital one so that you can share it on the social media. To a computer, a digital photo is just a matrix of pixels. Black and white images are represented by a matrix with elements ranging from 0 (black) to 255 (white). Therefore, there are 256 different levels of gray. Color images use three matrices: Red, Green and Blue, or RGB.
The more focused and crisp an image, the more pixels it requires. More pixels means bigger matrices. And while we all like our photos to be their sharpest, this comes at the cost of more storage space on your computer or phone. This is because your CPU needs to process a large matrix of pixels. You can also imagine that companies like Google and Facebook which have to store billions of pictures need to deal with all these large matrices somehow.
There are many ways to handle this data problem (more effectively than other problem-solving approaches). One way is to compress the images. Now, when you compress, you lose some accuracy in the data, some clarity in the image. However, for many purposes you don't need the high resolution image (faux-artistic Insta aside). In fact, your iPhone gives you the option:
Since we know that images are simply matrices of numbers, one way of compressing images could be to remove some of these numbers and hope that the image looks the same. I mean, does every number serve a purpose? If not, then how many can we remove such that our image still looks decent? And is there a systematic way to do this?
This is one application of something called low-rank approximation. Low-rank approximation is used extensively in manipulating large datasets and reducing dimensionality. I've talked in the past about PCA which is a common dimension reduction technique. Low-rank approximation is a more general application.
So let's get to it. Here is a photobomb, in black and white, that we want to compress:
It's about 1.6 MB in size, so pretty small but big enough that it could use some compression. But how? Using low-rank approximation, aka the maths.
First we need to factor our big 1.6 MB matrix into smaller matrices. There are many ways to do this, but the kind that I'll discuss today is called Singular Value Decomposition. SVD is one of these top 10 techniques that anyone who is serious about data needs to know. SVD involves factoring a matrix, call it A into 3 smaller ones, we call U, Σ and V*. Each of these matrices serves a specific and important role, the most important of which is Σ. This is a diagonal matrix that holds our "useful" information - this is what gives the original matrix A its structure. Mathy people call the useful information singular values, denote them by 𝜎, and order them from most useful to least:
The game is then to remove the least useful singular values. We can do this by setting all the singular values from k+1 to n to 0. We can call this new matrix Σk (k is a subscript just denoting that our Σ is slightly different than before). Every singular value we set to 0 is like an extra dimension we "save". Then we just multiply back our matrices UΣkV* and form a new matrix, Ak which has a rank of k, smaller than our original A.
The idea is that removing a few singular values, and hence dimension, probably won't change the clarity of the picture enough for the human eye to detect a difference. Though removing only a few dimensions probably doesn't save us much space either. Formally, we're solving something that looks like ||A - Ak||F < 𝛆. The term on the right hand side, 𝛆, is math-speak for arbitrarily small. Solving this explicitly could get hairy, so we omit details. Instead, we opt for a more elegant approach - eyeball the image and pick a dimension that looks good enough.
Our original, 1.6 MB (or 1600 KB) image is represented by a 2400 X 3300 dimensional matrix (our A matrix). Can we use SVD to remove less important singular values (from Σ), reconstruct the image (multiply back the matrices to form Ak) and still make it such that our photobomb is still be recognizable? You be the judge:
As you can see, decreasing the dimensionality decreases the clarity of the image. But we certainly don't need all 1.6 MB to see that George totally owned Kruger. You may also wonder why the clarity gets so bad even though the size is still relatively large. This is because I'm keeping the dimensions of the image the same. If I shrunk it, then it would look clearer.
So what did we learn:
- Images are stored on your computer in the form of matrices.
- We can manipulate matrices using math, and this in turn manipulates the image.
- One of the ways we can do this is called low-rank approximation. We can use Singular Value Decomposition to find a low-rank approximation to a big matrix.
- We use the SVD to remove useless data from our big matrix, and reconstruct it using less, but only useful, data. Now our image is compressed.
- Photobombs are not a new invention.
One final thing to note: decomposing the original matrix into UΣV* took about 2 minutes on my laptop. That's extremely slow. Imagine doing that operation on 1000's of pictures. Or even a movie. No way would it work (not only would it take a long time, but it probably wouldn't fit into RAM). So are we S.O.L? Of course not! We just have to find more clever ways of doing this stuff. This topic forms the motivation for my research. More to come on that, but I hope you've enjoyed learning some cool applications of Machine Learning and Statistics. Looking forward to see what 2016 holds!
Until then!