Eigenstyle

[Cross-posted at The Hackerati.]

Principal Component Analysis and Fashion

Any set of images can be broken down with Principal Component Analysis. This has been done pretty successfully with faces. Here we’ll take a look at style.

Our dataset is 807 pictures of dresses from Amazon. They have a standard image size, but unfortunately do not have a standard model pose (though they tend to be centered in the image similarly). Ideally, our principal components would only be about actual dress style, but here many of them will be concerned with model pose. Despite this, we can still do a lot with this data set.

imageimage

This eigendress is the first principal component, which accounts for the most variation among all the dresses. Broadly, it’s looking at light-colored dresses vs dark-colored dresses.

imageimage

The second component seems to look at short dresses vs long dresses.

imageimage

Reddish colors vs blueish colors.

imageimage

Short hair and sleveless vs long hair and long sleeves.

imageimage

Posing with legs close together vs posing with legs farther apart.

And so on.

Using components to recreate images

With a bunch of components like these, we can reduce an image from, eg 60,000 points of data (pixel values) to just a handful of numbers.

Let’s recreate this dress from its components.

image

The following pictures are created from one, two, four, nine, ten, fifteen, thirty, forty, and seventy components (respectively).

Dress from one componentDress from two componentsDress from four componentsDress from nine componentsDress from ten componentsDress from fifteen componentsDress from thirty componentsDress from forty componentsDress from seventy components

As you can see, the more components we have, the more accurate and detailed the dress recreation will be.

The data for the middle dress above now looks like this: [-17541.81, -12749.33, -3766.29, 2005.28, 4193.08, 6832.55, -6704.90, -2135.51, 1112.27, 7627.80].

So, if you have a million pictures you need to store, you can save a whole lot of space by saving just the component values instead of the values of every pixel of every dress.

It even works for dresses that were not in the training set:

imageimage

imageimage

Though it works less well for patterns we haven’t seen before:

imageimage

imageimage

And can’t recreate accessories that were not present in the training set (notice the sunglasses and handbag disappear):

imageimage

And even though the training set only contained dresses, the data is decent at recreating different types of clothing, such as suits and overalls:

imageimage

imageimage

Using components for prediction

I’ve also manually categorized the pictures as dresses I like (287 pictures) and ones I dislike (520 pictures).

Now we can use logistic regression on component data to predict whether or not I’ll like a dress.

Sorting all the dresses by score, it can show the prettiest and ugliest dresses of the whole set.

The prettiest dresses:

Prettiest 0Prettiest 1Prettiest 2

The ugliest dresses:

Ugliest 0Ugliest 1Ugliest 2

Seems pretty spot on! I could now set up something to watch as new dresses are posted on Amazon, and to alert me to dresses it thinks I will really like.

The misclassifications are interesting too. Here are the three “ugliest pretty dresses”, those that I classified as my style, that the program predicted I should really dislike:

Ugliest 0Ugliest 1Ugliest 2

It seems to be about that specific shade of blue.

And here are “prettiest ugly dresses”, those that I classified as dislikes, that the program predicted I would really like:

Ugliest 0Ugliest 1Ugliest 2

These aren’t that bad. I do kinda like them, but think they’d be nicer with some minor adjustments (slightly less form-fitting, slightly less loud pattern, slightly brighter color).

Creating new dresses

For creating pictures, there’s no reason we need to confine ourselves to already known dress component values. We can also choose random values for each component, and see what happens!

Random DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom DressRandom Dress

Completely new dresses! With more data and better data, this could actually be a viable dress design tool!

Want to play? Code on Github

If you’d like to see more about neat applications of PCA, check out Joel Grus’s post!

[Edit 8/17/15: Additional material added based on the suggestions of hackernews commenters leni536 and gwern. Thanks!]