Data Challenge
You are given a 300M dataset (~3,000,000 rows) which includes GPS informations about the passenger’s desired pick-up and drop-off locations and their actual pick-up and drop-off locations. The goal is to find four spots that will minimize some expected loss.
The entire problem description is a-page-long. But essentially the problem can be casted into a problem similar to K-means clustering where the instead of using quadratic loss, a provided loss function is provided. I have no idea why I failed the data challenge. Maybe I wrote my solutions in Matlab? Who knows.