Prototyping real-time machine learning algorithms using cutting edge research
Developing applications to visualize and manipulate huge amounts of data
Analysis of new data streams for inclusion in our real-time ad targeting engine
Scaling systems to handle many terabytes of data whilst still maintaining millisecond-level response times
Working to support the following steps in the analytical process with large (a multi-million record) data sets:
Basic data cleansing and preparation
Variable preprocessing/transformation
Performing statistical tests
Generation of graphical output
Preparation of data sets for predictive modeling
Robust predictive model building, validation, and application
Automation of statistical processes
What You Should Bring
Must-have
A minimum of a Bachelor’s degree in a mathematical discipline such as Computer Science, Applied Statistics, Maths, Engineering, or Physics from a respected University. A Ph.D. is a bonus
4+ years experience in Python and good knowledge of R
At least 4 years practical experience of univariate and multivariate statistical analysis in Python or R with large data sets (millions of records and many tens or hundreds of independent variables)
Good experience of variable transformation and data preprocessing techniques to extract maximum predictive power such as binning, piecewise linear regression, non-linear function transforms, etc.
Excellent practical knowledge of multi-variate techniques such as Logistic Regression, Decision Trees, Random Forest, Naive Bayes, Clustering, etc. and a good grasp of the strengths and weaknesses of specific approaches