● Investigate new technologies and match them to proposed new ideas. Apply it in quick prototypes.
● Work closely with researchers, data scientists, data architects, ETL developers to identify, capture, collect, and format data from the external sources, internal systems, and the data warehouse to extract features of interest.
● Deploy trained models and monitor their performance.
● Wrap models into a scaled backend and provide API for external use.
● Professional data engineering experience focused on batch and real-time data pipelines using Spark, PySpark, Python (Scala is a plus).
● Strong hands-on working experience of Big Data stack including Spark and Python(Pandas is a plus).
● Awareness of DevOps.
● Hands-on design and development experience in data space: data processing/data transformation using ETL tools, a data warehouse (data modeling, programming).
● Exposure to Google Cloud and Amazon Web Services.
● Experience working in a high load / high performance distributed project.
● Experience both with async code and with multithreading/multiprocessing.
● Hands-on experience with SQL (e.g. PostgreSQL) and NoSQL databases (MongoDB, Redis, Elasticsearch, etc.)
● An open-minded person who is willing to work in a dynamic R&D environment.
● Experience with other languages as GoLang/Scala, etc.
● Experience in performance analysis, load testing, and modules/system optimization.
● Experience with deployment to the cloud
● Experience to work with a couple of ideas in parallel.
● Previous experience in R&D projects.