Graduate degree in Computer Science, Engineering, related field, or equivalent practical experience
Roles & Responsibilities:
Design and develop business-critical backend systems using stream processors and high-quality data pipelines.
Work in a cross-functional team of Machine Learning engineers and Data scientists to design and code large scale batch and real-time data pipelines on the AWS.
Assemble large, complex data sets that meet functional / non-functional business requirements.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Build a cloud-native, real-time stream processing & data lake platform that scales into the Zettabytes and beyond.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Perform code reviews, and lead by example on code refactoring for readability, extensibility, and testability
Lead your products, with a focus on DevOps and robust automation
Perform root cause analysis on external and internal processes and data to identify opportunities for improvement and answer questions
Build processes that support data transformation, workload management, data structures, dependency and metadata
Develop AutoML infrastructure for model selection and hyperparameter tuning.
Adopt change, always open to replacing what you built yesterday with something better today.
Skill Sets & Experience details:
Experience with big data tools: Hadoop, Spark, Kafka, etc.
Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Experience with AWS cloud services: EC2, EMR, RDS, Redshift
Experience with stream-processing systems: Storm, Spark-Streaming, etc.
Experience with object-oriented/object function scripting languages: Python, Java, Scala, etc.
Experience with microservice architecture, and design.
Experience on machine learning toolkits like spark mllib, H20, scikit-learn, R and ML techniques.
Strong command with machine learning libraries such as PyTorch and Tensorflow and knowledge of common integration patterns for serving inference with them.
Proven track record of building and optimizing data sets, ‘big data’ data pipelines and architectures
Excellent problem solving and analytic skills associated with working on unstructured datasets
Experienced in delivering real data feedback loops and streams for supporting highly available and scalable solutions with large transaction volume on a 24×7 operational cycle
Experience in communicating with users, other technical teams, and senior management to collect requirements, describe software product features and review technical designs
Kindly Submit Your Resume