Responsibilities
2-4 years of experience developing Data engineering, and ad-hoc transformation of unstructured raw data
Use of orchestration tools
Design, build, and maintain workflows/pipelines to process a continuous stream of data with experience in end-to-end design and build process of Near-Real-Time and Batch Data Pipelines.
Expected to work closely with other data engineers and business intelligence engineers across teams to create data integrations and ETL pipelines to drive projects from initial concept to production deployment
Maintaining and supporting incoming data feed into the data pipeline from multiple sources, including external customer feeds in CSV or XML file format to Publisher/Subscriber model automatic feeds.
Knowledge of database structures, theories, principles and practices (both SQL and NoSQL).
Active development of ETL processes using Python, PySpark, Spark or other highly parallel technologies, and implementing ETL/data pipelines
Experience with Data Engineering
- Experience with Data Engineering technologies and tools such as Spark, Kafka, Hive, Ookla, NiFi, Impala, SQL, NoSQL etc
- Understanding of Map Reduce and other Data Query Processing and Aggregation models
- Understanding of challenges of transforming data across distributed clustered environment
- Experience with techniques for consuming, holding and aging out continuous data streams
Improvement and Automation
Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers
Ability to provide quick ingestion tools and corresponding access API’s for continuously changing data schema, working closely with Data Engineers around specific transformation and access needs
Qualifications
1-2 years experience developing applications with Relational Databases, preferably with experience in SQLServer and/or MySQL.
Some exposure to database optimization techniques for speed, complexity, normalization etc
Ability to have effective working relationships with all functional units of the organization
Excellent written, verbal and presentation skills
Excellent interpersonal skills
Ability to work as part of a cross-cultural team
Self-starter and Self-motivated
Ability to work without lots of supervision
Works under pressure and is able to manage competing priorities.
Development Skills
- 3-7 years in development using Java, Python, PySpark, Spark, Scala, and object-oriented approaches in designing, coding, testing, and debugging programs
- Ability to create simple scripts and tools, using Linux, Perl, Bash
- Development of cloud based, distributed applications
- Understanding of clustering and cloud orchestration tools
- Working knowledge of database standards and end user applications
- Working knowledge of data backup, recovery, security, integrity and SQL
Technical Knowledge
- Familiarity with database design, documentation and coding
- Previous experience with DBA case tools (frontend/backend) and third party tools
- Understanding of distributed file systems, and their optimal use in the commercial cloud (HDFS, S3, Google File System, Databricks)
- Familiarity with programming languages API
- Problem solving skills and ability to think algorithmically
- Working Knowledge on RDBMS/ORDBMS like MariaDb, Oracle and PostgreSQL