Responsibilities

2-4 years of experience developing Data engineering, and ad-hoc transformation of unstructured raw data

Use of orchestration tools

Design, build, and maintain workflows/pipelines to process a continuous stream of data with experience in end-to-end design and build process of Near-Real-Time and Batch Data Pipelines.

Expected to work closely with other data engineers and business intelligence engineers across teams to create data integrations and ETL pipelines to drive projects from initial concept to production deployment

Maintaining and supporting incoming data feed into the data pipeline from multiple sources, including external customer feeds in CSV or XML file format to Publisher/Subscriber model automatic feeds.

Knowledge of database structures, theories, principles and practices (both SQL and NoSQL).

Active development of ETL processes using Python, PySpark, Spark or other highly parallel technologies, and implementing ETL/data pipelines

Experience with Data Engineering

Experience with Data Engineering technologies and tools such as Spark, Kafka, Hive, Ookla, NiFi, Impala, SQL, NoSQL etc
Understanding of Map Reduce and other Data Query Processing and Aggregation models
Understanding of challenges of transforming data across distributed clustered environment
Experience with techniques for consuming, holding and aging out continuous data streams

Improvement and Automation

Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers

Ability to provide quick ingestion tools and corresponding access API’s for continuously changing data schema, working closely with Data Engineers around specific transformation and access needs

Qualifications

1-2 years experience developing applications with Relational Databases, preferably with experience in SQLServer and/or MySQL.

Some exposure to database optimization techniques for speed, complexity, normalization etc

Ability to have effective working relationships with all functional units of the organization

Excellent written, verbal and presentation skills

Excellent interpersonal skills

Ability to work as part of a cross-cultural team

Self-starter and Self-motivated

Ability to work without lots of supervision

Works under pressure and is able to manage competing priorities.

Development Skills

3-7 years in development using Java, Python, PySpark, Spark, Scala, and object-oriented approaches in designing, coding, testing, and debugging programs
Ability to create simple scripts and tools, using Linux, Perl, Bash
Development of cloud based, distributed applications
Understanding of clustering and cloud orchestration tools
Working knowledge of database standards and end user applications
Working knowledge of data backup, recovery, security, integrity and SQL

Technical Knowledge

Familiarity with database design, documentation and coding
Previous experience with DBA case tools (frontend/backend) and third party tools
Understanding of distributed file systems, and their optimal use in the commercial cloud (HDFS, S3, Google File System, Databricks)
Familiarity with programming languages API
Problem solving skills and ability to think algorithmically
Working Knowledge on RDBMS/ORDBMS like MariaDb, Oracle and PostgreSQL

Python Developer

Responsibilities

Experience with Data Engineering

Improvement and Automation

Qualifications

Development Skills

Technical Knowledge

Expertise level

Work arrangement

Key skills

Similar Jobs in United States

AWS Engineer with Python

Software Engineer

Software Engineer with Kubernetes, Python/Golang - Hybrid, San Jose, CA/RTP, NC

Summer Teaching Assistant - Introduction to Python, Oakland

Python Full Stack Engineer