Data Engineer in Atlanta, GA at HUNTER Technical Resources

Date Posted: 2/25/2020

Job Snapshot

Job Description

Primary Responsibilities:
  • Develop and maintain scalable data pipelines, with a focus on writing clean, fault-tolerant code
  • Maintain various data stores and distributed systems, such as Spark, Hive, and Presto
  • Develop entity resolution and linking, utilizing blocking techniques such as attribute clustering and and Q-gram blocking, as well as and string matching algorithms such as TF-IDF and Levenshtein
  • Optimize data structures for efficient querying of those systems
  • Perform data cleansing, such as garbage character cleanup and de-duplication
  • Collaborate with internal and external data sources to ensure integrations are accurate, scalable and maintainable
  • Collaborate with data science team on implementing machine learning algorithms to facilitate audience intelligence and cross-brand personalization initiatives
  • Collaborate with business intelligence/analytics teams on data mart optimizations, query tuning and database designs
  • Execute proof of concepts to assess strategic opportunities and future data extraction and integration capabilities
  • Define data models publish metadata, and best practice querying standards

Required Skills
  • 3+ years data engineering and 5+ years of software development experience
  • Fluency with python and pyspark
  • Fluency in SQL with Hadoop and related technologies (Hive, Presto, Spark)
  • Exceptional analytical, quantitative, problem-solving, and critical thinking skills
  • Have a collaborative work style with strong desire to work in dynamic, fast paced environment that requires flexibility and ability to manage multiple priorities

Desirable Skills:
  • Experience with AWS tools, i.e. especially Glue, EMR, S3, Lambda, Kinesis, Kafka
  • Experience with GCP tools, i.e. BigQuery, DataFlow
  • Scala programming experience