Site Reliability Engineer in Alpharetta, GA at HUNTER Technical Resources

Date Posted: 11/25/2019

Job Snapshot

  • Employee Type:
    Contractor
  • Job Type:
  • Experience:
    Not Specified
  • Date Posted:
    11/25/2019
  • Job ID:
    4738641

Job Description


Site Reliability Engineer

JOB DESCRIPTION:
  • Engage in and improve the whole lifecycle of software development services— from inception and design, through deployment, operation, and refinement. 
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. 
  • Work closely with development and operations teams to build highly available, cost effective systems with extremely high  uptime metrics. 
  • Work with teams across organization and ensures core services reliability and keep an eye on capacity and performance. 
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health in a 24x7 environment. 
  • Participate in 24x7X365 an on-call support for multiple core platforms globally. Using a “ Follow the Sun” model, we expect working patterns will include on call duty, weekend and holiday season cover. 
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. 
  • Practice sustainable incident response and blameless postmortems. 
  • Influence and create new designs, architecture, standards, and methods for large-scale systems. 
  • Binding and orchestrating the system infrastructure with the application layer to enable High Availability/Clustering load balancing and integration; 
  • Provide technical guidance or support for the development or troubleshooting of systems; 
  • Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLOs, SLIs, and SLAs and get proactive notifications of possible issues for all systems; 
  • Develop automated solutions to address potential problems before they result in a service interruption and demonstrate a passion for automation, including CI/CD automation; 
  • Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria. 

Qualifications: 
  • Bachelors of Science degree in Computer Science, Engineering, or equivalent relevant experience. 
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems. 
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive; 
  • Ability to debug and optimize code and automate routine tasks; 
  • Overall 6+ years of experience in one or more of the following:
    • Experience in building JavaEE applications using, build tools like Maven/ANT, Subversion, JIRA Jenkins, Bitbucket and Chef; 
    • Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus); 
    • You' ve created automation using Chef, Puppet or another SCM tool; Docker and container scheduler services such as ECS or Kubernetes is desirable; 
    • You' ve worked with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper; 
    • Experience as SCM/release engineer, or in a position with similar skill sets and responsibilities (Software Engineer, Systems Engineer, Systems Administrator); 
    • Experience in performing source code control management Subversion/GIT including branching, merging, tagging, etc.; 
    • Experience in configuring and administering JavaEE application servers (Tomcat, WebSphere, WebLogic, etc.); 
    • Experience in with scripting language such as Unix Shells, Python, Perl, Shell, bash, ksh); 
    • Experience in configuring, building, and supporting apps and operations in a public cloud environment (AWS, Azure, GCP); 
    • Experience with Monitoring and Logging tools (Elastic Search, ELK, AppDynamics, Splunk, etc.); 
    • Collaborate well with team members, developers, QA, and ownership teams to resolve issues; 
    • Knowledge of Agile / Scrum methodologies and principles; 
    • Possess excellent written and verbal communication skills with the ability to communicate with team members at various levels, including business leaders; 
    • A real passion for and the ability to learn new technologies.