Site Reliability Engineer (SRE)

Geospark Analytics is looking for a SRE to manage our operations during off hours. The ideal candidate will have interest in keeping a high performant web based applications running and assisting in controlling cost in a 24x7 environment. Geospark Analytics is fast-growing startup developing a global threat and risk assessment platform that is being used by government, commercial and NGO clients. This is the perfect opportunity for the successful candidate to become a part of an innovative and dynamic team developing a revolutionary threat platform. We are looking for a team player who is ready to take on a challenging and rewarding opportunity. As the SRE you will work directly with our CTO and Development/Product team in delivering a high quality, reliable, web and mobile experience as well as ensuring our backend data processing process are running smoothly. We are looking for a candidate that can augment our USA based development team in system monitoring and off hours upgrades. Candidates from any locations are considered but will be expected to preform core duties during non-US normal working hours.

Responsibilities

  • Monitor System during off Hours for a USA based Team
  • Manage Operation of our Hyperion (SaaS) platform in AWS and ensure 24x7 monitoring
  • Collaborate with the Development team during deployments and providing ongoing maintenance during non peak hours
  • Create plan for escalating system issues to the relevant parties for remediations
  • Work in an Agile software environment delivering iterative solutions
  • Ensure solutions built by Development team are sustainable, scaleable and cost effective
  • Clearly and concisely communicate work status, methods, instructions, problems, requirements, options, and concerns with team members, managers, and customers through various means
  • Conduct Security Scans and Assessments of the platform on an ongoing basis

Ideal Qualifications:

  • Possess a Bachelor’s degree or higher in Computer Science or related discipline
  • At least 5 years of related experience in operations environment
  • Possess AWS SysOps/DevOps Certification
  • Experience in supporting cloud based Web Applications, Vue / Node / Python / Elasticsearch / Lambda
  • Experience working in an Agile environment and support activities with a Kanban environment
  • Experience in building/maintaining DevOps pipelines for automated testing and deployments
  • Experience using AWS and managing Serverless Frameworks
  • Experience managing Elasticsearch and tuning multiple Domains
  • Manage and report end to end monitoring of systems
  • Strong problem-solving skills with an emphasis on product delivery
  • Excellent written and verbal communication skills for coordinating across teams.
  • Strong interpersonal skills and the ability to work as part of a team
  • Act as an escalation manager to ensure problems are resolved quickly
  • Ability to manage on-call support across a team rotation schedule in a 24/7 environment
  • Ability to clearly and concisely communicate work status, methods, instructions, problems, requirements, options, and concerns with team members, managers, and customers
  • Other desirable technological experience: Elasticsearch, Docker, Machine Learning, NLP
  • Strong ability to plan, lead, and implement the successful completion of major project initiatives and efforts, as required.