SRE
Keyway
About Keyway: Keyway is a Series A stage PropTech company revolutionizing the real estate industry through digital solutions. By leveraging cutting-edge artificial intelligence and machine learning, we've developed Keypilot, the first AI-powered real estate copilot designed to optimize the entirety of a real estate transaction. Our leadership includes a serial tech entrepreneur with a history of successful ventures and substantial venture capital backing from industry-leading investors.
Role Overview: As a Site Reliability Engineer (SRE), you will play a crucial role in ensuring the availability, reliability, and performance of Keyway's services. This position requires understanding of system dynamics, a passion for automation, and a drive for continuous improvement in a high-stakes environment.
Responsibilities:
- System Reliability & Availability: Monitor, maintain, and enhance the reliability and performance of Keyway's critical services.
- Automation & Efficiency: Design and implement automation tools to minimize repetitive tasks and manual errors, thereby improving operational efficiency.
- Incident Management & Problem-Solving: response to critical incidents, ensuring swift resolution and conducting thorough postmortem analyses to prevent future occurrences.
- Monitoring & Alerts: Develop and maintain effective monitoring systems, setting alert thresholds to minimize false alarms and improve incident response.
- Resource Optimization: Analyze resource usage and optimize infrastructure to improve efficiency and reduce costs, ensuring proper utilization of cloud and on-premise tools.
- Security & Compliance: Work closely within engineering teams to ensure infrastructure compliance with security policies and regulations (e.g., SOC2, GDPR).
- Infrastructure as Code (IaC): Manage infrastructure using IaC tools like Terraform, promoting consistent practices.
- Interdisciplinary Collaboration: Collaborate closely with development, operations, and security teams to integrate SRE best practices into the software development lifecycle (SDLC).
- Mentorship & Team Development: Mentor junior team members, fostering a culture of continuous learning and knowledge sharing.
- Capacity Planning & Scalability Strategies: Anticipate demand growth and plan necessary infrastructure capacity to ensure efficient and uninterrupted scaling.
About You:
- Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- Experience: Minimum of 5 years in a site reliability engineering role or similar.
- Technical Proficiency: Strong command of infrastructure as code (IaC) tools (e.g., Terraform), automation tools, cloud services (AWS, Azure, Google Cloud), and robust monitoring solutions.
- Skills: Proficient in programming languages relevant to automation and infrastructure management (e.g., Python, Ruby, Bash).
- Certifications: Certifications in cloud architecture or security (AWS Certified, Microsoft Azure Certified) are highly desirable.
- Communication: Strong communication skills and proficiency in English.
Keyway’s Commitment to Diversity: At Keyway, we celebrate diversity and recognize the value it brings to our customers and employees.