أرسل لي وظائف مثل هذه
الجنسية
أي جنسية
جنس
غير مذكور
عدد الشواغر
1 عدد الشواغر
الوصف الوظيفي
الأدوار والمسؤوليات
Reliability & Incident Management
- Lead high-severity incident response and drive post-incident reviews.
- Troubleshoot complex issues across applications, infrastructure, and networks.
- Improve MTTR through better monitoring, alerts, and diagnostic tooling.
- Participate in the on-call rotation supporting production systems.
Performance & Scalability br>
- Identify and resolve performance bottlenecks and scaling challenges.
- Conduct load testing and capacity planning for high-traffic scenarios.
Infrastructure & Operations br>
- Enhance cloud-native infrastructure, deployment processes, and automation.
- Improve resilience, fault-tolerance, and recovery mechanisms across systems.
Observability br>
- Build and refine dashboards, alerts, metrics, logs, and traces.
- Define SLIs/SLOs and improve visibility into system behavior.
Tooling & Automation br>
- Develop tools that reduce operational toil and increase reliability.
- Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.
Collaboration br>
- Work closely with engineering teams to ensure services are robust and production-ready.
- Mentor engineers on reliability, debugging, and operational best practices.
Required Skills br>
- Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS/GCP/Azure).
- Deep understanding of Linux, networking, distributed systems, and load balancers.
- Hands-on with Terraform or similar IaC tools.
- Experience with Prometheus, Grafana, Loki, Mimir, Elastic, or similar observability tools.
- Proficiency in scripting/programming (Bash, Python, Go).
- Experience with CI/CD and GitOps.
- Strong debugging, incident response, and performance analysis skills.
Bonus Skills br>
- Background in large-scale, high-traffic systems.
- Experience with fault-tolerant design, DR, and HA patterns.
- Familiarity with SLOs, SLIs, and error budgets.
الملف الشخصي المطلوب للمرشحين
Required Skills br
- Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS/GCP/Azure).
- Deep understanding of Linux, networking, distributed systems, and load balancers.
- Hands-on with Terraform or similar IaC tools.
- Experience with Prometheus, Grafana, Loki, Mimir, Elastic, or similar observability tools.
- Proficiency in scripting/programming (Bash, Python, Go).
- Experience with CI/CD and GitOps.
- Strong debugging, incident response, and performance analysis skills.
Bonus Skills br
- Background in large-scale, high-traffic systems.
- Experience with fault-tolerant design, DR, and HA patterns.
- Familiarity with SLOs, SLIs, and error budgets.
القطاع المهني للشركة
- إنترنت
- التجارة الإلكترونية
- دوتكوم
المجال الوظيفي / القسم
- سوفت وير تقنية المعلومات
الكلمات الرئيسية
- Senior Site Reliability Engineer (SRE)
تنويه: نوكري غلف هو مجرد منصة لجمع الباحثين عن عمل وأصحاب العمل معا. وينصح المتقدمون بالبحث في حسن نية صاحب العمل المحتمل بشكل مستقل. نحن لا نؤيد أي طلبات لدفع الأموال وننصح بشدة ضد تبادل المعلومات الشخصية أو المصرفية ذات الصلة. نوصي أيضا زيارة نصائح أمنية للمزيد من المعلومات. إذا كنت تشك في أي احتيال أو سوء تصرف ، راسلنا عبر البريد الإلكتروني abuse@naukrigulf.com