أرسل لي وظائف مثل هذه
الخبرة
3 - 8 سنوات
موقع العمل
التعليم
بكالوريوس في التكنولوجيا/ الهندسة(أجهزة الكمبيوتر)
الجنسية
أي جنسية
جنس
غير مذكور
عدد الشواغر
1 عدد الشواغر
الوصف الوظيفي
الأدوار والمسؤوليات
You ll be responsible for outcomes, not just tasks. Here s what success looks like in this role:
You ll make reliability the default
- You ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable
- You ll proactively identify and eliminate single points of failure before they become incidents
- You ll ensure our production systems remain stable, even under increasing scale and load
You ll own and optimize our cloud environments
- You ll manage and continuously improve workloads across AWS, GCP, or Azure
- You ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure
- You ll optimize resource usage to balance performance and cost
You ll run and improve Kubernetes in production
- You ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence
- You ll troubleshoot issues quickly and ensure smooth deployments and upgrades
- You ll ensure our containerized workloads perform reliably at scale
You ll build strong observability and respond to incidents
- You ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK
- You ll define alerting that is meaningful, not noisy
- You ll respond to incidents, lead root cause analysis, and ensure we learn from every failure
You ll automate everything that shouldn t be manual
- You ll write scripts and build tooling to eliminate repetitive operational work
- You ll continuously improve infrastructure efficiency through automation
- You ll promote a culture where manual work is a temporary state, not the norm
You ll collaborate to improve the entire system
- You ll work closely with DevOps and engineering teams to solve performance bottlenecks
- You ll contribute to CI/CD improvements and deployment reliability
- You ll help shape reliability best practices across the organization
What success looks like (First 90 Days)
First 30 days:
- You ve built a strong understanding of our infrastructure, systems, and workflows
- You re contributing to day-to-day operations with support from the team
- You ve started identifying areas for improvement in automation and reliability
By 90 days:
- You re independently managing infrastructure tasks and troubleshooting issues
- You re actively contributing to reliability and scalability improvements
- You ve taken ownership of parts of our infrastructure and are improving them
الملف الشخصي المطلوب للمرشحين
This is what will make you successful in this role:
- You ve spent ~3 years working in SRE, DevOps, or infrastructure engineering, and you ve seen what breaks at scale
- You re comfortable working in cloud environments like AWS, GCP, or Azure and you understand how distributed systems behave
- You ve worked hands-on with Kubernetes in production and know how to troubleshoot it when things go wrong
- You don t just fix issues - you ask why they happened and make sure they don t happen again
Technically, you likely:
- Use Terraform (or similar IaC tools) to manage infrastructure
- Work confidently with Docker and Kubernetes
- Write scripts in Python, Bash, or similar to automate workflows
- Understand CI/CD pipelines (Jenkins, GitHub Actions, Bitbucket, etc.)
- Have a solid grasp of networking, load balancing, and high-availability design
When it comes to monitoring:
- You ve implemented tools like Prometheus, Grafana, Datadog, or ELK
- You know the difference between useful alerts and noise
- You focus on signals that actually drive action
What sets you apart:
- You take ownership - you don t wait to be told something is broken
- You re calm under pressure and methodical during incidents
- You simplify complexity instead of adding to it
- You communicate clearly, even when explaining deeply technical issues
- You care about building systems that make other engineers more effective
Nice to Have (but not required)
- Experience with RabbitMQ or Redis in production
- Familiarity with Ansible or AWX
- Exposure to multi-cloud or hybrid environments
- Cloud certifications (AWS, GCP) or Linux certifications
- Background from ITI (Information Technology Institute)
القطاع المهني للشركة
- تكنولوجيا المعلومات - خدمات البرمجيات
المجال الوظيفي / القسم
- سوفت وير تقنية المعلومات
الكلمات الرئيسية
- Site Reliability Engineer
تنويه: نوكري غلف هو مجرد منصة لجمع الباحثين عن عمل وأصحاب العمل معا. وينصح المتقدمون بالبحث في حسن نية صاحب العمل المحتمل بشكل مستقل. نحن لا نؤيد أي طلبات لدفع الأموال وننصح بشدة ضد تبادل المعلومات الشخصية أو المصرفية ذات الصلة. نوصي أيضا زيارة نصائح أمنية للمزيد من المعلومات. إذا كنت تشك في أي احتيال أو سوء تصرف ، راسلنا عبر البريد الإلكتروني abuse@naukrigulf.com
Lucidya
Lucidya is an AI-native platform for customer experience (CX) intelligence that manages entire customer lifecycles autonomously, from initial engagement through retention and growth. Unlike platforms that only surface insights and leave the action to you, Lucidya closes the loop with proprietary NLU technology built in-house and trained on millions of multilingual conversations. This enables marketing, support, CX, and research teams to deliver personalized experiences that drive measurable improvements in customer satisfaction, retention, and lifetime value. As we continue scaling globally, the reliability, performance, and resilience of our infrastructure become mission-critical to everything we do.