Bioinformatics Data Engineer
GenBio AI
نشرت في 16 مارس
أرسل لي وظائف مثل هذه
الخبرة
1 - 7 سنوات
موقع العمل
التعليم
بكالوريوس في العلوم(أجهزة الكمبيوتر)
الجنسية
أي جنسية
جنس
غير مذكور
عدد الشواغر
1 عدد الشواغر
الوصف الوظيفي
الأدوار والمسؤوليات
As our data ingestion needs grow, we are looking for a Bioinformatics Data Engineer to act as the crucial bridge between raw biological data and our scalable infrastructure. Reporting to the Data Engineering Lead, you will leverage your deep biological domain expertise to build the initial scripts and processing logic for complex datasets, ensuring they are primed for large-scale foundation model training.
- Source & Acquire Biological Data - Identify, evaluate, and obtain high-quality bioinformatics datasets from public and partner sources (e.g., NCBI, PubChem, ENCODE,UniProt etc.) to support research and model development initiatives.
- Deeply Understand Complex Datasets - Develop a comprehensive understanding of biological datasets, including data structures, schemas, metadata standards, entity relationships, and underlying biological context to ensure accurate interpretation and usage.
- Design & Implement Data Processing Pipelines - Develop robust preprocessing scripts and scalable data transformation workflows using Python, R, and relevant Tools. Leverage AI-assisted tools where appropriate to process, clean, normalize, and integrate complex biological data for foundation model training.
- Structure & Standardize Biological Data - Organize heterogeneous datasets into well-defined, interoperable formats aligned with internal infrastructure requirements and downstream AI training pipelines.
- Bioinformatics Data Analysis - Perform exploratory and statistical analysis of genomic, transcriptomic, proteomic, and other multi-omics datasets to assess data quality, uncover biological patterns, and generate insights that inform model development. Apply appropriate computational and statistical methods to validate assumptions and support downstream AI training and evaluation.
- Build Data Products - Create production-ready data assets, including standardized datasets, curated releases, dashboards, analytical reports, and technical documentation to enable efficient research and model evaluation.
- Ensure Data Quality & FAIR Compliance - Curate, annotate, validate, and standardize public and partner datasets in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles, ensuring long-term usability and reproducibility.
- Collaborate Cross-Functionally - Partner closely with research scientists and ML engineers to translate biological research needs into scalable data engineering solutions that support AI model training and evaluation.
- Knowledge Sharing & Documentation - Contribute domain expertise by documenting data methodologies, maintaining clear technical documentation, and sharing biological data insights across teams.
الملف الشخصي المطلوب للمرشحين
- Educational Background: Bachelor s or Master s degree in Bioinformatics, Computational Biology, Computer Science, or a related field with a heavy focus on the life sciences.
- Biological Data: Deep, hands-on familiarity with multiple biomedical data modalities (e.g., genomics, transcriptomics, spatial omics, protein structure, biomedical imaging, clinical/phenotypic data, etc.).
- Biological Tools: Familiar with Bioconda,Biopython,Bioconductor, samtools,bamtools,bcftools,gffutils etc.
- Scripting & Tooling: Strong programming skills in Python (Pandas, NumPy) and proficiency with standard bioinformatics workflow managers and tools (e.g., Ray, Kubeflow).
- Engineering Handoff: Experience writing clean, modular code that can be easily picked up by core data engineers for optimization in cloud environments (AWS/GCP/HF) and containerized setups (Docker).
- AI/ML Awareness: A solid understanding of machine learning workflows and how biological data must be formatted and batched for deep learning frameworks (e.g., PyTorch).
القطاع المهني للشركة
- تكنولوجيا المعلومات - خدمات البرمجيات
المجال الوظيفي / القسم
- سوفت وير تقنية المعلومات
الكلمات الرئيسية
- Bioinformatics Data Engineer
تنويه: نوكري غلف هو مجرد منصة لجمع الباحثين عن عمل وأصحاب العمل معا. وينصح المتقدمون بالبحث في حسن نية صاحب العمل المحتمل بشكل مستقل. نحن لا نؤيد أي طلبات لدفع الأموال وننصح بشدة ضد تبادل المعلومات الشخصية أو المصرفية ذات الصلة. نوصي أيضا زيارة نصائح أمنية للمزيد من المعلومات. إذا كنت تشك في أي احتيال أو سوء تصرف ، راسلنا عبر البريد الإلكتروني abuse@naukrigulf.com
GenBio AI
GenBio AI develops multiscale foundation models to decode and simulate human biology. Our team is accelerating towards an ambitious future where scientists can unlock humanity's biggest challenges in drug discovery, healthcare, and fundamental research with AIDO (AI-Driven Digital Organism): a unified framework for predicting, simulating, and programming biology across all scales. The foundation of this vision begins today as we engineer the virtual cell to model and simulate the fundamental unit of life. This vision has brought together a talent-dense group of product-minded researchers and engineers dedicated to bringing it to reality. Our team prides itself on our strong engineering culture and highly interdisciplinary and collaborative approach. We are based in Palo Alto, with satellite offices in Paris and Abu Dhabi.
https://jobs.lever.co/genbio/ba9d68d4-b928-42e9-a442-b5c03f0c95b6
وظائف مماثلة
Data Engineer
DUBAI PROPERTIES GROUP LLC
- 3 - 6 سنوات
- Dubai - United Arab Emirates (UAE)
مهندس بيانات & مهندس بيانات أول (فوري إلى 30 يوم إشعار) مصرفي
Sphere IT Consultants DWC LLC
- 5 - 10 سنوات
- دبي - الإمارات العربية المتحدة
مهندس البيانات
Dicetek LLC
- 5 - 10 سنوات
- أبو ظبي , دبي - دولة الإمارات العربية المتحدة
مطور الذكاء الاصطناعي مع بايثون
Dicetek LLC
- 3 - 5 سنوات
- الشارقة - الإمارات العربية المتحدة
Data Analyst
Al Futtaim Private Company (LLC)
- 2 - 3 سنوات
- Dubai - United Arab Emirates (UAE)