CONTEXT
At SESAMm , we provide tools for the asset management industry , based on our proprietary Big Data , Artificial Intelligence and Natural Language Processing technologies. We analyze a huge amount of unstructured textual data extracted from millions of news articles, blogs, forums and social networks in real time. We use this alternative data in combination with standard market data to provide innovative analytics on thousands of financial products across all asset classes, and to develop custom investment strategies using our internal machine learning and statistical expertise. With more than EUR 8M raised since its creation in 2014, major clients across the world, numerous awards won and an exponential team growth, we are expanding quickly in Western Europe, Americas and Asia.
Join SESAMm, an innovative and fast-growing FinTech company!
JOB DESCRIPTION
Overarching goal: you will build and scale data components to key SESAMm products, such as raw data ingestion pipeline, job scheduling and ETL design/optimization, optimize the migration the Product Data Platform toward cloud or on-premise solutions, and setup the best data development practices for other tech members.
Communicate the work of your team with weekly updates.
Key activities:
Design and implement best data pipeline for our Text-based products (ingestion, processing, exposition) :
Test and design state-of-the-art data ingestion pipelines
Implement efficient streaming services
Lead the acquisition of new data sources
For each new data source, describe its feasibility and potential
Integrate the new data into the datalake
Develop data request tooling for Data Scientists and Technical teams
Ease the new data request engine
Optimize current queries
Implement and maintain critical data systems
Process and integrate data in new databases or datalake
Ensure maintainability and create update systems
Used technologies: Spark, AWS-EMR, Kafka, SQL, MongoDB