Using Data Science to Maximise Protein Production banner

The escalating demand for recombinant proteins has spurred the exploration of data science along with engineering strategies for selecting and optimising host cell lines. This encompasses comprehensive verification and sequence analysis of the target gene or protein, along with processes such as codon optimisation, vector construction, and clone/host selection each requiring meticulous consideration of numerous variables. Cambridge Healthtech Institute’s Using Data Science to Maximise Protein Production explores high-throughput expression systems, elucidates data organisation methodologies, outlines data-driven design strategies, and with streamlining the number of experiments, saving time and costs. Learn from seasoned, savvy protein and data scientists who are fostering wider adoption of deep learning models for cell line engineering, protein expression, and production.

Recommended Short Course*
Monday, 4 November, 14:00 – 17:00
SC3: Tools for Cell Line Engineering and Development
*Separate registration required. See short courses page for details. All short courses take place in-person only.

Tuesday, 5 November

07:30Registration and Morning Coffee

LEVERAGING DATA AND MODELS TO KNOW YOUR PROTEIN

08:25

Chairperson's Remarks

Rivka Isaacson, PhD, Professor of Molecular Biophysics, Department of Chemistry, King’s College London

08:30

FEATURED PRESENTATION: Target 2035: The Goal to Develop a Pharmacological Modulator for Every Human Protein

Nicola Burgess-Brown, PhD, COO and Consultant, Protein Sciences, Structural Genomics Consortium; Visiting Scientist, University College London

The SGC, a global public-private partnership, uncovers novel human biology through structural genomics and chemical biology approaches. Target 2035 aims to develop tool molecules for every human protein by creating massive open datasets of high-quality protein-small molecule binding data, using DNA-encoded libraries and affinity selection mass spectrometry platforms. Models built from these data will allow prediction of new and more drug-like small molecule binders, which will be produced, and tested experimentally. By 2030, the target is to find verified hits for thousands of proteins and to enable development of open algorithms for prediction of hits for the entire human proteome.

09:00

Picking the Right Proteins: Model-Derived Physicochemical Properties Can Predict Behaviour of Proteins in Vivo

Christopher Wood, PhD, Lecturer in Biotechnology, School of Biological Sciences, University of Edinburgh

In recent years, there have been huge advances in protein structure prediction methods, which have given us access to vast amounts of highly accurate structural data for previously intractable targets. We have found that properties derived from these models can be used to identify antibody designs that were highly produced in cells, as well as highlighted fundamental differences between natural and designed proteins. Finally, we uncovered systematic variations between the average properties of proteins in a range of organisms to such an extent that, when clustered using these data alone, we partially recovered of the tree of life.

09:30

Product-Specific Solutions: Unlocking the Potential of Synthetic Signal Peptides

Adam J. Brown, PhD, CTO, SynGenSys Ltd.; Associate Professor, Chemical & Biological Engineering, University of Sheffield

Selection and/or design of signal peptide components is complicated by their product-specific functionality. This talk will introduce our signal peptide design platform, which can forward engineer-optimised, synthetic solutions for any new protein of interest. Underpinned by data from a wide range of cellular and molecular contexts, this tool enables precise predictable control of product translocation rates, facilitating significant increases in recombinant protein titers.

10:00

From Screening to Large-Scale Purification: Versatility of Strep-TactinXT Magnetic Beads

Fabian Mohr, CSO, IBA Lifesciences

MagStrep beads address the evolving laborious and time-consuming change challenges in protein purification by enabling a rapid and efficient purification process that also makes automation and scalability feasible. With their high binding capacity and specificity, these beads ensure superior purity and yield. MagStrep Beads are a cutting-edge solution, providing unmatched efficiency, convenience, and performance in protein purification.

10:15 Selected Poster Presentation:

Optimizing a Mammalian Cell-Free Expression System Using Design of Experiments

Maximilian Goertz, Graduate Student, Biology, RWTH Aachen University

Cell-free expression (CFE) platforms with an eukaryotic origin commonly suffer from low protein yield. The performance relies on the function of numerous enzymes involved in the molecular pathways of protein synthesis. We describe the development and optimisation of a Chinese hamster ovary cell (CHO) derived CFE system. The optimised CFE system reaches the yields of commercially available HeLa based lysates and represents a considerable advancement in CHO cell-based lysate systems.

10:30Grand Opening Coffee Break in the Exhibit Hall with Poster Viewing

APPLYING DATA SCIENCE FOR CONSTRUCT DESIGN

11:15

Bioinformatics and AI Approaches in Construct Design towards Soluble (and Crystallisable) Proteins

Christopher Cooper, PhD, Director and Head of Protein Sciences, CHARM Therapeutics

Construct design towards soluble protein fragments for biochemical, biophysical, and structural analyses has been greatly facilitated by algorithms predicting features such as domains, disorder, and secondary elements. The recent advent of AI tools such as AlphaFold2, however, has transformed in silico structural biology. Here we present practical tips for using bioinformatics and AI tools in construct design to help users improve the likelihood of obtaining functional proteins for their needs.

11:45

Biophysical Characterisation of Proteostasis Machinery

Rivka Isaacson, PhD, Professor of Molecular Biophysics, Department of Chemistry, King’s College London

Within the crowded environment of the cell, quality control machinery is vital for correct spatial and temporal protein distribution. I will discuss the optimisation of design and production for a variety of protein constructs that have allowed us to investigate these mechanisms and understand some of their roles in health and disease.

12:15Attend Concurrent Track

12:45Luncheon in the Exhibit Hall with Poster Viewing

APPLYING DATA SCIENCE TO ENHANCE PROTEIN EXPRESSION AND PRODUCTION

13:45

Chairperson's Remarks

Nicola Burgess-Brown, PhD, COO and Consultant, Protein Sciences, Structural Genomics Consortium; Visiting Scientist, University College London

13:50

Using Machine Learning to Predict Recombinant Protein Expression

Bradley Peter, PhD, Senior Research Scientist, Protein, Structure & Biophysics, AstraZeneca R&D

Identification of domain boundaries for optimal expression of proteins is essential for early drug discovery. We have developed and implemented a machine learning model to predict protein expression. The model was coupled to an in silico screening procedure that systematically designs and assesses thousands of constructs in a high-throughput manner. We will share how this is being used within our protein production platforms at AstraZeneca and some of the challenges faced.

14:20

Co-Presentation: A Deep-Learning Approach to Predict Optical Density at 600 Nanometers

Giovanna Scaramuzzino, Software Technical Department Engineer, HSG Engineering

Riccardo Vannacci, Managing Director, Operation, HSG Engineering

Optimising production of recombinant proteins is challenging due to the interaction of multiple process parameters. Expensive and time-consuming multivariable experiments are necessary to study these relationships. In this work, we propose a deep-learning approach using recurrent neural networks to predict real-time optical density at 600 nanometers (OD600nm) values. OD600 is a classical fermentation parameter that reflects bacterial concentration and is crucial for estimating protein recombination production. Our model enables real-time detection of deviations from the canonical trend in fermentation processes. Key points to be presented include data preparation, deep learning model details, experiments, and results.

14:50

Leveraging Heterogenous Datasets for Modelling Recombinant Protein Production

Evgeny Tankhilevich, Scientist, Andrew Leach Group, Chemical Biology Services, EMBL EBI

We have developed a machine learning model to predict recombinant protein expression, using a combination of in-house experiment results and publicly available data sets from SGC. Heterogeneity of these data sets presented a challenge during model development. Using a tailored model architecture and training algorithm has yielded an improvement in Area Under ROC Curve of X%. The model was experimentally validated on a carefully selected set of proteins.

15:20 Engineering mAb-Like Molecules From Leads to Drugs

Claes Gustafsson, Chief Commercial Officer & Co Founder, ATUM

The classic drug development funnel for mAb-like protein therapeutics starts with thousands of binders derived from a discovery engine, and each subsequent developability assay reduces the lead pipeline until only a handful winners (hopefully) are left standing. Instead, ATUM's developability engineering approach relies on utilizing information-rich multidimensional testing of a modest number of lead variants. Systematic design of the variants enable the identification and characterization of causal vs simply correlating sequence-function information. The resulting data not only dictates the 'best' solution in the searched space, but also provides boundaries for developability attributes.

15:50Refreshment Break in the Exhibit Hall with Poster Viewing

16:35

Employing Machine Learning for Cell Culture Optimisation

Bei-Wen Ying, PhD, Associate Professor, Life & Environmental Sciences, University of Tsukuba

Machine learning (ML) is an emerging technology with practical applications in improving cell culture in biotechnology such as protein production. Our research delves into the integration of ML techniques to enhance cell culture, demonstrating that ML can efficiently optimise culture media for bacterial or mammalian cells to increase cell growth and production. These success stories serve as compelling evidence of ML's potential to drive innovation in industry and research.

17:05

Optimizing Expression of Complex Proteins using Shallow and Deep Learning Approaches

Benjamin Fode, eleva GmbH

To predict optimal expression conditions, we adapted a miniaturized screening platform based on transient expression in a microtiter plate format. Using this platform, we screened for optimal combinations of regulatory elements in a design-of-experiments model. Moreover, set-up of a convolutional layer-based neural network for the moss-based production process enabled prediction of protein expression and comparison of such predictions with the transient expression data.

17:35 PANEL DISCUSSION:

Speaking the Same Language: Insights from Protein and Data Scientists

PANEL MODERATOR:

Nicola Burgess-Brown, PhD, COO and Consultant, Protein Sciences, Structural Genomics Consortium; Visiting Scientist, University College London

Data scientists view data in black and white while protein scientists consider the grey.   Hear from both disciplines as they address: 

  • Can we enhance protein production using machine learning? 
  • What are the main challenges?
  • What data to capture, in what format, and for what purpose?
  • How do we simplify data capture to encourage data entry and consistency?
  • How do we reduce the need to curate, “clean up” the data before applying ML?
  • What is enough data for protein production to apply ML algorithms?
  • The importance of including negative data!​
PANELISTS:

Christopher Cooper, PhD, Director and Head of Protein Sciences, CHARM Therapeutics

Peter Schmidt, Director Protein Biochemistry, CSL Research, Melbourne, Australia

Evgeny Tankhilevich, Scientist, Andrew Leach Group, Chemical Biology Services, EMBL EBI

Bei-Wen Ying, PhD, Associate Professor, Life & Environmental Sciences, University of Tsukuba

18:35Welcome Reception in the Exhibit Hall with Poster Viewing

19:35Close of Using Data Science to Maximise Protein Production Conference