| Title : Mobilizing the Biocatalysis Community for Reproducible and Reusable Data Collection - Marques_2026_ACS.Catalysis_16_8858 |
| Author(s) : Marques SM , Planas-Iglesias J , Velecky J , Musil M , Asano Y , Borowski T , Brissos V , Cespugli M , Chorozian K , Dadashipour M , Erdem E , Ferrandi EE , Grigorakis K , Kluza A , Lawniczek J , Makryniotis K , Monti D , Nestl B , Ngo AC , Nikolaivits E , Patti S , Pentari C , Rodrigues CF , Schopper T , Seweryn-Ozog K , Szaleniec M , Taborda A , Tataruch M , Tischler D , Topakas E , Wang J , Wojcik P , Wojtkiewicz AM , Woodley JM , Zastawny O , Martins LO , Fraaije M , Pleiss J , Schnell S , Damborsky J , Mazurenko S , Bednar D |
| Ref : ACS Catalysis , 16 :8858 , 2026 |
|
Abstract :
Importance of Sharing Experimental Data. Science is an ever-evolving endeavor, with all new research grounded in knowledge gained in previous studies and publications. This applies not only at the level of theory and fundamental knowledge, but also at the level of specific data. In thecontext of enzyme research, that includes information on properties such as protein production and folding, protein solubility, stability, catalytic activity, together with specificity and stereoselectivity, as well as regulatory effects as activation and inhibition,and kinetics, which are crucial for multiple practical reasons.In the fields of biology and biochemistry, the availability ofhigh-quality experimental data has already contributed to severalbreakthroughs over time. One example is AlphaFold 2,1 released in 2021 , a machine learning-based tool that predicts the 3D structures of proteins with unprecedented accuracy. Its release represented a major breakthrough in structural biology, addressing a long-standing challenge that had persisted for decades. A key element in the success of AlphaFold was the large number of experimental protein structures available in the Protein DataBank (ca. 159 ,000 in 2019 ). This was made possible becausethe deposition of crystallographic, nuclear magnetic resonance(NMR), and electron microscopy (cryo-EM) structures in auniform format into databases became the gold standard anda strict requirement for their publication three decades beforethe AlphaFold release.3,4 Thanks to the high quality and thelarge volume of its data, the Protein Data Bank also enabled the development of molecular docking and other tools. Other examples are UniProt and BRENDA, databases that contributed to functional prediction tools, metabolic modeling, and large-scale enzyme design efforts. Their success reliesheavily on community contributions, data quality checks, and manual curation. Bottleneck in Enzyme Engineering Enzyme engineering and predictive biotechnology still face numerous challenges. Predicting enzyme activity, selectivity, stability, and solubility remains difficult, not only because of the complexity of the underlying physical processes, but also due to the limited availability and heterogeneity of high-quality experimental data. The fast-evolving machine-learning techniques require training on reliable data to generate accurate predictions. Traditional low-throughput methods often offer greater accuracy, reproducibility, and interpretability compared to current high-throughput techniques, and their contribution remains highly valuable. Most research publications report experimental results in the form of tables and figures. Naturally, these results need to be presented in a form understandable to humans who will read them. However, search algorithms often miss such results, because they are either not available in a machine-readable format or they are hidden in the Supporting materials, which can be even harder to trace. Although several high quality repositories exist, namely STRENDA DB and SABIO RK for kinetic measurements, BRENDA for enzyme functional annotations, and domain specific resources such as FireProtDB and SoluProtMutDB for stability and solubility, respectively, there is currently no universally mandated deposition venue across journals and no uniform reporting practice for stability and solubility data. This fragmentation leads to heterogeneous metadata and inconsistency in unit conventions and uncertainty reporting. Our recommendations, therefore, aim to articulate a coordinated, interoperable pathway that bridges kinetic, stability, and solubility measurements under shared formats and vocabularies. This clearly represents a bottleneck in the development of future tools devoted to engineering better biocatalysts. The situation is gradually improving due to increased community awareness of the need for open science and the publication of data that adheres to the FAIR Data Principles(see below). High-quality data repositories can only emerge from community-wide agreement on how enzyme data are reported. This entails: (i) consensus on standardized reporting across disciplines and journals, and (ii) widespread adoption of established author guidelines for enzymology and biocatalysis, such as the STRENDA guidelines, now embedded in the author instructions of 55 peer-reviewed biochemistry journals. |
| PubMedSearch : Marques_2026_ACS.Catalysis_16_8858 |
| PubMedID: |
Marques SM, Planas-Iglesias J, Velecky J, Musil M, Asano Y, Borowski T, Brissos V, Cespugli M, Chorozian K, Dadashipour M, Erdem E, Ferrandi EE, Grigorakis K, Kluza A, Lawniczek J, Makryniotis K, Monti D, Nestl B, Ngo AC, Nikolaivits E, Patti S, Pentari C, Rodrigues CF, Schopper T, Seweryn-Ozog K, Szaleniec M, Taborda A, Tataruch M, Tischler D, Topakas E, Wang J, Wojcik P, Wojtkiewicz AM, Woodley JM, Zastawny O, Martins LO, Fraaije M, Pleiss J, Schnell S, Damborsky J, Mazurenko S, Bednar D (2026)
Mobilizing the Biocatalysis Community for Reproducible and Reusable Data Collection
ACS Catalysis
16 :8858
Marques SM, Planas-Iglesias J, Velecky J, Musil M, Asano Y, Borowski T, Brissos V, Cespugli M, Chorozian K, Dadashipour M, Erdem E, Ferrandi EE, Grigorakis K, Kluza A, Lawniczek J, Makryniotis K, Monti D, Nestl B, Ngo AC, Nikolaivits E, Patti S, Pentari C, Rodrigues CF, Schopper T, Seweryn-Ozog K, Szaleniec M, Taborda A, Tataruch M, Tischler D, Topakas E, Wang J, Wojcik P, Wojtkiewicz AM, Woodley JM, Zastawny O, Martins LO, Fraaije M, Pleiss J, Schnell S, Damborsky J, Mazurenko S, Bednar D (2026)
ACS Catalysis
16 :8858