The original author disagrees, microbial identification of cancer: milestone research accused of significant data errors Paper | Journal | Data
·According to an article in the journal Science on August 2nd, this study has received hundreds of citations to date, providing data for more than a dozen other studies and supporting the incubation of at least one commercial project aimed at using microbial sequences in human blood to reveal the presence of cancer.
On August 2nd local time, an article in the journal Science described a recent heated debate in the field of science, stating that a milestone study may have "significant errors," but scientists who were accused of errors have expressed disagreement.
On March 11, 2020 local time, the journal Nature published a paper titled "Microbial Analysis of Blood and Tissue as a Diagnostic Method for Cancer". Scientists such as Robert Knight from the University of California, San Diego, USA, showed that different types of cancer are associated with different microbial communities. They used artificial intelligence to comb out microbial DNA that can indicate specific cancers and proposed a new type of cancer diagnosis tool based on microbiome.
According to an article in the journal Science on August 2nd, this study has received hundreds of citations to date, providing data for more than a dozen other studies and supporting the incubation of at least one commercial project aimed at using microbial sequences in human blood to suggest the presence of cancer.
However, on July 31, 2023 local time, Steven Salzberg and others from Johns Hopkins University in the United States published an article on the preprint platform, claiming that Knight et al.'s 2020 paper had "significant data analysis errors.". Salzberg et al. pointed out that Knight et al. failed to correctly filter human DNA from sequenced cancer tissue databases, resulting in millions of human gene sequences being misclassified as microorganisms. "The main conclusion of this paper is completely incorrect," Salzberg said.
Knight disagrees with these criticisms and points out that in January 2023, some scientists also questioned his 2020 paper, and his laboratory has responded in a preprint published in February 2023. "The issues pointed out by this new preprint are really not yet publicly resolved," said Knight, who co founded Micronoma in 2019 to develop microbiome based cancer diagnosis methods. He emphasized that in September 2022, his team published a paper titled "Pan Cancer Analysis Reveals Cancer Type Specific Fungal Ecology and Bacterial Group Interactions" in the journal Cell, using updated methods to analyze fungi and bacteria in tumors and drawing similar conclusions to previous papers in Nature.
However, bystander researchers have stated that Salzberg and others have raised more flaws in the 2020 paper by the Knight team, and the argument is convincing. Julian Parkhill, a bacterial geneticist at the University of Cambridge in the UK, said, "This is a precise breakdown of the errors in the original paper."
According to a report by the journal Science, multiple researchers have told the journal that microbiome science undoubtedly has biomedical prospects, and many other research groups have linked microorganisms to specific cancers. Leicesley Hoyle, a microbiologist and bioinformatics expert at the University of Nottingham Trent in the UK, said, "This debate provides a warning for microbiome research that heavily relies on computational methods. There is a lack of questioning about the content of publications, and we need someone to conduct such analysis."
Unexpected bacteria appeared in cancer tissue
Knight et al.'s paper used a database called "Cancer Genome Atlas", which stores a large number of DNA sequences from human cancer samples. The database classifies sequences as human or non-human based on whether they match the human reference genome.
Knight et al. compared the "non-human" sequences of TCGA, as well as sequences from dozens of non cancer patients and 100 cancer patients, with DNA databases of bacteria, viruses, and other microorganisms, indicating that different types of cancer have specific resident microbial communities. Then they input the data into machine learning algorithms, which can predict the type of cancer or the presence of cancer solely from the microbial composition of the sample, and their accuracy sometimes approaches 100%.
However, some scientists have noticed some puzzling findings in the experimental results of Knight et al. Although this work has discovered many human bacteria in cancer tissues, in addition to the mysterious seaweed bacteria, there is also a marine hydrothermal vent bacteria associated with prostate cancer, as well as a coral bacteria associated with melanoma.
In the January 2023 preprint, researchers from the University of East Anglia in the UK stated that this may indicate issues with the research methodology. They specifically pointed out that the presence of unexpected microorganisms in cancer tissue may be the result of database errors, where the sequence of one species is mistakenly labeled as the sequence of another species.
Parker Hill explained that it is common for human DNA to accidentally enter microbial databases and be mistakenly listed under microbial species names. Unless researchers filter out human DNA from human tissue sequencing data before comparing it with microbial databases, they may detect organisms that do not actually exist in the tissue.
In a 27 page response, Knight and his colleagues questioned the importance of these observations, stating that they used a new method in their 2022 paper published in Cell, replicating the conclusions of a 2020 paper published in Nature.
But this answer did not convince Salzberg, who developed some of the computational tools used in Knight et al.'s 2020 paper. Salzberg collaborated with researchers from the University of East Anglia to download and reanalyze a subset of data from Knight et al.'s research. Their analysis found that what Knight et al. believed to be millions of sequences of microorganisms were actually human. The latest preprint suggests that many microorganisms discovered in the study are not even present in cancer samples from TCGA.
Knight stated that the exact identity of sequences discovered in specific cancers has not changed the conclusion of his research team, "which can be improved through technology and data sources." He also pointed out that other studies, as well as a small analysis conducted by him and his colleagues, indicate that even when human sequences are more strictly excluded, microbial differences still exist.
Research on heavily dependent computational methods should be approached with caution
In the 2020 paper, due to the fact that the organizational samples come from multiple different medical centers and at different times, Knight et al. used "standardization" techniques to attempt to eliminate variability. But the new preprint states that this process is problematic as it introduces different electronic tags for each cancer type of data. When the research team inputs standardized data into their algorithm, computers can secretly use tags instead of microbial data to determine which cancer type the sample comes from.
Knight said that his team disagrees with the analysis of the latest preprint and once again emphasizes that they processed the data in different ways and reached the same conclusion. He added that his team did not have the special motivation to sort out the lengthy analysis of preprints or address it on social media, even though it had already caused a sensation. "If they want to publish this article in a peer-reviewed journal, we will address it. I believe this is an appropriate way to conduct scientific research, just like in the past few centuries."
Micronoma CEO Sandeline Miller Montgomery said in a statement, "We have developed additional human filtering and quality control methods to minimize human genomic DNA contamination, and found that doing so does not hinder the ability to diagnose the presence or type of cancer." For its ongoing lung cancer blood testing, Micronoma has assembled an independent, proprietary microbial database based on non-human metagenomes.
At present, it is unclear whether the research of other academic teams using data from Knight et al.'s 2020 paper has been affected. "These are all very early stages, and this is a quite complex problem," said Ethan Ruppin of the National Cancer Institute in the United States, who used the aforementioned dataset to write a paper titled "Predicting Cancer Prognosis and Drug Response of Tumor Microbiome", which was published in May 2022 in Nature Communications.
"Now it is time to listen to the opinions of the original authors of the Nature paper. If they choose to respond, they can gain a potentially more balanced and fair perspective on this important topic," said Eytan Ruppin.
Some scientists say that academic journals should be more cautious about research that heavily relies on computational methods. Ivan Vujkovic Cvijin, a microbiologist at the Cedar Sinai Medical Center in the United States, said, "The use of machine learning in microbiology lacks standards. I believe this scientific divergence emphasizes the necessity of developing them."
Some scientists also hope for more discussion to address the issues in the original paper. "As scientists, we should accept the challenge," Parkhill said. "We should be able to objectively handle it and correct it when necessary."