top of page
Search

AI Researchers

The rapid evolution of artificial intelligence has led to an unprecedented intersection between AI development and scientific inquiry. The advent of large language models (LLMs) has provided researchers with powerful tools for generating and analyzing text. However, there has been skepticism about whether these models can produce genuinely original output, especially in the context of scientific research. Despite these doubts, new methodologies have emerged that leverage the capabilities of LLMs to generate novel research ideas, test them, and document the results in a structured manner. One such approach, known as AI Scientist [1], represents a groundbreaking agentic workflow designed to harness the creative potential of LLMs for advancing AI research.


AI Scientist: A Collaborative Endeavor


AI Scientist is the brainchild of a multidisciplinary team of researchers, including Chris Lu, Cong Lu, Robert Tjarko Lange, and their colleagues from various esteemed institutions such as Sakana AI, the University of Oxford, the University of British Columbia, the Vector Institute, and the Canadian Institute for Advanced Research. Their work focuses on utilizing LLMs not merely as tools for generating text but as autonomous agents capable of driving the entire research process—from ideation to experimentation and documentation.


This novel approach positions LLMs as active participants in the scientific process. It challenges the traditional view that AI can only mimic human thought processes without generating genuinely new ideas [2]. Instead, AI Scientist leverages the computational power and pattern recognition capabilities of LLMs to push the boundaries of what is possible in AI research.


Even more, if you want to use this tool to help you write your papers you can download the code from Github[3] and do it yourself. It is done in Python and it is very well documented so it will be a straightforward process.


How AI Scientist Operates: A Multi-Stage Workflow


The AI Scientist workflow is a meticulously structured process that involves several stages, each designed to ensure that the output generated by the LLMs is both novel and scientifically rigorous. The workflow employs four prominent LLMs: Claude Sonnet 3.5 [7], GPT-4o[8], DeepSeek Coder[9], and LLama 3.1 405B[10]. These models are tasked with generating research in three key areas: diffusion image modeling, transformer-based language modeling, and the concept of "grokking"—a term the authors use to describe the generalization and speed of learning in deep neural networks.


In the following chart, you can see a summary of the designed process.


The AI Scientist is an end-to-end LLM-driven process that generates, tests, and reviews scientific ideas autonomously, including coding, executing experiments, and summarizing results.
The AI Scientist is an end-to-end LLM-driven process that generates, tests, and reviews scientific ideas autonomously, including coding, executing experiments, and summarizing results.

1. Idea Generation


The first step in the AI Scientist workflow involves prompting the LLM to generate "the next creative and impactful idea for research" within one of the specified categories. This prompt is intentionally broad to encourage the model to explore a wide range of possibilities. Once an idea is generated, the workflow introduces an API that searches through existing scientific papers to determine whether the idea is genuinely novel. If the model cannot ascertain the novelty of the idea, it is instructed to generate a search query to find related works. This iterative process of idea generation and validation continues until the LLM is confident that it has produced a novel concept.


This approach draws inspiration from evolutionary computation and open-ended research paradigms, where the generation of ideas is seen as an iterative process. Each idea generated by the AI Scientist includes a description, an experimental plan, and self-assessed scores on novelty, feasibility, and interest. By connecting the LLM with tools like the Semantic Scholar API[11], the AI Scientist can refine its ideas based on the current state of research, ensuring that its proposals are not only innovative but also grounded in existing knowledge.


2. Experiment Design and Execution


Once a novel idea is confirmed, the workflow moves to the next stage, where the LLM is prompted to design a series of experiments to test the hypothesis. This is where the Aider Python library [4] comes into play. The LLM uses this library to implement the experiments, analyze the results, and generate data visualizations. The use of an existing Python script ensures that the figures and results produced are scientifically valid and accurately represent the experimental outcomes.


The AI Scientist's approach to experimentation is robust, with the system designed to handle errors and unexpected outcomes by iteratively refining its code. This capability allows the AI Scientist to explore various avenues within its experimental framework, leading to potentially novel insights that might not have been discovered through a more linear approach. Moreover, the AI Scientist is capable of implementing entirely new plots and metrics that were not part of the initial templates, showcasing its ability to adapt and innovate within the experimental process.


3. Documentation and Paper Writing


The final stage involves synthesizing the findings into a cohesive research paper. The LLM is guided through this process by being prompted to write one section of the paper at a time, incorporating the experimental results, data visualizations, and related work citations. The authors have included tips on how to write a paper based on an existing guide [12]. The workflow also includes steps for refining the document—removing redundancies, reducing verbosity, and formatting the paper according to academic standards.


This stage is particularly significant because it mirrors the traditional scientific publication process, which involves not just the presentation of results but also the articulation of ideas in a structured and coherent manner. By automating this process, the AI Scientist demonstrates the potential for LLMs to contribute meaningfully to the scientific discourse, producing papers that are not only innovative but also adhere to the rigorous standards of academic writing.


Results and Evaluation


The effectiveness of the AI Scientist workflow was rigorously evaluated by a GPT-4o-based agent named “The AI Scientist reviewer”. It was designed using a set of criteria aligned with the guidelines for submissions to the Neural Information Processing Systems (NeurIPS)[5] conference, one of the most prestigious venues for AI research. These guidelines include an overall score ranging from 1 (very strongly reject) to 10 (award-quality: flawless and groundbreaking), along with a binary decision of whether to accept or reject the paper.


The results of this evaluation were intriguing. Of the four LLMs involved, Claude Sonnet 3.5 emerged as the top performer, with its highest-scoring papers achieving a score of 6, which is considered a "weak accept" by NeurIPS standards. One of Claude's most notable achievements was in the domain of diffusion modeling, where the LLM not only identified a promising research direction but also proposed a comprehensive experimental plan and successfully executed it, yielding positive results. The authors provide an archive of Claude’s output[6], so you can check it by yourself.


Violin plots showing the distribution of scores generated by The AI Scientist reviewer for AI-generated papers across three domains and four foundation models. Scores on the y-axis refer to NeurIPS ratings, which range from 2 (Strong Reject) to 6 (Weak Accept).
Violin plots showing the distribution of scores generated by The AI Scientist reviewer for AI-generated papers across three domains and four foundation models. Scores on the y-axis refer to NeurIPS ratings, which range from 2 (Strong Reject) to 6 (Weak Accept).

GPT-4o, another prominent model in the workflow, ranked second, with its best paper receiving a score of 5, a "borderline accept." However, despite these successes, the overall performance of the LLMs was less than stellar. Across all models and categories, the average score was 4.05 or lower, with 4 being classified as a "borderline reject." This indicates that while the AI Scientist workflow has the potential to produce scientifically valid and innovative research, there are still significant challenges to be addressed.


One of the most pressing issues identified by the researchers was the frequent failure of the LLMs to fully implement their ideas. This often resulted in incomplete experiments or fabricated results, undermining the credibility of the generated papers. Additionally, the models sometimes failed to cite the most relevant existing research, further detracting from the scientific rigor of the output.


The Broader Implications


Despite the mixed results, the AI Scientist project represents a significant step forward in the development of agentic workflows in AI research. These workflows, which break down complex tasks into more manageable subtasks, are becoming an increasingly prominent theme in AI development. By dividing the process of conducting AI research into distinct stages—idea generation, experimentation, and documentation—LLMs can be guided through the scientific process in a way that mirrors the workflow of human researchers.


An illustration of an AI Agent. Source [13]
An illustration of an AI Agent.

The potential applications of this approach are vast. For instance, agentic workflows could be used to accelerate the pace of research in fields that require large-scale data analysis, such as genomics or climate science. By automating the more routine aspects of the research process, scientists could focus their efforts on the more creative and interpretive aspects of their work, leading to faster and potentially more impactful discoveries.


Moreover, as LLMs continue to evolve and improve, the quality of the research they produce is likely to increase. The AI Scientist project highlights the importance of continuous refinement and iteration in the development of AI-driven research tools. By addressing the current limitations—such as the tendency to fabricate results or overlook key references—future iterations of the AI Scientist workflow could produce research that not only meets but exceeds the standards of leading academic conferences and journals.


Expanding the Scope


While the AI Scientist has shown promise in automating research in machine learning, its underlying framework could be applied to a broad range of scientific disciplines. The principles of ideation, experimentation, and documentation are not confined to AI research; they are the bedrock of scientific inquiry in fields as diverse as biology, chemistry, and physics. The modular nature of the AI Scientist's workflow allows it to be adapted to different research domains, provided the necessary tools and datasets are available for experimentation.


For example, in materials science, the AI Scientist could be used to explore new compounds with desirable properties by generating hypotheses about molecular structures, running simulations to test these hypotheses, and documenting the results in scientific papers. Similarly, in the field of genomics, the AI Scientist could assist in identifying new gene functions or interactions by analyzing large datasets, designing experiments, and publishing the findings.


The potential for this technology to democratize access to scientific research is profound. By lowering the barriers to entry for conducting high-quality research, the AI Scientist could empower researchers in under-resourced regions or institutions to contribute to the global scientific community. This democratization of research could lead to a more diverse and inclusive scientific landscape, where ideas and discoveries emerge from a wider array of perspectives.


Challenges and Ethical Considerations


While the potential benefits of the AI Scientist are significant, the technology also presents a number of challenges and ethical considerations that must be addressed. One of the primary concerns is the reliability and accuracy of the research produced by LLMs. As mentioned earlier, the AI Scientist has been known to fabricate results or overlook key references, which could lead to the dissemination of false or misleading information.


Check our other articles to learn more about Interpretability on ML and bias issues with LLMs
Check our other articles to learn more about Interpretability [14] on ML and bias issues with LLMs [15]

To mitigate these risks, it is essential to develop more robust validation and verification processes for AI-generated research. This could involve integrating human oversight at critical stages of the workflow, such as during the evaluation of experimental results or the final review of the manuscript. Additionally, as the AI Scientist continues to evolve, it will be important to establish ethical guidelines for the use of AI in scientific research, particularly concerning issues such as data privacy, intellectual property, and the potential for bias in AI-generated content.


Another challenge lies in the potential impact of AI-generated research on the academic publishing ecosystem. As the AI Scientist becomes more capable of producing publishable papers, there is a risk that the volume of submissions to academic journals could increase dramatically, potentially overwhelming the peer review process and diluting the quality of published research. To address this, academic institutions and journals may need to develop new policies and standards for evaluating AI-generated research, ensuring that it meets the same rigorous criteria as human-authored work.


Conclusion


The AI Scientist project is a bold and ambitious attempt to harness the creative and analytical capabilities of large language models for scientific research. While the current results are mixed, the potential of agentic workflows to revolutionize the way research is conducted cannot be overstated. As AI technology continues to advance, we may soon see a future where AI-driven research is not just a novelty but a standard practice in scientific inquiry. By automating the research process, AI has the potential to democratize access to scientific knowledge, enabling researchers from around the world to contribute to the advancement of human understanding in ways that were previously unimaginable.


The road ahead is undoubtedly challenging, with many technical and ethical hurdles to overcome. However, the AI Scientist project has laid a solid foundation for future developments in this exciting field. As we move forward, the continued collaboration between AI researchers, computer scientists, and domain experts will be crucial in realizing the full potential of AI as a tool for scientific discovery. The journey has just begun, and the possibilities are truly limitless.


References


[1] Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., & Ha, D. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv preprint arXiv:2408.06292.


[2] Boden, M. A. (1998). Creativity and artificial intelligence. Artificial intelligence, 103(1-2), 347-356.



[4] Paul Gauthier. Aider, 2024.




[7] Claude 3.5 Sonnet, Anthropic


[8] GPT-4o, Openai


[9] DeepSeek Coder, DeepSeek


[10] Llama Team. The llama 3 herd of models, 2024






[15] Biases in LLMs, Transcendent AI 2024

Comments


bottom of page