Skip to Main Content

Generative AI in literature research

General overview Generative AI

 

Although Artificial Intelligence (AI) had been around for a long time and has found its way into scientific information retrieval and writing support the emergence of generative AI (GenAI) based on large Language Models (LLM’s) has provided a large boost to application development.

 

Generative AI (GenAI) produces new content like text, images, videos, music, and other types of data. GenAI does this by using pre-trained models on large datasets to learn patterns and connections to generate new content following an initial prompt. A further supervised fine-tuning on conversations turns the model into an assistant and with the help of reinforced learning reasoning is added. The models are context aware so providing context within a prompt will provide a more fitting response.

 

GenAI is very good at communicating in natural language which makes the tools very accessible and convincing. They are capable of handling large amounts of data and generating summaries.

 

Bias in GenAI systems

As LLM’s are trained on large amounts of human generated text (or images, music, etc) the predominant views and hence biases can become part of the LLM. The same can be caused by the selection or availability of data sources to train LLM’s that are biased. 

 

Hallucinations in GenAI

When GenAI produces responses these responses are generated based on associations and predictions. As this process is not embedded in acknowledged knowledge the statements can sometimes provide a very convincing wrong answer. As this phenomena is inherent to the architecture of GenAI (based on associations and predictions) all systems using LLM’s suffer from this effect. Although systems are in place to reduce the effect as much as possible it is impossible to fully prevent this phenomena. It is very difficult to assess when the model is hallucinating and when not, having topic knowledge is highly relevant in such situations. 

 

GenAI robustness

As responses are generated each time the system is queried a similar prompt can result in different responses when performed at different time points.

 

Privacy

Many GenAI platforms need a registered account but can also use the user provided information for further training. For sensitive data it is therefore essential to be very careful with uploading information to a chat platform. The University of Amsterdam has more information on their specific policy: https://www.uva.nl/en/about-the-uva/about-the-university/ai/ai-policy/ai-policy.html

Best Practices: Do's and Dont's

Best Practices: Do's and Don'ts

Do's

  • Validate Information: Cross-check outputs with primary scientific articles to avoid relying on hallucinated or incorrect data.
  • Disclose AI Use: When applicable, acknowledge generative AI in your methodology.
  • Use Secure Platforms: Avoid sharing sensitive or unpublished data on open AI systems.

Don'ts

  • Over-rely on LLMs: These tools associate information but lack formalized scientific knowledge.
  • Bypass Peer Review Standards: Do not upload confidential manuscripts to LLMs during the review process.
  • Create Figures with AI: Generative AI should not replace original visualizations or artwork.

 

Limitations

Understanding the Limitations

While LLMs offer summaries, their methods for querying records remain opaque, and hallucinations are a known issue. Always review original articles and consider the feasibility of suggestions.

LLMs and "Knowledge"

Unlike humans, LLMs operate through associative learning, synthesizing patterns from data. They lack true understanding and depend on users for critical thinking.

Ethics

Ethical Considerations

Researchers should:

By balancing innovation with responsibility, LLMs can revolutionize the research process, making it more efficient and accessible.

E-module: AI literacy

The UvA has developed a short online AI literacy course to help you understand:

  • what Generative AI is
  • how it works
  • what its limitations are
  • how to use it responsibly during your studies

This interactive e-module consists of texts and short quizzes and takes around 45 to 60 minutes to complete.