Prompt Optimisation for Black-Box LLMs

Pretrained language models (PLMs) have revolutionized natural language processing (NLP), demonstrating remarkable capabilities, particularly in low-resource scenarios like few-shot learning. However, fine-tuning large PLMs can be computationally expensive. An efficient solution to this challenge is prompt tuning, a technique that biases the behavior of a pretrained model using a small number of tunable variables, typically inserted as soft (continuous) or hard (discrete) prompts. In particular, derivative-free optimization (DFO) techniques have gained attention as a means to optimize prompts for black-box PLMs without needing access to their internal parameters.
However, existing DFO-based methods require certain preconditions: such as the availability of auxiliary models or predefined manual prompts. To overcome these limitations, we introduce a novel approach based on a genetic algorithm (GA) for evolving discrete prompts from scratch, called Genetic Algorithm for Predictive Probability Guided Prompting (GAP3).
The GAP3 Approach
GAP3 aims to eliminate the preconditions required by previous DFO-based methods, such as needing extra APIs or manual prompts. The main idea behind GAP3 is to start with an empty prompt and evolve it using a genetic algorithm to better fit a downstream task. The evolution process is guided by the predictive probabilities derived from the language model itself, which is used to evaluate the fitness of prompt candidates.
Here’s how GAP3 works:
- Initial Population: GAP3 begins with an empty prompt template and generates an initial population of prompt candidates through random mutations.
- Fitness Evaluation: Each individual (prompt) is evaluated based on its performance on a few-shot training set using the language model's predictive probabilities.
- Crossover and Mutation: A genetic algorithm is applied, where the top-performing prompts (elite individuals) are selected for crossover and mutation. During mutation, a mask token is inserted, and the language model is used to predict the best token to replace the masked token.
- Evolution: This process is repeated over multiple iterations, evolving the prompt until a highly effective prompt is found for the downstream task.
The key benefits of GAP3 include:
- No Predefined Manual Prompts: Unlike previous methods that require manually designed prompts, GAP3 evolves them from scratch.
- No Extra APIs: GAP3 works directly with discrete tokens, eliminating the need for API calls to inject continuous embeddings into the model.
- Model Independence: GAP3 can be used with various PLMs, including those that provide predictive probabilities for masked tokens, like masked language models (MLMs) or encoder-decoder models like T5.
Experimental Results
We conducted experiments on 7 benchmark datasets with two different backbones: RoBERTa (LARGE) and GPT-2 (LARGE). The results demonstrate that GAP3 outperforms existing DFO-based methods, including BBT, GPS, and GRIPS, with significant improvements in accuracy and F1-score. Specifically, GAP3 achieves at least a 2.9% improvement in average performance for RoBERTa (LARGE) and 2.4% for GPT-2 (LARGE), compared to previous methods.
Furthermore, GAP3 performs competitively even when compared to gradient-based methods like Prompt Tuning (PT) and Full-Model Fine-Tuning (FT), achieving results close to PT for GPT-2 (LARGE) and surpassing it for RoBERTa (LARGE).
Advantages of GAP3
- Precondition-Free: GAP3 removes the need for manual prompt design and additional APIs for injecting vectors into the model, making it a fully automated approach.
- Cost-Effective: GAP3 is more computationally efficient than other methods that rely on auxiliary models, making it a viable option for real-world, large-scale applications.
- General Applicability: GAP3 can be applied to a wide range of PLMs, regardless of whether they are causal or masked language models, ensuring its broad usability.
Conclusion
GAP3 offers a significant advancement in the field of prompt tuning for language models. By leveraging a genetic algorithm, we can evolve high-performing prompts without relying on additional model dependencies or predefined manual prompts. This approach not only simplifies the prompt tuning process but also improves the efficiency and effectiveness of large-scale PLMs in downstream tasks.
We believe that GAP3 has great potential for further advancements in the automation of prompt tuning and could be a valuable tool for anyone working with large-scale language models in real-world applications.
The full article on GAP3: https://www.ijcai.org/proceedings/2023/0588.pdf
Learn more about GAP3: GAP3 Code and Materials
@inproceedings{10.24963/ijcai.2023/588,
author = {Zhao, Jiangjiang and Wang, Zhuoran and Yang, Fangchun},
title = {Genetic Prompt Search via Exploiting Language Model Probabilities},
year = {2023},
booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence},
pages = {5296--5305},
location = {Macao, P.R.China}
}