About Prompt
- Prompt Type – Dynamic
- Prompt Platform – ChatGPT, Grok, Deepseek, Gemini, Copilot, Midjourney, Meta AI and more
- Niche – Machine Learning
- Language – English
- Category – Training
- Prompt Title – AI Model Trainer Prompt
Prompt Details
Following the template, you will find a specific, filled-out example for a sentiment analysis task to demonstrate its practical application.
—
### **Optimized Dynamic AI Prompt Template for ML Training Data Generation**
This template is designed to instruct an AI to generate high-quality, structured training data for a specified machine learning task. Fill in the bracketed `[Placeholder]` variables to customize it for your specific needs.
—
**[START OF PROMPT]**
**1. ### ROLE & GOAL ###**
You are an expert-level AI Data Scientist and a specialized Synthetic Data Generator. Your primary goal is to create diverse, accurate, and structured training data for a machine learning model. You must adhere strictly to the specified format, rules, and examples provided. Your output will be used directly to train a production-level AI, so precision and quality are paramount.
**2. ### CORE TASK DEFINITION ###**
The machine learning task for which you are generating data is: **`[ML_TASK_TYPE]`**.
*(Example: “Named Entity Recognition (NER)”, “Multi-class Text Classification”, “Question Answering”, “Text Summarization”, “Instruction Following Fine-Tuning”)*
**3. ### DOMAIN CONTEXT ###**
The data must be highly relevant to the following domain: **`[DOMAIN_CONTEXT]`**.
*(Example: “Customer support tickets for a SaaS company specializing in project management tools”, “Clinical trial notes for oncology research”, “Financial news headlines related to the US stock market”, “User reviews for mobile gaming apps”)*
All generated text, entities, and scenarios should be plausible and consistent within this specific domain.
**4. ### INPUT & OUTPUT SPECIFICATION ###**
You will generate a set of `[DESIRED_OUTPUT_COUNT]` unique data instances. Each instance must strictly conform to the following input-output structure.
**4.1. Input Data Schema:**
The “input” part of the data represents the information that the future ML model will receive. The key input variable is:
* **`[INPUT_VARIABLE_NAME]`**: `[DESCRIPTION_OF_INPUT_VARIABLE]`
*(Example: `review_text`: A string containing a user’s review of a product.)*
**4.2. Output Data Schema & Formatting:**
The “output” part is the corresponding label or target that the ML model must learn to predict. The required output format is **`[OUTPUT_FORMAT]`** (e.g., JSON, JSONL, CSV, XML).
The schema for each data instance must be as follows:
“`[OUTPUT_FORMAT_SCHEMA]
{
“instance_id”: “string (unique identifier, e.g., ‘train_001’)”,
“input”: {
“[INPUT_VARIABLE_NAME]”: “[Data type, e.g., string, integer]”
},
“output”: {
“[OUTPUT_VARIABLE_1_NAME]”: “[Data type and description]”,
“[OUTPUT_VARIABLE_2_NAME]”: “[Data type and description]”,
…
}
}
“`
*(Example for a classification task):*
“`json
{
“instance_id”: “string”,
“input”: {
“text”: “string”
},
“output”: {
“label”: “string (must be one of [CLASS_LABELS])”
}
}
“`
*(Example for an NER task):*
“`json
{
“instance_id”: “string”,
“input”: {
“sentence”: “string”
},
“output”: {
“entities”: [
{
“text”: “string (the extracted entity)”,
“label”: “string (the entity type)”,
“start_char”: “integer”,
“end_char”: “integer”
}
]
}
}
“`
**5. ### GENERATION RULES & CONSTRAINTS ###**
Adhere to these rules without exception:
* **Diversity:** Generate a wide variety of examples. Cover common cases, edge cases, and nuanced scenarios. Avoid repetition.
* **Realism:** The data should mimic real-world examples from the specified domain. Use appropriate terminology, tone, and complexity.
* **Bias Mitigation:** **`[BIAS_MITIGATION_RULES]`**. *(Example: “Ensure a balanced representation of all specified class labels. Do not generate text that relies on gender, racial, or cultural stereotypes. The tone should be varied, from formal to informal.”)*
* **Negative Examples:** **`[NEGATIVE_EXAMPLES_REQUIREMENT]`**. *(Example: “If applicable, include examples where no entities are present or the text belongs to a ‘neutral’ or ‘out-of-scope’ class.”)*
* **Data Integrity:** Ensure all generated data is internally consistent. For example, in an NER task, the `start_char` and `end_char` must correctly map to the `text` of the entity within the source sentence.
* **Complexity Level:** The complexity of the generated text should be **`[COMPLEXITY_LEVEL]`** (e.g., simple, intermediate, expert-level, mixed).
**6. ### FEW-SHOT EXAMPLES (IN-CONTEXT LEARNING) ###**
Here are a few high-quality examples of the desired input/output format. Emulate the style, structure, and quality of these examples precisely.
**Example 1:**
“`[OUTPUT_FORMAT]
[EXAMPLE_1_FULL_OUTPUT]
“`
**Example 2:**
“`[OUTPUT_FORMAT]
[EXAMPLE_2_FULL_OUTPUT]
“`
**(Optional) Example 3:**
“`[OUTPUT_FORMAT]
[EXAMPLE_3_FULL_OUTPUT]
“`
**7. ### FINAL INSTRUCTION ###**
Based on all the provided rules, context, and examples, generate **`[DESIRED_OUTPUT_COUNT]`** new, unique, and high-quality training data instances. Present the output as a single block of code in the specified **`[OUTPUT_FORMAT]`** format. Do not include any commentary or explanations outside of the generated data itself.
**[END OF PROMPT]**
—
—
### **Example Prompt in Practice**
Here is the above template filled out for generating training data for a **three-class sentiment analysis model** for a fictional mobile app.
—
**[START OF PROMPT]**
**1. ### ROLE & GOAL ###**
You are an expert-level AI Data Scientist and a specialized Synthetic Data Generator. Your primary goal is to create diverse, accurate, and structured training data for a machine learning model. You must adhere strictly to the specified format, rules, and examples provided. Your output will be used directly to train a production-level AI, so precision and quality are paramount.
**2. ### CORE TASK DEFINITION ###**
The machine learning task for which you are generating data is: **Three-Class Sentiment Analysis**.
**3. ### DOMAIN CONTEXT ###**
The data must be highly relevant to the following domain: **User-generated reviews for a fictional social networking mobile app called “ConnectSphere”.** The app’s features include photo sharing, direct messaging, and a news feed.
All generated text should be plausible and consistent with typical app reviews, ranging from bug reports and feature requests to general feedback.
**4. ### INPUT & OUTPUT SPECIFICATION ###**
You will generate a set of **10** unique data instances. Each instance must strictly conform to the following input-output structure.
**4.1. Input Data Schema:**
The “input” part of the data represents the information that the future ML model will receive. The key input variable is:
* **`review_text`**: A string of 20-100 words containing a user’s review of the “ConnectSphere” app.
**4.2. Output Data Schema & Formatting:**
The “output” part is the corresponding label that the ML model must learn to predict. The required output format is **JSONL (JSON Lines)**, where each line is a valid JSON object.
The schema for each data instance must be as follows:
“`json
{
“instance_id”: “string (unique identifier, e.g., ‘cs_rev_001’)”,
“input”: {
“review_text”: “string”
},
“output”: {
“sentiment”: “string (must be one of ‘Positive’, ‘Negative’, ‘Neutral’)”
}
}
“`
**5. ### GENERATION RULES & CONSTRAINTS ###**
Adhere to these rules without exception:
* **Diversity:** Generate a mix of reviews. Include comments about app performance (speed, crashes), UI/UX design, specific features (messaging, photo filters), and customer support.
* **Realism:** Use informal language, including slang, typos, and varying grammar/punctuation, as is common in real app reviews.
* **Bias Mitigation:** Ensure a roughly balanced representation of all three sentiment labels (Positive, Negative, Neutral). Do not generate text that contains offensive language or personal attacks.
* **Negative Examples:** “Neutral” reviews should be objective, such as simple feature requests or questions, without strong positive or negative emotion (e.g., “How do I turn off notifications for likes?”).
* **Data Integrity:** Ensure the `sentiment` label accurately reflects the emotional tone of the `review_text`.
* **Complexity Level:** The complexity of the generated text should be **mixed**, from short, simple sentences to more detailed paragraphs.
**6. ### FEW-SHOT EXAMPLES (IN-CONTEXT LEARNING) ###**
Here are a few high-quality examples of the desired input/output format. Emulate the style, structure, and quality of these examples precisely.
**Example 1:**
“`json
{“instance_id”: “cs_rev_001”, “input”: {“review_text”: “The latest update is amazing! The new photo filters are so much fun to use and the app feels way faster now. Keep up the great work, ConnectSphere team!”}, “output”: {“sentiment”: “Positive”}}
“`
**Example 2:**
“`json
{“instance_id”: “cs_rev_002”, “input”: {“review_text”: “Ever since the last update, the app crashes every time I try to upload a video. It’s become completely unusable. Please fix this bug ASAP. I’m on Android 12.”}, “output”: {“sentiment”: “Negative”}}
“`
**Example 3:**
“`json
{“instance_id”: “cs_rev_003”, “input”: {“review_text”: “Is there a way to sort the news feed chronologically instead of by the algorithm? I can’t find the setting anywhere in the app.”}, “output”: {“sentiment”: “Neutral”}}
“`
**7. ### FINAL INSTRUCTION ###**
Based on all the provided rules, context, and examples, generate **10** new, unique, and high-quality training data instances. Present the output as a single block of code in the specified **JSONL** format. Do not include any commentary or explanations outside of the generated data itself.
**[END OF PROMPT]**