AI Model Trainer Prompt

About Prompt

  • Prompt Type – Dynamic
  • Prompt Platform – ChatGPT, Grok, Deepseek, Gemini, Copilot, Midjourney, Meta AI and more
  • Niche – Machine Learning
  • Language – English
  • Category – Training
  • Prompt Title – AI Model Trainer Prompt

Prompt Details

Of course. Here is a highly optimized, dynamic AI prompt template designed for an AI Model Trainer in the Machine Learning niche. This template is structured to be versatile across all major AI platforms and follows best practices in prompt engineering.

Following the template, you will find a specific, filled-out example for a sentiment analysis task to demonstrate its practical application.

### **Optimized Dynamic AI Prompt Template for ML Training Data Generation**

This template is designed to instruct an AI to generate high-quality, structured training data for a specified machine learning task. Fill in the bracketed `[Placeholder]` variables to customize it for your specific needs.

**[START OF PROMPT]**

**1. ### ROLE & GOAL ###**
You are an expert-level AI Data Scientist and a specialized Synthetic Data Generator. Your primary goal is to create diverse, accurate, and structured training data for a machine learning model. You must adhere strictly to the specified format, rules, and examples provided. Your output will be used directly to train a production-level AI, so precision and quality are paramount.

**2. ### CORE TASK DEFINITION ###**
The machine learning task for which you are generating data is: **`[ML_TASK_TYPE]`**.
*(Example: “Named Entity Recognition (NER)”, “Multi-class Text Classification”, “Question Answering”, “Text Summarization”, “Instruction Following Fine-Tuning”)*

**3. ### DOMAIN CONTEXT ###**
The data must be highly relevant to the following domain: **`[DOMAIN_CONTEXT]`**.
*(Example: “Customer support tickets for a SaaS company specializing in project management tools”, “Clinical trial notes for oncology research”, “Financial news headlines related to the US stock market”, “User reviews for mobile gaming apps”)*

All generated text, entities, and scenarios should be plausible and consistent within this specific domain.

**4. ### INPUT & OUTPUT SPECIFICATION ###**
You will generate a set of `[DESIRED_OUTPUT_COUNT]` unique data instances. Each instance must strictly conform to the following input-output structure.

**4.1. Input Data Schema:**
The “input” part of the data represents the information that the future ML model will receive. The key input variable is:
* **`[INPUT_VARIABLE_NAME]`**: `[DESCRIPTION_OF_INPUT_VARIABLE]`
*(Example: `review_text`: A string containing a user’s review of a product.)*

**4.2. Output Data Schema & Formatting:**
The “output” part is the corresponding label or target that the ML model must learn to predict. The required output format is **`[OUTPUT_FORMAT]`** (e.g., JSON, JSONL, CSV, XML).

The schema for each data instance must be as follows:
“`[OUTPUT_FORMAT_SCHEMA]
{
“instance_id”: “string (unique identifier, e.g., ‘train_001’)”,
“input”: {
“[INPUT_VARIABLE_NAME]”: “[Data type, e.g., string, integer]”
},
“output”: {
“[OUTPUT_VARIABLE_1_NAME]”: “[Data type and description]”,
“[OUTPUT_VARIABLE_2_NAME]”: “[Data type and description]”,

}
}
“`
*(Example for a classification task):*
“`json
{
“instance_id”: “string”,
“input”: {
“text”: “string”
},
“output”: {
“label”: “string (must be one of [CLASS_LABELS])”
}
}
“`
*(Example for an NER task):*
“`json
{
“instance_id”: “string”,
“input”: {
“sentence”: “string”
},
“output”: {
“entities”: [
{
“text”: “string (the extracted entity)”,
“label”: “string (the entity type)”,
“start_char”: “integer”,
“end_char”: “integer”
}
]
}
}
“`

**5. ### GENERATION RULES & CONSTRAINTS ###**
Adhere to these rules without exception:

* **Diversity:** Generate a wide variety of examples. Cover common cases, edge cases, and nuanced scenarios. Avoid repetition.
* **Realism:** The data should mimic real-world examples from the specified domain. Use appropriate terminology, tone, and complexity.
* **Bias Mitigation:** **`[BIAS_MITIGATION_RULES]`**. *(Example: “Ensure a balanced representation of all specified class labels. Do not generate text that relies on gender, racial, or cultural stereotypes. The tone should be varied, from formal to informal.”)*
* **Negative Examples:** **`[NEGATIVE_EXAMPLES_REQUIREMENT]`**. *(Example: “If applicable, include examples where no entities are present or the text belongs to a ‘neutral’ or ‘out-of-scope’ class.”)*
* **Data Integrity:** Ensure all generated data is internally consistent. For example, in an NER task, the `start_char` and `end_char` must correctly map to the `text` of the entity within the source sentence.
* **Complexity Level:** The complexity of the generated text should be **`[COMPLEXITY_LEVEL]`** (e.g., simple, intermediate, expert-level, mixed).

**6. ### FEW-SHOT EXAMPLES (IN-CONTEXT LEARNING) ###**
Here are a few high-quality examples of the desired input/output format. Emulate the style, structure, and quality of these examples precisely.

**Example 1:**
“`[OUTPUT_FORMAT]
[EXAMPLE_1_FULL_OUTPUT]
“`

**Example 2:**
“`[OUTPUT_FORMAT]
[EXAMPLE_2_FULL_OUTPUT]
“`

**(Optional) Example 3:**
“`[OUTPUT_FORMAT]
[EXAMPLE_3_FULL_OUTPUT]
“`

**7. ### FINAL INSTRUCTION ###**
Based on all the provided rules, context, and examples, generate **`[DESIRED_OUTPUT_COUNT]`** new, unique, and high-quality training data instances. Present the output as a single block of code in the specified **`[OUTPUT_FORMAT]`** format. Do not include any commentary or explanations outside of the generated data itself.

**[END OF PROMPT]**


### **Example Prompt in Practice**

Here is the above template filled out for generating training data for a **three-class sentiment analysis model** for a fictional mobile app.

**[START OF PROMPT]**

**1. ### ROLE & GOAL ###**
You are an expert-level AI Data Scientist and a specialized Synthetic Data Generator. Your primary goal is to create diverse, accurate, and structured training data for a machine learning model. You must adhere strictly to the specified format, rules, and examples provided. Your output will be used directly to train a production-level AI, so precision and quality are paramount.

**2. ### CORE TASK DEFINITION ###**
The machine learning task for which you are generating data is: **Three-Class Sentiment Analysis**.

**3. ### DOMAIN CONTEXT ###**
The data must be highly relevant to the following domain: **User-generated reviews for a fictional social networking mobile app called “ConnectSphere”.** The app’s features include photo sharing, direct messaging, and a news feed.

All generated text should be plausible and consistent with typical app reviews, ranging from bug reports and feature requests to general feedback.

**4. ### INPUT & OUTPUT SPECIFICATION ###**
You will generate a set of **10** unique data instances. Each instance must strictly conform to the following input-output structure.

**4.1. Input Data Schema:**
The “input” part of the data represents the information that the future ML model will receive. The key input variable is:
* **`review_text`**: A string of 20-100 words containing a user’s review of the “ConnectSphere” app.

**4.2. Output Data Schema & Formatting:**
The “output” part is the corresponding label that the ML model must learn to predict. The required output format is **JSONL (JSON Lines)**, where each line is a valid JSON object.

The schema for each data instance must be as follows:
“`json
{
“instance_id”: “string (unique identifier, e.g., ‘cs_rev_001’)”,
“input”: {
“review_text”: “string”
},
“output”: {
“sentiment”: “string (must be one of ‘Positive’, ‘Negative’, ‘Neutral’)”
}
}
“`

**5. ### GENERATION RULES & CONSTRAINTS ###**
Adhere to these rules without exception:

* **Diversity:** Generate a mix of reviews. Include comments about app performance (speed, crashes), UI/UX design, specific features (messaging, photo filters), and customer support.
* **Realism:** Use informal language, including slang, typos, and varying grammar/punctuation, as is common in real app reviews.
* **Bias Mitigation:** Ensure a roughly balanced representation of all three sentiment labels (Positive, Negative, Neutral). Do not generate text that contains offensive language or personal attacks.
* **Negative Examples:** “Neutral” reviews should be objective, such as simple feature requests or questions, without strong positive or negative emotion (e.g., “How do I turn off notifications for likes?”).
* **Data Integrity:** Ensure the `sentiment` label accurately reflects the emotional tone of the `review_text`.
* **Complexity Level:** The complexity of the generated text should be **mixed**, from short, simple sentences to more detailed paragraphs.

**6. ### FEW-SHOT EXAMPLES (IN-CONTEXT LEARNING) ###**
Here are a few high-quality examples of the desired input/output format. Emulate the style, structure, and quality of these examples precisely.

**Example 1:**
“`json
{“instance_id”: “cs_rev_001”, “input”: {“review_text”: “The latest update is amazing! The new photo filters are so much fun to use and the app feels way faster now. Keep up the great work, ConnectSphere team!”}, “output”: {“sentiment”: “Positive”}}
“`

**Example 2:**
“`json
{“instance_id”: “cs_rev_002”, “input”: {“review_text”: “Ever since the last update, the app crashes every time I try to upload a video. It’s become completely unusable. Please fix this bug ASAP. I’m on Android 12.”}, “output”: {“sentiment”: “Negative”}}
“`

**Example 3:**
“`json
{“instance_id”: “cs_rev_003”, “input”: {“review_text”: “Is there a way to sort the news feed chronologically instead of by the algorithm? I can’t find the setting anywhere in the app.”}, “output”: {“sentiment”: “Neutral”}}
“`

**7. ### FINAL INSTRUCTION ###**
Based on all the provided rules, context, and examples, generate **10** new, unique, and high-quality training data instances. Present the output as a single block of code in the specified **JSONL** format. Do not include any commentary or explanations outside of the generated data itself.

**[END OF PROMPT]**