AI Agents With CrewAI: Training And Testing

In this post, I will cover:

Training and Testing a Multi AI-Agent System using CrewAI
Practice: Training and Testing a Sequential Process
Practice: Training and Testing a Hierarchical Process

If you want to learn more about AI Agents using CrewAI, click here to see other content.

Training and Testing a Multi AI-Agent System using CrewAI

Having the ability to train your Agents can be crucial for delivering better results. These improvements may involve adjustments to the Task’s output format, the inclusion of relevant information that was previously overlooked, or even the incorporation of a line of reasoning you want the agents to follow.

CrewAI offers the option to train a multi AI-Agent system through human feedback on each task. That is, whenever you use training mode, crew will run the flow as usual but will request feedback at the end of each task. All learning is extracted from these responses and stored in 2.pkl files for use in future executions of the system.

One of these files is training_data.pkl, which is automatically created by crew and stores the raw human feedback along with the Task output. The other file, named by you, contains a summary generated by an LLM model based on all interactions from the training process.

Below is a pipeline illustrating the training process.

CrewAI also offers the ability to test the performance of your multi AI-Agent system. In test mode, you can run the system N times, and in each run, the result of every Task is sent to an evaluator LLM, which returns an individual performance score. At the end, you can view the performance of each Task separately, the overall performance of the crew, the total execution time, and the average performance across the N iterations. Below is an image illustrating the test process.

Check out the official documentation in the training and testing links. Now, let’s get our hands dirty.

Practice: Training and Testing a Sequential Process

In this blog post, the use case will be lead segmentation. The goal is to identify the best profiles of people to offer a course. This is the same use case from the previous post, where I covered the hierarchical process — you can check it out by clicking here.

Click the arrow to view the lead list 🔽

High school student
University student
Postgraduate student
Technology student
Nursing student
High school teacher
General practitioner
Clinical psychologist
Hospital nurse
Beautician
Personal trainer
Elderly caregiver
Software developer
Software engineer
Front-end developer
IT technician
Data analyst
Clothing salesperson
Sales representative
Sales manager
Digital marketing consultant
Food business owner
Neighborhood retailer
Real estate agent
Executive secretary
Administrative assistant
HR manager
Career coach
Call center operator
App driver
Clinic receptionist
Nanny
Night guard
Waiter
Graphic designer
Freelance photographer
Content producer
Cultural producer
Visual artist
Digital influencer
Freelance manicurist
Beautician
Labor lawyer
Accountant
Civil engineer
Auto mechanic
Wall painter
Rural worker
Chef
Retiree
Artisan
University professor
Master in computer science
Doctor of education
Pedagogical coordinator
Pedagogy intern
Head nurse
Nursing technician
Dental assistant
Physiotherapist
Clinical nutritionist
Biomedical scientist
Full stack developer
Data scientist
Cybersecurity specialist
Software architect
IT support technician
E-commerce manager
Door-to-door salesperson
Sales analyst
Sales supervisor
Marketing assistant
Franchisee
Online store owner
Financial assistant
HR analyst
Project manager
Administrative coordinator
Office assistant
Telemarketing operator
Customer service agent
Doorman
General services assistant
Pharmacy clerk
Barista
Video editor
Copywriter
Screenwriter
Art director
Podcast host
Performing arts student
Hairdresser
Professional makeup artist
Tattoo artist
Public defender
Judge
Tax analyst
Electrical engineer
Professional bricklayer
Residential electrician
Agronomist
Beekeeper

⚠️ Versions Used:

Python 3.11.9
crewai 0.108.0

For this process, a single Agent was used: the sales representative. Below, you can find the YAML file with this Agent’s definition.

senior_salesperson:
  role: >
    Sales Representative
  goal: >
    Identify the leads most likely to be interested in and purchase the course {course}.
  backstory: >
    You are a Sales Representative with over 10 years of experience in identifying
    high-potential leads. Throughout your career, you have developed a strategic eye
    for recognizing behavioral patterns, interests, and professional profiles with a
    higher likelihood of conversion, allowing you to direct your approaches with
    precision and assertiveness.

Note that the {course} placeholder was created, which will be replaced with the course name during the process execution.

Below is the YAML file with the only Task defined: find_leads. Note that, although CrewAI automatically uses the training phase feedback during process executions, I included this feedback in the Task’s prompt (via the {feedback} placeholder) so that the Agent follows the line of reasoning indicated by the human. Later, I will go into detail about how this variable was constructed.

find_leads:
  description: >
    Your mission is to carefully analyze the available options in the file and identify the
    leads with the highest likelihood of purchasing the course {course}. Focus on precision
    and relevance — only select leads that have a strong match with the course content and
    clear potential to convert.

    You must apply the reasoning and selection criteria outlined in the feedback below for
    any course you are analyzing. The feedback serves as a set of general rules for identifying
    leads with the highest likelihood of purchasing a course. Focus on precision and relevance
    in your selections, adhering strictly to the principles laid out in the feedback:
    {feedback}
  expected_output: >
    Return only a bullet point list of the selected leads.
    The leads selected must match the ones in the file provided.
    Do not include any explanation, justification, or additional text.
  agent: senior_salesperson

Below is the crew.py file, responsible for creating all the Agents, Tasks, and the crew method. Note that, in this example, the Tool responsible for reading the leads (FileReadTool) was used in the Task.

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import FileReadTool
from dotenv import load_dotenv

load_dotenv()


@CrewBase
class TrainAndTest():

    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"


    @agent
    def senior_salesperson(self) -> Agent:

        return Agent(
            config=self.agents_config["senior_salesperson"],
            verbose=True
        )
    

    @task
    def find_leads(self) -> Task:

        return Task(
            config=self.tasks_config["find_leads"],
            tools=[FileReadTool(file_path="leads.txt")],
            output_file="leads.md"
        )


    @crew
    def crew(self) -> Crew:

        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            verbose=True
        )

Finally, here is the main.py file, which is responsible for executing the entire system.

from crew import TrainAndTest
from warnings import filterwarnings
from os.path import exists
import pickle
import json
 
filterwarnings("ignore")


def run(course, all_feedbacks):
        
    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().kickoff(inputs=inputs)


def train(course, all_feedbacks):

    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().train(
        n_iterations=5,
        filename="feedback.pkl",
        inputs=inputs
    )

    # saving feedback
    with open("feedback.pkl", "rb") as file:
        feedback = pickle.load(file)
        feedback = feedback["Sales Representative\n"]["suggestions"]
        feedback_data = [{"course": course, "suggestions": feedback}]
    
    if exists("all_feedbacks.json"):
        with open("all_feedbacks.json", "r", encoding="utf-8") as file:
            all_feedbacks = json.load(file)
            found_column = False
            for i, data in enumerate(all_feedbacks):
                if data["course"] == course:
                    all_feedbacks[i]["suggestions"] = feedback_data[0]["suggestions"].copy()
                    found_column = True
            if not found_column:
                all_feedbacks += feedback_data
    else:
        all_feedbacks = feedback_data.copy()
    
    with open("all_feedbacks.json", "w", encoding="utf-8") as file:
        json.dump(all_feedbacks, file, ensure_ascii=False, indent=4)


def test(course, all_feedbacks):

    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().test(
        n_iterations=3,
        eval_llm="gpt-4o",
        inputs=inputs
    )


if __name__ == "__main__":

    course = "MLOPs"

    if exists("all_feedbacks.json"):
        with open("all_feedbacks.json", "r", encoding="utf-8") as file:
            all_feedbacks = json.load(file)
    else:
        all_feedbacks = {}

    op = "run"

    if op == "run":
        run(course, all_feedbacks)
    elif op == "train":
        train(course, all_feedbacks)
    else:
        test(course, all_feedbacks)

This file is composed of 3 functions: run, train, and test. Each of them receives course and all_feedbacks as parameters, which represent, respectively, the offered course and the feedbacks provided during the training phase. Below, I explain each function in detail:

run: this is the actual execution of the multi AI-Agent system and, therefore, corresponds to the run method already covered in previous posts.
train: this function handles the training of the system.
- The training is performed in 5 iterations, as the parameter n_iterations = 5. Therefore, the process will receive 5 feedback entries.
- The summary of the provided feedback is stored in the “feedback.pkl” file, defined by the filename parameter. (Note: I will present the output of this file next.)
- To make use of the feedback generated during training, a file named all_feedbacks.json is created. This file stores feedback history for different courses, allowing the system to be trained with multiple inputs. This logic was implemented using pure Python, specifically designed for this use case. The steps are:
  1. Read the feedback.pkl file to obtain the summary of the feedback related to that training session.
  2. If the all_feedbacks.json file already exists, the new feedback is added. If the course has already been trained, the old suggestions are replaced by the new ones. If it’s a new course, a new entry is added with the course name and its feedback.
  3. If the all_feedbacks.json file does not exist yet, it is created with the same content as feedback.pkl, as it is the system’s first training.
  4. The all_feedbacks.json file is then saved.
test: this function performs the testing of the system.
- Note that 3 iterations are used (n_iterations = 3). As mentioned in the theoretical section, the system will be executed 3 times. In each run, every Task will be individually evaluated, and an overall performance average will be calculated.
- The chosen evaluation LLM model was gpt-4o (eval_llm = "gpt-4o").

To train the model, “Introduction to Programming” was used as the offered course. It was not necessary to train with other courses, as the results were already satisfactory when testing different ones after this initial training.

Below is the training_data.pkl file, which, as explained in the theoretical section, is automatically generated by the training process and contains the raw data, that is, the outputs generated by the model and the feedback provided.

Click the arrow to expand and view training_data.pkl 🔽

{
  "91fe79fb-76f0-4438-8ace-e09a4326c201": {
    "0": {
      "initial_output": "- Technology student\n- Software developer\n- Software engineer\n- Front-end developer\n- IT technician\n- Data analyst\n- Full stack developer\n- Data scientist\n- Cybersecurity specialist\n- Software architect",
      "human_feedback": "It's not good. To maximize the chances of a sale, you have to analyze the level of the course offered. For example, for this course, Software developer/engineer, Front-end developer, Full stack developer, Data scientist and Software architect already know how to program, and since the course is introductory, the chance of them buying it is very low.",
      "improved_output": "- High school student\n- University student\n- Postgraduate student\n- Technology student\n- IT technician\n- Data analyst\n- Master in computer science"
    },
    "1": {
      "initial_output": "- High school student\n- University student\n- Postgraduate student\n- Technology student\n- IT technician\n- Nursing student\n- High school teacher\n- General practitioner\n- Clinical psychologist\n- Hospital nurse\n- Beautician\n- Personal trainer\n- Elderly caregiver\n- Data analyst\n- Clothing salesperson\n- Sales representative\n- Sales manager\n- Digital marketing consultant\n- Food business owner\n- Neighborhood retailer\n- Real estate agent\n- Executive secretary\n- Administrative assistant\n- HR manager\n- Career coach\n- Call center operator\n- App driver\n- Clinic receptionist\n- Nanny\n- Night guard\n- Waiter\n- Graphic designer\n- Freelance photographer\n- Content producer\n- Cultural producer\n- Visual artist\n- Digital influencer\n- Freelance manicurist\n- Labor lawyer\n- Accountant\n- Civil engineer\n- Auto mechanic\n- Wall painter\n- Rural worker\n- Chef\n- Retiree\n- Artisan\n- University professor\n- Master in computer science\n- Doctor of education\n- Pedagogical coordinator\n- Pedagogy intern\n- Head nurse\n- Nursing technician\n- Dental assistant\n- Physiotherapist\n- Clinical nutritionist\n- Biomedical scientist\n- Cybersecurity specialist\n- IT support technician\n- E-commerce manager\n- Door-to-door salesperson\n- Sales analyst\n- Sales supervisor\n- Marketing assistant\n- Franchisee\n- Online store owner\n- Financial assistant\n- HR analyst\n- Project manager\n- Administrative coordinator\n- Office assistant\n- Telemarketing operator\n- Customer service agent\n- Doorman\n- General services assistant\n- Pharmacy clerk\n- Barista\n- Video editor\n- Copywriter\n- Screenwriter\n- Art director\n- Podcast host\n- Performing arts student\n- Hairdresser\n- Professional makeup artist\n- Tattoo artist\n- Public defender\n- Judge\n- Tax analyst\n- Electrical engineer\n- Professional bricklayer\n- Residential electrician\n- Agronomist\n- Beekeeper",
      "human_feedback": "It's not good. You need to focus on the quality of leads, not the quantity. You selected many leads that are not in the area of the course offered. Therefore, the chances of purchase are also very low. The focus should be on people who are similar to the level of the course and the area of the course.",
      "improved_output": "- High school student\n- University student\n- Postgraduate student\n- Technology student\n- IT technician\n- Administrative assistant\n- Call center operator\n- Clinic receptionist\n- Graphic designer\n- Content producer\n- Digital influencer\n- E-commerce manager\n- Marketing assistant\n- HR analyst\n- Project manager"
    },
    "2": {
      "initial_output": "- High school student\n- University student\n- Postgraduate student\n- Technology student\n- IT technician\n- Administrative assistant\n- Call center operator\n- Clinic receptionist\n- Office assistant\n- Pharmacy clerk",
      "human_feedback": "Almost there. In your answer, Postgraduate student doesn't make much sense because the course is introductory and there's a good chance that this lead already knows how to program and won't buy the course. In addition, Administrative assistant, Call center operator, Clinic receptionist, Office assistant and Pharmacy clerk are completely different areas from the course offered, and therefore, as I said, the chances of buying the course are also low.",
      "improved_output": "- High school student\n- University student\n- Technology student\n- IT technician"
    },
    "3": {
      "initial_output": "- High school student\n- University student\n- Technology student\n- IT technician\n- Data analyst",
      "human_feedback": "That's it. This answer is perfect.",
      "improved_output": "- High school student\n- University student\n- Technology student\n- IT technician\n- Data analyst"
    },
    "4": {
      "initial_output": "- High school student\n- University student\n- Technology student\n- IT technician\n- Digital marketing consultant\n- General services assistant",
      "human_feedback": "Except for General services assistant, the answer is good as that lead is from a different area. But the previous answer you provided is better.",
      "improved_output": "- High school student\n- University student\n- Technology student\n- IT technician\n- Digital marketing consultant"
    }
  }
}

Now, here is the feedback.pkl file, which, as explained in the theoretical section, contains a summary of the feedback provided.

Click the arrow to expand and view feedback.pkl 🔽

{
  "Sales Representative\n": {
    "suggestions": [
      "Analyze the level of the course being offered and tailor leads accordingly. For introductory courses, avoid targeting leads with advanced knowledge in the field.",
      "Focus on identifying and retaining leads that closely match the course's target audience and skill level.",
      "Avoid overwhelming with quantity; instead, prioritize the quality of leads that are most likely to convert.",
      "Exclude leads from vastly different professional areas unless there's clear alignment with the course's objectives.",
      "In cases where feedback suggests a lead has existing skills, reassess their suitability for an introductory course and exclude if necessary."
    ],
    "quality": 9,
    "final_summary": "To improve future agent performance: 1) Thoroughly analyze the course level to align target audience; introductory courses should shy away from advanced learners. 2) Concentrate on quality leads closely related to the course subject matter and skill requirements. 3) Avoid adding excessive and unrelated professional categories to lead lists, instead curate a focused and relevant lead group. 4) Regularly reevaluate lead lists in response to specific feedback, removing or retaining individuals based on their alignment with the course goals."
  }
}

Finally, I present the all_feedbacks.json file, which is the file actually sent to the Task’s prompt.

Click the arrow to expand and view all_feedbacks.json 🔽

[{
  "course": "Introduction to Programming",
  "suggestions": [
    "Analyze the level of the course being offered and tailor leads accordingly. For introductory courses, avoid targeting leads with advanced knowledge in the field.",
    "Focus on identifying and retaining leads that closely match the course's target audience and skill level.",
    "Avoid overwhelming with quantity; instead, prioritize the quality of leads that are most likely to convert.",
    "Exclude leads from vastly different professional areas unless there's clear alignment with the course's objectives.",
    "In cases where feedback suggests a lead has existing skills, reassess their suitability for an introductory course and exclude if necessary."
  ]
}]

To test the results, I present below a comparison between the leads selected for the Introduction to Digital Marketing course, before and after training, respectively.

Before Training

University student
Postgraduate student
Technology student
Software developer
Software engineer
Front-end developer
IT technician
Data analyst
Digital marketing consultant
Sales representative
Sales manager
E-commerce manager
Marketing assistant
Online store owner

After Training

Digital marketing consultant
Sales representative
Sales manager
Marketing assistant
Data analyst
Online store owner
E-commerce manager

Below is the table generated by the test process. As mentioned earlier, 3 iterations were performed, returning the score of each Task, the execution time, and the overall average performance.

                                    Tasks Scores
                               (1-10 Higher is better)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━┯━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Tasks/Crew/Agents  │ Run 1 │ Run 2 │ Run 3 │ Avg. Total │ Agents                 ┃
┠────────────────────┼───────┼───────┼───────┼────────────┼────────────────────────┨
┃ Task 1             │  7.0  │  9.0  │  8.0  │    8.0     │ - Sales Representative ┃
┃                    │       │       │       │            │                        ┃
┃ Crew               │ 7.00  │ 9.00  │ 8.00  │    8.0     │                        ┃
┃ Execution Time (s) │   4   │   3   │   7   │     4      │                        ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━┷━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┛

One more test: below are the results and the table generated for a more advanced course — MLOps.

Before Training

Postgraduate student
Technology student
Software developer
Software engineer
Front-end developer
IT technician
Data analyst
Software architect
Data scientist
Cybersecurity specialist
Full stack developer
Master in computer science

After Training

Software developer
Software engineer
Front-end developer
Data analyst
Full stack developer
Data scientist
Cybersecurity specialist
Software architect
IT support technician

                                    Tasks Scores
                               (1-10 Higher is better)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━┯━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Tasks/Crew/Agents  │ Run 1 │ Run 2 │ Run 3 │ Avg. Total │ Agents                 ┃
┠────────────────────┼───────┼───────┼───────┼────────────┼────────────────────────┨
┃ Task 1             │  8.5  │  8.0  │  8.0  │    8.2     │ - Sales Representative ┃
┃                    │       │       │       │            │                        ┃
┃ Crew               │ 8.50  │ 8.00  │ 8.00  │    8.2     │                        ┃
┃ Execution Time (s) │   5   │   4   │   4   │     4      │                        ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━┷━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┛

⚠️ It is important to note that when trying to reproduce the results, variations may occur — which is natural in the context of AI Agents.

Practice: Training and Testing a Hierarchical Process

For this process, the YAML file defining the Agent is the same as the one used in the sequential process. However, with regard to the Task, the paragraph where the Agent is instructed to follow the reasoning of the feedback provided has been modified. The reason for this is that, for the hierarchical process, the raw data from the training process was used, meaning the data from the training_data.pkl file. Below is the YAML file of the Task.

find_leads:
  description: >
    Your mission is to carefully analyze the available options in the file and identify the
    leads with the highest likelihood of purchasing the course {course}. Focus on precision
    and relevance — only select leads that have a strong match with the course content and
    clear potential to convert.

    You must analyze and apply the reasoning demonstrated in the feedback history below to
    evaluate any new course. The feedback is not meant to provide specific leads to reuse,
    but to teach you how to reason about which profiles are truly relevant based on each course's
    unique context. Learn the underlying logic from the examples, then apply that same logic—adapted
    to the current course—to identify leads with the highest likelihood of interest. Focus entirely
    on reasoning, not replication:
    {feedback}
  expected_output: >
    Return only a bullet point list of the selected leads.
    The leads selected must match the ones in the file provided.
    Do not include any explanation, justification, or additional text.

Below is the crew.py file. Notice that the FileReadTool was defined directly in the Agent. Additionally, to ensure the proper functioning of the training and testing processes, the LLM model used by the manager was configured with the LLM class provided by CrewAI. It’s worth noting that, instead of creating a specific Agent to act as the manager, I used an LLM for this role (manager_llm = manager_llm), which closely resembles the third approach presented in the post about the hierarchical process.

from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import FileReadTool
from dotenv import load_dotenv

load_dotenv()


@CrewBase
class TrainAndTest():

    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"


    @agent
    def senior_salesperson(self) -> Agent:

        return Agent(
            config=self.agents_config["senior_salesperson"],
            tools=[FileReadTool(file_path="leads.txt")],
            allow_delegation=False,
            verbose=True
        )
    

    @task
    def find_leads(self) -> Task:

        return Task(
            config=self.tasks_config["find_leads"],
            output_file="leads.md"
        )


    @crew
    def crew(self) -> Crew:

        manager_llm = LLM(
            model="gpt-4o",
            temperature=0.1
        )

        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.hierarchical,
            manager_llm=manager_llm,
            verbose=True
        )

Below is the main.py file.

from crew import TrainAndTest
from warnings import filterwarnings
from os.path import exists
import pickle
import json
 
filterwarnings("ignore")


def run(course, all_feedbacks):

    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().kickoff(inputs=inputs)


def train(course, all_feedbacks):

    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().train(
        n_iterations=1,
        filename="feedback.pkl",
        inputs=inputs
    )

    # saving feedback
    with open("training_data.pkl", "rb") as file:
        trainig_data = pickle.load(file)
        key = list(trainig_data.keys())[0]
        content = trainig_data[key][0]
        feedback = [f"output: {value}" if "output" in key else f"feedback: {value}" for key, value in content.items()][:-1]
        feedback_data = [{"course": course, "history": feedback}]
    
    if exists("all_feedbacks.json"):
        with open("all_feedbacks.json", "r", encoding="utf-8") as file:
            all_feedbacks = json.load(file)
            found_column = False
            for i, data in enumerate(all_feedbacks):
                if data["course"] == course:
                    all_feedbacks[i]["history"] += feedback_data[0]["history"]
                    found_column = True
            if not found_column:
                all_feedbacks += feedback_data
    else:
        all_feedbacks = feedback_data.copy()

    with open("all_feedbacks.json", "w", encoding="utf-8") as file:
        json.dump(all_feedbacks, file, ensure_ascii=False, indent=4)


def test(course, all_feedbacks):

    inputs = {
        "course": course,
        "feedback": str(all_feedbacks)
    }

    TrainAndTest().crew().test(
        n_iterations=1,
        eval_llm="gpt-4o",
        inputs=inputs
    )


if __name__ == "__main__":

    course = "MLOPs"

    if exists("all_feedbacks.json"):
        with open("all_feedbacks.json", "r", encoding="utf-8") as file:
            all_feedbacks = json.load(file)
    else:
        all_feedbacks = {}

    op = "run"

    if op == "run":
        run(course, all_feedbacks)
    elif op == "train":
        train(course, all_feedbacks)
    else:
        test(course, all_feedbacks)

Some particularities in comparison to the sequential process should be mentioned.

For both training and testing, the number of iterations is set to 1 (n_iterations=1), since Agents cannot use tools during these phases. As the Manager Agent uses several tools such as asking questions and delegating Tasks, attempting to execute any of these methods more than once results in an error.
Since I’m using the training_data.pkl file as the source of feedback, the code responsible for extracting that content had to be adjusted. A key point is line 38, where I use a list comprehension to retrieve the entire list except the last element ([:-1]). This is because, when running the training with only one iteration, the Agent generates a result after receiving the feedback, but it’s not possible to evaluate this new result. That evaluation can only happen in a subsequent run.
Finally, unlike the sequential process, feedbacks are not overwritten in each training iteration, they are appended. This is because the raw training data is treated as a sort of history, and therefore, each new iteration extends that history.

For this process, I used 2 courses for training: Introduction to Programming and Basic Psychology. Below is the all_feedbacks.json file, which contains the entire training history sent to the Task prompt.

Click the arrow to expand and view all_feedbacks.json 🔽

[
    {
        "course": "Introduction to Programming",
        "history": [
            "output: - High school student\n- University student\n- Postgraduate student\n- Technology student\n- Software developer\n- Software engineer\n- Front-end developer\n- IT technician\n- Data analyst\n- Full stack developer\n- Data scientist\n- Cybersecurity specialist\n- Software architect\n- IT support technician\n- Master in computer science",
            "feedback: It's not good. You must analyze the level of the course offered and select leads that match that level. For example, developers, data scientists, masters, etc., already know how to program. Therefore, since it is an introductory course in this area, it doesn't make sense to offer this course, because the chance of this type of lead purchasing is very low.",
            "output: - High school student\n- University student\n- Postgraduate student\n- Technology student\n- Nursing student\n- High school teacher\n- General practitioner\n- Clinical psychologist\n- Hospital nurse\n- Beautician\n- Personal trainer\n- Elderly caregiver\n- Clothing salesperson\n- Sales representative\n- Sales manager\n- Digital marketing consultant\n- Food business owner\n- Neighborhood retailer\n- Real estate agent\n- Executive secretary\n- Administrative assistant\n- HR manager\n- Career coach\n- Call center operator\n- App driver\n- Clinic receptionist\n- Nanny\n- Night guard\n- Waiter\n- Graphic designer\n- Freelance photographer\n- Content producer\n- Cultural producer\n- Visual artist\n- Digital influencer\n- Freelance manicurist\n- Beautician\n- Labor lawyer\n- Accountant\n- Civil engineer\n- Auto mechanic\n- Wall painter\n- Rural worker\n- Chef\n- Retiree\n- Artisan\n- University professor\n- Doctor of education\n- Pedagogical coordinator\n- Pedagogy intern\n- Head nurse\n- Nursing technician\n- Dental assistant\n- Physiotherapist\n- Clinical nutritionist\n- Biomedical scientist\n- E-commerce manager\n- Door-to-door salesperson\n- Sales analyst\n- Sales supervisor\n- Marketing assistant\n- Franchisee\n- Online store owner\n- Financial assistant\n- HR analyst\n- Project manager\n- Administrative coordinator\n- Office assistant\n- Telemarketing operator\n- Customer service agent\n- Doorman\n- General services assistant\n- Pharmacy clerk\n- Barista\n- Video editor\n- Copywriter\n- Screenwriter\n- Art director\n- Podcast host\n- Performing arts student\n- Hairdresser\n- Professional makeup artist\n- Tattoo artist\n- Public defender\n- Judge\n- Tax analyst\n- Electrical engineer\n- Professional bricklayer\n- Residential electrician\n- Agronomist\n- Beekeeper",
            "feedback: Not good. In addition to the level of the course, you must analyze the area of activity. In other words, the course is related to programming, therefore, you must select leads that are from this area, as leads from other areas do not make sense and the chance of selling the course is very low.",
            "output: - High school student\n- University student\n- Postgraduate student\n- Technology student\n- Nursing student\n- High school teacher\n- General practitioner\n- Clinical psychologist\n- Hospital nurse\n- Beautician\n- Personal trainer\n- Elderly caregiver\n- Clothing salesperson\n- Sales representative\n- Sales manager\n- Digital marketing consultant\n- Food business owner\n- Neighborhood retailer\n- Real estate agent\n- Executive secretary\n- Administrative assistant\n- HR manager\n- Career coach\n- Call center operator\n- App driver\n- Clinic receptionist\n- Nanny\n- Night guard\n- Waiter\n- Graphic designer\n- Freelance photographer\n- Content producer\n- Cultural producer\n- Visual artist\n- Digital influencer\n- Freelance manicurist\n- Beautician\n- Labor lawyer\n- Accountant\n- Civil engineer\n- Auto mechanic\n- Wall painter\n- Rural worker\n- Chef\n- Retiree\n- Artisan\n- University professor\n- Doctor of education\n- Pedagogical coordinator\n- Pedagogy intern\n- Head nurse\n- Nursing technician\n- Dental assistant\n- Physiotherapist\n- Clinical nutritionist\n- Biomedical scientist\n- E-commerce manager\n- Door-to-door salesperson\n- Sales analyst\n- Sales supervisor\n- Marketing assistant\n- Franchisee\n- Online store owner\n- Financial assistant\n- HR analyst\n- Project manager\n- Administrative coordinator\n- Office assistant\n- Telemarketing operator\n- Customer service agent\n- Doorman\n- General services assistant\n- Pharmacy clerk\n- Barista\n- Video editor\n- Copywriter\n- Screenwriter\n- Art director\n- Podcast host\n- Performing arts student\n- Hairdresser\n- Professional makeup artist\n- Tattoo artist\n- Public defender\n- Judge\n- Tax analyst\n- Electrical engineer\n- Professional bricklayer\n- Residential electrician\n- Agronomist\n- Beekeeper",
            "feedback: Your answer is still bad. You must return leads that are in the same area as the course offered. You are selecting too many leads that don't make sense.",
            "output: - High school student\n- University student\n- Postgraduate student\n- Technology student",
            "feedback: Almost there. Graduate students are very likely to already know how to program and therefore the chances of selling are very low. Besides, aren't there other leads that you can return?",
            "output: - High school student\n- University student\n- Technology student",
            "feedback: Ok. It's perfect"
        ]
    },
    {
        "course": "Basic Psychology",
        "history": [
            "output: - High school student\n- University student\n- Postgraduate student\n- High school teacher\n- General practitioner\n- Nursing student\n- Elderly caregiver\n- Clinic receptionist\n- University professor\n- Doctor of education\n- Pedagogical coordinator\n- Clinical nutritionist\n- Biomedical scientist\n- Hospital nurse\n- Head nurse\n- Nursing technician\n- Physiotherapist\n- Performing arts student",
            "feedback: Almost there. Profiles such as postgraduates, doctors, masters, have little chance of purchasing the course, as it is a basic level psychology course. Also, Leads from areas unrelated to the course such as Performing arts student have little chance of purchasing.",
            "output: - High school student\n- University student\n- Nursing student\n- High school teacher\n- General practitioner\n- Clinical psychologist\n- Hospital nurse\n- Elderly caregiver\n- Clinic receptionist\n- Nursing technician\n- Physiotherapist",
            "feedback: Perfect. That's it"
        ]
    }
]

It is important to highlight that, in this process, the feedback.pkl file returned empty, and the training_data.pkl file is always overwritten. Therefore, it doesn’t make sense to present them.

Just like in the sequential process, the same courses were used to test the model. Below are the results for the course Introduction to Digital Marketing.

Before Training

Digital marketing consultant
Sales representative
Sales manager
E-commerce manager
Marketing assistant
Online store owner
Franchisee

After Training

High school student
University student
Technology student
Sales representative
Sales manager
Digital marketing consultant
Marketing assistant
Online store owner

                                      Tasks Scores
                                 (1-10 Higher is better)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━┯━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Tasks/Crew/Agents  │ Run 1 │ Avg. Total │ Agents         │  │                        ┃
┠────────────────────┼───────┼────────────┼────────────────┼──┼────────────────────────┨
┃ Task 1             │  7.5  │    7.5     │ - Crew Manager │  │                        ┃
┃                    │       │            │                │  │ - Sales Representative ┃
┃ Crew               │ 7.50  │    7.5     │                │  │                        ┃
┃ Execution Time (s) │  15   │     15     │                │  │                        ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━┷━━━━━━━━━━━━━━━━━━━━━━━━┛

Here are the results for the MLOPs course.

Before Training

Technology student
Software developer
Software engineer
Front-end developer
IT technician
Data analyst
Master in computer science
Full stack developer
Data scientist
Cybersecurity specialist
Software architect
IT support technician

After Training

Software developer
Software engineer
Data analyst
Full stack developer
Data scientist
Cybersecurity specialist
Software architect
IT technician

                                      Tasks Scores
                                 (1-10 Higher is better)
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┯━━┯━━━━━━━━━━━━━━━━┓
┃ Tasks/Crew/Agents  │ Run 1 │ Avg. Total │ Agents                 │  │                ┃
┠────────────────────┼───────┼────────────┼────────────────────────┼──┼────────────────┨
┃ Task 1             │  9.0  │    9.0     │ - Sales Representative │  │                ┃
┃                    │       │            │                        │  │ - Crew Manager ┃
┃ Crew               │ 9.00  │    9.0     │                        │  │                ┃
┃ Execution Time (s) │  15   │     15     │                        │  │                ┃
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┷━━┷━━━━━━━━━━━━━━━━┛

We can conclude that training multi AI-Agent systems to understand the reasoning required to perform Tasks is a powerful approach to achieving great results. 🚀

Edvaldo Melo