Software: Python

Northeastern University’s Khoury College offers several large classes supported by dozens of teaching assistants (TAs). These include Intro to Programming with Data, Discrete Structures, and Fundies. These courses have many lab practicums that need to be supported by the course TAs. Allocating TAs to labs is a difficult but interesting resource-allocation challenge. While a simple sign-up sheet might be a solution, the results are often suboptimal. In general, our goal is to assign TAs to labs that are consistent with their availability and preferences while making sure we assign enough TAs to each lab. Performing this scheduling task by hand takes many hours of the instructor’s time and it is unlikely that they are coming up with the best possible solution. Solving this problem would be a great service to Khoury College. In this project, we take on the challenge!

Purpose:

  • Use evolutionary computing to solve an interesting resource allocation problem

  • Help Khoury College faculty with managing TA / lab allocation for large classes

  • Explore an important field of AI: Intelligent Decision-Support

  • Practice functional programming and test-driven development (TDD)

Two tables of data have been provided using real data from DS2000: Intro to Programming with Data.

sections.csv: A table of information about each lab/section. There are 17 sections numbered 0-16. The critical columns are min_ta and max_ta, the minimum and maximum number of TAs that need to be assigned to that section. The min/max values are determined by the number of students enrolled in that section.

tas.csv: A table of information about each TA, how many sections they are willing to support (max_assigned) and their availability and preferences for each section (U = Unavailable, W = Willing, P = Preferred).

In this project, we will use evolutionary computing to assign TAs to specific recitation sections. The main tasks are:

  1. Define functions for each objective. Each function should be written in a functional style. Avoid loops and assignments to the extent possible.

  2. Implement at least one unit test for each objective function to verify that the objectives are working correctly.

  3. Define agents that modify existing solutions and create new solutions. These can be simple or complex. Code up at least four distinct agents.

  4. Run your optimizer for at most 5 minutes. There is no limit on how many agent invocations you perform or how many solutions you produce. Your only limit is time.

  5. Report your best, non-dominated set of solutions.

  1. Minimize overallocation of TAs (overallocation): Each TA specifies how many labs they can support (max_assigned column in tas.csv). If a TA requests at most 2 labs and you assign to them 5 labs, that’s an overallocation penalty of 3. Compute the objective by summing the overallocation penalty over all TAs. There is no minimum allocation.

  2. Minimize time conflicts (conflicts): Minimize the number of TAs with one or more time conflicts. A time conflict occurs if you assign a TA to two labs meeting at the same time. If a TA has multiple time conflicts, still count that as one overall time conflict for that TA.

  3. Minimize under-support (undersupport): If a section needs at least 3 TAs and you only assign 1, count that as 2 penalty points. Minimize the total penalty score across all sections. There is no penalty for assigning too many TAs. You can never have enough TAs.

  4. Minimize the number of times you allocate a TA to a section they are unwilling to support (unwilling). You could argue this is really a hard constraint, but we will treat it as an objective to be minimized instead.

  5. Minimize the number of times you allocate a TA to a section where they said “willing” but not “preferred” (unpreferred). In effect, we are trying to assign TAs to sections that they prefer. But we want to frame every objective a minimization objective. So, if your solution score has unwilling = 0 and unpreferred = 0, then all TAs are assigned to sections they prefer!

To do this, I created four code files: evo.py, profiler.py, assignta.py, and test_assignta.py.

evo.py: An evolutionary computing framework (outline created in DS3500 class)

Within this file, I created an Evo class, which included functions like:

  • __init__ – Instantiate self.pop (evaluation → solution), self.fitness (name → objective function), and self.agents (name → (operator function, num_solutions_input)

  • add_fitness_criteria – Register an objective with the environment

  • add_agent – Register an agent with the environment. The operator (op) defines how the agent tweaks a solution. k defines the number of solutions input to the agent.

  • add_solution – Add a solution to the population

  • run_agent – Invoke a named agent on the population

  • dominates – p = evaluation of one solution: ((obj1, score 1), (obj2, score2), ...), q = evaluation of another solution: ((obj1, score 1), (obj2, score2), ...)

  • reduce_nds – Reduce non-dominated solutions in population

  • remove_nondominated – Remove non-dominated solutions from population

  • evolve – Run random agents until program hits time limit.

    • dom: How frequently to remove dominated solutions

    • status: How frequently to output the current population

    • time_limit: How long to evolve the population in seconds (300 seconds/5 minutes)

  • __str__ – Output the solutions in the population

profiler.py: A profiler class that uses decorators to support optimizer profiling (demonstrate solutions were produced in five minutes or less)

Within this file, I created a Profiler class to keep track of function calls and running time.

assignta.py: Assign TAs to specific recitation sections using evolutionary computing

Within this file, I defined five objective functions, one for each optimizer objective, to calculate objective values for each solution in the population:

  • minimize_overallocation, minimize_conflicts, minimize_undersupport, minimize_unwilling, minimize_unpreferred

Next, I defined seven agents to modify solutions in the population. Five of these agents targeted a specific optimizer objective, while two aimed to simply mutate/shuffle solutions for randomness. The agents included:

  • overallocation_minimizer, conflicts_minimizer, undersupport_minimizer, unwilling_minimizer, unpreferred_minimizer (five objective agents)

  • shuffle_solutions – Agent to randomly shuffle TA assignments

  • mutate_solutions – Agent to mutate TA assignments

I also created a function to write the final CSV file containing a summary table of objective scores for all non-dominated Pareto-optimal solutions, write_scores_csv.

This file also has a main() function where an environment is created, objective functions are added, agents are registered, the optimizer is run with a profiling report, and the CSV file is written for the final population.

test_assignta.py: Unit test for each objective function to verify that objectives are working correctly

This file includes fixtures for three sample solutions and unit tests for each objective function to ensure the test solution score matches the expected score for each objective (calculated beforehand). The tests were named as follows:

  • test_minimize_overallocation

  • test_minimize_conflicts

  • test_minimize_undersupport

  • test_minimize_unwilling

  • test_minimize_unpreferred

The file pytest_report.txt demonstrates that the objective functions I created were scoring the three sample solutions correctly (all tests passed):

pytest_report.txt

After running assignta.py wherein the Evo optimizer ran for five minutes, the following CSV summary table of non-dominated Pareto-optimal solutions was created:

non_dominated_solution_scores.csv

The profiling_report.txt file shows that these solutions were created within the five minute (300 second) time limit:

profiling_report.txt

After looking at the non-dominated solutions in my summary table, I decided that the following solution was best:

non_dominated_solution_scores.csv

As noted in the best_solution_details.txt file, the evaluation scores of my chosen solution are: (('overallocation', 0), ('conflicts', 2), ('undersupport', 5), ('unwilling', 0), ('unpreferred', 4)).

The assignments of my chosen solution are:

In looking at the different objectives, I felt as though 'overallocation' and 'unwilling' should be prioritized as these pertain to needs directly expressed by TAs regarding their workload capacity. With this in mind, a failure to abide by these expressed needs during scheduling will overextend the TA (something that is unfair given being a TA is voluntary and their needs were clearly expressed before scheduling) and possibly hurt their overall perception of the Khoury TA program. The solution above achieves 0 overallocations so no TA is responsible for supporting more labs than they are capable of as well as 0 unwilling assignments so no TA is assigned to a section they are unwilling to support. In doing so, TAs won't be assigned to labs that violate their basic scheduling needs.

While this solution has 5 undersupported sections, there are still four TAs that are assigned to no sections, as shown above. Therefore, Khoury still may have the option to assign these TAs to one (or two) of these undersupported sections if the TA agrees. While this is not ideal (unassigned TAs may not agree to help) and I think undersupport is an important objective to consider, this solution had the lowest undersupport objective score of the solutions that had 0 overallocations and 0 unwilling assignments (which I believe are the most important objectives to maintaining TA happiness and satisfaction), leading to my selection of this solution.

Finally, this solution had 4 unpreferred assignments. Despite this score being tied for highest unpreferred score for all non-dominated solutions, I felt that this objective was least important to consider because it represents an ideal scenario. In other words, while it would be nice to assign all TAs to sections they prefer, there are some cases where I believe it is worth assigning a TA to a section they are only willing (not unwilling) to support for the sake of minimizing conflicts or undersupport as these objectives relate directly to the quality of TA support students will be receiving (need TAs available at class time to help). While it is important to prioritize TA satisfaction with assignments as mentioned earlier, being a TA also means working when help is most needed, which may not always be at preferred times. For all these reasons, I consider this solution to be my best solution.

This project was a deep exploration of evolutionary computing and its application to real-world resource allocation challenges; by addressing TA assignments for lab sections, I contributed to developing an intelligent decision-support system to aid Khoury College faculty. Implementing the evolutionary framework helped me understand how iterative optimization balances complex priorities (in this case, minimizing overallocation, undersupport, and unpreferred assignments) within strict time constraints. The emphasis on functional programming and test-driven development taught me the importance of clean, modular code, as well as verifying that your code is accurate as you work (as opposed to at the very end of a file).

Furthermore, designing agents to address specific objectives required balancing tradeoffs, such as reducing overallocation without creating time conflicts, and demonstrated the complexity of optimizing competing constraints. Presenting a Pareto-optimal set of solutions and analyzing tradeoffs after running the optimizer deepened my technical and critical thinking skills as well. While my best solution achieved a balanced tradeoff, the project underscored the challenges of achieving perfection in real-world scenarios as even my best solution had compromises. Overall, this experience strengthened my problem-solving skills and encouraged me to think through the purpose and function of all elements of my code. Numerous deliverables and competing objectives made it difficult to know what to prioritize at times, but this project sparked a lot of growth in my programming skills and coding approach that I will use in the future.

Previous
Previous

Lyricool NLP Framework

Next
Next

Exploring the Relationship Between Lifestyle and Diabetic Outcomes