Skip to content

Commit add3f85

Browse files
authored
Merge pull request #34 from MIT-Emerging-Talent/meeting_notes
milestone 5- Update meeting minutes for Milestones 4 and 5
2 parents 3aae9c5 + 008d989 commit add3f85

File tree

3 files changed

+241
-8
lines changed

3 files changed

+241
-8
lines changed

meeting_minutes/README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,28 +24,38 @@ By the end of Milestone 1, the project established its scope, research framework
2424

2525
**Timeline:** October 15 – November 6, 2025
2626

27-
With the research framework and scope finalized in Milestone 1, **Milestone 2** focuses on preparing the experimental environment and defining how sustainability metrics will be measured. This phase involves setting up tools such as **CodeCarbon****CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google.
27+
With the research framework and scope finalized in Milestone 1, **Milestone 2** focused on preparing the experimental environment and defining how sustainability metrics were be measured. This phase involved setting up tools such as **CodeCarbon****CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google.
2828

29-
The team also plans to configure testing environments for small open-source models (e.g., **Mistral****LLaMA-2**) using **Hugging Face Transformers****PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable is the **experimental design document**, which will outline the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process.
29+
The team also planned to configure testing environments for small open-source models (e.g., **Mistral****LLaMA-2**) using **Hugging Face Transformers****PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable was the **experimental design document**, which was outlining the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process.
3030

3131
By the end of Milestone 2, the team completed the technical setup, finalized the measurement pipeline, and validated that all tracking tools operate consistently across model types—ensuring a smooth transition into Milestone 3, where full experiments will be executed.
3232

3333
## 📊 Milestone 3 – Model Benchmarking & Data Collection
3434

3535
**Timeline:** November 7 – November 18, 2025
3636

37-
Milestone 3 marks the beginning of the full experimental phase. Using the measurement pipeline and tooling established in Milestone 2, the team runs benchmark tasks on both proprietary and open-source models to collect data on **accuracy** and **environmental impact**. This includes tracking **energy consumption and carbon emissions** for each testing model under consistent test conditions.
37+
Milestone 3 marked the beginning of the full experimental phase. Using the measurement pipeline and tooling established in Milestone 2, the team ran benchmark tasks on both proprietary and open-source models to collect data on **accuracy** and **environmental impact**. This included tracking **energy consumption and carbon emissions** for each testing model under consistent test conditions.
3838

39-
During this phase, the team also validates accuracy results on selected reasoning and summarization tasks, investigates irregular outputs, and updates evaluation scripts when needed. Additional observations such as **inference time, token throughput**, and **hardware utilization** are recorded to support later analysis.
39+
During this phase, the team also validated accuracy results on selected reasoning and summarization tasks, investigated irregular outputs, and updated evaluation scripts when needed. Additional observations such as **inference time, token throughput**, and **hardware utilization** were recorded to support later analysis.
4040

4141
By the end of Milestone 3, the project has produced a complete experimental dataset covering sustainability metrics and accuracy scores for all evaluated models, providing a strong foundation for **Milestone 4**, which focuses on human evaluation and qualitative assessment.
4242

4343
## 🧪 Milestone 4 – Human Evaluation & Survey Analysis
4444

45-
**Timeline:** November 19 – ongoing
45+
**Timeline:** November 19 – December 3, 2025
4646

47-
Milestone 4 centers on incorporating **human judgment** into the benchmarking process. The team prepares a Google Form survey designed to compare model outputs side-by-side. Participants evaluate **clarity, coherence, informativeness, factuality,** and **overall preference**.
47+
Milestone 4 centered on incorporating **human judgment** into the benchmarking process and concluded successfully. The team prepared and published a Google Form survey to compare model outputs side-by-side, and participants evaluated **clarity, coherence, informativeness, factuality,** and **overall preference**.
4848

49-
Once responses are collected, the team analyzes the results by aggregating scores, assessing agreement among reviewers, and comparing human preferences with automated accuracy metrics from earlier milestones. This helps identify where quantitative and qualitative assessments align or diverge.
49+
To improve participation and focus, the survey scope was refined to eight questions across four categories—**Reasoning, Summarization, Creative Writing,** and **Paraphrasing**—and the **Retrieval/RAG** category was excluded due to its emphasis on factual lookup rather than generative quality.
5050

51-
By the end of Milestone 4, the project integrates the human evaluation results into the broader dataset, enabling a more nuanced understanding of model performance and preparing the groundwork for **Milestone 5**.
51+
Once responses were collected, the team analyzed the results by aggregating scores, assessing agreement among reviewers, and comparing human preferences. Initial insights, including distributional patterns and respondent demographics, were reviewed via Google Forms visualizations, and notable alignments and divergences between human judgments and quantitative metrics were documented to guide interpretation in the final analysis.
52+
53+
By the end of Milestone 4, the project integrated the human evaluation results into the broader dataset, consolidated the confirmed question set and model pairings, and prepared materials for downstream reporting. This provided a more nuanced understanding of model performance, completing the human evaluation phase and setting up the transition into **Milestone 5**.
54+
55+
## 📣 Milestone 5 – Communication of Results & Final Presentation
56+
57+
**Timeline:** December 4 – ongoing
58+
59+
Milestone 5 focuses on packaging and communicating the project’s findings while completing the final presentation and releasing the full set of artifacts. The team is synthesizing human evaluation results to produce a coherent analysis narrative, drafting and editing the presentation and article for publication, and finalizing an infographic and visual summary that will be embedded in both the article and the presentation.
60+
61+
In parallel, the repository is being cleaned and organized to publish the code, data, and analysis notebooks with clear usage notes and data access instructions. Everything will be finalized on December 7.
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
<!-- markdownlint-disable MD024 -->
2+
<!-- Disabled MD024 (Multiple headings with the same content) rule
3+
because repeated headings (Summary, Action Items) are
4+
intentionally used across multiple sections for structural clarity.
5+
-->
6+
# Milestone 4 Meeting Minutes
7+
8+
## Meeting 18
9+
10+
**Date:** November 19, 2025 (Wednesday, 1:00 PM EST)
11+
**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia
12+
13+
### Summary
14+
15+
- The research question was refined with Evan's help:
16+
- During the group conversation with Evan on Slack, team initially drafted a
17+
precise question focusing on whether optimized open-source models (e.g., via
18+
recursive editing, distillation) could become environmentally and
19+
functionally viable alternatives to commercial models.
20+
- However, Evan advised that the ELO2 project should remain **open-ended**,
21+
shifting toward a broader guiding question:
22+
**“How can we achieve similar results to large private models on smaller
23+
devices and with less power consumption?”**
24+
- As a result, the final deliverable will be a **comprehensive portfolio** of
25+
experiments, benchmarks, comparisons, and promising directions—rather than a
26+
single definitive answer.
27+
- Based on Evan’s feedback, the upcoming **Google Form** will include both
28+
**commercial** and **open-source** model responses.
29+
- Initially, the plan was to pair small open-source SLMs with commercial models
30+
of similar sizes, but this was not feasible due to limited access. The team
31+
instead decided to **use accessible commercial LLMs (ChatGPT, Claude,
32+
Gemini).**
33+
- **All questions** across all categories will be used in the **Google form**.
34+
- Finalized model assignments:
35+
- **Aseel** → ChatGPT
36+
- **Caesar** → Claude Haiku 4.5
37+
- **Amro** → Gemini Pro 3
38+
- **Banu** → Gemini Fast (Flash 2.5)
39+
- **Reem** → Gemini Flash 2.5 Lite (via API/HuggingFace)
40+
41+
### Action Plan
42+
43+
- Each member will generate **responses for all question prompts** using their
44+
assigned model.
45+
- All responses must be uploaded to the
46+
[shared document](<https://docs.google.com/document/d/1CBYpsLvkeE5aLKp1-6vPaiiz>
47+
DXVf80o2uH42XL5gXtw/edit?tab=t.0) **by tomorrow**.
48+
- In tomorrow’s meeting, the team will review all open-source and commercial
49+
model outputs and **select the final answers** to include in the Google Form.
50+
51+
---
52+
53+
## Meeting 19
54+
55+
**Date:** November 20, 2025 (Thursday, 2:30 PM EST)
56+
**Attendees:** Amro, Aseel, Caesar, Reem, Safia
57+
58+
### Summary
59+
60+
- The team revisited the original plan for the evaluation form. Initially,
61+
**all 21 questions** across all task categories were intended to be included.
62+
- However, because a 21-question survey would be too long for participants, the
63+
group agreed to **select only two questions per category** to keep the form
64+
manageable.
65+
- During this selection process, the team decided to **exclude the
66+
Retrieval/RAG category** entirely, since its questions require factual lookup
67+
(dates, names, quantities), which does not align well with the survey’s goal
68+
of evaluating reasoning or generation quality.
69+
- As a result, the form will include **four categories**—Reasoning,
70+
Summarization, Creative Writing, Paraphrasing—with **two questions per
71+
category**, for a total of **8 questions**.
72+
_(Selected Q&A's can be found
73+
[here.](<https://docs.google.com/document/d/1CBYpsLvkeE5aLKp1-6vPaiizDXVf80o2uH>
74+
42XL5gXtw/edit?tab=t.ugqqnecewdh7))_
75+
- The group reaffirmed that **each task category will be represented by one
76+
model pair**, ensuring all models contribute to the study.
77+
- The team decided to **pair each open-source model with the closest commercial
78+
model** for comparative evaluation.
79+
- The final task–model pairings were confirmed as:
80+
- **Reasoning:** Gemma ↔ Claude Haiku 4.5
81+
- **Summarization:** LaMini ↔ Gemini Flash
82+
- **Creative Writing:** Mistral ↔ Gemini Pro 3
83+
- **Paraphrasing:** Qwen ↔ ChatGPT
84+
85+
### Action Plan
86+
87+
- Add the selected model responses to the **Google Form** initially created by
88+
Banu and finalize the form.
89+
- Submit the form to **Evan** for feedback, then incorporate any revisions.
90+
- **Publish** the finalized form to the cohort group and **collect responses
91+
until November 30**.
92+
93+
---
94+
95+
## Meeting 20
96+
97+
**Date:** November 25, 2025 (Tuesday, 2:30 PM EST)
98+
**Attendees:** Amro, Caesar, Reem, Banu
99+
100+
### Summary
101+
102+
- To discuss what is left while the form is still running, the team reviewed the
103+
remaining deliverables for both **ELO2/Graduation requirements** and **the
104+
project itself**.
105+
106+
1. For **ELO2 and graduation**, the deliverables were revisited and confirmed
107+
as:
108+
- Repository
109+
- Presentation
110+
- 1000-word final testimonial _(individual)_
111+
- 1000-word ELO2 retrospective _(individual)_
112+
- Exit Survey _(individual)_
113+
114+
2. For **Green AI project’s final outputs**, the deliverables were identified
115+
as:
116+
- Repository
117+
- Article
118+
- Presentation
119+
- Form analysis
120+
- The team also discussed the expected article format and structure.
121+
- The article should narrate the project process by explaining motivations and
122+
the overall journey, with roughly **5–10%** on initial ideas, most of the
123+
content focusing on the work done, and a concluding section with findings
124+
and potential future directions.
125+
- Reem volunteered to create an infographic or visual summary that can be used
126+
for both the article and the presentation.
127+
- It was also noted that **on November 28 (Friday)**, support may be requested
128+
from Evan to help boost form participation through an announcement.
129+
- Since the form was published later than anticipated, the team decided to
130+
**close it on December 2nd instead of November 30th.**
131+
132+
### Action Plan
133+
134+
- **Caesar and Reem** to begin drafting the article for team review.
135+
- **Amro** to work on repository updates and refine the main README draft
136+
prepared by Banu earlier.
137+
- **Reem** to create an infographic using a **visualization tool** to summarize
138+
project results for the article, presentation, and Medium or similar platforms
139+
_(after the form closes)_.
140+
- The survey form will be **closed on December 2nd**, after which data analysis
141+
will begin.
142+
- **Banu, Safia, and Aseel** to work on the presentation, building on the
143+
initial draft previously prepared by Banu.
144+
- An announcement request **may be sent to Evan on November 28 (Friday)** to
145+
encourage more form responses.
146+
147+
---
148+
149+
## Meeting 21
150+
151+
**Date:** December 2, 2025 (Tuesday, 12:30 PM EST)
152+
**Attendees:** Caesar, Reem, Aseel
153+
154+
### Summary
155+
156+
- The team discussed the status of the survey form and agreed to close it by the
157+
end of the day. After reviewing its current performance, they noted that most
158+
initial insights and demographics could already be observed through the Google
159+
Forms visualization tools, which provided a general overview of respondent
160+
characteristics and early trends.
161+
- The group revisited the remaining requirements and clarified what still needs
162+
to be completed for the final deliverables. A key part of the discussion
163+
focused on how the results will be structured and presented in both the
164+
article and the visual summary. This included considering how to best
165+
translate the survey findings into a clear narrative and an accompanying
166+
infographic or visualization.
167+
- The team confirmed that an additional meeting would be held on December 3rd to
168+
examine the survey results in more depth. During that session, they will
169+
identify any notable or unexpected findings that may require special emphasis
170+
or separate formats within the final outputs.
171+
172+
### Action Plan
173+
174+
- Close the survey form by end of day on December 2.
175+
- Begin outlining how survey results will be integrated into the article and
176+
visual materials.
177+
- Continue exploring the data using Google Forms visualizations to prepare for
178+
deeper analysis.
179+
- Meet again on December 3rd to review detailed results and determine standout
180+
findings or sections that require special formatting.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
<!-- markdownlint-disable MD024-->
2+
<!-- Disabled MD024 (Multiple headings with the same content) rule
3+
because repeated headings (Summary, Action Items) are
4+
intentionally used across multiple sections for structural clarity.
5+
-->
6+
# Milestone 5 Meeting Minutes
7+
8+
## Meeting 22
9+
10+
**Date:** December 4, 2025 (Thursday, 1:00 PM EST)
11+
**Attendees:** Amro, Aseel, Banu, Caesar, Safia
12+
13+
### Summary
14+
15+
- The team reviewed the overall project status, noting that only a few days remained
16+
and both the **presentation** and the **article** were close to completion.
17+
- The group discussed the structure of the final presentation, agreeing on a
18+
**category-based layout** (reasoning, summarization, creative generation, paraphrasing)
19+
supported by charts, visuals, and short explanations.
20+
- Aseel led the design direction, and the team confirmed a **road-themed visual style**
21+
for consistency across slides.
22+
- The group highlighted key research findings:
23+
- Open-source models performed surprisingly well in **reasoning** and
24+
**creative generation** tasks.
25+
- Commercial LLMs still hold advantages for **highly specialized or complex domains**.
26+
- Promising future directions include experimenting with larger parameter counts,
27+
fine-tuning methods, and evaluating performance under different context window
28+
sizes.
29+
30+
### Action Plan
31+
32+
- **Banu** will compile and document all experiment findings (questions, answers,
33+
charts) and upload the file to the GitHub repo, to the **'findings' folder**.
34+
- **Caesar** will continue to work on the **article and 'experiment' folder** of
35+
the repo.
36+
- **Amro** will finalize the **main README**.
37+
- **Aseel** will continue leading the presentation design and **finalize the slides**.
38+
- **Reem** will complete the **visual summaries** for the article and presentation.
39+
- **Safia** will work on the **conclusion slide.**
40+
- The team will meet again on **December 7** to review all components and
41+
finalize the project.
42+
43+
---

0 commit comments

Comments
 (0)