GenAI Studio: News, Tools, and Teaching & Learning FAQs
These sixty minute, weekly sessions – facilitated by Technologists and Pedagogy Experts from the CTLT – are designed for faculty and staff at UBC who are using, or thinking about using, Generative AI tools as part of their teaching, researching, or daily work. Each week we discuss the news of the week, highlight a specific tool for use within teaching and learning, and then hold a question and answer session for attendees.
They run on Zoom every Wednesday from 1pm – 2pm and you can register for upcoming events on the CTLT Events Website.
News of the Week
Each week we discuss several new items that happened in the Generative AI space over the past 7 days. There’s usually a flood of new AI-adjacent news every week – as this industry is moving so fast – so we highlight news articles which are relevant to the UBC community.
In this week’s tech news, Microsoft unveiled Magnetic-One, a multi-agent AI system designed to coordinate tasks across multiple specialized agents. Epoch AI introduced a new math benchmark, FrontierMath, which challenges both LLMs and Math PhDs with its high level of difficulty. Mistral AI launched a moderation API to identify and filter undesirable content. A growing trend shows researchers increasingly opting for local AI models over larger, web-based LLMs. In the medical field, AI systems achieved competitive accuracy in grading OSCE tests, while a new Slack survey reveals that global workforce excitement for AI is plateauing, despite strong investment interest from executives. Repomix, a new tool, enables users to package code repositories into AI-friendly file formats. A post by Jacob Kaplan-Moss discussed ethical AI use in the public sector, emphasizing responsible and fair applications. Finally, Decart released Oasis, an open-world video game entirely generated by AI, with gameplay that mimics Minecraft.
Here’s this week’s news:
Magentic-One’s Multi-Agent System Launches
Microsoft has released Magentic-One, a powerful multi-agent AI system designed to handle complex tasks by coordinating multiple specialized agents. With tools like file and web surfing agents, coders, and a coordinator to manage interactions, Magentic-One represents a shift toward distributed, task-specific models over large, monolithic AI systems, hinting at the potential future of the direction of AI systems.
New Benchmark Highlights Math Struggles for AIs and PhDs
A recent benchmark test released by Epoch AI shows large language models (LLMs) struggle significantly with complex math, as many leading models score less than two percent on the test. This highlights limitations as they optimize for language tasks rather than specialized fields like mathematics. This benchmark also challenged PhD holders, suggesting its high difficulty level for both humans and models alike. By having unpublished questions to avoid data contamination, this benchmark aims to act as an effective standard for generalized learning of large language models.
Mistral’s Moderation API for Enhanced Content Filtering
Mistral.ai has introduced a moderation API based on a LLM classifier model, tailored to detect and filter harmful content using smaller language models fine-tuned for specific content categories. The API excels in filtering content like PII, hate speech, and self-harm references, showing potential as a powerful moderation tool, especially in educational contexts.
Read more about Moderation API here.
Researchers Opt for Local AIs to Enhance Privacy
Researchers and tech enthusiasts are increasingly adopting local versions of large language models (LLMs), bypassing traditional web-based AI tools like ChatGPT for more privacy, cost-effectiveness, and control. Researchers use these smaller, open-weight LLMs, such as Mistral and Llama, on personal devices to streamline tasks, from summarizing medical records to annotating genetic data, while ensuring data privacy and reproducibility. This shift is supported by advancements in consumer-grade models that now rival the capabilities of larger, cloud-based AI models.
Large Language Models for Medical OSCE Assessment
Researchers explored the use of large language models (LLMs) for assessing Objective Structured Clinical Examinations (OSCEs) focused on medical student communication. By analyzing over 2,000 OSCE transcripts, they found that models like GPT-4 closely aligned with human graders, achieving high accuracy in evaluating students’ ability to summarize medical histories. The study suggests that LLMs could streamline and enhance the OSCE grading process while reducing costs, with open-source models also demonstrating strong potential for widespread, privacy-preserving applications in medical education.
Slack Survey Shows Cooling AI Enthusiasm in the Workforce
According to Slack’s Fall 2024 Workforce Index, AI adoption in the U.S. workforce is widespread but faces skepticism. While 99% of executives are investing in AI, only about 30% of employees report using it daily, and many harbor concerns. Around half desk workers worry that relying on AI could be seen as being lazy, diminishing competence, or be perceived as “cheating.” The findings suggest a gap between executive ambitions and practical adoption, highlighting the need for leadership to align AI implementations with realistic workforce expectations.
Repomix: AI-Friendly Repository Packager
Repomix is a tool designed to streamline codebase preparation for large language models (LLMs) like ChatGPT, Claude, and Gemini by packaging entire repositories into AI-optimized files. The tool simplifies token counting, configuration, and selective file inclusion, making it easier to work within LLM token limits. Originally called Repopack, Repomix is geared towards developers who need efficient ways to input complex codebases into AI applications.
Check out the Repomix Github repository.
Ethical Use of AI in the Public Sector
In this post, Jacob Kaplan-Moss discusses ethical considerations for AI in the public sector, arguing that AI should enhance rather than replace human judgment. Building off Arvind Narayanan’s work, he breaks down AI into three main applications: perception (e.g., facial recognition), judgment (e.g., automated grading), and generative AI, while adding a fourth category of generative algorithms. He warns against predictive uses, such as predicting crime or social outcomes, due to inherent biases. Kaplan-Moss advocates for “assistive AI,” where technology aids human decision-making, over “automated AI,” which could risk fairness and accountability in public services.
Oasis: The First Entirely AI-Generated Video Game
Decart, has launched Oasis, a game entirely generated by AI with no underlying code, using a transformer model trained on Minecraft gameplay data. Unlike traditional games, Oasis builds each frame based on the previous one and user interactions, resulting in a surreal, continuously evolving world. Though the AI-generated approach makes it fascinating, experts view Oasis as an experimental leap rather than a conventional gaming experience, hinting at future applications of generative AI in game design.
Read the article here, and try Oasis here.
Tool of the Week
Tool of the Week: Llama OCR
What is Llama OCR?
Llama OCR is a tool designed to extract data from non-structured documents, such as handwritten notes, smudged labels, or irregularly formatted PDFs, by converting them into a markdown-friendly format. This technology represents an advancement in document processing, especially for hard-to-read formats.
How is it used?
Llama OCR works by combining a trained model with tools that can read diverse and challenging document types. Users can feed scanned documents or images to Llama OCR, which interprets and structures the data, making it easily accessible and editable for further analysis or storage.
What is it used for?
This tool is particularly valuable in scenarios where data is “trapped” in unconventional formats, such as medical records, legal paperwork, or older documents. By transforming these into accessible data, Llama OCR aids organizations in capturing insights from previously inaccessible information.
For additional information, explore Llama OCR and its npm library.
Without a PIA, instructors cannot require students use the tool or service without providing alternatives that do not require use of student private information
Questions and Answers
Each studio ends with a question and answer session whereby attendees can ask questions of the pedagogy experts and technologists who facilitate the sessions. We have published a full FAQ section on this site. If you have other questions about GenAI usage, please get in touch.
-
Assessment Design using Generative AI
Generative AI is reshaping assessment design, requiring faculty to adapt assignments to maintain academic integrity. The GENAI Assessment Scale guides AI use in coursework, from study aids to full collaboration, helping educators create assessments that balance AI integration with skill development, fostering critical thinking and fairness in learning.
-
How can I use GenAI in my course?
In education, the integration of GenAI offers a multitude of applications within your courses. Presented is a detailed table categorizing various use cases, outlining the specific roles they play, their pedagogical benefits, and potential risks associated with their implementation. A Complete Breakdown of each use case and the original image can be found here. At […]