Automated Large Language Model Evaluation & APLCOLLAB

Posted Aug 20, 2023

By Thomas Armstrong

1 min read

While at JHU APL, I developed a communication script between GPT 3.5 Turbo and Dolly 12B or Stable Vicuna 13B allowing them to engage in a directed dialog designed to eliminate manual testing of large language models by 15 staff, saving hundreds of hours of manual effort.

I also built a standardized machine learning environment called APLCOLLAB that includes PyTorch and TensorFlow using Docker for use by all APL employees. I established a thorough dynamic Gitlab CI/CD pipeline that ensures stability of the environment across a variety of GPUs.

Research, Internship

This post is licensed under CC BY 4.0 by the author.

Automated Large Language Model Evaluation & APLCOLLAB

Further Reading

Analyzing Moral Foundations From Text Using Transformer-Based Models

Transformer Models for Network Traffic Analysis

Using Large Language Models for Targeted Phishing Attacks