Benchmarking of gpt-3.5-turbo compliance breach predictions #111

ojus1 · 2023-06-01T02:24:44Z

I've been playing around with ChatGPT (api) with many variations of prompts. It is spitting out complete nonsense reasoning and breach predictions. Has any benchmarking been done for the compliance breach prediction pipeline? If so, I would love to look at the documentation.

@harshithere

harshithere · 2023-06-01T09:08:23Z

Short answer: Nope, no benchmarking done
The consensus of using this comes from the prototype breach reports that @lepisma was generating last month. The idea was that we want a system that provides some mechanism which is better than the current method (randomly looking through calls or relying on client feedback). Thus large number of FPs is not a concern. Providing baseline accuracy numbers and tuning this system for better performance were kept as future tasks.
Nonetheless, I'm curious what is 'completely nonsense' reasoning. Lets schedule a call for that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking of gpt-3.5-turbo compliance breach predictions #111

Benchmarking of gpt-3.5-turbo compliance breach predictions #111

ojus1 commented Jun 1, 2023

harshithere commented Jun 1, 2023

Benchmarking of gpt-3.5-turbo compliance breach predictions #111

Benchmarking of gpt-3.5-turbo compliance breach predictions #111

Comments

ojus1 commented Jun 1, 2023

harshithere commented Jun 1, 2023