Your correspondent hasn’t written much about ChatGPT’s o1 models for a good reason. When they first came out, o1 mini and o1 preview were slow and provided unnecessarily long responses to prompts. They were, in a word – annoying.

Since the initial release, some things have changed, and we now have access to the full o1 model (no longer in preview) and even an O1 Pro model if you don’t mind coughing up $200/month. I do mind coughing up that money, so I have continued exploring some of the construction safety use-cases that the base o1 model excels at.

However, I still use Claude Sonnet 3.5 as my "daily driver" because I like its writing style and persona. It doesn't lecture me too much, and it has learned that I want its answers to be succinct and direct.

So then, where does o1 shine regarding construction safety? I’ve found at least a few areas.

Complex Regulatory Guidance

OpenAI o1 can digest and interpret complex regulations, providing step-by-step compliance advice tailored to unique job sites. In my jurisdiction, there are numerous regulations that I work within regularly that can be difficult to interpret and apply. One such Regulation describes the minimum levels of first aid required at a workplace. I put o1 up against 4o with the following prompt:

 “Tell me my minimum first aid requirements given the following:

  • My site has 200 workers
  • The project is high high-hazard
  • Some places are less accessible in my workplace
  • My workplace is not remote

Here is the Regulation that applies to this scenario: [Entire section of Regulation pasted].”

o1 did an excellent job of arriving at the correct answer, which involved determining the class of my workplace – a function of its remoteness and accessibility – and then referencing a specific table in the Regulation. Once in the table, the number of workers and hazard rating determined which cell the requirements would be in. o1 had no problem with this task and provided the correct answer and a summary of its reasoning.

Similarly, 4o arrived at the correct table and cell; however, it hallucinated by adding that the first aid transportation must be capable of transporting two workers. I find this hallucination baffling because 4o found the correct table but added a requirement that didn’t exist. When questioned, 4o stood by its original answer. Only after I said that I checked the table, and no such requirement exists, did it correct itself. It is a mark in favor of o1 and a reminder that regardless of the model, we still can’t trust them unquestioningly.

Mathematics

OpenAI touts o1 for its superior coding and mathematical abilities. When it comes to complex calculations, o1 really shows its strengths. I tested both models with a challenging vapor pressure problem I found in a Certified Industrial Hygienist (CIH) practice exam. The problem required applying Raoult's Law, which helps us understand how a solute's presence affects a solvent's vapor pressure.

Both models provided detailed, step-by-step solutions, but o1's approach was notably more structured and precise. It began by clearly restating the problem and organizing the given data in a table format, immediately clarifying the problem.

The final answers differed significantly - o1 calculated 5.1% benzene concentration in air, while 4o arrived at 51.65%. This significant discrepancy highlights errors in 4o's approach to these problems. This example illustrates why o1 might be the better choice when dealing with complex calculations in safety-critical situations, such as determining exposure limits or analyzing hazardous material concentrations. Again, and at the risk of sounding like a broken record, I would still never wholly trust any large language model with solving safety critical problems on its own. It remains an assistant that requires you (the expert) at the controls.

Conclusion

So, where does this leave us? While o1 clearly shows advantages in regulatory interpretation and mathematical calculations, it will not necessarily be the most effective for safety professionals. o1 excels in specific niches, particularly when dealing with complex regulatory frameworks or calculations that require precise, step-by-step analysis. However, models like Claude Sonnet 3.5 provide superior writing ability and solid reasoning skills for day-to-day safety management tasks.

The key takeaway isn't about which model is "better" - it's about understanding when to deploy specific tools for specific tasks. If you're working through complex exposure calculations or need to interpret multi-layered regulations, o1 might be worth considering. But remember - whether you're using o1, 4o, or any other AI model, these tools are assistants, not replacements for professional judgment.

Ultimately, the best approach might be to maintain access to multiple models and use each where it shows specific strength. Don't expect any of them to be your safety fairy—that's still your job.