Emma Hooper, Author at AEC Magazine

AI: Information Integrity

Emma Hooper — Wed, 16 Apr 2025 05:00:12 +0000

As AI reshapes how we engage with information, Emma Hooper, head of information management strategy at RLB Digital, explores how we can refine large language models to improve accuracy, reduce bias, and uphold data integrity — without losing the essential human skill of critical thinking

In a world where AI is becoming an increasingly integral part of our everyday lives, the potential benefits are immense. However, as someone with a background in technology — having spent my career producing, managing or thinking about information — I continue to contemplate how AI will alter our relationship with information and how the integrity and quality of data will be managed.

Understanding LLMs

AI is a broad field focused on simulating human intelligence, enabling machines to learn from examples and apply this learning to new situations. As we delve deeper into its sub-types, we become more detached from the inner workings of these models, and the statistical patterns they use become increasingly complex. This is particularly relevant with large language models (LLMs), which generate new content based on training data and user instructions (prompts).

A large language model (LLM) uses a transformer model, that is a specific type of neural network. These models learn patterns and connections from words or phrases, so the more examples they are fed, the more accurate they become. Consequently, they require vast amounts of data and significant computational power, which puts considerable pressure on the environment. These models power tools such as ChatGPT, Gemini, and Claude.

Find this article plus many more in the March / April 2025 Edition of AEC Magazine

Subscribe FREE here

The case of DeepSeek-R1

DeepSeek-R1 which has recently been in the news, demonstrates how constraints can drive innovation through good old-fashioned problem-solving. This open-source LLM uses rule-based reinforcement learning, making it cheaper and less compute-intensive to train compared to more established models.

However, since it is an LLM it still faces limitations in output quality. However, when it comes to accuracy, LLMs are statistical models that operate based on probabilities. Therefore, their responses are limited to what they’ve been trained on. They perform well when operating within their dataset, but if there are gaps or they go out of scope, inaccuracies or hallucinations can occur.

Inaccurate information is problematic when reliability is crucial, but trust in quality isn’t the only issue. General LLMs are trained on internet content, but much domain-specific knowledge isn’t captured online or is behind downloads/paywalls, so we’re missing out on a significant chunk of knowledge.

Training LLMs: the built environment

Training LLMs is resource-intensive and requires vast amounts of data. However, data sharing in the built environment is limited, and ownership is often debated. This raises several questions in my mind: Where does the training data come from? Do trainers have permission to use it? How can organisations ensure their models’ outputs are interoperable? Are SMEs disadvantaged due to limited data access? How can we reduce bias from proprietary terminology and data structures? Will the vast variation hinder the ability to spot correct patterns?

With my information manager hat on, without proper application and understanding it’s not just rubbish in and rubbish out, it’s rubbish out on a huge scale that is all artificial and completely overwhelms us.

How do we improve the use of LLMs?

There are techniques such as Retrieval Augmented Generation (RAG), that use vector databases to retrieve relevant information from a specific knowledge base. This information is used within the LLM prompt to provide outputs that are much more relevant and up to date. Having more control over the knowledge base ensures the sources are known and reliable.

This leads to an improvement, but the machine still doesn’t fully understand what it’s being asked. By introducing more context and meaning, we might achieve better outputs. This is where returning to information science and using knowledge graphs can help.

A knowledge graph is a collection of interlinked descriptions of things or concepts. It uses a graph-structured data model within a database to create connections – a web of facts. These graphs link many ideas into a cohesive whole, allowing computers to understand real world relationships much more quickly. They are underpinned by ontologies, which provide a domain-focused framework to give formal meaning. This meaning, or semantics, is key. The ontology organises information by defining relationships and concepts to help with reasoning and inference.

Knowledge graphs enhance the RAG process by providing structured information with defined relationships, creating more context-enriched prompts. Organisations across various industries are exploring how to integrate knowledge graphs into their enterprise data strategies. So much so they even made it onto the Gartner Hype Cycle on the slope of enlightenment.

The need for critical thinking

From an industry perspective, semantics is not just where the magic lies for AI; it is also crucial for sorting out the information chaos in the industry. The tools discussed can improve LLMs, but the results still depend on a backbone of good information management. This includes having strategies in place to ensure information meets the needs of its original purpose and implementing strong assurance processes to provide governance.

Therefore, before we move too far ahead, I believe it’s crucial for the industry to return to the theory and roots of information science. By understanding this, we can lay strong foundations that all stakeholders can work from, providing a common starting point and a sound base to meet AI halfway and derive the most value from it.

Above all it’s important to not lose sight that this begins and ends with people and one of the greatest things we can ever do is to think critically and keep questioning!

The post AI: Information Integrity appeared first on AEC Magazine.

IFC: what is it and why is it needed?

Emma Hooper — Mon, 11 Jul 2022 14:56:30 +0000

Article #3 of 8 from AEC Magazine’s IFC Special Report

Emma Hooper, Associate Director and Head of R&D at Bond Bryan Digital, provides a useful overview of the IFC data model specification

Over the course of a facility’s life, information is created and goes on a journey in which it is constantly exchanged by people using technology.

From the initial idea to construct a building to the deletion of this asset from a map following its demolition, a building creates a trail of information that follows it from cradle to grave.

This trail is invisible. Some call it a ‘golden thread’. I prefer to call it an ‘information layer’, which forms part of an information management ecosystem. But whatever you call it, this trail is currently fragmented and, quite frankly, a mess.

The purpose of information management is to view information as an asset in its own right. To get the full value from information, it must be rationalised and joined up – both processes entirely separate from software.

Two layers at work

The information management ecosystem is made up of two layers. First, there’s the management layer, which includes recurring cycles of information management activities, based on appointments. This is covered by ISO 19650.

Second, there’s the information layer, where the complexity of the different facets of information are broken down, structured, ordered and joined up, in order to provide a base data language for the activities in the management layer outlined above and for the technology to plug into.

The information layer is complex. There is no escaping that. Try describing one component in a facility: its type, performance, materials, location, name and all the other data related to it, plus the data about the data. And that’s just one component.

Now, multiply this to cover tens of thousands or millions of components and how they all connect to one another. The task is utterly mind-blowing in its complexity! So, the only way we can produce connected, machine-interpretable data is to use data models as part of the information layer.

What is a data model, anyway?

Essentially, a data model is a way of structuring and joining up data. It creates order and enables complex connections to be made. A data model is not a BIM model in the traditional sense, and it doesn’t have to contain geometry.

But we also need a standardised data model to provide a single data language throughout, otherwise we quickly encounter interoperability issues. Do we have something already? We do! It’s called Industry Foundation Classes, or IFC.

IFC is an off-the-shelf data model specification. It is managed by buildingSMART International (see buildingSMART article) and is an international standard, ISO 16739.

IFC provides a data framework for most of the parts of the AEC industry, allowing information to be connected. For example, a boiler might be connected to a pipe and associated with a particular system, along with the space and building in which it is located, a construction programme, commissioning certificates, performance properties, a cost plan, classification and so on. In fact, I could go on and on. What’s important is that there is nothing in the industry, besides IFC, that can accomplish so much in terms of connecting information across so many domains.

IFC is a digital representation of a built asset for a computer to understand.

Why do we need IFC?

Each proprietary software application has its own data model running in the background. These are typically packaged up in custom file formats for exchange purposes.

But these data models are bespoke and often poorly created, with the sole objective of serving the software. Therefore, when we exchange data between software packages, we run into interoperability issues, because these packages speak different languages. If software packages can read and write to a standard data model, they only have to create the mapping once, rather than a point-to-point solution for every permutation of software exchange.

It’s also not just delivery and the exchange of design information where IFC can play a part. Going back to the information management ecosystem, IFC is at the heart of the information layer as the standardised data model. Therefore, it can be used to provide data foundations to underpin ISO 19650 activities. IFC can be used to structure exchange information requirements, deliver them and assure the delivered data against the original requirements and store data during and after the project (see Gen Zero project article).

Because the data model is so big, it has to be broken down to exchange information. This is all done using filtered parts of the IFC schema called model view definitions. This approach is being redeveloped by buildingSMART to make it more flexible using information delivery specifications, or IDS.

The more we digitise, the more data models organisations will create. If we don’t have a standardised starting point for these, they will be structured in completely different ways and, as a result, sharing information between them will be as difficult as it is now between authoring software, just on a much bigger scale. Technology will not provide a magic solution!

IFC basics

The IFC standard is free and can be accessed via the buildingSMART website. There are currently two official versions:

IFC2x3 TC1 (IFC2x3) – this is aligned to ISO 16739:2005.
IFC4 ADD2 TC1 (IFC4) – this is aligned to ISO 16739-1:2018.

IFC2x3 is the predominant version used in the UK. However IFC4 implementation within software has recently accelerated and, together with the proposed release of IFC4.3 in 2022/2023, we need as an industry to start the transition to IFC 4.3 in the next year (see IFC 4.3 article).

Communication of the data model is carried out using a schema. This provides a data modelling language to represent a data model often in a graphical way, enabling a viewer to see what the data model contains and work out which parts are connected.

IFC can be visualised using several schemas. Currently, the principal one is EXPRESS-G, but the plan is to move to UML (unified modelling language) in IFC5.

On top of this, when transferring data from a data model, you need an exchange format to transport it. IFC typically uses the STEP physical format (SPF) which is text-based. (Because it has the ‘.ifc’ file extension, this has led to the misconception that IFC is just a file format.) Being text-based means that model files can be opened using a standard text editor such as Notepad.

Other exchange formats include XML and JSON and there are others in development. These include RDF/XML,Turtle and JSONLD, where the emphasis is less on exchanging files and more on exchanging the data.

IFC data model composition

In simple terms, IFC is made up of three parts: entities, attributes and relationships.

Entities are the main classes and, in the data model, act like nodes. In other words, it’s the entities that get connected. Most entities can be considered as objects – not just physicalbased objects such as walls and boilers, but also objects such as geometry, processes, properties, materials and so on. This means there is potential to perform cost schedules, resource planning and construction using IFC.

IFC representation of a boiler

A particularly important entity is IfcBuildingElementProxy, which can be used where there is no appropriate entity. This acts like a template entity, identifying all the appropriate attributes and relationships. There is also the ability here to define the object further (see section below on predefined types).

Attributes define entities further by including basic data such as ‘name’, ‘description’ and ‘globalID’. Attributes also allow connections to be made to other entities by acting like hooks.

Relationships connect entities via attributes, and in the IFC schema, are objects themselves. It is the relationships that are key and will become even more important as we move into a more connected future.

Watch Emma Hooper’s NXT BLD 2022 talk on Information Models and the future of IFC. Register here

Predefined types, properties & external references

There are a few more terms with which users need to familiarise themselves.

For example, one important attribute is the predefined type. This allows an entity to be described further; for example, for IfcSanitaryTerminalType, predefined types include TOILETPAN, SINK, WASHHANDBASIN and so on. These are listed in capitals, just as they are on the predefined pick-list.

The USERDEFINED predefined type should be used only where there is no appropriate predefined type.

USERDEFINED still needs to be entered at the predefined type, but the entity can be defined further by using the ElementType or ObjectType attribute.

IFC also enables properties to be associated with objects. Before the association can take place, the property has to be assigned to a property set. A property set is a container of properties that have something in common; within the IFC schema, property sets are characterised using the ‘Pset_’ prefix.

Custom properties can also be added using custom property sets, but it is first important to check that the properties don’t already exist in industry dictionaries or lexicons.

Finally, let’s look at external references. IFC recognises that not all information will be captured within IFC models, so it also has the ability to associate externally referenced sources of information to IFC objects. The three external references are:

Classification, which allows classification systems such as Uniclass to be associated to objects.
Libraries, which allow data from external databases to be associated to objects (for example, product data manufacturers).
Documents, which allow documents to be associated with objects (for example, a commissioning certificate can be associated with a boiler).

In summary, I would not claim that IFC is perfect – but as an industry, we need to team up and help to support, improve and evolve IFC across an ever-changing digital landscape. Those working in the digital information space need to know the basics. But the majority of people shouldn’t even know it’s there, because it operates seamlessly in the background. In fact, I’d go as far as to say that the more I understand IFC, the more I come to think of it as one of the greatest achievements in digital construction.

IFC’s benefits

Free, off-the-shelf and ready to be used for almost any purpose, IFC brings big benefits. These include its ability to:
Provide the data framework for information management activities • Enable repeatable processes and software configurations during delivery
Deliver longevity and sustainability of data
Side-step intellectual property issues and vendor lock-in
Support more complex querying, via relationships, providing better insight for decision making
Provide easier connection to external data sets via standardisation, for complex use cases like smart cities
Accelerate advancements like machine learning

Click here for more information about buildingSmart UK & Ireland.

This article is part of AEC Magazine’s

IFC Special Report – Enabling interoperability in the AEC industry.

To read the other articles in this report click on the links below.

Industry convergence
From sustainability to new business models, and from wellness to emerging technologies, IFC can be a force for good, driving the AEC industry to new levels of achievement

Inside buildingSMART
What is buildingSMART and what can it offer industry practitioners?

IFC for Infrastructure
Perhaps the most significant update to the IFC standard is the inclusion of extensions for infrastructure entities in IFC 4.3

Native OpenBIM, and the rise of open source in AEC
OpenBIM can deliver on the promise of a digital world for the built environment where information and data are truly valued

IFC at Hinkley Point C
By Tim Davies, digital engineering manager, BYLOR JV – Hinkley Point C

Tackling the Gen Zero Project
The UK Department for Education’s Gen Zero project showcases how IFC can be used as the underlying data standard for a large, complex project, from start to finish

buildingSMART certification
By Phil Read, program lead at bSUKI and managing director, Man and Machine

The post IFC: what is it and why is it needed? appeared first on AEC Magazine.

IFC for infrastructure

Emma Hooper — Mon, 11 Jul 2022 13:42:11 +0000

Article #4 of 8 from AEC Magazine’s IFC Special Report

Perhaps the most significant update to the IFC standard is the inclusion of extensions for infrastructure entities in IFC 4.3, as Emma Hooper, associate director and head of R&D at Bond Bryan Digital, explains

IFC has received many updates over the years. This year sees the finalisation of the much-anticipated IFC version 4.3, a major update of the IFC4 schema.

It’s a significant milestone in the history of IFC and has been a huge team effort, involving many countries and organisations. Highlights of the updated standard include:

A more agile process and, for the enduser, full transparency of the live schema through development;
Updated IFC documentation (found online), with much clearer definitions and a new search function;
The inclusion of extensions for infrastructure entities – the most significant update and the main focus of this article.

From an infrastructure perspective, IFC provides that standardised digital language to be used throughout the facility’s lifespan, as it already does for buildings. This will help to reduce the variation in conventions that currently exists across the globe (for example, in rail alignment).

Ultimately, it will mean wider collaboration and knowledge-sharing, particularly for cross-border projects. It gives a standardised method for information exchange and managing processes.

Despite the common perception that infrastructure is just ‘a building on its side’, it really isn’t. It’s so much more. Infrastructure, in fact, is what joins up the vertical world of buildings.

There is also a big focus in IFC 4.3 on the integration between IFC and open standards such as GIS (geographic information system).

Previous versions of the IFC schema could be used for infrastructure projects to a certain extent. IfcBuildingElementProxy could be used in lieu of any predefined entities, for example. However, fundamental updates to the schema were needed to make it more infrastructure-inclusive. These include alignment, entities with specific relationships, and a review of the overall hierarchy which was previously very building-focused.

ODELS courtesy of Autodesk

Work to date

IFC Alignment was critical to establish early on in the journey, in order to extend IFC into the infrastructure sector, enabling linear definition of horizontal assets, such as the centreline of a road, the kerbline or rail track. This allows offsets for associated assets to be defined, as not all positioning in infrastructure is carried out using Cartesian (x, y, z) coordinates. For example, an engineer can place street furniture such as a road sign a set distance to the right of the centreline of the road rather than giving the coordinates. Should the road profile move, the sign (and other elements such as lighting and barriers) can subsequently be repositioned according to that offset, and not by calculation of new cartesian coordinates.

Similarly, the use of an alignment definition helps where linear and vertical constructions intersect; for example, a road/railway bridge. The bridge construction is analogous to a building, where there are retaining walls and beams to support the deck – but all of this has to follow the profile of the road/rail that it is supporting. If engineers decide to move the road/rail, then through use of a shared alignment, the bridge moves to meet the new position of the road/rail.

IfcAlignment has been substantially updated during the development of IFC 4.3 from its early 4.1 release to reflect new considerations such as cant, segment, horizontal and vertical alignment (see box, Further Information).

The Infrastructure Room led a series of further collaborative projects involving industry specialists, owner representatives, software providers and buildingSMART experts. During the early definition of requirements for infrastructure extensions IFC rail was identified as a substantial domain and due to large engagement from the owners was spun off into a separate Railway room. Significant work also took place between the Infrastructure and Railway rooms, focused on defining common schema elements such as embankments or drainage.

Figure 1 – IFC extensions

A final necessity was to also modify the vertical extensions. One such update is that IfcBuilding has been moved down the hierarchy and replaced by the new entity IfcFacility, as seen in figure 1.

Only the start

This is only the start, and the work will keep developing. Further work on the IFC Tunnel project is underway, as well as work to improve the interoperability between IFC and geotechnics to make use of the XML data from geotechnic models, in particular OGC Geoscience Markup Language (GeoSciML). There is also work underway on aligning properties and feeding into data dictionaries and developing more extensions.

In order for IFC 4.3 to be finalised as a builidngSMART standard (hopefully by Autumn 2022), a final project is underway for the development of the base MVDs (model view definitions) which enable the IFC schema to fulfil defined exchanges, to enable consistent implementation within software and subsequently for software to be certified against the schema.

Software vendors have been part of the IFC 4.3 process throughout; for example, Autodesk Civil 3D has beta support for IFC4.3 import and export available, which can be updated to meet agreed exchanges once the MVD project is finalised and certified accordingly.

A further dependency necessary before clients can confidently specify IFC 4.3 for project information exchanges is for it to complete its ISO 16739 approval. This process is currently underway and is expected to be completed in 2023.

The IFC 4.3 series of extensions has been a huge collaborative effort that has led to some owners in Europe and China preparing to adopt it.

For example, in the European rail sector, the likes of ÖBB of Austria and SBB of Switzerland have been very active, as has the China Rail BIM Institute, where there is a plan to complete 30,000km of high-speed rail investment in a timescale an order of magnitude faster than has been achieved before. By adopting standardisation of rail element definitions, which IFC 4.3 provides, these owners and their projects can reduce risk through greater consistency and reliable exchange of information.

There is a huge opportunity in the UK to utilise IFC4.3 for infrastructure. To date, IFC 2×3 has been successfully trialled on larger infrastructure projects, such as Hinckley Point C and HS2. With the introduction of infrastructure-specific definitions, these pilot benefits can increase substantially. In fact, due to how infrastructure projects are procured using alliances and frameworks, there is the opportunity for the infrastructure community to really embrace IFC and take it to new levels.

Over the next 18 months, the official release of IFC 4.3 and its ISO certification will be the catalyst the industry needs to move from trials with the IFC 2×3 schema to infrastructure helping to lead the way. BuildingSMART UK&I sees itself as fundamental to helping the UK and Ireland in this transition.

Acknowledgement: Author Emma Hooper would like to thank Marek Suchocki, global business development executive at Autodesk and Lawrence Chapman, lead information manager on HS2, for their input on this article.

Watch Emma Hooper’s NXT BLD 2022 talk on Information Models and the future of IFC. Register here

The IFC timeline

The following dates indicate roughly when work was finished, but it’s worth remembering that each task took many years of hard work to complete.

2011 – IFC for infrastructure project is conceived

2013 – BuildingSMART InfraRoom is established

2015 – IFC Alignment is developed and published as IFC 4.1

2016 – Collaboration with Open Geospatial Consortium (OCG) brings alignment between the IFC and GIS schemas (in particular OGC LandInfra / InfraGML)

2017 – IFC Alignment is updated as IFC 4.1 v1.1 2017 – Work is undertaken to update parts of the existing IFC schema that share common definitions with infrastructure

2017 – BuildingSMART Railway Room is established

2018 – A common schema is established to harmonise IFC 4.3 infrastructure extensions

2022 – IFC extensions are developed for IFC 4.3, including IFC Rail, IFC Road, IFC Bridge and IFC Ports & Waterways (IFC Tunnel IFC 4.4 extension is currently in progress)

2022 – BuilidngSMART International final IFC 4.3 standard is established

2023 – ISO 16739 release 3 status expected for IFC 4.3

Further information

buildingSMART InfraRoom

buildingSMART Railway Room

If you would like to understand more about the technical aspects of IFC for infrastructure, this whitepaper is also useful.