Computing scholar seeks to reverse chaos caused by fake images, video

Dr Sabrina Caldwell is in battle mode against deceptive images and deepfakes, and more specifically, the negative impacts that image manipulation and artificial intelligence generated content (AIGC) have on the public.

“AIGC and other manipulated images can be standardised so that people can interact with them without being deceived,” she said.

As a computer scientist with an enduring mission to serve the public good, she co-chairs the Joint Photographic Experts Group (JPEG) Trust group, an international team developing a standard for identifying the trust indicators of media and then embedding into images symbols that identify them as authentic photographs, as fakes, or something in between.

We spoke to her about that mission, its status, and how she became aware of its dire necessity.

ANU CECC

You are playing a pivotal role in an emerging domain of mounting importance to the world. What was the path led you to that role?

Dr Sabrina Caldwell

My specific discipline is human/computer interaction, but my real focus is on how we use the computing power at our disposal to support a healthy knowledge economy. When I was doing my first PhD, I realised that we would soon be in danger of losing the ability to trust the images we were looking at.

It seems ludicrous to say it now because everybody knows that images are often untrustworthy. However, back then [2006], this was not the case. Images were trustworthy unless they were in an advertisement, for instance.

ANU CECC

Before the term “Photoshopped.”

Dr Sabrina Caldwell

Yeah, well, Photoshop was around then.

ANU CECC

But before it was a verb.

Dr Sabrina Caldwell

Yeah, exactly. I was very concerned about Photoshopped images. There was a particular photo that specifically concerned me, a photograph by Adnan Hajj of a bombing in Beirut. He had used Photoshop to multiply the plumes of smoke across the city. And it was released by Reuters. It was at that moment that — and I know it sounds quite dramatic — I realised that Photoshopped images could be a matter of life and death.

If a government took a military decision based on a photograph that misrepresents what is happening, people could die. And since then, I have heard of situations where people have in fact died because of manipulated photographs.

ANU CECC

Five people died at the United States Capitol because of a big lie.

Dr Sabrina Caldwell

Yes. Deception at scale is a real problem. We’re living in a world where people believe whatever they want to believe because they’re not sure what’s real. Our knowledge artefacts are not as robust as they used to be.

ANU CECC

Where were you when this epiphany hit you?

Dr Sabrina Caldwell

I was here at ANU at the College of Social Sciences doing my PhD on the impact of technology on authors and artists. I then came to the School of Computing to do my second PhD, which was on image credibility.

I needed to identify what the technological drivers were for this problem, and then look for technological solutions. I thought perhaps I could help build rigour into the knowledge artefact machine we all use.

ANU CECC

Well, we do all use JPEG.

Dr Sabrina Caldwell

Yeah. It’s the most widely used image format.

ANU CECC

Not everyone knows JPEG is also an organisation.

Dr Sabrina Caldwell

Yes. The Joint Photographic Experts Group, which is associated with Standards Australia and the International Standards Organisation (ISO). JPEG identifies the standards and criteria to facilitate interoperability between different companies in different countries. They support things like lossless compression, for example.

And recently, we were then given a charter by JPEG to say, okay, we recognise that this manipulated images problem is a real threat, we need to move forward, we need to have standards. And so now we are developing the JPEG Trust International Standard.

ANU CECC

If you fix it for JPEG, what about the formats that might not have fixed it? Are you hoping they will follow JPEG’s example?

Dr Sabrina Caldwell

We need to establish a beachhead saying: In this zone, we can trust what we are looking at. Here, AIGC and other manipulated images can be standardised so that people can interact with them without being deceived. After that, hopefully the safe zone can expand.

ANU CECC

But this is not easy, as your previous work shows.

Dr Sabrina Caldwell

Right. I did research that investigated whether people could discern if they were looking at manipulated images. The answer was no, they can’t, unless there’s something extraordinarily obvious. Even then, not everybody picks it up — things like a cow sitting on a car.

Bizarre things that nobody should believe —some people will believe them. And if it looks even a little bit reasonable, most people will.

ANU CECC

What year was that?

Dr Sabrina Caldwell

That was 2015. We did eye- tracking on the subjects. When viewing photographs, we found that there are two things that people look at. One of them is luminance. So, if there is something bright in the image, people see that first.

The second, or simultaneous, thing that people see is the semantics of the image. What does this picture mean? Not is it authentic? Not is it true, but what does it mean?

So, if you have a photograph that presents true or false information, even if it is a completely fake image, you are still communicating with the person who is consuming that image.

We just want people to know — if it purports to be an actual photograph, that yes, it is an actual photograph. There is nothing wrong with photo art or AI-generated images, as long as we do not mislead people.

So that is what JPEG Trust is looking at: securing the metadata, identifying what kind of metadata is necessary for us to understand what we are looking at, and figuring out how to communicate with the public, keeping in mind that most people look at an image for no more than two or three seconds.

We also want organisations to be able to trust that they can use images in an ethical way that would not mislead people, even inadvertently.

If someone drives down a mountain road after seeing a photo that was manipulated, and then they fall off the edge of the cliff, then you have these indirect liability issues, so we need to help restore trust for people and organisations.

We want to make it easy and interoperable so that organisations understand what you have to do to document and preserve that chain of information about what happened with that image.

ANU CECC

It’s not hard to imagine that a computer can be trained to detect if there has been manipulation of an image or a video. But, when an ordinary person looks at it, what would it be that tells them? Is it something in the corner?

Dr Sabrina Caldwell

That’s a part we’re working on right now — there is a subgroup of us within the overall international standard team who are working very hard on figuring out what sort of symbols we could use, along the lines of a green tick next to the image but nothing so simple as that of course.

ANU CECC

Or you could put it inside the frame of the photo so that people can’t miss it.

Dr Sabrina Caldwell

Yeah. That’s a good way. Something that’s easily interpretable, something that’s instant. In that two to three second window, we’re asking them to spend an additional few milliseconds to look at this symbol and decide whether or not they agree with us. If they see a green tick, what does that mean? Does it mean that this is a genuine photo? What if someone has used Photoshop to remove the red eye effect? Is it no longer genuine?

So, we need to come up with a sophisticated yet extremely simple stable of symbols to which people can become accustomed such that it feels intuitive. We need to communicate that an image or video is trustworthy, and also, in what way it is trustworthy.

ANU CECC

You could have a symbol, like a camera, if it’s a genuine photo. And if it’s manipulated, the symbol is an easel. And then you could have an in-between image that says, “This is a manipulated photo, but the manipulation does not change its meaning.”

Dr Sabrina Caldwell

Yeah, yeah, that’s good. Because that middle ground of manipulation is complicated. If you delete things or put things in, that might be viewed as a lot more manipulative than if you rotated to straighten out the frame.

I was talking to a group of students who were doing electron microscopy and they viewed removing red eye as a seriously problematic manipulation. Most people would be like, “What’s the problem?”

You know, it’s just removing the reflection from the back of the eye. Nobody wants their photo looking like a demon.

But from the perspective of the people who are doing electron microscopy, and I would say this might be the case for law enforcement as well, removing red eye changes the colour of the person’s eyes and the way that you perceive what their eyes actually look like.

It’s like every other piece of communication. People look at images for different reasons and with different filters based on their own experiences and interests. If your hobby is collecting watches, and you see a picture of a person with an unusual watch on their wrist, that’s the first thing you see. I would not even notice the watch, but a watch collector would.

ANU CECC

It’s more complicated than one would think.

Dr Sabrina Caldwell

Yes. There’s a lot of people attempting to solve this problem because it is a hugely difficult problem. But we’re not creating an international standard out of thin air. First, we worked out what the requirements would be, the use cases. We put out a call for proposals. We got six proposals back. We identified three that had really important components.

Those people, as well as the other people and the rest of the group have been working together to merge this into a standard for about a half a year now.

ANU CECC

JPEG is the right organisation to do it. If you were to announce that, from day forward, if it’s a JPEG photo or video, you will be able to wave your mouse over the upper right corner of the frame, or touch your finger, and there will be a symbol to confirm the authenticity, do you think that could restore order to all this chaos?

Dr Sabrina Caldwell

That’s exactly it, because all the metadata and provenance of a particular image is there. The question is just how do you put that together so a person can ingest it in a second? I love your idea of the little camera and the easel and I will be talking about that at the next meeting. Thank you!

ANU CECC

No problem.

Dr Sabrina Caldwell

But the other challenge is how do you can computationally identify that you have enough metadata and provenance that you can confidently put a little camera there?

Camera manufacturers are designing systems such that, when a photograph is made, the image goes to the cloud and is registered the moment it is created. And all the metadata is there. The aperture of the camera, the date and time of the day, etc. The challenge is, how do you make sure that the media can’t be scrubbed? There are a lot of technological approaches that build in security of knowledge.

Trust is very complicated too – it can be quite different across different individuals, communities, organisations and governments. We’re working through accommodating that aspect of trust in images now, and video, which is a series of many images, soon after. It’s pretty exciting.