"Left-hand-drive cars, but traffic keeps to the left" -- yet the picture doesn't hint at which side traffic drives on.
"Language on the shop fascia looks like a Latin alphabet business name rather than Spanish or Portuguese" -- I'm sorry, what alphabet are Spanish and Portuguese written in?
It's worth mentioning that this is a different scenario to the reasoning models though. Reasoning models use the generated text to arrive at an answer, in a sense, it cannot lie until it gives the answer. That answer may express a reasoning that was not the reasoning used. That bit is the lie.
You can actually take this further when you consider deepseek style reinforcement. While the reasoning text may appear to show the thought process used in readable language, the model is trained to say whatever it needs to generate the right answer, that may or may not be what that text means to an outside observer. In theory it could encode extra information in word lengths or even evolve it's own Turing complete gobbledegook. There are many degrees of likelihood in the options available. Perhaps one more likely is some rarely used word has some poorly trained side-effect that gives the context a kick in the right direction right before it was going to take a fork going the wrong way. Kind of a SolidGoldMagikarp spanking.
People do this all the time too! Cat scans show that people make up their minds quickly, showing activations in one part of the brain that makes snap judgements, and then a fraction of a second later the part that shows rational reasoning begins to activate. People in sales have long known this, wanting to give people emotional reasons to make the right decision, while also giving them the rational data needed to support it. [1]
I remember seeing this illustrated ourselves when our team of 8 or so people was making a big ERP purchasing decision between Oracle ERP and Peoplesoft long ago. We had divided what our application needed to do into over 400 feature areas, and in each feature area had developed a very structured set of evaluation criteria for each area. Then we put weights on each of those to express how important it was to us. We had a big spreadsheet to rank the things.
But along the way of the 9 month sales process, we really enjoyed working with the Oracle sales team a lot better. We felt like we'd be able to work with them better. In the end, we ran all the numbers, and Peoplesoft came out on top. And we sat there and soberly looked each other in the eyes, and said "We're going with Oracle." (Actually I remember one lady on the team when asked for her vote said, "It's gotta be the big O.")
Salespeople know that ultimately it's a gut decision, even if the people buying things don't realize that themselves.
The real answer is it's completely domain-specific. If you're trying to search for something that you'll instantly know when you see it, then something that can instantly give you 5 wrong answers and 1 right answer is a godsend and barely worse than something that is right 100% of the time. If the task is to be an authoritative designer of a new aeroplane, it's a different story.
The other day I meme-ified a photo with ChatGPT. Pleased with the style I fed that into Midjourney's "Describe" feature which aims to write an image generation prompt based on the image supplied. Midjourney did include a location as part of its output description and this was indeed accurate to the original photographic source material - this is all in spite of the image fed into the system being a ChatGPT-generated caricature, with what I thought was a generic looking background.
The lesson here is that these are still algorithmically generated images - and although it may not be obvious to us, even heavily stylised images may still give away a location through the inclusion of unremarkable landmarks. In my case it appears that the particular arrangement of mountains in the background was specific to a single geographic region.
While I think your story is entirely plausible, I wonder if there could be something else going on. Maybe ChatGPT puts the prompt (or an assumed location) in the image's metadata?
Not ruling it out, but this would mean both ChatGPT to put the metadata in the file, and then Midjourney read that metadata and put it into the img2txt output. (Midjourney produces 4 sets of text outputs from the single input image, two contained location information, naming the specific mountain chains it "saw" in the caricature image.)
Assuming it's not the metadata, it's a powerful use of AI, but also not one that I would not be too surprised about. It can be a useful investigative tool, or simply a fun way to hide clues for a puzzle.
Generative AIs just patch together memorized data. So parts of the original data can sometimes get back out like victim's hairs out of a monster's mouth.
Worked so-so for me. Took a picture from my street, and cropped it a bit to leave out some significant landmark in the distance. It missed by around 500 km, but deduced a lot of things correctly.
Then I used the uncropped picture, and it spent 3 minutes trying to look at the features of said landmark. It get hung up on some similar (and much more famous) island which is even further away from here.
Lastly I used a google image photo of said landmark (which is an island with a lighthouse) - which was quite clear. But it insisted on being the same island as the previous try.
I showed it a picture of a street in Rome from our last holiday and the thinking traces show it was bang on but halfway through the output it just deletes it all and says it's against policy.
Well, this is why I personally like open source since if something cool is found, it can't be taken away.
I think the openai team is putting some restrictions on the model because such reverse location could put a bad paint on their reputation. But if Openai was actually open source, they could've done nothing about it. But that's not the world we live in.
If a trendy thing to do with AI goes viral, people feel peer pressure to do the trendy thing. If you block the trendy thing from your free version, but keep it available in your paid version, maybe you can covert some free users to paid.
AI models cost money to develop and run: that much electricity and that many processors do not come cheaply. Open Source has yet to find ways to pay for that.
Presumably, they don’t want people using the tool to dox others. It’s a bit moot since it can still be done by humans but that requires time, effort, skill, etc., which is otherwise outsourced to the AI.
15 years ago I'd have agreed with you. However we've managed to make almost everything about life, happen on the Internet. Not just the "fun" things, but identification (and verification thereof), communication, payment, bureaucracy, hell even medicine in many ways.
We cannot lull ourselves into this idea of "technopia" where everyone - 100% of everyone - is acutely aware of privacy risks and proper security and safety posture on the internet. That will never happen, and only puts disadvantaged individuals at risk - especially since we've effectively forced them to use a system they're not capable of operating safely.
No, you very clearly are not if you think it is "insane" to even just talk about this as a privacy issue. The location of an EXIF stripped image can easily be inferred, automatically, at scale, with high accuracy. If that isn't a privacy issue, I don't know what is. You may not feel it is a big concern, or you may have given up on privacy, but don't gaslight me into thinking that this is not a privacy concern. That would be literally insane. Why are you so upset with this being talked about as a privacy issue?
And I don't know why you feel compelled to give this disclaimer. Sounds like the bad faith "as a diehard conservative/liberal, this conservative/liberal policy goes too far" when in fact they are not from that side at all. "Privacy is harder now on the internet, so it is impossible, just give up" is what everyone who profits from personal data collection wants you to think.
Sadly that is part of the internet commenter's rhetorical toolbox now. Every argument sounds a little stronger if you say that you are usually part of one team on this side of the argument, but you think your team has gone too far on this one. Classic enlightened centrism. What an independent thinker! If only it were true.
Never in my life have a shared a picture and thought "I hope nobody knows where this was taken." Nor have I ever thought about people sharing pictures of me "I hope they keep the location private." It's on me to wear a mask if I don't want to be identified. My assumption is that any time I'm in public my life is public. To assume otherwise is folly.
I took a crapshot at asking chatgpt how I'd set the clock on my car radio by giving it a picture. Not only did it tell me the correct method, but it identified my radio as a "typical factory radio installed in early 2000 insert make here vehicles."
I just played a full round of Geoguessr world with Gemini 2.5 and got a score of 22k / 25k (so a silver medal). This puts in the realm of a "pretty good" player.
It was shockingly accurate with its guesses of Essen, Germany and Sheffield, UK, but faltered a bit in Italy (it thought Genoa was Siena) and Russia (it guessed Samara but it was actually a small town about 400 miles to the west). It also guessed Orlando when it was Tampa.
Still this was only giving it a single image to work off of, where any player would be able to move around for a few minutes.
The version of using reasoning to do geoguesser to find approximate locations is fine. But we should fully expect this tech to reasonably soon be able to rapidly vector search satellite imagery or even non satellite imagery to pinpoint locations based on landmarks that should feel unusable to us humans.
We’re going to create a fuzzy visual index for every location in the world.
A photo taken on my street (no exif) "only" gives the correct town in chatgpt and gemini, and then incorrectly guesses the precise neighbourhood/street when pushed. Gemini claimed to have done a reverse image search, but I'm not convinced it did. An actual Google reverse image search found similar photos, taken a bit further along the same street or in a different direction, labelled with the correct street (no LLM required).
I've been digitising family photos using this. I scanned the photo itself and the text on it, then passed that to an LLM for OCR and used tools to get the caption verbatim, the location mentioned and the date in a standard format. That was going to be the end of it, but the OpenAI docs https://platform.openai.com/docs/guides/function-calling?lan... suggest letting the model guess coordinates instead of just grabbing names, so I did both and it was impressive. My favourite was taking a picture looking out to sea from a pier and pinpointing the exact pier.
I showed the model a picture and any text written on that picture and asked it to guess a latitude/longitude using the tool use API for structured outputs.
That was in addition to having it transcribe the hand written text and extracting location names, which was my original goal until I saw how good it was at guessing exact coordinates. It would guess within ~200km on average, even on pictures with no information written on them.
At first, it's unsure, but also mention that there are a lot of riverside cafes in Southeast Asia that have this view. Then I said it was in Vietnam, and it was immediately concluded that this was taken at the Han River in Da Nang city, which was correct.
I can see that there is some actual analysis skill here. I'm not 100% convinced, but I'm still impressed.
I took a screenshot of your image and this prompt "play the game geoguesser and guess where this image was taken"
Putting those pieces together, the most likely spot is one of the cafés on the east bank just north of Dragon Bridge. A popular candidate with a very similar railing/table setup is Bridgespan Café (also called Bridge Cafe) at ≈ 16.0645 N, 108.2292 E.
Location guess: A second‑floor riverside café on Trần Hưng Đạo street, east bank of the Hàn River, Đà Nẵng, Vietnam (looking southwest toward Dragon Bridge).
Approx. coordinates: 16.064 °N, 108.229 °E
Confidence level: 70 %
The bridge‑light pattern and cruise‑boat LEDs strongly suggest Đà Nẵng, but several cafés share almost identical views, so the exact establishment is harder to pin down.
I’ve found it surprisingly good, but has anyone verified that it’s not just using EXIF geolocation data embedded in the photo? I haven’t bothered to strip it.
I've given it a screenshot and it got close enough. So, it's not looking at exif data.
I've been playing a bit with just giving it random things from Google's streetview.
It figured out a landmark in Dresden pretty easily. But then I gave it a generic view of some street in Nuremberg and it took a long time. It extracted lots of information from the screenshot. Including the logo of a travel agency and some very pixelated house numbers. Amusingly I actually triggered some AB testing thing where it was using two models in parallel side by side.
I just let it do its thing and it looked like it got stuck in some loop. Eventually both models produced an answer. They both correctly identified the city but then both models came up with two wrong streets. The travel agency has offices across Germany and probably multiple in Nuremberg as well. There was no street name readable and no house numbers. It based this off just the name of the travel agency and the fact it was next to some hairdresser.
It seems to be using python scripts to zoom in on different bits of the image and search engines to query clues it finds. It also looks at architecture and other aspects to deduce things about country, regions, etc. Overall, I'm pretty impressed with how far it got.
These were taken without EXIF for GPS (because the cameras did not have GPS), and it was still very good. With anything turisty it got them immediately, and with my photo of a house on the Muese river it got it nearly perfect. When I added a few more photos I was able to nail it down to a couple hundred meters.
I did it all with 4o (the old model), the only issue is that if it starts to search the internet it will then lie to you that its not able to read pictures.
Its pretty good I had used 4o many months back with a picture of me deccades back in nalanda, and mind you not the iconic huge wall that most people associate with it, but another corner of the place and it knew where I was back then.
Working backwards -- it seems like a good idea to use geoguessr in the training set for SFT or sorts. I would imagine it would generalise well to other aspects.
It's not inconceivable, given that smartphones became widespread about a decade ago. Somehow, we endured for hundreds of thousand of years without recording every single moment of our lives to feed into the social media data mining machine.
Not to rain on anyone's parade. This does seem fun, but I have been doing this with Google Lens or Gemini on my phone for a while now and it's usually pretty good already?
I mean a while like Google Lens has been able to do this for a long time...
Yeah I wanted to post this, reverse image search can already do this easily. Sometimes when one of my friends posts a pic in a foreign land, I try to geoguess, it once told me the name of the hotel the view was taken from
The example includes the following "reasoning":
"Left-hand-drive cars, but traffic keeps to the left" -- yet the picture doesn't hint at which side traffic drives on.
"Language on the shop fascia looks like a Latin alphabet business name rather than Spanish or Portuguese" -- I'm sorry, what alphabet are Spanish and Portuguese written in?
LLMs lie about their reasoning: https://www.anthropic.com/research/tracing-thoughts-language...
It's worth mentioning that this is a different scenario to the reasoning models though. Reasoning models use the generated text to arrive at an answer, in a sense, it cannot lie until it gives the answer. That answer may express a reasoning that was not the reasoning used. That bit is the lie.
You can actually take this further when you consider deepseek style reinforcement. While the reasoning text may appear to show the thought process used in readable language, the model is trained to say whatever it needs to generate the right answer, that may or may not be what that text means to an outside observer. In theory it could encode extra information in word lengths or even evolve it's own Turing complete gobbledegook. There are many degrees of likelihood in the options available. Perhaps one more likely is some rarely used word has some poorly trained side-effect that gives the context a kick in the right direction right before it was going to take a fork going the wrong way. Kind of a SolidGoldMagikarp spanking.
> LLMs lie about their reasoning
People do this all the time too! Cat scans show that people make up their minds quickly, showing activations in one part of the brain that makes snap judgements, and then a fraction of a second later the part that shows rational reasoning begins to activate. People in sales have long known this, wanting to give people emotional reasons to make the right decision, while also giving them the rational data needed to support it. [1]
I remember seeing this illustrated ourselves when our team of 8 or so people was making a big ERP purchasing decision between Oracle ERP and Peoplesoft long ago. We had divided what our application needed to do into over 400 feature areas, and in each feature area had developed a very structured set of evaluation criteria for each area. Then we put weights on each of those to express how important it was to us. We had a big spreadsheet to rank the things.
But along the way of the 9 month sales process, we really enjoyed working with the Oracle sales team a lot better. We felt like we'd be able to work with them better. In the end, we ran all the numbers, and Peoplesoft came out on top. And we sat there and soberly looked each other in the eyes, and said "We're going with Oracle." (Actually I remember one lady on the team when asked for her vote said, "It's gotta be the big O.")
Salespeople know that ultimately it's a gut decision, even if the people buying things don't realize that themselves.
[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC6310859/
> People do this all the time too
I wish people would stop comparing AI to Humans, honestly
I know humans are flawed. We all know
The appeal of computer systems is that they are consistent. The ideal software is bug free, zero flaws
Creating human-like computer systems is so worthless. Why would we want to make them less predictable and less consistent
Language models happen to share human flaws, but like humans they can amplify their abilities and reliability by building and using reliable tools.
I actually prefer a system that's correct half of the time at thousands of times the cost & speed.
The real answer is it's completely domain-specific. If you're trying to search for something that you'll instantly know when you see it, then something that can instantly give you 5 wrong answers and 1 right answer is a godsend and barely worse than something that is right 100% of the time. If the task is to be an authoritative designer of a new aeroplane, it's a different story.
Because we can still do things computers can't and that's interesting
Hey, maybe they're in a really old part of town! ...like really REALLY old. https://en.wikipedia.org/wiki/Iberian_scripts
I'm pretty sure this extends beyond ChatGPT.
The other day I meme-ified a photo with ChatGPT. Pleased with the style I fed that into Midjourney's "Describe" feature which aims to write an image generation prompt based on the image supplied. Midjourney did include a location as part of its output description and this was indeed accurate to the original photographic source material - this is all in spite of the image fed into the system being a ChatGPT-generated caricature, with what I thought was a generic looking background.
The lesson here is that these are still algorithmically generated images - and although it may not be obvious to us, even heavily stylised images may still give away a location through the inclusion of unremarkable landmarks. In my case it appears that the particular arrangement of mountains in the background was specific to a single geographic region.
While I think your story is entirely plausible, I wonder if there could be something else going on. Maybe ChatGPT puts the prompt (or an assumed location) in the image's metadata?
Not ruling it out, but this would mean both ChatGPT to put the metadata in the file, and then Midjourney read that metadata and put it into the img2txt output. (Midjourney produces 4 sets of text outputs from the single input image, two contained location information, naming the specific mountain chains it "saw" in the caricature image.)
Assuming it's not the metadata, it's a powerful use of AI, but also not one that I would not be too surprised about. It can be a useful investigative tool, or simply a fun way to hide clues for a puzzle.
Generative AIs just patch together memorized data. So parts of the original data can sometimes get back out like victim's hairs out of a monster's mouth.
Worked so-so for me. Took a picture from my street, and cropped it a bit to leave out some significant landmark in the distance. It missed by around 500 km, but deduced a lot of things correctly.
Then I used the uncropped picture, and it spent 3 minutes trying to look at the features of said landmark. It get hung up on some similar (and much more famous) island which is even further away from here.
Lastly I used a google image photo of said landmark (which is an island with a lighthouse) - which was quite clear. But it insisted on being the same island as the previous try.
As always when there's a new trend it refuses me.
I showed it a picture of a street in Rome from our last holiday and the thinking traces show it was bang on but halfway through the output it just deletes it all and says it's against policy.
Well, this is why I personally like open source since if something cool is found, it can't be taken away.
I think the openai team is putting some restrictions on the model because such reverse location could put a bad paint on their reputation. But if Openai was actually open source, they could've done nothing about it. But that's not the world we live in.
If a trendy thing to do with AI goes viral, people feel peer pressure to do the trendy thing. If you block the trendy thing from your free version, but keep it available in your paid version, maybe you can covert some free users to paid.
AI models cost money to develop and run: that much electricity and that many processors do not come cheaply. Open Source has yet to find ways to pay for that.
> Open Source has yet to find ways to pay for that.
I mean, self-hosted models would call that an electricity bill.
The same is true if something dangerous is found.
> it's against policy
Presumably, they don’t want people using the tool to dox others. It’s a bit moot since it can still be done by humans but that requires time, effort, skill, etc., which is otherwise outsourced to the AI.
"New privacy risk" what the hell.
The whole internet is a privacy risk from the start. Don't want any risk? Don't publish anything. Go live on an island. Be a random.
I'm fond of boosting privacy issue awareness, but jumping directly to "booh new privacy risk" every time is insane.
> Don't want any risk? Don't publish anything.
I don‘t, and still my data was shared by other people because they posted something or gave by private number to FB and WhatsApp.
It’s a new privacy risk and it’s legit to name it every time it is one.
Do you say the same every time they found a new cause for cancer?
15 years ago I'd have agreed with you. However we've managed to make almost everything about life, happen on the Internet. Not just the "fun" things, but identification (and verification thereof), communication, payment, bureaucracy, hell even medicine in many ways.
We cannot lull ourselves into this idea of "technopia" where everyone - 100% of everyone - is acutely aware of privacy risks and proper security and safety posture on the internet. That will never happen, and only puts disadvantaged individuals at risk - especially since we've effectively forced them to use a system they're not capable of operating safely.
> I'm fond of boosting privacy issue awareness
No, you very clearly are not if you think it is "insane" to even just talk about this as a privacy issue. The location of an EXIF stripped image can easily be inferred, automatically, at scale, with high accuracy. If that isn't a privacy issue, I don't know what is. You may not feel it is a big concern, or you may have given up on privacy, but don't gaslight me into thinking that this is not a privacy concern. That would be literally insane. Why are you so upset with this being talked about as a privacy issue?
And I don't know why you feel compelled to give this disclaimer. Sounds like the bad faith "as a diehard conservative/liberal, this conservative/liberal policy goes too far" when in fact they are not from that side at all. "Privacy is harder now on the internet, so it is impossible, just give up" is what everyone who profits from personal data collection wants you to think.
Sadly that is part of the internet commenter's rhetorical toolbox now. Every argument sounds a little stronger if you say that you are usually part of one team on this side of the argument, but you think your team has gone too far on this one. Classic enlightened centrism. What an independent thinker! If only it were true.
Never in my life have a shared a picture and thought "I hope nobody knows where this was taken." Nor have I ever thought about people sharing pictures of me "I hope they keep the location private." It's on me to wear a mask if I don't want to be identified. My assumption is that any time I'm in public my life is public. To assume otherwise is folly.
[dead]
Naive. This is risk at a whole new level - and should be raised in an article such as this, as it was.
I took a crapshot at asking chatgpt how I'd set the clock on my car radio by giving it a picture. Not only did it tell me the correct method, but it identified my radio as a "typical factory radio installed in early 2000 insert make here vehicles."
I just played a full round of Geoguessr world with Gemini 2.5 and got a score of 22k / 25k (so a silver medal). This puts in the realm of a "pretty good" player.
It was shockingly accurate with its guesses of Essen, Germany and Sheffield, UK, but faltered a bit in Italy (it thought Genoa was Siena) and Russia (it guessed Samara but it was actually a small town about 400 miles to the west). It also guessed Orlando when it was Tampa.
Still this was only giving it a single image to work off of, where any player would be able to move around for a few minutes.
Hmmm this could be really problematic tbh.
The version of using reasoning to do geoguesser to find approximate locations is fine. But we should fully expect this tech to reasonably soon be able to rapidly vector search satellite imagery or even non satellite imagery to pinpoint locations based on landmarks that should feel unusable to us humans.
We’re going to create a fuzzy visual index for every location in the world.
Google probably has this from their street view repository.
A photo taken on my street (no exif) "only" gives the correct town in chatgpt and gemini, and then incorrectly guesses the precise neighbourhood/street when pushed. Gemini claimed to have done a reverse image search, but I'm not convinced it did. An actual Google reverse image search found similar photos, taken a bit further along the same street or in a different direction, labelled with the correct street (no LLM required).
I've been digitising family photos using this. I scanned the photo itself and the text on it, then passed that to an LLM for OCR and used tools to get the caption verbatim, the location mentioned and the date in a standard format. That was going to be the end of it, but the OpenAI docs https://platform.openai.com/docs/guides/function-calling?lan... suggest letting the model guess coordinates instead of just grabbing names, so I did both and it was impressive. My favourite was taking a picture looking out to sea from a pier and pinpointing the exact pier.
Hmm, not sure I understand how you made use of OpenAI to guess the location oh a photo. Could you expand on that a bit? Thanks!
I showed the model a picture and any text written on that picture and asked it to guess a latitude/longitude using the tool use API for structured outputs. That was in addition to having it transcribe the hand written text and extracting location names, which was my original goal until I saw how good it was at guessing exact coordinates. It would guess within ~200km on average, even on pictures with no information written on them.
I gave it this picture https://i.imgur.com/HyfVxiD.jpeg
At first, it's unsure, but also mention that there are a lot of riverside cafes in Southeast Asia that have this view. Then I said it was in Vietnam, and it was immediately concluded that this was taken at the Han River in Da Nang city, which was correct.
I can see that there is some actual analysis skill here. I'm not 100% convinced, but I'm still impressed.
I took a screenshot of your image and this prompt "play the game geoguesser and guess where this image was taken"
Putting those pieces together, the most likely spot is one of the cafés on the east bank just north of Dragon Bridge. A popular candidate with a very similar railing/table setup is Bridgespan Café (also called Bridge Cafe) at ≈ 16.0645 N, 108.2292 E.
Location guess: A second‑floor riverside café on Trần Hưng Đạo street, east bank of the Hàn River, Đà Nẵng, Vietnam (looking southwest toward Dragon Bridge).
Approx. coordinates: 16.064 °N, 108.229 °E
Confidence level: 70 %
The bridge‑light pattern and cruise‑boat LEDs strongly suggest Đà Nẵng, but several cafés share almost identical views, so the exact establishment is harder to pin down.
I tried just now. It got one image exactly and proposed reasonably good but wrong guesses for the other two.
Makes me appreciate the insane level of skill that humans on GeoGuessr style subreddits have.
There are geoguesser specific AIs which are able to outperform the top players consistently. It’s just surprising a generalised AI can do it.
I’ve found it surprisingly good, but has anyone verified that it’s not just using EXIF geolocation data embedded in the photo? I haven’t bothered to strip it.
Edit: just saw defrost’s link to the earlier threads, and one comment did just that https://news.ycombinator.com/item?id=43724063
I've given it a screenshot and it got close enough. So, it's not looking at exif data.
I've been playing a bit with just giving it random things from Google's streetview.
It figured out a landmark in Dresden pretty easily. But then I gave it a generic view of some street in Nuremberg and it took a long time. It extracted lots of information from the screenshot. Including the logo of a travel agency and some very pixelated house numbers. Amusingly I actually triggered some AB testing thing where it was using two models in parallel side by side.
I just let it do its thing and it looked like it got stuck in some loop. Eventually both models produced an answer. They both correctly identified the city but then both models came up with two wrong streets. The travel agency has offices across Germany and probably multiple in Nuremberg as well. There was no street name readable and no house numbers. It based this off just the name of the travel agency and the fact it was next to some hairdresser.
It seems to be using python scripts to zoom in on different bits of the image and search engines to query clues it finds. It also looks at architecture and other aspects to deduce things about country, regions, etc. Overall, I'm pretty impressed with how far it got.
Just create a screenshot of the pic and upload that.
Edit the location metadata to be obviously false (Alaska coordinates for a tropical beach photo) then see what it says?
Use google maps, and take a screenshot.
I used it to place some old vacation photos.
These were taken without EXIF for GPS (because the cameras did not have GPS), and it was still very good. With anything turisty it got them immediately, and with my photo of a house on the Muese river it got it nearly perfect. When I added a few more photos I was able to nail it down to a couple hundred meters.
I did it all with 4o (the old model), the only issue is that if it starts to search the internet it will then lie to you that its not able to read pictures.
Its pretty good I had used 4o many months back with a picture of me deccades back in nalanda, and mind you not the iconic huge wall that most people associate with it, but another corner of the place and it knew where I was back then.
Earlier on HN:
ChatGPT now performs well at GeoGuesser (flausch.social)
131 points | 8 hours ago | 113 comments https://news.ycombinator.com/item?id=43723408
Working backwards -- it seems like a good idea to use geoguessr in the training set for SFT or sorts. I would imagine it would generalise well to other aspects.
With a sample size of 1, Gemini 2.5 Pro (Experimental) did a great job of this (and was considerably faster than O3)
People taking pictures with you in them without your permission, whether intentional or not, are invading your privacy.
you're saying you've NEVER taken a picture of anyone in the background without their permission?
It's not inconceivable, given that smartphones became widespread about a decade ago. Somehow, we endured for hundreds of thousand of years without recording every single moment of our lives to feed into the social media data mining machine.
Not to rain on anyone's parade. This does seem fun, but I have been doing this with Google Lens or Gemini on my phone for a while now and it's usually pretty good already?
I mean a while like Google Lens has been able to do this for a long time...
Yeah I wanted to post this, reverse image search can already do this easily. Sometimes when one of my friends posts a pic in a foreign land, I try to geoguess, it once told me the name of the hotel the view was taken from