People (as a group) are not able to tell if someone is trans or cis with the accuracy they think they are. Basically the toupée fallacy but applied to trans people. The shortest version of it would be: "I have never seen a trans person that wasn't clearly trans!".
Side note: It's possible that this already exists and I am just unaware. If it was done before I would be interesting in comparing the results against newer/updated studies to see how things have changed (maybe because of media coverage or because of the internet, etc).
The simplest setup would be to have individual pictures appear on the screen. The subject then has the option to answer the question: "Is this person trans?". Yes or no? The next simplest option would be to show two images side by side. Have the subject pick which one is trans, or to mark: one, two, both, none, etc. A logical extension would be to include a group picture (maybe with numbers above persons) and have the subject pick who is or is not trans (if any). Another version I could think of is to have images of the same person pre-transition and post-transition (for lack of a better term), either individually or side by side. Again having the subject pick which person in the image(s) is trans (hint: they would both be trans). For a further curve-ball you could include further questions than just cis/trans binary (non-binary, etc). I think people would fail at just identifying the binary case already and their responses would get less accurate the more options are added.
So the data known for each image would, at a minimum be: trans/cis/non-binary, gender. Further details that I think would be very pertinent are: age of the subject in the image, transition length (if trans), age at transition start (if trans), wearing makeup/context (I am thinking at a gala vs on the street), ethnicity. The biggest one I think would be ethnicity because people have a difficult time with ethnicities they are less familiar with, and may skew the results heavily.
For the subject responding I think having some of the same information would be useful. Age, trans/cis/etc, ethnicity, gender, confidence in ability (what % do they think they can tell).
It would be important to include trans men as well as trans women! It would also be important to have a large amount of responses as well as images. Further it would be important to have a variety of people (big/small/pretty/non-conforming/etc).
Depending on the setup you would have more or less information. Assuming the best case scenario where you had all the possible criteria (and maybe more) the data could be compared along so many axes. There are so many to list: individual 'correct' percent, subject's age/gender/ethnicity have an affect, if given individual images/group images/side by side images have an affect. Also the details of false positives by the same metrics. I think the false positives would be very interesting to see.
There are quite a few things I can think of here. The first would be about ethics. I don't know why but it just feels icky, and I don't have the background to say why. Something about having people judge others based on their looks, but with consent maybe that is fine.
Perhaps the biggest issue I can think of is: each image has to be relatively the same. What I mean is that comparing a head-shot to a full body image, also context (dressed for gala vs beach vs casual). Would the results skew depending on this? I think so, but I don't think it would be as clear as just an increase in 'correct' % but also false positives.
The final issue is that images are static but 'passing' isn't really a static thing. Running the same study but replacing static images for video would probably give widely different results. Using recordings (audio only) would be another result.
The last drawback would be bigots misusing the results (this happens anyways). They may cherry-pick the data to show that: "see trans people are easily seen!" or worse: "trans people aren't easily seen so they need to be marked somehow!". This is more of an afterthought than a concern with the study though.
I think that people would rate themselves higher than the actual percent 'correct'. I further think people would be shocked to see the false-positive percent. I also think that the people who had the highest difference in perceived vs actual ability would be the most likely to write-off this disparity, and really probably be the most discriminatory people.
There are also many other things that I can think of that aren't as fully formed that would make this a pain in the ass to implement. Lucky for me I am not a researcher and wouldn't have to figure out how large a scale to make the study.
I stated it earlier in the pre/post transition side by side idea, but the concept of being trans is often thought of as someone who is transitioning. That isn't always the case though. People might not be able to transition in the way they would like but they are still trans. Same for the people who don't follow the lay-person's idea of a common transition. I think it's important to recognize that but really I would rather respect the individual's definition of themselves over assigning one to them.
The study would also show that everyone is hurt by these perceptions, not just trans people.