In response to a post on this project's TIGSource development log (thanks to to Tiblanc for commenting!), this is my rundown on the aforementioned really, really naive websocket performance tests I've done.
As someone who's working on a game with a similar elevator pitch, I'm interested in seeing where you take this project.
Out of curiosity, did you benchmark websockets against a REST web service? I know websockets drag REST in the mud for efficiency, but is it significant enough to warrant the pain? You made me doubt my decision of going with REST for my game thinking I could be lazy and get away with it. I never did any stress test assuming I would be hit by a few hundred calls every second at most with Jetty being a superhero in all this, so I'm curious to see what kind of throughput you're getting/needing here.
I didn't bother with a REST benchmark just because, as Tiblanc said, there really is not any contest unless you massively fail at implementing your websocket communication. It was a no-brainer simply because I only plan on targeting Android, and the support is there (well... not as easily as I thought, but still there).
My extremely naive benchmark involved sending requests from set numbers of simultaneous processes (choices being 10, 20, 30, and 40) each with a specific request count (choices being 1k, 5k, 10k, 20k). The problems here are obviously numerous: It does not account for WAN latency, it does not account for the reality of data processing and memory access, it ignores the additional latency that may come from disparate machines hosting the service and the data, etc... it was intended mostly as a basis to see where I was starting from, since the potential is not even vaguely close to these numbers, but I can at least see the upper limit.
What the benchmark does do is send a JSON request, which is parsed and replied to by the server via normal routing, using a dict that is created normally and then JSON encoded (though no database or cache connection of any kind is established).
Results below. This is run within a Debian VM running on Parallels on OS X using 4 cores and 8GB RAM. I'd initially run it from another machine on the network; since I failed to save the results (oops) and don't really want to start another machine, this is run from the same machine. So that's another black mark, but I recall the numbers being close to this anyway (regardless, it's LAN, so it's not particularly valid in any case).
There is no throttling of requests in any way. Processes are started via multiprocessing. Process with a request count and told to do their best.
10000 total requests over 10 processes
Elapsed time: 0.114934921265 s
50000 total requests over 10 processes
Elapsed time: 2.99072003365 s
100000 total requests over 10 processes
Elapsed time: 12.3157320023 s
200000 total requests over 10 processes
Elapsed time: 47.2823381424 s
20000 total requests over 20 processes
Elapsed time: 0.238799095154 s
100000 total requests over 20 processes
Elapsed time: 3.96140599251 s
200000 total requests over 20 processes
Elapsed time: 29.5805599689 s
400000 total requests over 20 processes
Elapsed time: 105.174664974 s
30000 total requests over 30 processes
Elapsed time: 0.408610105515 s
150000 total requests over 30 processes
Elapsed time: 3.89557290077 s
300000 total requests over 30 processes
Elapsed time: 41.0506842136 s
600000 total requests over 30 processes
Elapsed time: 151.851080894 s
40000 total requests over 40 processes
Elapsed time: 0.406050920486 s
200000 total requests over 40 processes
Elapsed time: 4.46408104897 s
400000 total requests over 40 processes
Elapsed time: 45.580463171 s
Python being Python, you could easily make better numbers if you're more comfortable in a more performant language (I'd love to be using Clojure for this, but I have not dedicated nearly enough time to it to be comfortable in that environment). That said, even one percent of this throughput from a single server in production would probably be pretty good.