It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
avatar
gogtrial34987: FWIW, I also still have areas where my search is worse than GOG. The big one I know about is dealing with spelling mistakes / unfinished words. I deliberately didn't implement any of that, as I wasn't certain what my performance would be under load. Now that it's proven to be satisfactory, I'll look into improving there as well.
I should now handle basic spelling mistakes, and also deal with (very) basic transliteration. mobius, moebius and möbius (so: möbius) will all return both "Möbius Front '83" and "Moebius: Empire Rising" among the top results.

I'm not yet handling unfinished input ("strongho"), and am not certain I want to (need to come to grips with n-grams a bit more). If anyone knows a case where GOG is still returning something significantly better than I am besides that, please let me know?
Post edited May 09, 2025 by gogtrial34987
avatar
gogtrial34987:
avatar
mrkgnao: Apparently GOG has introduced dynamically-priced bundles:
https://support.gog.com/hc/en-us/articles/26981125840925-Dynamic-Pricing-for-Bundles

Could you perhaps add a filter for these? I have no idea how to find them, but perhaps your API scraper does.
Just had a first look, now that these bundles have been re-enabled. They've added a bundleType field, with values "standard_bundle" or "partial_bundle" (that seems to be the new type). It's possible there are other types as well? I haven't looked exhaustively yet.
I don't think I'll do anything over the weekend, but should be able to add a filter for these bundles sometime in the next week or two.
Post edited May 09, 2025 by gogtrial34987
avatar
mrkgnao: Apparently GOG has introduced dynamically-priced bundles:
https://support.gog.com/hc/en-us/articles/26981125840925-Dynamic-Pricing-for-Bundles

Could you perhaps add a filter for these? I have no idea how to find them, but perhaps your API scraper does.
avatar
gogtrial34987: Just had a first look, now that these bundles have been re-enabled. They've added a bundleType field, with values "standard_bundle" or "partial_bundle" (that seems to be the new type). It's possible there are other types as well? I haven't looked exhaustively yet.
I don't think I'll do anything over the weekend, but should be able to add a filter for these bundles sometime in the next week or two.
Thanks!
Hello, gogtrial34987! Thank you very much for this work of yours. I am sure it is the result of many months of work so far. It is great and quite useful.

There are quite a few things I am still figuring out. Like selecting games that never got discounted lists many free games. That is technically correct but not very useful. However, I can then filter this by price, so it is not really a problem.

But I would like to ask you why you chose Germany to stand in for the Eurozone, as I believe some games are not available there. I can find "Wolfenstein" and "Carmageddon" with prices in Euros. I am sorry if it is a silly question.
avatar
Gede: There are quite a few things I am still figuring out. Like selecting games that never got discounted lists many free games. That is technically correct but not very useful. However, I can then filter this by price, so it is not really a problem.
Yeah, I'm aware of that being not ideal. I don't want to apply too much magic, or assume too many things about what a visitor might want, as when I get it wrong the logic will only become more confusing. The only realistic alternative I have thought of - and dismissed - which I could do when sorting by "price improvement" doesn't have any effect, would be to sort by price rather than by release date. But then I'd have to put the most expensive items first (as starting with all the free games would be worse in this case), which isn't ideal either. So I'd rather leave the sorting consistent with the default sorting for all other filters, which is price improvement followed by releasedate on GOG.
(Since it's a multi-select filter, it gets worse when someone picks both "never" and "rarely", as then I definitely want to put the currently on sale items from the "rarely" category first.)

avatar
Gede: But I would like to ask you why you chose Germany to stand in for the Eurozone, as I believe some games are not available there. I can find "Wolfenstein" and "Carmageddon" with prices in Euros. I am sorry if it is a silly question.
There are indeed many games not available in Germany - but luckily they still all have a price in Germany, which is indeed the same as in (most) of the rest of the eurozone, so the lack of the availability in Germany isn't important. And then it's a case of Germany being the biggest country within the eurozone, so no obviously better choice for a stand-in country presented itself.
If I ever can gather prices for many more countries without the risk of being banned from the API for it, I'll need to rethink the country-select, and my current thinking is that I'd then probably do two (or more?) "euro-sign" flags, with some indications (?) of which all countries within the eurozone they apply to. (Basically the old magog "zones" solution, but trying to cram the UI for it into something small-ish.)

I really appreciate you (and others) asking about such things, btw! It helps me re-examine choices I made, and reminds me to take a step back every so often to try and look at things from different perspectives. I try to do that anyway, but all the same I'm very submerged in the subject, so it's hard.
My site is suffering from some extremely aggressive and badly behaved scrapers. I'm putting in various mitigations, and although I'm trying to do so in a way which won't cause real people to suffer from it, it's possible I might make a mistake somewhere. If you find yourself unable to access gamesieve, please let me know, and I'll get it fixed.
Post edited May 11, 2025 by gogtrial34987
avatar
gogtrial34987: My site is suffering from some extremely aggressive and badly behaved scrapers. I'm putting in various mitigations, and although I'm trying to do so in a way which won't cause real people to suffer from it, it's possible I might make a mistake somewhere. If you find yourself unable to access gamesieve, please let me know, and I'll get it fixed.
Welcome to the world of website maintenance...
*sigh*
avatar
gogtrial34987: There are indeed many games not available in Germany - but luckily they still all have a price in Germany,(...)
Oh. I was not expecting that. But I did notice that the website was quite mature and also, from the discussions here, is that you really seem to have given great thought on your choices.

avatar
gogtrial34987: I really appreciate you (and others) asking about such things, btw! It helps me re-examine choices I made, and reminds me to take a step back every so often to try and look at things from different perspectives. I try to do that anyway, but all the same I'm very submerged in the subject, so it's hard.
That is understandable and, in my opinion, highly desirable. You easily end up with tunnel-vision, with difficulty looking beyond the current problem and technical difficulties. Other people (like project managers) just care how useful things are. It may feel ungrateful at times, but it really leads to a better product.
The important thing is being polite and respectful.

It is nice that you show the game tags. GOG, sadly, does not make use of them on the purchased game list. gogdb also does not have support for game tags, but it does allow us to download the game info. I feel like I could almost put together a tool to help me choose what game to play next.
avatar
gogtrial34987: My site is suffering from some extremely aggressive and badly behaved scrapers. I'm putting in various mitigations, and although I'm trying to do so in a way which won't cause real people to suffer from it, it's possible I might make a mistake somewhere. If you find yourself unable to access gamesieve, please let me know, and I'll get it fixed.
avatar
mrkgnao: Welcome to the world of website maintenance...
*sigh*
From what I have been reading, it has been a common occurrence for some time. Are the robots.txt being disregarded or is it simply the number of crawlers that is much greater?
Post edited May 12, 2025 by Gede
avatar
Gede: From what I have been reading, it has been a common occurrence for some time. Are the robots.txt being disregarded or is it simply the number of crawlers that is much greater?
There are still some "good" bots out there, who properly identify themselves and respect robots.txt (and other) permissions --- e.g. Google --- but they are quickly becoming a minority.

There are also those who ignore permissions, but still identify themselves or use a fixed IP address --- e.g. ChatGPT --- so one can easily block them.

The real problem lies with a third group of bots that ignore permissions and further obfuscate their identity by constantly changing their user agent string and IP address with every single access. These are the real pain and if aggressive enough can even crash one's website (happened to me twice in 2025). This group has become much larger in the last couple of years --- part of the AI race, I suspect. I have developed heuristics to identify these and attempt to reduce their impact, but one has to be careful not to overdo it, so as not affect legitimate users.
avatar
mrkgnao: The real problem lies with a third group of bots that ignore permissions and further obfuscate their identity by constantly changing their user agent string and IP address with every single access.
Rather than trying to profile your attackers, simply enforce global limits to all your users based on expected usage patterns. That means mostly rate limiting, but not only. If anyone behaves badly, be it bots or men, they'll trip. Equality for all :P.
avatar
mrkgnao: The real problem lies with a third group of bots that ignore permissions and further obfuscate their identity by constantly changing their user agent string and IP address with every single access.
avatar
WinterSnowfall: Rather than trying to profile your attackers, simply enforce global limits to all your users based on expected usage patterns. That means mostly rate limiting, but not only. If anyone behaves badly, be it bots or men, they'll trip. Equality for all :P.
How would I limit a bot user that changes its IP with every access?
avatar
mrkgnao: The real problem lies with a third group of bots that ignore permissions and further obfuscate their identity by constantly changing their user agent string and IP address with every single access.
avatar
WinterSnowfall: Rather than trying to profile your attackers, simply enforce global limits to all your users based on expected usage patterns. That means mostly rate limiting, but not only. If anyone behaves badly, be it bots or men, they'll trip. Equality for all :P.
That won't do any good against bots coming over Tor, changing exit nodes and user-agent string every 5-10 requests while pointlessly gathering 1000+ (of which ~500 virtually identical) pages within a couple of minutes. By the time they trip any rate limit which wouldn't inconvenience regular humans, they're already using their next identity. (1000 pages isn't so bad, but once they'll have parsed those, they'll have discovered nearly 1M new links to explore...)

It's been a few years since I did active server administration, and the challenge has grown significantly. Yet another reason to loathe everything "AI", with their insane data hunger. Can't wait for that particular bubble to burst.
Post edited May 12, 2025 by gogtrial34987
avatar
mrkgnao: How would I limit a bot user that changes its IP with every access?
With every access? Damn. Well, one way is to have a website entry-point that sets a unique cookie token for that user and all other entry-points that check it and redirect to the original entry-point if it's not present. From there on, you have a way to uniquely identify your user. If that's a feasible or user-friendly way to design a website is another question. You may still get traffic even like this, of course, but the bots won't be able to harvest anything useful (and now you can rate limit them if they decide they do need something useful).
Post edited May 12, 2025 by WinterSnowfall
avatar
mrkgnao: How would I limit a bot user that changes its IP with every access?
avatar
WinterSnowfall: With every access? Damn. Well, one way is to have a website entry-point that sets a unique cookie token for that user and all other entry-points that check it and redirect to the original entry-point if it's not present. From there on, you have a way to uniquely identify your user. If that's a feasible or user-friendly way to design a website is another question. You may still get traffic even like this, of course, but the bots won't be able to harvest anything useful (and now you can rate limit them if they decide they do need something useful).
Unless I'm missing something, wouldn't that make it impossible to go directly to any page (via search engine or bookmark)?
And I'm thinking that a decent AI bot would find a way to create a fake token anyway.
avatar
Cavalary: Unless I'm missing something, wouldn't that make it impossible to go directly to any page (via search engine or bookmark)?
... on the first access, but subsequent access would work, once you have a valid token. Not the best of design, as I said before, but a valid one if keeping bots in check is a priority.

avatar
Cavalary: And I'm thinking that a decent AI bot would find a way to create a fake token anyway.
Not even "AI bots" can defeat cryptography, so as long as you have a decent hashing algorithm in place, any sort of token guessing should be practically impossible.