It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
avatar
Yepoleb: Databases aren't made to store big chunks of binary data. MongoDB is probably even worse than PostgreSQL, because it splits them into a massive amount of tiny chunks. If you want to store files with a unique key, use a filesystem, that's what they're designed for. Metadata can be kept in json files that are much easier to manage than database columns. Whatever you do, please just don't store files in a database.
avatar
Magnitus: In my experience, filesystems have a much lower amount of content-integrity checking than databases: files on the filesystem get corrupted all the time without the knowledge of the storage engine until you try to access that particular file. Databases are of course not immune to this (they too run files on a regular filesystem in most scenarios), but they are by their nature a lot more pro-active at letting you know that your content was corrupted at which point you can take action rather than unknowingly sit on corrupted data.

Also, overall, I got a lot of pre-existing knowledge with MongoDB (I took a bunch of online courses and have a double-certification). It's relatively easy for me to add/remove replicas for storage redundancy or just to deprecate or add storage medium.

For me, putting and retrieving files from a remote database is a lot less hassle and more portable than having to manage, replicate and move around a filesystem directory structure.

In a previous place of employment, we originally stored media files in a directory structure and that proved to be a mess to maintain. Moving it to a database proved to be a net gain for us (from there, you could remotely access files, replicate them, shard them and overall handle them just like other database documents).

Now, they were not 1GB+ files and you a probably right that if you want to do a lot of manipulation on the entire file, the chucks may be a drawback, but for storage and retrieval? Why not?
You'll need specialized tools for absolutely everything instead of being able to use the massive amount of existing programs to manage your files. Performance will be garbage, ram usage through the roof and your time wasted on glueing together components that were never designed for that purpose. But if you decide to reinvent the wheel and make it square there's no way I can stop you.
avatar
Yepoleb: You'll need specialized tools for absolutely everything instead of being able to use the massive amount of existing programs to manage your files.
I don't need a crazy amount of third-party tooling which isn't already provided by the DB. I'm not doing image processing, I'm storing files...

avatar
Yepoleb: Performance will be garbage,
Not my experience storing user-generated files in GridFS, but I haven't really done a lot of empirical tests strictly comparing the two to prove you wrong, especially with very large files.

One thing that I'll say from my experience storing to GridFs behind a Hapi server is that you can stream the file all the way in both directions which is pretty cool (very RAM efficient).

Either way, though Nginx-level static file serving performance is not what I'm looking for here.

avatar
Yepoleb: ram usage through the roof
For the DB, sure (it always tend to use the majority of the system RAM for caching), though you can palliate this by having separate instances for your files and capping its RAM usage.

avatar
Yepoleb: and your time wasted on glueing together components that were never designed for that purpose. But if you decide to reinvent the wheel and make it square there's no way I can stop you.
I feel like we are coming from 2 very different places with this. I wouldn't recommend this to a generic user who knows j*ck sh*t about database administration, but it's a good use-case for me in terms of leveraging what I already know (namely, the inherent integrity and portability of a replicated database) and certainly less administration work than managing those files directly on the filesystem.
Post edited June 18, 2017 by Magnitus
avatar
Magnitus: I feel like we are coming from 2 very different places with this. I wouldn't recommend this to a generic user who knows j*ck sh*t about database administration, but it's a good use-case for me in terms of leveraging what I already know (namely, the inherent integrity and portability of a replicated database) and certainly less administration work than managing those files directly on the filesystem.
To each his own, but I'll have to agree with Yepoleb on this one. Most modern file systems these days are already not much unlike database systems which are optimized for file storage.

If you really miss integrity checks and replication so much, you can always setup a RAID 1 array (either hardware or software) and use a hashing program and/or Reed-Solomon codes to further protect data at filesystem level.

I also have some database administration experience and I still would not think of doing what you're trying to do. But that's not necessarily relevant, so good luck :).
avatar
WinterSnowfall: If you really miss integrity checks and replication so much, you can always setup a RAID 1 array (either hardware or software) and use a hashing program and/or Reed-Solomon codes to further protect data at filesystem level.
RAID has some hardware limitations. I've implementing some data integrity checks in Python in the past using hashing, but I always felt like I was implementing something I get for free with a db.

avatar
WinterSnowfall: I also have some database administration experience and I still would not think of doing what you're trying to do. But that's not necessarily relevant, so good luck :).
Well, I'm doing it anyways. I've had wet dreams for years about having my replicated GridFS MongoDB backups for games, music and rips of my movie DVDs and it's gonna be grand, no matter what naysayer say :P.

Anyways, maybe of greater interest: I'm progressing well implementing a general-purpose Node.js GOG API library and am almost done with the login.

I'm not sure I'll implement the whole API (I'm mainly interested in login in, getting game info and downloading them), but I'll publish what I have once it's done.

PS:

GOG should be ashamed that they don't have a public API or an easy way to login via scripts that isn't dependent on scrapping page content.

Seriously, this is an area where Steam got them beat: https://developer.valvesoftware.com/wiki/Steam_Web_API

To quote: https://youtu.be/Cn-m90q9o-A?t=757

GOG is doing a sh*t job at leveraging its comnunity. Seriously, it's the 21rst century, get a public API.
Post edited June 19, 2017 by Magnitus
Steam definitely has a better public API, but the stuff SteamDB shows isn't publicly accessible either. I recently found example code on Github called galaxy-demo-app that suggest that registered developers can generate their own OAuth2 IDs, secrets and possible register allowed redirect URLs, so you can use the web login for own applications. That's a quite nice way to do it, as clients never get access to user credentials. Google's APIs work the same way. They'd just need to make it public instead of restricting it to a few selected studios.
avatar
Magnitus: In my experience, filesystems have a much lower amount of content-integrity checking than databases: files on the filesystem get corrupted all the time without the knowledge of the storage engine until you try to access that particular file. Databases are of course not immune to this (they too run files on a regular filesystem in most scenarios), but they are by their nature a lot more pro-active at letting you know that your content was corrupted at which point you can take action rather than unknowingly sit on corrupted data.
A lot has been said about this already, I just wanted to add something to this specific piece... which highly depend on filesystem in question. Some filesystems are much closer to databases than others. If you are storing your files on FAT then yes, you should worry about silent corruption. Btrfs on RAID1 with regular scrub on the other hand IS immune to this due to checksums, it will loudly complain if it detects checksum failure. It also has it's own RAID implementation without need for another layer to sit in between FS and hardware, and that one should be able to recover from almost anything short of direct meteor impact (though from mailing list i get the feeling that RAID-n for n>1 is still immature and not recommended for general use). Including meteor impact if you regularly "send" differences to offsite copy (diff/rsync analog that works with filesystem metadata and knows what changed since last sync without need to read whole file to hash it's content).

I've been meaning to move my private data storage to something like this, when I find enough time to do it. And I get to access my files over SMB or NFS or SSHFS instead of having to first download them to local disk before I can use them (highly disk-space inefficient, considering some installers on gog are in the range 20-30G spread over many files).
avatar
huan: A lot has been said about this already, I just wanted to add something to this specific piece... which highly depend on filesystem in question. Some filesystems are much closer to databases than others. If you are storing your files on FAT then yes, you should worry about silent corruption. Btrfs on RAID1 with regular scrub on the other hand IS immune to this due to checksums, it will loudly complain if it detects checksum failure. It also has it's own RAID implementation without need for another layer to sit in between FS and hardware, and that one should be able to recover from almost anything short of direct meteor impact (though from mailing list i get the feeling that RAID-n for n>1 is still immature and not recommended for general use). Including meteor impact if you regularly "send" differences to offsite copy (diff/rsync analog that works with filesystem metadata and knows what changed since last sync without need to read whole file to hash it's content).

I've been meaning to move my private data storage to something like this, when I find enough time to do it. And I get to access my files over SMB or NFS or SSHFS instead of having to first download them to local disk before I can use them (highly disk-space inefficient, considering some installers on gog are in the range 20-30G spread over many files).
Ideally, I want something that is somewhat robust against corruption operating on cheaper non-specialized commodity hardware. I'll pay premium for a good laptop (I spend so much time on it that it is just a good investment), but otherwise, I can be rather cheap with my infrastructure.

I`m very much sold on the horizontal scalability premise of just throwing more "lower grade" hardware at a problem as needed rather than pay $$$ for better and better vertical scaling.

My working home setup is a cheap set of "servers" (one of them acquired after someone discarded their old computer) and two 8 TB external hard drives bought during a sale (still from a reputable brand mind you)... that I'm currently leveraging to experiment with Kubernetes and Docker Swarm and DB replication/sharding (figured might as well use them for backup too)

RAID 1 requires at the very least a machine with 2 sizable hard drives. If you want to increase capacity, then you need bigger hard drives or a strategy to split your files across machines.

With GridFS, you can have a replica set across 2 separate drives for redundancy and you can shard across additional sets of drive if you need more space.

avatar
Yepoleb: Steam definitely has a better public API, but the stuff SteamDB shows isn't publicly accessible either. I recently found example code on Github called galaxy-demo-app that suggest that registered developers can generate their own OAuth2 IDs, secrets and possible register allowed redirect URLs, so you can use the web login for own applications. That's a quite nice way to do it, as clients never get access to user credentials. Google's APIs work the same way. They'd just need to make it public instead of restricting it to a few selected studios.
I suppose it makes sense that they would give special access to game devs.

However, for a web site whose premise is to "own" your game, I find that their tooling to efficiently keep an offline backup of your installers leaves much to be desired.

At the very least, they could keep a well documented targeted public API that uses API keys to facilitate the creation of third party tooling and libraries to achieve this.
Post edited June 23, 2017 by Magnitus
avatar
Yepoleb:
damn, how did I miss this thread.
somebody should sticky this :)

browsing through the documentation it seems you have it pretty much complete (and way more stuff than I know).
just two tiny additions that I don't see mentioned:


GET /userData.json - Information about the logged in user.

this can also take a list of game ids as argument and will then return the games price information
region/currency can be controlled via 'gog_lc' cookie
dummy python code

data = { 'data': { 'product_ids':[ 2060341617, 1403945378 ], 'serien_ids': [] } }
cookie = { 'gog_lc' : 'UK_GBP_en-US' }
r = request.post('https://www.gog.com/userData.json', json=data, cookies=cookie)
don't know what the 'serien_ids' are for



GET api.gog.com/products/(int: product_id)

this supports another parameter besides 'expand': 'locale'
i.e. which does exactly what you think it would do :)
Is it possible to add a new route that returns the latest games added by GOG? Basically all the games in the "NEW" tab at the homepage of GOG.com. It would also be great if it did support the limit query parameter.
Post edited July 03, 2017 by Do0msDay
avatar
Do0msDay: Is it possible to add a new route that returns the latest games added by GOG? Basically all the games in the "NEW" tab at the homepage of GOG.com. It would also be great if it did support the limit query parameter.
Just use /games/ajax/filtered?sort=new. It supports the limit parameter and everything else you know from the games page.
avatar
Do0msDay: Is it possible to add a new route that returns the latest games added by GOG? Basically all the games in the "NEW" tab at the homepage of GOG.com. It would also be great if it did support the limit query parameter.
avatar
Yepoleb: Just use /games/ajax/filtered?sort=new. It supports the limit parameter and everything else you know from the games page.
Thanks, that's right what I needed :D
This seems like a good place to ask. Is it possible to access the offline installer files with galaxy included through api.gog.com ? Or is it only possible through gog.com/accounts ?
avatar
Kalanyr: This seems like a good place to ask. Is it possible to access the offline installer files with galaxy included through api.gog.com ? Or is it only possible through gog.com/accounts ?
Use the products endpoint with downloads expanded and follow the downlink to get the chunklist and file URL.

Example:
https://api.gog.com/products/1433856545?expand=downloads
Iterate the files to get the downlink
https://api.gog.com/products/1433856545/downlink/installer/en1installer0
Request it to get the file url, make sure you're authenticated
https://cdn.gog.com/secure/unreal_tournament_2004_ece/pc/setup_ut2004_2.0.0.6.exe?a3c669cc2530fcf... (Wrong file for the product ID, but you get the point)

Edit: Crap, I misunderstood the question. Sorry for that.
Post edited July 15, 2017 by Yepoleb
avatar
Kalanyr: This seems like a good place to ask. Is it possible to access the offline installer files with galaxy included through api.gog.com ?
no, only the installers without galaxy
avatar
Kalanyr: This seems like a good place to ask. Is it possible to access the offline installer files with galaxy included through api.gog.com ? Or is it only possible through gog.com/accounts ?
avatar
Yepoleb: Use the products endpoint with downloads expanded and follow the downlink to get the chunklist and file URL.

Example:
https://api.gog.com/products/1433856545?expand=downloads
Iterate the files to get the downlink
https://api.gog.com/products/1433856545/downlink/installer/en1installer0
Request it to get the file url, make sure you're authenticated
https://cdn.gog.com/secure/unreal_tournament_2004_ece/pc/setup_ut2004_2.0.0.6.exe?a3c669cc2530fcf... (Wrong file for the product ID, but you get the point)
This only seems to yield the installers without Galaxy included ?