lichess.org
Donate

How to transform a chess database to a new one with additionnal parameters ?

Hello everyone, I have a very specific question :

So let's say you have a normal chess database (like the mega database of Chessbase for Instance) but you ONLY want to keep the games with precisely 18 moves, how would you do it ?

I also have an other question :
Lichess only allows you to import .pgn that have 32 games maximum in them, so do you know any way that I could split a .pgn file with 1280 games in it into 40 smaller .pgn files containing each 32 games ?

Thank you very much for reading, it would really mean a lot to me if anyone could answer !!
<Comment deleted by user>
To keep games with exactly 18 moves, you can, for example, filter games in scid vs pc by minimum 36 plies and maximum 36 plies (ply = 1/2 move). Or use pgn-extract with options --maxply, --minply or --maxmoves, --minmoves (for example, pgn-extract --maxply 36 --minply 36 input.pgn -o output.pgn).

Before splitting, I would suggest splitting by ECO code first and then split into small files. (Again pgn-extract would do it.)
@polar-fr said in #4:
> if the database has a number_of_moves field you would write an sql (structured query language) operation (I don't know the exact syntax - something like)

This is not how ChessBase databases work.
@kajalmaya said in #5:
> To keep games with exactly 18 moves, you can, for example, filter games in scid vs pc by minimum 36 plies and maximum 36 plies (ply = 1/2 move). Or use pgn-extract with options --maxply, --minply or --maxmoves, --minmoves (for example, pgn-extract --maxply 36 --minply 36 input.pgn -o output.pgn).
>
> Before splitting, I would suggest splitting by ECO code first and then split into small files. (Again pgn-extract would do it.)

Thanks so much !!! I downloaded pgn-extract and did what you adviced me to do. However I can't figure out how to filter by ELO now... I want games only with both players being above 2500 ELO in my PGN. I tried the -t "tag" command with different syntaxes but I really can't figure it out... It would really mean the world to me if you could tell me how to do it please !!
Create a text file, say tag-file. Write something like the following in the file:

Date > "2015.12.31"
WhiteElo >= "2500"
BlackElo >= "2500"

Then call pgn-extract like this

pgn-extract -t tag-file input.pgn -o output.pgn

pgn-extract will combine the options with AND.

I think some tag filters can be specified on the command line, and some other tag filters can be specified from a tags file.

Look at the pgn-extract help page. It gives lots of options. Or do "pgn-extract --help". I often don't remember many options. But it is a very powerful tool, and very reliable as well. It does precisely what the author says on the website. The author is very helpful also. But sometimes exporting from scid/scid vs pc is quicker.

May I ask why you want games with precisely 18 moves?
It takes good hardware to work with large modern databases ... there's no way around it ...

Best is a large SSD, or NAS of at least 2 TB ... you'll want modern silicon too with as many CPU's and highest clock rate you can afford with at least 32 GB RAM on a newer front-end bus, 4.8 Ghz or better ...

Also, you'll want a fast IP connection too, so you won't have to wait for days to d/l a typical Lichess games collection ...

Then, which tools will you use once the compressed raw games file is on your local storage ?

Tools like pgn-extract requires a utf-8 ascii PGN for input ...

On your local machine, if you fire up a decompressor for one of the monthly Lichess collections, for instance, you're going to wind up with a very large ascii encoded PGN games collection ... on the order of 100 - 200 GB ...

Playing with this data is a big-boy computing task ...

This topic has been archived and can no longer be replied to.