Topic starter 13/11/2021 4:16 pm
What is the fastest way I can do a sort on a 20-gigabyte file?
Should I split the file into smaller files (smaller than 2 GB), sort -u them, put them together, then split them again in a different size, sort -u again, etc.? Or are there any sorting scripts, programs that could handle files this big with my limited amount of RAM?
13/11/2021 11:20 pm
Using the sort command will probably be the fastest option.
sort --parallel=2 -u file-sorted.txt file.txt
"--parallel=n"
Set the number of sorts run in parallel to n. By default, n is set to the number of available processors but limited to 8, as there are diminishing performance gains after that. Note also that using n threads increases the memory usage by a factor of log n.