Making Bash script parallel with xargs
People, who tried to create parallel Bash scripts, are probably aware of -P parameter of xargs utility. In conjunction with find it allows to execute a specified command for each file and distribute these commands among several processes. This scheme is perfect when you need, say, convert all files in the given directory.
When I started to write such scripts, a certain inconvenience is revealed. Commands, that should be given to xargs, usually appear to be more complicated than a single call of some utility with arguments. It's often needed to pass a set of commands, especially when progress indication and error reporting are required. Finally, a separate script for processing a single item is created in addition to the script that has the xargs call.
I analized how to combine these two script into one. A good decision could be to pass a Bash function to xargs, but it doesn't work. A simple logic helps: if I can't have anything but my script then it should be passed to xargs. The first argument of our script would be used to separate processing of a single item and a list of items. If it is greater that zero, then it will be a number of processes for parallel list handling; if it's equal to zero, then a single item is processed.
If you want to use this method, it's often sufficient to insert this code into your script:
function process { default_np=4 np="$1"; shift if [ -z "$np" ]; then np=$default_np; fi if [ "$np" == "0" ] then process_item "$@" else process_list "$np" "$@" fi } process "$@"
You should write two functions before that — process_item for processing a single file and process_list for generating and processing list of files. process_list will receive all arguments of the script starting with a second and it should call xargs. Download the demo script to make it clear.
Any suggestions are welcome!
Nikita Melnichenko.
Comments
GNU Parallel can deal with scripts given on the command line:
ls *.zip | parallel 'mkdir {.}; cd {.}; unzip ../{}'
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
Watch the intro videos for GNU Parallel to learn more: http://nd.gd/039