Making Bash script parallel with xargs

September 16, 2010
Communication

People, who tried to create parallel Bash scripts, are probably aware of -P parameter of xargs utility. In conjunction with find it allows to execute a specified command for each file and distribute these commands among several processes. This scheme is perfect when you need, say, convert all files in the given directory.

When I started to write such scripts, a certain inconvenience is revealed. Commands, that should be given to xargs, usually appear to be more complicated than a single call of some utility with arguments. It's often needed to pass a set of commands, especially when progress indication and error reporting are required. Finally, a separate script for processing a single item is created in addition to the script that has the xargs call.

I analized how to combine these two script into one. A good decision could be to pass a Bash function to xargs, but it doesn't work. A simple logic helps: if I can't have anything but my script then it should be passed to xargs. The first argument of our script would be used to separate processing of a single item and a list of items. If it is greater that zero, then it will be a number of processes for parallel list handling; if it's equal to zero, then a single item is processed.

If you want to use this method, it's often sufficient to insert this code into your script:

function process
{
	default_np=4
	np="$1"; shift
	if [ -z "$np" ]; then np=$default_np; fi
	if [ "$np" == "0" ]
	then
		process_item "$@"
	else
		process_list "$np" "$@"
	fi
}

process "$@"

You should write two functions before that — process_item for processing a single file and process_list for generating and processing list of files. process_list will receive all arguments of the script starting with a second and it should call xargs. Download the demo script to make it clear.

Any suggestions are welcome!

Nikita Melnichenko.

Comments

05.12.2011, 05:34

GNU Parallel can deal with scripts given on the command line:

ls *.zip | parallel 'mkdir {.}; cd {.}; unzip ../{}'


You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel

Watch the intro videos for GNU Parallel to learn more: http://nd.gd/039

Add comment

Text will be published as is. No HTML allowed. key