Tools are meant to maximize productivity and decrease repetition. Repetition being one of my most detested types of tasks I’m asked to perform. Yesterday I was asked to do some SEO work which involved using the Google Webmasters Disavow Tool. I was delivered a spreadsheet of 6,000 urls in need of processing, with the full request being to disavow the domains entirely, not just the invidivdual urls.

Now that we have our scope, these are the solutions we have to build:

1) Process an entire text file list into a list of domains only (http://example.com/index.html => example.com)
2) Remove duplicate domains from the list
3) Prefix all the domains with “domain:” for Google formatting in a text file that is new line separated
4) Build this into a reusable tool for future use

First we are going to define our class, call our class, add a class constructor and a couple private variables for use later.

Now we have a rough outline and a few variables that can be used throughout the class later we need to go ahead and start writing our functions for operation. These first methods will involve checking if any arguments are passed from the command line and also printing out our help menu, if requested, or if the user does not pass any arguments to default to the help menu.

Special note: $argv[1] is accessed and not $argv[0] because [0] will be this script itself, technically you are accessing the 2nd argument passed to php.

Now that we are prepared to accept a ‘run’ argument else fall back to our help menu, we need to make sure the requirements for the script are met. Luckily this is a small script so really we just need to know that the file exists and that we can access it’s contents. Next step will be to then obtain it’s contents.

Next we will take all our urls and parse them into arrays and see if the [‘host’] key exists, if so pass it on to be added to the unique domains array.

This function verifies our domains are unique, if they are push them onto the domains array.

Finally we need a new line separated string of all our domains and to write it to a file. This is the file that can be uploaded to Google as it is properly parsed when written. You’ll notice each line is being prepended with ‘domain:’ as this is the required syntax for the Google Disavow Tool.

As the hard work is now done, I just need to print out some success messages to the user and also update our construct function to call the scripts in the correct order. Our final output will just have the counts of the number of urls processed, the unique domains, and the time taken to run the script.

What our __construct() function looks like updated. The $start variable is used to track the starting time so we can measure how long the script runs, as seen in the showStats() method.

That’s it, the entire script/project is MIT licensed and can be found at Desavouer on Gunn/Jerkens GitHub.