Bugzilla Details Script

I recently worked on a project which required extracting bug details from a Bugzilla database; we were attempting to reproduce the results of an academic paper in which the authors attempted to establish a link between object-oriented software metrics and program buginess. Bug data was obtained from a Bugzilla database to determine the buginess of a class. The original authors obtained a copy of the entire bug database and ran queries against it directly. We would not have been able to get a copy of the entire database, but fortunately it is now available on-line. We started by using the Bugzilla web search interface to get a list of bugs for the time period covered in the original research. We then fed that list of bug IDs into the script below to get the details we needed: which files, or classes, were associated with each bug.

This demonstrates the power of Unix shell scripting. Powerful shell scripting is one of the many reasons why I stay away from the Windows environment; awk, grep, sed, tr, and similar commands make it possible to perform some relatively powerful data manipulations quickly.

#!/bin/sh
#
# bugdetails_from_ids.sh
# Christopher Stoll, 2013-11-26
#
# Get Bugzilla bug patches given the bug IDs
#

if [ ! -e "$1" ]; then
	echo "Usage:\n $0 bug_ids-one_per_line.txt"
	exit
fi

echoerr() { echo "$@" 1>&2; }

# read bug IDs into an array
declare -a bugids=(`cat $1`)

((i=0))
while [ ${bugids[i]} ]; do
	bugid="${bugids[i]}"
	if [ ! -z "$bugid" ]; then
		#echoerr "~~~ $bugid"

		bugdetail=""
		# get the bug details page
		bugdetail=`curl -fLs --max-time 60 \
			https://api-dev.bugzilla.mozilla.org/latest/bug/$bugid/attachment`

		if [ ! -z "$bugdetail" ]; then
			# They give us some nice JSON,
			# which we are going to just hack apart
			#  convert commas to newlines
			#  find lines which contain "ref" (links to the attachments)
			#  grab the http and the rest of the url
			#  remove quoation marks
			atchids=()
			atchids=(`echo "$bugdetail" | \
				tr "," "\n" | \
				grep \"ref\" | \
				awk -F: '{print $2":"$3}' | \
				tr -d '"'`)

			((j=0))
			while [[ ${atchids[j]} ]]; do
				atchid="${atchids[j]}"

				if [ ! -z "$atchid" ]; then
					#echoerr "~~~ $bugid ~~~ $atchid"

					atchdetail=""
					# get the attachment details page
					atchdetail=`curl -fLs --max-time 60 \
						"$atchid?attachmentdata=1"`

					if [ ! -z "$atchdetail" ]; then
						# They give us some nice JSON, again we hack
						#  convert commas to newlines
						#  find lines which contain data
						#   this is the patch data
						#  grab the actual data
						#  remove quoation marks
						#  decode the data from base64
						patchdata=`echo "$atchdetail" | \
							tr "," "\n" | \
							grep \"data\" | \
							awk -F: '{print $2}' | \
							tr -d '"' | \
							base64 --decode`

						atchfile=`echo "$patchdata" | grep +++`
						echo "$atchfile" | sed -e 's/^/'$bugid'/'
					fi
				fi

				sleep 1
				((j=j+1))
			done
		fi
	fi

	sleep 1
	((i=i+1))
done

Bugzilla Details Script

Tensorflow Object Detection for Real World Problems

Plugging IOT Data into a Machine Learning Pipeline

Summiting Seneca Rocks