I recently worked on a project which required extracting bug details from a Bugzilla database; we were attempting to reproduce the results of an academic paper in which the authors attempted to establish a link between object-oriented software metrics and program buginess. Bug data was obtained from a Bugzilla database to determine the buginess of a class. The original authors obtained a copy of the entire bug database and ran queries against it directly. We would not have been able to get a copy of the entire database, but fortunately it is now available on-line. We started by using the Bugzilla web search interface to get a list of bugs for the time period covered in the original research. We then fed that list of bug IDs into the script below to get the details we needed: which files, or classes, were associated with each bug.
This demonstrates the power of Unix shell scripting. Powerful shell scripting is one of the many reasons why I stay away from the Windows environment; awk, grep, sed, tr, and similar commands make it possible to perform some relatively powerful data manipulations quickly.
#!/bin/sh
#
# bugdetails_from_ids.sh
# Christopher Stoll, 2013-11-26
#
# Get Bugzilla bug patches given the bug IDs
#
if [ ! -e "$1" ]; then
echo "Usage:\n $0 bug_ids-one_per_line.txt"
exit
fi
echoerr() { echo "$@" 1>&2; }
# read bug IDs into an array
declare -a bugids=(`cat $1`)
((i=0))
while [ ${bugids[i]} ]; do
bugid="${bugids[i]}"
if [ ! -z "$bugid" ]; then
#echoerr "~~~ $bugid"
bugdetail=""
# get the bug details page
bugdetail=`curl -fLs --max-time 60 \
https://api-dev.bugzilla.mozilla.org/latest/bug/$bugid/attachment`
if [ ! -z "$bugdetail" ]; then
# They give us some nice JSON,
# which we are going to just hack apart
# convert commas to newlines
# find lines which contain "ref" (links to the attachments)
# grab the http and the rest of the url
# remove quoation marks
atchids=()
atchids=(`echo "$bugdetail" | \
tr "," "\n" | \
grep \"ref\" | \
awk -F: '{print $2":"$3}' | \
tr -d '"'`)
((j=0))
while [[ ${atchids[j]} ]]; do
atchid="${atchids[j]}"
if [ ! -z "$atchid" ]; then
#echoerr "~~~ $bugid ~~~ $atchid"
atchdetail=""
# get the attachment details page
atchdetail=`curl -fLs --max-time 60 \
"$atchid?attachmentdata=1"`
if [ ! -z "$atchdetail" ]; then
# They give us some nice JSON, again we hack
# convert commas to newlines
# find lines which contain data
# this is the patch data
# grab the actual data
# remove quoation marks
# decode the data from base64
patchdata=`echo "$atchdetail" | \
tr "," "\n" | \
grep \"data\" | \
awk -F: '{print $2}' | \
tr -d '"' | \
base64 --decode`
atchfile=`echo "$patchdata" | grep +++`
echo "$atchfile" | sed -e 's/^/'$bugid'/'
fi
fi
sleep 1
((j=j+1))
done
fi
fi
sleep 1
((i=i+1))
done