A couple of days ago I needed to automate file downloading from a service, a very trivial task in every programming language (or using wget).
The little complication was represented by the web based authentication mechanism (userid/password) needed to access to files.
Determining which files to download and their usage (unzipping and picking files) required some specific business logic, nothing really complicated but very annoying.
After a while I realized this job can be done using Javascript and XPCOM and here I would share the solution based on nsIChannel.
What is does
- the service uses userid and password to login
- login to the service using HTTP POST method, it simulates an HTML <form/> submission
- store the cookies sent from server. They contains credentials login data, cookies usage is a prerequisite in our scenario
- reuse login data to download protected resources
Example usage
Suppose you want to automate download of extensions stored on AMO‘s sandbox, you must login first so this use-case is perfect for us.
Someone can consider this approach ugly, web services or Remora API are better but here I only want to demonstrate how to use nsIChannel.
Let’s start…
You can write the Javascript code shown below to download my extension RichFeedButton (dropped on sandbox an year ago)
var amoUsername = "dafi@localhost";
var amoPassword = "my_secret_code";
downloadProtectedResource(
"https://addons.mozilla.org/it/firefox/users/login",
"data[Login][email]=" + amoUsername + "&data[Login][password]" + amoPassword,
"https://addons.mozilla.org/en-US/firefox/downloads/file/33926/richfeedbutton-0.0.21-fx.xpi",
"/tmp/richfeedbutton-0.0.21-fx.xpi");
where our downloadProtectedResource function signature is shown below
function downloadProtectedResource(loginUrl, postData, resourceUrl, destPathName) { ... }
Nothing special, simply we need to know the HTML input names used for userid and password (ie data[Login][email] and data[Login][password]) and pass them in postData argument.
The downloadProtectedResource interacts with nsIChannel (nsIHttpChannel) and other XPCOM object
function downloadProtectedResource(loginUrl, postData, resourceUrl, destPathName) {
var httpChannel = makeHttpChannel(loginUrl); // create an object nsIHttpChannel
var stream = makeStringStream(postData); // create an object nsIStringInputStream
setChannelPostData(httpChannel, stream); // fill data using nsIUploadChannel
// downloader saves data on disk
var downloader = new Downloader(resourceUrl, destPathName);
// make a login then passes cookies to downloader object
var cookieListener = new CookieRetrieverListener(downloader);
// start authentication and download
httpChannel.asyncOpen(cookieListener, null);
}
The object Downloader and CookieRetrieverListener implement the nsIStreamListener interface.
The cookieListener after obtaining cookies aborts the operation because we don’t need all server output, then it calls the downloader.
function CookieRetrieverListener(downloader) {
this.downloader = downloader;
this.cookies = "";
}
CookieRetrieverListener.prototype = {
onStartRequest: function(request, ctx) {
var channel = request.QueryInterface(Components.interfaces.nsIHttpChannel);
this.cookies = channel.getRequestHeader("Cookie");
// no need more data
throw Components.results.NS_ERROR_ABORT;
},
onDataAvailable : function(request, context, inputStream, offset, count) {
},
onStopRequest: function(request, ctx, status) {
this.downloader.cookies = this.cookies;
this.downloader.start();
}
}
Another way to use this code consists to download localizations from BabelZilla as shown below.
BabelZilla requires many parameters on query string 😕
var bzUsername = "dafi_duck";
var bzPassword = "my_secret_code";
var bzItemId = "88";
var bzExtId = "4432";
downloadProtectedResource("http://www.babelzilla.org/index.php",
"op2=login&lang=english&message=0"
+ "&option=ipblogin&task=login&0b14737c5ade1f7697a8f81b33b0bacf=1"
+ "&option=com_frontpage&Itemid=1"
+ "&username=" + bzUsername
+ "&passwd=" + bzPassword,
"http://www.babelzilla.org/index.php?option=com_wts&type=downloadtar"
+ "&Itemid=" + bzItemId
+ "&extension=" + bzExtId,
"/tmp/vsw.tar.gz");
nsIChannel.asyncOpen
Accessing to cookies received from server requires to use nsIStreamListener available only in asynchronous open calls.
This needs to start the download only when cookies are surely retrieved, this is achieved using the nsIRequestObserver.onStopRequest, any better idea is very appreciated.
Complete code
The complete code contains a few of helper functions (reading binary stream, saving file) and is available on SVN, it’s ready to be executed as Komodo macro simply setting userid and password.