William Cook's Fusings: April 2005

Bad Signature formatting in outlook

Do you have trouble with your signature having the wrong format in Outlook? Mine had the "paragraph" style instead of "normal", so it looked different from the rest of my messages.

Well, I figured out that you can edit the HTML in the files where the signatures are stored. Just take out the paragraph marks and everything works much better!

I found the files in
C:\Documents and Settings\\Application Data\Microsoft\Signatures

Example Application

I recently wanted to get some data to select program committee members for a conference. I decided to estimate which authors had the most impact with papers published at the conference in the past. After some experimentation, here is what I wanted to do:

Use Google to estimate how many pages refer to each paper published at the conference, then analyze this data to find the most cited authors.

Here's what I did:

1) First I needed a list of papers. First I thought of scraping HTML from DBLP. Then I noticed that DBLP contains bibtex items, so I used WinHTTrack Website Copier to download bibtex files from DBLP for that conference. This was not too hard, but I did have to play with the configuration a fair bit to get it to work. This is because the bibtex files are stored on a different server.

2) I had to get the bibtex out of the HTML pages. I decided to write an XSLT script to do this, but had a problem because HTML is not valid XML. After a few false starts, I was able to download, compile and run the .NET Html Agility Pack to clean up the HTML. Then I concatenated all the files and ran this XSLT script over them:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output encoding="ascii" method="text">
    <xsl:template match="/">
        <xsl:for-each select="//pre">
            <xsl:value-of select=".">
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

3) I then had a bunch of bibtex files. I needed to extra the titles and authors. So I decided to the bibtex to XML. After a few false tries, I found bib2XML and converted my files to XML. However, I later found that bib2XML does not properly translate special characters (accented characters, superscirpts, trademark symbols). Rather than fix the code, I fixed the files by hand using a few regular expression substitutions.

4) Then I wrote a .NET application to drive the Google web services API. This was fairly straightforward, except that the Google server for registering to use the API was down for a few days. Also, when I passed null for a default parameter, the call failed without a useful explanation. I tried using empty strings instead and it works.

5) My program created a tab-delimited file (TSV) with a line for each author of a paper, which I then loaded into Excel for analysis. I used the data anlysis wizard to count and sum the number of hits for publications. But now I want to do some more sophisticated analysis.

You might say this is overkill for the purpose I had in mind (or even that it is irrelevant). But I sometimes try to automate processes like this just to find out how hard it is. This one was pretty difficult.

Batched Futures

Here is an older paper that is a real gem:

Reducing cross domain call overhead using batched futures
Phillip Bogle and Barbara Liskov
OOPSLA '94
http://portal.acm.org/citation.cfm?id=191133

Techniques that reduce the cost of latency are becoming increasingly important. This approach is quite simple, but it doesn't eliminate all the latency from a program. It can only delay sends based on the history of calls, it cannot base the call pattern on calls that have not been made yet. Some more global analysis is needed for that.

Why not start with a nit?

Have you ever noticed that online maps don't usually point to the right house/business on a street? I was looking at one today and noticed that google maps pointed to the wrong house. The addresses on my side of the street go from 1 to 13, and my address is 11 so I am near the north end of the block. But google pointed to the south end of the block. I suspect that they assume the block goes 1 to 99, and place the pointer at 11% up the block for an address that ends in "11". They should look at the highest number on the block and divide by that instead. Just a nit, but Google could do it...

William Cook's Fusings

Bad Signature formatting in outlook

Example Application

Batched Futures

Why not start with a nit?

My Blog List

About Me

Links

Blog Archive

Followers

William Cook's Fusings

Bad Signature formatting in outlook

Example Application

Batched Futures

Why not start with a nit?

My Blog List

About Me

Links

Subscribe To

Blog Archive

Followers