<?xml version='1.0' encoding='utf-8' ?>
<!--  If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/  -->
<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:media='http://search.yahoo.com/mrss/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>Daniel Einspanjer&apos;s journal</title>
  <link>http://daniele.livejournal.com/</link>
  <description>Daniel Einspanjer&apos;s journal - LiveJournal.com</description>
  <lastBuildDate>Tue, 17 Nov 2009 06:26:21 GMT</lastBuildDate>
  <generator>LiveJournal / LiveJournal.com</generator>
  <lj:journal>daniele</lj:journal>
  <lj:journalid>454686</lj:journalid>
  <lj:journaltype>personal</lj:journaltype>
  <atom10:link rel='hub' href='http://pubsubhubbub.appspot.com/' />
  <image>
    <url>http://l-userpic.livejournal.com/18541170/454686</url>
    <title>Daniel Einspanjer&apos;s journal</title>
    <link>http://daniele.livejournal.com/</link>
    <width>74</width>
    <height>100</height>
  </image>

<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/78409.html</guid>
  <pubDate>Tue, 17 Nov 2009 06:26:21 GMT</pubDate>
  <title>Performance of Rhino JS engine and Janino library in Kettle</title>
  <link>http://daniele.livejournal.com/78409.html</link>
  <description>My friend Roland Bouman made an interesting &lt;a href=&quot;http://rpbouman.blogspot.com/2009/11/pentaho-data-integration-javascript.html&quot;&gt;blog post regarding the performance of a bit of JavaScript for Kettle&lt;/a&gt; that he saw on a different blog.&lt;br /&gt;&lt;br /&gt;Given the large amounts of data that I am shoving through Kettle every day, I tend to be extremely concerned about performance.  Even a small inefficiency can lead to dramatic slowdowns.  Hence, when I saw his post, I got to thinking about how I would approach the problem if it were within the realm of the large data sets I work with and hence required extreme optimization.&lt;br /&gt;&lt;br /&gt;I didn&apos;t have a lot of spare time to dedicate to this experiment, so I opted for a screen-cast instead of a nicely formatted blog post.  That said, I think there is a certain benefit in being able to see the work flow of someone who is very comfortable with Kettle.&lt;br /&gt;&lt;br /&gt;The screen-cast is currently in Apple QuickTime format. Bleh.  I need to get a new Ogg Theora transcoder because the one that I tried to use last time is not happy with me and I didn&apos;t have time to fiddle with it.&lt;br /&gt;&lt;br /&gt;So, if you use Kettle and are interested in these things, here is the screen-cast.  Be warned it is 30 minutes long and probably not extremely exciting to anyone outside of the ETL field.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://bit.ly/PDI_example&quot; target=&quot;_blank&quot; title=&quot;http://people.mozilla.com/~deinspanjer/KettleJSPerformance.mov&quot;&gt;Kettle string transformation optimization walk-through&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If you are familiar with developing plug-ins for Kettle and you&apos;d like to take a look at the User Defined Java Class plug-in I demonstrated at the end of the screen-cast, you can pick it up from the Pentaho SVN plugins repository. Just wear gloves because it has rough edges.&lt;br /&gt;&lt;a title=&quot;svn://source.pentaho.org/svnkettleroot/plugins/UserDefinedJavaClass/branches/3.2x&quot; href=&quot;http://bit.ly/UDJC_SVN&quot;&gt;User Defined Java Class plug-in&lt;/a&gt;</description>
  <comments>http://daniele.livejournal.com/78409.html</comments>
  <category>kettle</category>
  <lj:mood>tired</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/78239.html</guid>
  <pubDate>Thu, 24 Sep 2009 20:10:50 GMT</pubDate>
  <title>Advice regarding using Travelex Cash Passport cards for travel</title>
  <link>http://daniele.livejournal.com/78239.html</link>
  <description>If you are a traveling to Europe and considering getting one of these debit cards to make life easier for you while there, my advice is &lt;strong&gt;don&apos;t&lt;/strong&gt;!  Figure out which of your credit cards charges the least amount of fees for international usage and just use it.&lt;br /&gt;&lt;br /&gt;&lt;a name=&quot;cutid1&quot;&gt;&lt;/a&gt;&lt;br /&gt;First, a general complaint:&amp;nbsp; interacting with Travelex is a hassle. Their phone menus are a maze of twisty little passages, all alike.&amp;nbsp; When you finally manage to reach a person, you&apos;ll have to give them all the exact same information you entered into the phone menu.&amp;nbsp; Calling them when you are on your trip means calling a long distance number plus additional charges if you must use your cell phone or hotel phone or pay phone.&amp;nbsp; The customer service agents are not rude though.&amp;nbsp; I&apos;ll give them that.&lt;br /&gt;&lt;br /&gt;Most importantly, the card does not work everywhere that MasterCard is accepted.  If you give it to a merchant that it is not compatible, their credit card processing machine might decline the card but the money can still be withheld from you for seven days.  If the merchant runs the card through three or four times, each attempt will withhold the funds again.  A nice &amp;euro;60 meal can turn into a &amp;euro;180 disaster.&lt;br /&gt;&lt;br /&gt;If you get into this situation, don&apos;t expect help from anyone.  The merchant can&apos;t do anything because it was declined on his side.  When you call Travelex (paying international long distance or international roaming fees), they will tell you that it isn&apos;t their fault and that the merchant did something wrong. Ignore the fact that you had to turn around and use a different MasterCard with the merchant and that one went through just fine.  Furthermore, Travelex will tell you that your only option is to wait seven business days and see if the hold disappears.  If it does, move on to the next challenge.  If it doesn&apos;t, then expect the following additional hassle:&lt;ol&gt;&lt;li&gt;request a dispute form from Travelex&lt;/li&gt;&lt;li&gt;wait for it to be delivered by mail&lt;/li&gt;&lt;li&gt;fill it out and send them copies of all the decline receipts and a letter from the merchant stating that they did not receive the money that is disputed and that you have already paid them through other means&lt;/li&gt;&lt;li&gt;wait.  After pulling teeth with one customer service agent, I was able to get the reassurance that it should definitely take less than a year to resolve.  Likely just a few months.&lt;/li&gt;&lt;/ol&gt;Once you have the funds back on your card, it is likely you might be in the same boat as me and your trip to Europe is now over!  So now, how do you get the money back? You have a few options:&lt;ul&gt;&lt;li&gt;Withdraw it from an ATM. You will pay:&lt;ul&gt;&lt;li&gt;&amp;euro;1.75 ATM fee converted to USD at a worse than market rate&lt;/li&gt;&lt;li&gt;$x dollars to the owner of the ATM for using a foreign ATM card&lt;/li&gt;&lt;li&gt;Whatever remainder you can&apos;t take out through the ATM will be taken by Travelex after 12 months&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Close the card at a Travelex branch. You will pay:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;$20 administration fee&lt;/li&gt;&lt;li&gt;Worse than market conversion rate (note the fine print that the &amp;quot;Currency Return Guarantee&amp;quot; doesn&apos;t apply to the money on the card!)&lt;/li&gt;&lt;li&gt;Another arbitrary fee/commission that varies from branch to branch and (I suspect) the mood of the employee&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Leave the funds on the card until your next trip. You will pay:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&amp;euro;2.30 Monthly inactivity fee each month after a year.&lt;/li&gt;&lt;li&gt;The unavailability of the money for the duration (Travelex will happily take advantage of the money while you aren&apos;t using it though!)&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;</description>
  <comments>http://daniele.livejournal.com/78239.html</comments>
  <category>personal</category>
  <lj:mood>angry</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/77979.html</guid>
  <pubDate>Tue, 08 Sep 2009 14:16:12 GMT</pubDate>
  <title>Ubuntu screen-profiles customization</title>
  <link>http://daniele.livejournal.com/77979.html</link>
  <description>I recently loaded Ubuntu server 9.04 onto a new machine and encountered Ubuntu&apos;s screen-profiles.&lt;br /&gt;In general, I like it.  I had one problem and one customization that I wanted to share:&lt;br /&gt;&lt;br /&gt;I use Mac OS X&apos;s Terminal.app to connect to my remote machines, and by default, it has custom mappings for F1 through F4.  I have no idea what those keybindings mean, but they prevent screen-profiles&apos;s keybindings from working.  It took a little fiddling to figure out how to fix them.  Basically, you need to:&lt;ol&gt;&lt;li&gt;Open up the preferences dialog for Terminal.app&lt;/li&gt;&lt;li&gt;Go to the Settings pane&lt;/li&gt;&lt;li&gt;Click on the Keyboard tab button&lt;/li&gt;&lt;li&gt;Edit the action for each of the F1 through F4 keys&lt;/li&gt;&lt;li&gt;When editing, click the &amp;quot;delete one character&amp;quot; button twice to erase the characters currently in there (leave the \033 escape)&lt;/li&gt;&lt;li&gt;Type the following characters: [ 1 1 ~   11 is F1, 12 is F2, 13 is F3, 14 is F4&lt;/li&gt;&lt;li&gt;The new entries should look just like the F5 through F8 actions.&lt;/li&gt;&lt;/ol&gt;Once I was able to use the F2 F3 and F4 keys, I decided that they weren&apos;t that useful to me.  I prefer to use a combination of screen regions and windows.  The window commands are very easy for me, but I&apos;ve always found the split, focus, and remove keybindings to be uncomfortable so I figured those would be great commands to map to F2 F3 and F4.  Here is how I did that:&lt;ol&gt;&lt;li&gt;sudo cp /usr/share/screen-profiles/keybindings/common /usr/share/screen-profiles/keybindings/regions&lt;/li&gt;&lt;li&gt;sudo vi /usr/share/screen-profiles/keybindings/regions&lt;/li&gt;&lt;li&gt;replace the first four entries with the new entries below&lt;/li&gt;&lt;li&gt;save and close the file&lt;/li&gt;&lt;li&gt;In screen, hit F9 to bring up the menu&lt;/li&gt;&lt;li&gt;Select the option for &amp;quot;Change keybinding set&lt;/li&gt;&lt;li&gt;Select the new &amp;quot;regions&amp;quot; entry&lt;/li&gt;&lt;li&gt;Hit F5 to reload your screen-profile and pick up the new keybindings.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;code&gt;register n &amp;quot;^aS^a^i^a^c^aA&amp;quot;                                 #     | Goes with the F2 definition&lt;br /&gt;bindkey -k k2 process n                                 # F2  | Create new region and window (and name it)&lt;br /&gt;bindkey -k k3 focus                                     # F3  | Next region&lt;br /&gt;bindkey -k k4 remove                                    # F4  | Remove region&lt;br /&gt;&lt;/code&gt;</description>
  <comments>http://daniele.livejournal.com/77979.html</comments>
  <category>ubuntu</category>
  <category>screen</category>
  <lj:mood>contemplative</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/77587.html</guid>
  <pubDate>Thu, 30 Jul 2009 05:55:16 GMT</pubDate>
  <title>Shell script analytics</title>
  <link>http://daniele.livejournal.com/77587.html</link>
  <description>I just made a rather lengthy post on the &lt;a href=&quot;http://blog.mozilla.com/data/&quot;&gt;Mozilla blog of data&lt;/a&gt; about &lt;a href=&quot;http://blog.mozilla.com/data/2009/07/29/shell-script-analytics/&quot;&gt;shell script analytics&lt;/a&gt;.&amp;nbsp; I&apos;ll try hard not to cross post stuff like this too often, but I thought I&apos;d allow myself the spam this time around because using Bash and AWK to do things like this really is an important part of who I am personally as a geek in addition to what I do for Mozilla. :)&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/77587.html</comments>
  <category>work</category>
  <category>data</category>
  <category>mozilla</category>
  <category>etl</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/77528.html</guid>
  <pubDate>Fri, 29 May 2009 15:10:39 GMT</pubDate>
  <title>I&apos;ve always thought my job was fun.  Now I hear it is sexy too!</title>
  <link>http://daniele.livejournal.com/77528.html</link>
  <description>I just finished reading this lovely little post from the company &lt;a href=&quot;http://dataspora.com&quot;&gt;dataspora&lt;/a&gt; titled &lt;a href=&quot;http://dataspora.com/blog/sexy-data-geeks/&quot;&gt;The Three Sexy Skills of Data Geeks&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;By far, my favorite quote was,&amp;nbsp;&amp;quot;A good data munger excels at turning coffee into regular expressions and parsers&amp;quot;.&amp;nbsp; That certainly describes me to a tee. :)&lt;br /&gt;&lt;br /&gt;I&apos;ve always found each of these the three facets of working with data fascinating.&amp;nbsp; One of the comments mentioned that decision making was an important missing trait.&amp;nbsp; I could go either way there.&amp;nbsp; I feel it is good to be able to tell a compelling story with the data that helps others to understand it, and then those people take the understanding you imparted to them and make decisions based on it.&lt;br /&gt;&lt;br /&gt;It is incredibly hard to find a person who is skilled in just one or two of these facets.&amp;nbsp; When you find the data geek who has all three, then you count yourself lucky.&amp;nbsp; Expecting someone who has that caliber of devotion to data to also be capable of making decisions like a CEO is a bit unrealistic in my opinion.&lt;br /&gt;&lt;br /&gt;Anyway, the article is a good, quick read.&amp;nbsp; It also quite nicely summarizes the major passions in my professional life right now.&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/77528.html</comments>
  <category>work</category>
  <category>data</category>
  <category>visualization</category>
  <lj:mood>busy</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/77084.html</guid>
  <pubDate>Wed, 27 May 2009 14:49:42 GMT</pubDate>
  <title>Interesting crowd-sourced solution sites</title>
  <link>http://daniele.livejournal.com/77084.html</link>
  <description>A good friend of mine runs the site &lt;a href=&quot;http://bug.gd&quot;&gt;bug.gd&lt;/a&gt; (and it&apos;s more professional pseudonym, &lt;a href=&quot;http://errorhelp.com&quot;&gt;errorhelp.com&lt;/a&gt;).&amp;nbsp; This service provides something that is slightly missing from the typical Google search for an error to find a solution.&amp;nbsp; It allows you to enter the full text of the error message or stack trace instead of just a couple of keywords, and it provides rich community feedback on solutions.&amp;nbsp; You can even tip people for their solutions through &lt;a href=&quot;http://tipjoy.com&quot;&gt;tipjoy.com&lt;/a&gt; integration.&lt;br /&gt;&lt;br /&gt;I recently came across two other nice sites created by a different company that provide a similar and complimentary service:&lt;br /&gt;&lt;a href=&quot;http://stackoverflow.com&quot;&gt;stackoverflow.com&lt;/a&gt; - A site dedicated to crowd-sourcing answers to programming questions&lt;br /&gt;&lt;a href=&quot;http://serverfault.com&quot;&gt;serverfault.com&lt;/a&gt; - A site dedicated to crowd-sourcing answers to system administration questions&lt;br /&gt;&lt;br /&gt;I think it is very helpful to have a list of these sites that you can go to post a question and hopefully get an answer that will even be moderated by the community to help you determine the value of the answer.&amp;nbsp; This is something that typically takes a lot longer if you search for a forum or mailing list site and post there.&amp;nbsp; While it is less immediate than IRC, the moderation and ability to leave a question and get an answer &amp;quot;soon&amp;quot; are nice features you are less likely to see in IRC (although I&apos;ve always gotten great results from #java, #sql, #mysql, and #bash).&lt;br /&gt;&lt;br /&gt;As you can tell, I&apos;m a big fan of crowd-sourcing.&amp;nbsp; I&amp;nbsp;have run a couple of contests on &lt;a href=&quot;http://99designs.com&quot;&gt;99designs.com&lt;/a&gt; and have been incredibly pleased with the results that came out of that community of freelance graphic designers.&lt;br /&gt;&lt;br /&gt;Check these places out and see if they can help you or if you can help them!&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/77084.html</comments>
  <category>programming</category>
  <category>crowd-sourcing</category>
  <category>errors</category>
  <category>sysadmin</category>
  <lj:mood>cheerful</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/77049.html</guid>
  <pubDate>Fri, 17 Apr 2009 22:40:59 GMT</pubDate>
  <title>Ways to visualize and share data</title>
  <link>http://daniele.livejournal.com/77049.html</link>
  <description>Mozilla needs to be able to provide useful extracts of data such as download trends, etc. and allow the community to perform their own analysis on them, so I&apos;m always keeping a lookout for useful tools to further that goal.&lt;br /&gt;&lt;br /&gt;When Tony Wright posted the blog entry &lt;a href=&quot;http://www.tonywright.com/2009/just-how-important-is-the-valley-lets-look-at-some-data/&quot;&gt;Just How Important is the Valley? Let&amp;rsquo;s Look at some Data&lt;/a&gt; on April 17th 2009, he was kind enough to publish the data set (it needs an attribution / license though) and the data looked interesting so I thought I&apos;d spend a little time playing with it using some tools that I&apos;ve been keeping my eye on.&lt;br /&gt;&lt;br /&gt;First, I slurped the table into &lt;a href=&quot;http://www.dabbledb.com&quot;&gt;DabbleDB&lt;/a&gt;, a website that is very well suited to messing with this type of data (i.e. sourced from the web, might need a bit of cleanup, etc.).  You can view and edit the data I imported to DabbleDB here: &lt;a href=&quot;https://yipyip.dabbledb.com/page/yipyip/uqFxSObU&quot;&gt;Acquired Startups Data&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;DabbleDB does a great job at allowing a user to sort, filter, group, and modify data using a simple interface, but it does not have a large array of visualizations.  For that, we head over here to the IBM AlphaWorks lab&apos;s project, &lt;a href=&quot;http://manyeyes.alphaworks.ibm.com/wikified&quot;&gt;Many Eyes Wikified&lt;/a&gt;.&amp;nbsp; I created a quick wiki dashboard for throwing together a few visualizations: &lt;a href=&quot;http://manyeyes.alphaworks.ibm.com/wikified/acquired_startups/Main Page&quot;&gt;Acquired Startups Visualizations&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This was just a quick break from real work I&apos;ve been doing, so I spent less than an hour on this.&amp;nbsp; I only took about 20 minutes with DabbleDB: importing the data, cleaning the dollar values, then creating two new views that group the data by country or by state for visualization.&amp;nbsp; Then I moved over to Many Eyes and played with a few visualizations to try to find some interesting views of the data and threw them into the dashboard and two sub pages.&lt;br /&gt;&lt;br /&gt;Being able to quickly extract, transform, and visualize this data is the big win for DabbleDB and Many Eyes in my opinion.&amp;nbsp; With both applications having open licensing of the data and collaboration as a key focus, they are tools that I hope to be able to take advantage of at Mozilla soon.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;https://yipyip.dabbledb.com/page/yipyip/uqFxSObU&quot;&gt;&lt;img width=&quot;694&quot; height=&quot;265&quot; src=&quot;http://content.screencast.com/users/DEinspanjer/folders/Jing/media/e7f222f2-8953-47f8-a6f7-9409c93c5036/00000068.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://manyeyes.alphaworks.ibm.com/wikified/acquired_startups/Main%20Page&quot;&gt;&lt;img width=&quot;733&quot; height=&quot;253&quot; src=&quot;http://content.screencast.com/users/DEinspanjer/folders/Jing/media/3742ed48-cc68-4241-9d41-a1788c68b712/00000067.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/77049.html</comments>
  <category>manyeyes</category>
  <category>data</category>
  <category>visualization</category>
  <category>dabbledb</category>
  <lj:mood>cheerful</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/76695.html</guid>
  <pubDate>Mon, 30 Mar 2009 16:17:52 GMT</pubDate>
  <title>Counting unique visitors in SQL</title>
  <link>http://daniele.livejournal.com/76695.html</link>
  <description>A lot of web metrics solutions out there like NetTracker or Omniture allow you to perform analysis on the number of unique visitors over time.  This is a pretty important metric to a lot of companies, and I recently needed to perform such an analysis, but it was on data stored in a SQL database rather than in one of these proprietary solution&apos;s data-stores.&lt;br /&gt;&lt;br /&gt;Doing any sort of distinct counting on a large volume of data in SQL can be very costly, both in terms of storage of the raw data (since you can&apos;t aggregate it), and in query performance since there are relatively few optimizations that can be performed on the table or the query.&lt;br /&gt;&lt;br /&gt;&lt;a name=&quot;cutid1&quot;&gt;&lt;/a&gt;&lt;br /&gt;Fortunately for me, our data warehouse is stored in &lt;a href=&quot;http://www.vertica.com&quot;&gt;Vertica&lt;/a&gt;, and while the queries weren&apos;t blindingly fast, I was able to get the analysis done in a very reasonable time frame.&lt;br /&gt;&lt;br /&gt;I was dealing with a week worth of traffic (about 80m requests per day), and one of the biggest challenges I had was how to determine what constituted a &amp;quot;unique visitor&amp;quot; (in this case, it is actually more of a unique ping or requestor since there isn&apos;t really a person involved).&lt;br /&gt;&lt;br /&gt;I didn&apos;t have a cookie that I could use, so that left me with the less desirable course of using a combination of IP address and User Agent string.  The problem with this is that the solution will under count one class of requests, and over count a different class.  Here are the details:&lt;br /&gt;&lt;br /&gt;1. If a request comes from a host that receives its public IP address via DHCP (e.g. a cable modem or DSL) and that service provider has their DHCP configured to force a change of IP addresses when the host renews its lease, then when the IP address changes, the requestor will be considered &amp;quot;new&amp;quot;.  e.g. HostX makes a request on Monday with IP 1.2.3.4 and a request Tuesday with IP 1.2.3.4. Then, on Wednesday, their IP address changes.  Later Wednesday, HostX makes a request with its new IP 2.3.4.5.  When we perform analysis on this week, we will see one distinct requestor on Monday and Tuesday, but a new requestor on Wednesday.  In the worst case, if a new host, HostY is assigned IP 1.2.3.4 which HostX used to have and HostY also makes a request using the same OS version, we will mistakenly believe HostY on Wednesday through Saturday is the same distinct requestor as HostX from Monday and Tuesday.&lt;br /&gt; &lt;br /&gt; 2. If several hosts are on the same LAN network (e.g. an office), then the public IP address will likely be the same for each of those hosts.  I use the partial user agent string to help mitigate this problem.  I am pulling the OS and locale details out of the user agent string and using that in addition to the IP address to determine uniqueness.  Unfortunately, there are a lot of machines running Windows XP with en-US, so this is only partially helpful.  Any host with the same IP + OSversion + locale will be treated as a single distinct requestor in this analysis.&lt;br /&gt; &lt;br /&gt; 3. When I worked on this IP+UA strategy, I originally tested using the full UA (user agent) string which includes the browser version number.  This might make sense for many other websites, but unfortunately, what I saw in the test cases that I used was that we would &amp;quot;forget&amp;quot; the distinctness whenver the browser is upgraded, or in some cases, even when certain plugins or extensions are installed (ones that modify the UA) [I&apos;m glaring at you, MegaUpload!].&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;So, with this strategy in place, I ran the numbers and while I could see an unfortunate amount of under counting (i.e. multiple requests being counted as the same distinct requestor when they likely should have been separate), it was as good as I was going to get.&lt;br /&gt;&lt;br /&gt;The last thing I needed to do was to write a SQL statement that added up the number of distinct requestors grouped by the number of days in the week that requests were made.  Here is the SQL I wrote to do that.  This was just my first stab at it, I got my answer, and it didn&apos;t take more than a few minutes, so I left it at that.  I&apos;d still be interested in hearing if anyone else has a better way. :)&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;SELECT (d15 + d16 + d17 + d18 + d19 + d20 + d21) AS RequestsPerWeek&lt;br /&gt;, COUNT(*) AS NumDistinctRequestors&lt;br /&gt;FROM (&lt;br /&gt;    SELECT&lt;br /&gt;      MAX(CASE WHEN d.date = &apos;2009-03-15&apos; THEN 1 ELSE 0 END) AS d15&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-16&apos; THEN 1 ELSE 0 END) AS d16&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-17&apos; THEN 1 ELSE 0 END) AS d17&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-18&apos; THEN 1 ELSE 0 END) AS d18&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-19&apos; THEN 1 ELSE 0 END) AS d19&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-20&apos; THEN 1 ELSE 0 END) AS d20&lt;br /&gt;    , MAX(CASE WHEN d.date = &apos;2009-03-21&apos; THEN 1 ELSE 0 END) AS d21&lt;br /&gt;    FROM distinct_requests a1&lt;br /&gt;    JOIN dates d ON a1.utc_date_id = d.date_id&lt;br /&gt;    GROUP BY a1.ip_ua_id&lt;br /&gt;) x&lt;br /&gt;GROUP BY (d15 + d16 + d17 + d18 + d19 + d20 + d21)&lt;br /&gt;ORDER BY (d15 + d16 + d17 + d18 + d19 + d20 + d21)&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/76695.html</comments>
  <category>work</category>
  <category>vertica</category>
  <category>data</category>
  <category>sql</category>
  <lj:mood>working</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/76336.html</guid>
  <pubDate>Mon, 09 Mar 2009 16:17:37 GMT</pubDate>
  <title>TinyArro.ws URLs</title>
  <link>http://daniele.livejournal.com/76336.html</link>
  <description>A friend just released an URL shrinking service that I enjoy:&amp;nbsp; &lt;a href=&quot;http://tinyarro.ws&quot;&gt;tinyarro.ws&lt;/a&gt; (more nifty when written as &lt;a href=&quot;http://➡.ws&quot;&gt;➡.ws&lt;/a&gt;).&lt;br /&gt;It has a few great features over the current main stream shrinkers:&lt;br /&gt;&lt;br /&gt;1. Cool/fun URLs (e.g. http://➽.ws/囨 for my website)&lt;br /&gt;2. Very short URLs due to Unicode suffixes (great for Twitter!)&lt;br /&gt;3. Preview by default! (no tweak to the URL to remember)&lt;br /&gt;4. Option to enter your own custom suffix&amp;nbsp;(TinyURL now has this, but it was too useful to not mention).&lt;br /&gt;5. &lt;a href=&quot;http://›.ws/☺&quot;&gt;A Ubiquity command &amp;rsaquo;.ws/☺&lt;/a&gt; (eventually to be integrated directly on the site)&lt;br /&gt;&lt;br /&gt;Some news about the site:&lt;br /&gt;&lt;a href=&quot;http://news.ycombinator.com/item?id=507982&quot; rel=&quot;nofollow&quot;&gt;TinyArro.ws: 10 new unicode domains. Defaulting previews to ON.&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://news.ycombinator.com/item?id=498051&quot;&gt;Ask HN: Thoughts on TinyArro.ws? Tiniest urls in the world (or your money back)&lt;/a&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/76336.html</comments>
  <category>fun</category>
  <lj:mood>amused</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/76164.html</guid>
  <pubDate>Thu, 12 Feb 2009 19:33:08 GMT</pubDate>
  <title>Willingness to be a little evil</title>
  <link>http://daniele.livejournal.com/76164.html</link>
  <description>I have been a supporter of Firefox and Mozilla for several years now, and while I don&apos;t write patches and fix bugs, a major part of that support is educating people about Mozilla, open source, and user empowerment whenever a conversation about technology allows for it.&lt;br /&gt;&lt;br /&gt;I&apos;ve found that people who use proprietary software and operating systems often fall into two broad categories for rationalizing that choice:&lt;br /&gt;1. They are told to do so by some authority (usually their employeer, sometimes their social tech support person, and in some cases, just because they were told it was the right thing to do by an ad or magazine article).&lt;br /&gt;2. They started using it for some reason (typically reason #1 above) a long time ago and are now just accustomed to it.&lt;br /&gt;&lt;br /&gt;I&apos;m sure all this is going to be old news to most people reading this, but I bring it up because of an interesting article I read today.&lt;br /&gt;&lt;br /&gt;In the 1960&apos;s and early 70&apos;s, psychologist Stanley Milgram performed &lt;a href=&quot;http://en.wikipedia.org/wiki/Milgram_experiment&quot;&gt;a series of famous experiments&lt;/a&gt; that tested the willingness of people to do something they would normally object to on moral grounds when they are in a strictly controlled environment and instructed to do so by an authority figure.&lt;br /&gt;&lt;br /&gt;More recently, psychologist Jerry Burger had the opportunity to perform a series of similar experiments.&amp;nbsp; &lt;a href=&quot;http://www.alternet.org/module/printversion/126492&quot;&gt;This alternet article&lt;/a&gt; describes the story and discusses the findings.&amp;nbsp; As I read the results and Dr. Burger&apos;s statements regarding the findings, I started thinking about how easy it is for the people to choose to give up their freedom to a piece of proprietary software for reasons similar to the ones described in these experiments.&lt;br /&gt;&lt;br /&gt;In a green field, these people would normally opt for software that provided them with more freedom and in many cases, subjectively better security, but because they are instructed by an authority figure, or because they got started with it a long time ago and just slid deeper and deeper in, those preferences are not enough by themselves to prompt the person to change their behavior.&lt;br /&gt;&lt;br /&gt;Now even this thought in and of itself would not be enough to prompt me to blog about this topic.&amp;nbsp; We&apos;re still well in the territory where the people who haven&apos;t gotten lost in a Wikipedia article about toothbrush hygiene they found when they clicked my first link are saying, &amp;quot;um, DUH!&amp;quot;&amp;nbsp; So here is my point:&lt;br /&gt;&lt;br /&gt;At the end of the article, Dr. Burger focuses on an interesting finding of both experiments.&amp;nbsp; &lt;em&gt;When a person is instructed to do something &amp;quot;wrong&amp;quot;, they are significantly less likely to do so if they are surrounded by peers who object first&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;So when you talk to someone who is sighing about how much they hate product X but they don&apos;t have a choice, don&apos;t hate on them and don&apos;t deride them for not having a backbone, but just tell them and show them how you chose to stand up for your freedom and your security.&amp;nbsp; An example can go a long way toward giving them the courage to listen to that little voice inside saying, &amp;quot;I want something better!&amp;quot;&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/76164.html</comments>
  <lj:mood>working</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/76011.html</guid>
  <pubDate>Thu, 05 Feb 2009 21:43:45 GMT</pubDate>
  <title>Bash functions for going up to a directory</title>
  <link>http://daniele.livejournal.com/76011.html</link>
  <description>Sometimes, if I&apos;m in a really deep directory, I don&apos;t want to cd from / nor do I want to cd ../../../..&lt;br /&gt;I just want to either go up 5 directories, or maybe I want to go up to the parent directory &amp;quot;src&amp;quot; when I&apos;m in /home/dre/src/projects/foo/bar/classes/org/apache/blah&lt;br /&gt;&lt;br /&gt;This set of Bash functions lets me do that.&lt;br /&gt;The first, up() will change your directory. The second will instead just print the desired directory name.&amp;nbsp; This makes it easy for you to mv a file up higher or something.&lt;br /&gt;&lt;br /&gt;If you pass no arguments, it just goes up one directory.&lt;br /&gt;If you pass a numeric argument it will go up that number of directories.&lt;br /&gt;If you pass a string argument, it will look for a parent directory with that name and go up to it.&lt;br /&gt;(Note, there is a small display bug there. If you give it an invalid name, cd reports the &amp;quot;No such file or directory&amp;quot; error, which is good, but it has a bogus path.  Since you can&apos;t know what path they were actually trying to go to, it should just say &amp;quot;No such parent directory: ${yourbogusname}&amp;quot;.  I don&apos;t have time to figure that out right now though.)&lt;br /&gt;&lt;br /&gt;Just put these functions in your ~/.bashrc file and don&apos;t forget to source it. (&amp;nbsp; source ~/.bashrc )&lt;br /&gt;&lt;pre&gt;

function up()
{
    dir=&amp;quot;&amp;quot;
    if [ -z &amp;quot;$1&amp;quot; ]; then
        dir=..
    elif [[ $1 =~ ^[0-9]+$ ]]; then
        x=0
        while [ $x -lt ${1:-1} ]; do
            dir=${dir}../
            x=$(($x+1))
        done
    else
        dir=${PWD%/$1/*}/$1
    fi
    cd &amp;quot;$dir&amp;quot;;
}

function upstr()
{
    echo &amp;quot;$(up &amp;quot;$1&amp;quot; &amp;amp;&amp;amp; pwd)&amp;quot;;
}
&lt;/pre&gt;</description>
  <comments>http://daniele.livejournal.com/76011.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/75621.html</guid>
  <pubDate>Fri, 19 Dec 2008 16:18:13 GMT</pubDate>
  <title>All hail Ken Kovash!</title>
  <link>http://daniele.livejournal.com/75621.html</link>
  <description>It may be showing my ignorance, but I was unaware until recently of the officially recognized day for celebrating the man, the myth, and the math that is &lt;a href=&quot;http://www.kenkovash.com/&quot;&gt;Ken Kovash&lt;/a&gt;.&amp;nbsp; To think that all the time leading up to this point, I had just been satisfied with the joyous feeling in my heart every day I interacted with him.&lt;br /&gt;&lt;br /&gt;Ken can be a harsh task-master some times. &lt;br /&gt;&quot;Daniel, where are my &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/11/19/using-firefox-after-eating-turkey/&quot;&gt;numbers from yesterday&lt;/a&gt;?&quot; &lt;br /&gt;&quot;Daniel, why are the &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/11/20/we-shipped-funnelcake03/&quot;&gt;funnelcake&lt;/a&gt; trends low here and high there? You&apos;re  data are wrong, go find it and fix it!&quot; &lt;br /&gt;But the pain is worth it when I see him take my crude raw data and  masterfully sculpt it into bounteous bevies of &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/09/16/do-ads-driving-firefox-downloads-affect-firefox-downloads/&quot;&gt;tables&lt;/a&gt;, raging rivers of &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/08/21/a-first-look-at-the-uninstall-survey/&quot;&gt;trend lines&lt;/a&gt;, triumphant  towers of &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/07/23/where-will-firefox-reach-50-market-share/&quot;&gt;bar charts&lt;/a&gt;, overwhelming ontologies of &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/11/06/firefox-usage-and-europe/&quot;&gt;pie graphs&lt;/a&gt;, and &lt;span&gt;gilt-edged grids &lt;/span&gt;of &lt;a href=&quot;http://blog.mozilla.com/metrics/2008/09/04/visualizing-data-in-new-ways/&quot;&gt;treemaps&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;One must weep to behold it. &lt;br /&gt;&amp;nbsp;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/75621.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/75318.html</guid>
  <pubDate>Tue, 09 Dec 2008 16:09:55 GMT</pubDate>
  <title>Performance improvements at the cost of complexity</title>
  <link>http://daniele.livejournal.com/75318.html</link>
  <description>I discovered something that I feel is a bit of a bug in the Sun Java implementation.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;If you pass in a string to the method InetAddress.getByName(), it does a bunch of testing to see if it is a domain name or a literal IP address.&lt;br /&gt;If it is an IPv4 address, it will then use String.split() to split the four parts.&amp;nbsp; String.split() uses regexes to do its work.&lt;br /&gt;&lt;br /&gt;That means that if you are querying for hundreds or millions of addresses in a tight loop (as I&apos;ve been doing), the JVM is spawning and compiling hundreds or millions of regex objects, in addition to a String array and four String objects per call.&lt;br /&gt;&lt;br /&gt;So at first, I just worked around it by doing basic substringing instead of splitting.&amp;nbsp; That gave me about 100x performance improvement. But then I realized I was still generating four string objects for every call..&lt;br /&gt;&lt;br /&gt;So I came up with this mapping method and it runs about 1000x faster with a near constant minimal memory footprint.&lt;br /&gt;&lt;br /&gt;I pre-calculate a multidimensional array of shorts where each element is indexed by the literal character value - 48 of the digits making up the number 0 - 255.&lt;br /&gt;&lt;br /&gt;With that array available, at run time, I can do a simple lookup of the short value and then do the math to get the long representation of the IP address.&amp;nbsp; I&apos;m still generating a couple of references and a few intermediate int values, but the JIT optimizer can make quick work of that.&lt;br /&gt;&lt;br /&gt;Linked is the test program I created to play with the different methods:&amp;nbsp; &lt;a href=&quot;http://people.mozilla.com/%7Edeinspanjer/InetAddressParse.zip&quot;&gt;InetAddressParse test&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/75318.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>8</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/75038.html</guid>
  <pubDate>Fri, 28 Nov 2008 04:16:37 GMT</pubDate>
  <title>Don&apos;t listen to bash, it will lie to you!</title>
  <link>http://daniele.livejournal.com/75038.html</link>
  <description>Remember folks,&amp;nbsp; if you mv a directory, and there is a bash shell currently in that directory, the bash prompt will not update to reflect the new name until you cd out of the directory and then back in.&lt;br /&gt;&lt;br /&gt;I just spend way too long making changes and being frustrated because the changes weren&apos;t having any effect.&amp;nbsp; I was clearing cashes and restarting applications and monitoring log files..&amp;nbsp; It wasn&apos;t until I happened to do a :pwd in vim while editing the file for the umpteenth time that I finally noticed that the file I had been editing was actually in a backup of the folder that I had just made.&lt;br /&gt;&lt;br /&gt;Sheesh.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/75038.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/74929.html</guid>
  <pubDate>Sat, 25 Oct 2008 01:37:41 GMT</pubDate>
  <title>The best DHTML date range picker I&apos;ve ever seen</title>
  <link>http://daniele.livejournal.com/74929.html</link>
  <description>&lt;big&gt;&lt;big&gt;&lt;a href=&quot;http://www.filamentgroup.com/lab/update_date_range_picker_with_jquery_ui/&quot;&gt;&lt;strong&gt;Filament Group&apos;s Date Range Picker&lt;/strong&gt;&lt;/a&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It uses &lt;a href=&quot;http://jquery.com&quot;&gt;jQuery&lt;/a&gt; and a JavaScript date parsing library by the name of &lt;a href=&quot;http://www.datejs.com/&quot;&gt;Date.js&lt;/a&gt;.&amp;nbsp; This thing is simply amazing.&amp;nbsp; Some of the reasons I think so:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The developer can configure a start and end date limits based on what is valid for the system (e.g. if you only have data going back to 1999, no sense in letting the user chose a date in 3000 BC)&lt;/li&gt;&lt;li&gt;The developer can configure a set of predefined ranges such as &quot;Last week&quot;, &quot;Month to date&quot;, &quot;Year to date&quot;.&lt;/li&gt;&lt;li&gt;If the developer allows it, the user can use any combination of preconfigured ranges, a single date, an arbitrary range of dates, or they can use the back and forward arrows to roll the current date range forward or back.&lt;/li&gt;&lt;li&gt;It is smooth and crisp, able to be easily themed, and seems pretty extensible/tweakable.&lt;/li&gt;&lt;/ul&gt;It is still a work in progress (they just released it today), but I think it is still usable.&amp;nbsp; The only downside that I&apos;ve found so far is that the back and forward arrows in this very first released version can produce some unexpected ranges.&amp;nbsp; They are currently strictly math based, so if you do something like select the current month and then hit the back arrow thinking it will select the previous month, you&apos;ll probably get something slightly different since most adjacent months don&apos;t have the same number of days.&lt;br /&gt;&lt;br /&gt;I&apos;m also pretty sure it has an off by one error in it that I suspect they&apos;ll fix shortly.&amp;nbsp; If you select Sunday to Saturday of a week and then scroll backward, the next range is actually Monday to Sunday and the next Tuesday to Monday...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Ignore these nitpicks and go check it out right away if your website needs a date picker though.&amp;nbsp; To get such a fantastic widget in the very first release can only mean that it is going to be the bee&apos;s knees after a little public beta testing.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/74929.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>4</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/74668.html</guid>
  <pubDate>Tue, 21 Oct 2008 16:06:31 GMT</pubDate>
  <title>Open Source Hardware</title>
  <link>http://daniele.livejournal.com/74668.html</link>
  <description>I thought that this article in Slate about &lt;a href=&quot;http://www.wired.com/techbiz/startups/magazine/16-11/ff_openmanufacturing?currentPage=all&quot;&gt;Open Source Hardware&lt;/a&gt; was a fun read and worth sharing.&lt;br /&gt;There is an interesting similarity in the way that &lt;a href=&quot;http://www.arduino.cc/&quot;&gt;Arduino&lt;/a&gt; handles open sourcing of their design but reserves the trademark to preserve brand quality to the Mozilla Firefox trademark.&lt;br /&gt;&lt;br /&gt;If you like reading about geeks going against the status quo in their industry and trying to make the world a better place, give the article a read.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/74668.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/74481.html</guid>
  <pubDate>Sat, 11 Oct 2008 05:56:56 GMT</pubDate>
  <title>Good bye Mountain View</title>
  <link>http://daniele.livejournal.com/74481.html</link>
  <description>It has been a great two weeks out here in the office.&amp;nbsp; I&apos;ve gotten to see a lot of people face to face and had some useful meetings about my projects.&amp;nbsp; I just kicked off another round of massive data loads to run over the weekend while I&apos;m out of pocket. Hopefully they will run smoothly and deliver me high quality data.&lt;br /&gt;&lt;br /&gt;There are some really exciting things coming up this quarter:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I&apos;ll be working on one of the largest data sets yet, our AMO data.&amp;nbsp; We have several really cool mechanisms for visualizing individual extension projects hosted on AMO. The developer has control over whether to make the statistics public or not.&amp;nbsp; As an example, you can take a look at the &lt;a href=&quot;https://addons.mozilla.org/en-US/statistics/addon/1865&quot;&gt;statistics for Adblock Plus&lt;/a&gt;.&amp;nbsp; I&apos;ll be working on ways to be able to integrate data across projects so we can get a better understanding of the extension community that means so very much to Mozilla.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I&apos;ll hopefully be blogging a little more about the complexities of processing the large amount of data that I have to crunch through.&lt;/li&gt;&lt;li&gt;I&apos;ll be making several pieces of my Pentaho Data Integration (Kettle for those of you in the know) ETL scripts available in an open source repository.&amp;nbsp; It will help with the blogging, they might be useful to other people doing similar things, and who knows, maybe some people will even have suggestions for improvements!&lt;/li&gt;&lt;li&gt;Later in the quarter, I&apos;ll be working on an exciting new project to take some of the aggregated data that Mozilla has, such as the number of downloads of Firefox for given time periods, and making it available publicly for the community to explore and visualize.&amp;nbsp; At the moment, I&apos;m leaning toward trying to use the &lt;a href=&quot;http://www.many-eyes.com/&quot;&gt;Many-Eyes&lt;/a&gt; project from IBM AlphaWorks.&amp;nbsp; If anyone has any better ideas, please let me know.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/74481.html</comments>
  <category>bi</category>
  <category>kettle</category>
  <lj:mood>accomplished</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/74074.html</guid>
  <pubDate>Sat, 30 Aug 2008 02:14:28 GMT</pubDate>
  <title>Been a long difficult week</title>
  <link>http://daniele.livejournal.com/74074.html</link>
  <description>I have wonderful results to show for it though.&lt;br /&gt;&lt;br /&gt;I have gotten the second large data source flowing through our metrics system and have the first report hooked up to it.&lt;br /&gt;&lt;br /&gt;It&apos;s going to be interesting comparing the performance of the two data sources.  Both have similar volume, but this second one is in a much cleaner looking star schema as opposed to the extremely denormalized single table format.  Vertica handles both of these formats well, so I&apos;m eager to figure out how close the performance is.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://kettle.pentaho.org/&quot;&gt;Kettle&lt;/a&gt; (a.k.a. Pentaho Data Integration) is a real winner here as far as enabling me to develop and maintain these very complex ETL processes.  The ETL for the previous data source working against the single table clips along at over 30,000 records per second.  This new ETL is a good bit slower, both because of a difference in the file structure of what I&apos;m parsing, and because I have seven dimensions that I am doing foreign key lookups in.  There is lots of room for optimization in this ETL too though.  &lt;br /&gt;&lt;br /&gt;It is somewhat difficult to optimize the throughput of the transformation for a headless server or when running a clustered transformation in Kettle.  Pentaho is supposed to be coming out with some new management tools that will hopefully streamline things there.&lt;br /&gt;&lt;br /&gt;One of the interesting things I ran into was the fact that because Kettle runs each step in a separate thread and these steps are passing around rows of data as array objects, certain server class hardware can actually perform much slower than desktop class hardware.&lt;br /&gt;A case in point:  a very simple transformation that does nothing more than generate several million records of data and pass them through a few steps can run at more than 700,000 records per second on my MacBook Pro with a 2.5 GHz Intel Core 2 Duo processor.  The exact same transformation running on a HP blade with dual quad core 2.5 GHz Intel Xeon processors and 16 GB of EEC memory tops out at about 350,000 records per second.  Let me tell you, that was pretty depressing to witness!  Of course, the saving grace here is that when there is a lot more work to be done than just passing pages of memory around between cores, the server can do a lot more work, faster.  That is another thing that I&apos;m hoping some R&amp;amp;D at Pentaho is going to help solve.&lt;p class=&quot;scribefire-powered&quot;&gt;Powered by &lt;a href=&quot;http://www.scribefire.com/&quot;&gt;ScribeFire&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://daniele.livejournal.com/74074.html</comments>
  <category>vertica</category>
  <category>kettle</category>
  <category>pentaho</category>
  <category>metrics</category>
  <category>data warehousing</category>
  <category>mozilla</category>
  <category>business intelligence</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/73605.html</guid>
  <pubDate>Wed, 20 Aug 2008 21:45:45 GMT</pubDate>
  <title>A post about personal data</title>
  <link>http://daniele.livejournal.com/73605.html</link>
  <description>Mitchell Baker, the Chairperson of Mozilla Foundation and Mozilla Corporation recently posted a series of blog entries about data:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a title=&quot;Thinking About Data&quot; href=&quot;http://blog.lizardwrangler.com/2008/07/21/thinking-about-data/&quot;&gt;Thinking About Data&lt;/a&gt;&lt;br /&gt;&lt;li&gt;&lt;a title=&quot;Framework for discussing “data”&quot; href=&quot;http://blog.lizardwrangler.com/2008/07/21/framework-for-discussing-data/&quot;&gt;Framework for discussing “data”&lt;/a&gt;&lt;br /&gt;&lt;li&gt;&lt;a title=&quot;Why focus on data?&quot; href=&quot;http://blog.lizardwrangler.com/2008/07/22/why-focus-on-data/&quot;&gt;Why focus on data?&lt;/a&gt;&lt;br /&gt;&lt;li&gt;&lt;a title=&quot;Data Relating to People&quot; href=&quot;http://blog.lizardwrangler.com/2008/07/23/data-relating-to-people/&quot;&gt;Data Relating to People&lt;/a&gt;&lt;br /&gt;&lt;li&gt;&lt;a title=&quot;Data — getting to the point&quot; href=&quot;http://blog.lizardwrangler.com/2008/07/24/data-getting-to-the-point/&quot;&gt;Data — getting to the point&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This discussion is something I&apos;ve been looking forward to seeing at Mozilla since I started back in March.  In the work that I do, I make every effort to safeguard data and make sure that what I process and store can&apos;t turn around and bite me later.&lt;br /&gt;&lt;br /&gt;One thing that I felt could use a different approach of listing out is the different forms of personal data that people are likely to generate or come across in the web world.&lt;br /&gt;&lt;br /&gt;To me, the best way to categorize these types of personal data is with a matrix.  I&apos;ve created one below that has the origin of the data as the X axis and the classification of the data as the Y axis.  Inside each cell, I&apos;ve placed a few examples that I think represent that intersection of data.&lt;br /&gt;&lt;br /&gt;I&apos;d encourage anyone interested in this to comment on other origins, classifications, or examples of personal data.  The more we have defined, the easier it will be to make sure that our discussions about data don&apos;t leave anything out.&lt;br /&gt;&lt;br /&gt;I&apos;ve also saved this document on docs.google.com (&lt;a href=&quot;http://docs.google.com/Doc?id=dhjztbd8_147fgqkz6fq&quot;&gt;Personal data types matrix&lt;/a&gt;).&lt;br /&gt;If anyone wishes to collaborate with me on enhancing it, please just let me know in the comments and I&apos;ll send you a collaboration invitation.&lt;br /&gt;&lt;br /&gt;&lt;table width=&quot;719&quot; border=&quot;1&quot; bordercolor=&quot;#000000&quot; cellpadding=&quot;4&quot; cellspacing=&quot;0&quot; style=&quot;page-break-before: always; page-break-inside: avoid&quot;&gt; 	&lt;col width=&quot;67&quot;&gt; 	&lt;col width=&quot;143&quot;&gt; 	&lt;col width=&quot;161&quot;&gt; 	&lt;col width=&quot;139&quot;&gt; 	&lt;col width=&quot;167&quot;&gt; 	&lt;tr valign=&quot;TOP&quot;&gt; 		&lt;td width=&quot;67&quot;&gt; 			&lt;p&gt;&lt;br&gt; 			&lt;/p&gt; 		&lt;/td&gt; 		&lt;td colspan=&quot;2&quot; width=&quot;312&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Identifying&lt;/p&gt; 		&lt;/td&gt; 		&lt;td colspan=&quot;2&quot; width=&quot;314&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Characterizing&lt;/p&gt; 		&lt;/td&gt; 	&lt;/tr&gt; 	&lt;tr valign=&quot;TOP&quot;&gt; 		&lt;td width=&quot;67&quot;&gt; 			&lt;p&gt;&lt;br&gt; 			&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;143&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Potential&lt;a class=&quot;sdfootnoteanc&quot; name=&quot;sdfootnote1anc&quot; href=&quot;#sdfootnote1sym&quot;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;161&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Definite&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;139&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Self&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;167&quot;&gt; 			&lt;p align=&quot;CENTER&quot;&gt;Relationships&lt;/p&gt; 		&lt;/td&gt; 	&lt;/tr&gt; 	&lt;tr valign=&quot;TOP&quot;&gt; 		&lt;td width=&quot;67&quot;&gt; 			&lt;p&gt;Elicited&lt;a class=&quot;sdfootnoteanc&quot; name=&quot;sdfootnote2anc&quot; href=&quot;#sdfootnote2sym&quot;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;143&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Name/Address (partial)&lt;/p&gt; 			&lt;p&gt;IP address&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;161&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Contact information 			(comprehensive)&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;SSN&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;E-mail address&lt;a class=&quot;sdfootnoteanc&quot; name=&quot;sdfootnote3anc&quot; href=&quot;#sdfootnote3sym&quot;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Blog URL&lt;/p&gt; 			&lt;p&gt;Credit card information&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;139&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Demographics&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Location&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Interests&lt;/p&gt; 			&lt;p&gt;Website filters&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;167&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Friend invitations&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Friends list&lt;/p&gt; 			&lt;p&gt;Friends watched/followed&lt;/p&gt; 		&lt;/td&gt; 	&lt;/tr&gt; 	&lt;tr valign=&quot;TOP&quot;&gt; 		&lt;td width=&quot;67&quot;&gt; 			&lt;p&gt;Published&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;143&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Blog posts&lt;a class=&quot;sdfootnoteanc&quot; name=&quot;sdfootnote4anc&quot; href=&quot;#sdfootnote4sym&quot;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt; 			&lt;p&gt;&lt;br&gt; 			&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;161&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;PGP key&lt;/p&gt; 			&lt;p&gt;Contact information (comprehensive)&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;139&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Interests&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Blog posts&lt;a class=&quot;sdfootnoteanc&quot; name=&quot;sdfootnote5anc&quot; href=&quot;#sdfootnote5sym&quot;&gt;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt; 			&lt;p&gt;Wishlists&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;167&quot;&gt; 			&lt;p&gt;Friends list&lt;/p&gt; 		&lt;/td&gt; 	&lt;/tr&gt; 	&lt;tr valign=&quot;TOP&quot;&gt; 		&lt;td width=&quot;67&quot;&gt; 			&lt;p&gt;Harvested&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;143&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;cookies&lt;/p&gt; 			&lt;p&gt;Personal search terms&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;161&quot;&gt; 			&lt;p&gt;&lt;br&gt; 			&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;139&quot;&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;Extrapolated interests&lt;/p&gt; 			&lt;p style=&quot;margin-bottom: 0.2in&quot;&gt;clickstream in site&lt;/p&gt; 			&lt;p&gt;Web history&lt;/p&gt; 		&lt;/td&gt; 		&lt;td width=&quot;167&quot;&gt; 			&lt;p&gt;People watched/followed&lt;/p&gt; 		&lt;/td&gt; 	&lt;/tr&gt; &lt;/table&gt; &lt;p style=&quot;margin-bottom: 0in&quot;&gt;&lt;br&gt; &lt;/p&gt; &lt;div&gt; 	&lt;p class=&quot;sdfootnote&quot; style=&quot;margin-bottom: 0.2in&quot;&gt;&lt;a class=&quot;sdfootnotesym&quot; name=&quot;sdfootnote1sym&quot; href=&quot;#sdfootnote1anc&quot;&gt;1&lt;/a&gt;Multiple 	pieces of potential identifying information are usually needed to 	make definite identification or direct contact&lt;/p&gt; &lt;/div&gt; &lt;div&gt; 	&lt;p class=&quot;sdfootnote&quot; style=&quot;margin-bottom: 0.2in&quot;&gt;&lt;a class=&quot;sdfootnotesym&quot; name=&quot;sdfootnote2sym&quot; href=&quot;#sdfootnote2anc&quot;&gt;2&lt;/a&gt;Data 	may be elicited as a requirement for interaction with the data 	collector (e.g. IP address required to view a web page or shipping 	information required for a purchase) or it may be optional (e.g. a 	blog comment form requesting your URL).&lt;/p&gt; &lt;/div&gt; &lt;div&gt; 	&lt;p class=&quot;sdfootnote&quot; style=&quot;margin-bottom: 0.2in&quot;&gt;&lt;a class=&quot;sdfootnotesym&quot; name=&quot;sdfootnote3sym&quot; href=&quot;#sdfootnote3anc&quot;&gt;3&lt;/a&gt;E-mail 	address is a definite identification because it immediately allows a 	person to contact you directly&lt;/p&gt; &lt;/div&gt; &lt;div&gt; 	&lt;p class=&quot;sdfootnote&quot; style=&quot;margin-bottom: 0.2in&quot;&gt;&lt;a class=&quot;sdfootnotesym&quot; name=&quot;sdfootnote4sym&quot; href=&quot;#sdfootnote4anc&quot;&gt;4&lt;/a&gt;Blog 	posts talking about who you are or where you live are potentially 	identifying.&lt;/p&gt; &lt;/div&gt; &lt;div&gt; 	&lt;p class=&quot;sdfootnote&quot; style=&quot;margin-bottom: 0.2in&quot;&gt;&lt;a class=&quot;sdfootnotesym&quot; name=&quot;sdfootnote5sym&quot; href=&quot;#sdfootnote5anc&quot;&gt;5&lt;/a&gt;Blog 	posts talking about topics that interest you or things you do are 	characterizing.&lt;/p&gt; &lt;/div&gt;&lt;/ul&gt;</description>
  <comments>http://daniele.livejournal.com/73605.html</comments>
  <category>privacy</category>
  <category>data</category>
  <category>mozilla</category>
  <lj:mood>working</lj:mood>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/73325.html</guid>
  <pubDate>Tue, 29 Jul 2008 16:27:21 GMT</pubDate>
  <title>Mozilla 2008 Summit</title>
  <link>http://daniele.livejournal.com/73325.html</link>
  <description>I&apos;m in Whistler, B.C. Canada attending the Mozilla 2008 Summit.  It is a huge crowd of people.  Should be lots of fun.  More later.&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/73325.html</comments>
  <category>work</category>
  <category>mozilla</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/72983.html</guid>
  <pubDate>Fri, 11 Jul 2008 20:45:56 GMT</pubDate>
  <title>Finally. It took work to get this apache front-end configured properly</title>
  <link>http://daniele.livejournal.com/72983.html</link>
  <description>I&apos;ve spent most of the day trying to get an Apache 2.2 server set up to do both LDAP authentication and AJP proxying to a tomcat back-end.&lt;br /&gt;&lt;br /&gt;The trickiest parts were translating changes from Apache 2.0&apos;s implementation of auth_ldap.&lt;br /&gt;&lt;br /&gt;In 2.0, the following directory directives were needed to do group based authentication:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;        AuthType Basic&lt;br /&gt;        AuthName      &quot;Use your ldap username/password&quot;&lt;br /&gt;        AuthLDAPBindDN xxx&lt;br /&gt;        AuthLDAPBindPassword xxx&lt;br /&gt;        AuthLDAPURL ldap://server/o=xxx?xxx&lt;br /&gt;        Require group cn=xxx,ou=groups&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;However, in 2.2, the syntax changed slightly and the following was what it took for me to get it going:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;        AuthType Basic&lt;br /&gt;        AuthName      &quot;Use your ldap username/password&quot;&lt;br /&gt;        #AuthBasicProvider defaults to file so it is required if you aren&apos;t loading mod_authn_file&lt;br /&gt;        AuthBasicProvider ldap&lt;br /&gt;        AuthLDAPBindDN xxx&lt;br /&gt;        AuthLDAPBindPassword xxx&lt;br /&gt;        AuthLDAPUrl ldap://server/o=xxx?xxx&lt;br /&gt;        #It is now ldap-group instead of just group&lt;br /&gt;        Require ldap-group cn=xxx, ou=groups&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Before I put the AuthBasicProvider ldap directive in place, I was getting an error in the logs:&lt;br /&gt;&lt;code&gt;configuration error:  couldn&apos;t check user.  No user file?: /&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It took me a longer time to figure out the ldap-group vs group problem.  In the logs, I was seeing incorrect password attempts being logged properly, but if I typed the right password, there was nothing in the log but the authorization dialog was just redisplayed in the browser.&lt;br /&gt;I imagine that maybe if I tweaked some log settings somewhere I&apos;d find that it was possible to see the Require directive failing.&lt;br /&gt;&lt;br /&gt;Also note that most auth_ldap examples show putting the directives in a &amp;lt;Directory&amp;gt; section.  Well, if you are using mod_proxy or mod_proxy_ajp, there is no directory so you put the auth directives in a &amp;lt;Location /&amp;gt; section instead.</description>
  <comments>http://daniele.livejournal.com/72983.html</comments>
  <category>work</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/72816.html</guid>
  <pubDate>Thu, 03 Jul 2008 14:16:14 GMT</pubDate>
  <title>Status update</title>
  <link>http://daniele.livejournal.com/72816.html</link>
  <description>Released the first alpha of my project.  And I&apos;ve been pretty happy with the results so far.  Gotten some good feedback and there is lots more work to be done.&lt;br /&gt;&lt;br /&gt;The project is based on Pentaho and uses a Vertica cluster as the DB backend.  I&apos;ve gotten pretty amazing results out of the combination.&lt;br /&gt;&lt;br /&gt;I&apos;ve been spending a lot of time working with two community additions to Pentaho, the &lt;a href=&quot;http://wiki.pentaho.com/display/COM/CBF+-+Community+Build+Framework&quot;&gt;Community Build Framework (CBF)&lt;/a&gt; and the &lt;a href=&quot;http://wiki.pentaho.com/display/COM/Community+Dashboard+Framework&quot;&gt;Community Dashboard Framework (CDF)&lt;/a&gt;.  These two amazing projects are being driven by &lt;a href=&quot;http://webdetails.pt/&quot;&gt;Pedro Alves&lt;/a&gt;, a BI consultant specializing in Pentaho.  They have really allowed my project to move along rapidly in the direction I wanted to take it.&lt;br /&gt;&lt;br /&gt;The other exciting thing I hope to blog about further in the near future is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Choropleth_map&quot;&gt;choropleth map&lt;/a&gt; I managed to implement in Pentaho.  It was based on an &lt;a href=&quot;http://crschmidt.net/mapping/choropleth.html&quot;&gt;example&lt;/a&gt; from &lt;a href=&quot;http://crschmidt.net/&quot;&gt;Chris Schmidt&lt;/a&gt;.  While writing this post, I just discovered that he lives nearby. I think I might have to take him out to lunch as a treat for the help he&apos;s given me. :)&lt;br /&gt;&lt;br /&gt;I need to look into integrating the &lt;a href=&quot;http://simile.mit.edu/timeplot/&quot;&gt;Simile Timeplot&lt;/a&gt; widget into my Pentaho dashboards.  I really need the ability to provide rich annotations for momentary or duration events.&lt;br /&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/72816.html</comments>
  <category>work</category>
  <category>kettle</category>
  <category>pentaho</category>
  <category>business intelligence</category>
  <category>vertica</category>
  <category>mozilla</category>
  <category>data warehousing</category>
  <category>metrics</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/72505.html</guid>
  <pubDate>Tue, 17 Jun 2008 01:28:22 GMT</pubDate>
  <title>Nyah, I didn&apos;t really want to release today.</title>
  <link>http://daniele.livejournal.com/72505.html</link>
  <description>So I was doing final cleanup and preparation for releasing the first version of my project to a wider audience at work and one of my machines developed a bad case of fingers in ears.  By that I mean it stopped listening to its cluster brothers and then it stopped listening to me.  The load average kept going higher and higher and while I was able to log in to the machine, frequently, commands I&apos;d issue would just hang and no amount of breaking or kill -9 would help them.&lt;br /&gt;&lt;br /&gt;When I gave it up for lost and tried the reboot command, I discovered to my chagrin that it didn&apos;t work either.&lt;br /&gt;&lt;br /&gt;So, until one of the IT people (who are all swamped with FF3 release stuff) can get around to logging in to the console and killing it, I&apos;m going to have to find other things to occupy myself with. :/&lt;br /&gt;&lt;br /&gt;I did come across this interesting comment on linuxquestions.org talking about what can cause a process to disregard kill -9:&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://www.linuxquestions.org/questions/linux-general-1/what-to-do-when-kill-9-pid-doesnt-work-641497/#post3151549&quot;&gt;What to do when kill -9 doesn&apos;t work&lt;/a&gt;&lt;br /&gt;</description>
  <comments>http://daniele.livejournal.com/72505.html</comments>
  <category>work</category>
  <category>vertica</category>
  <category>linux</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/72414.html</guid>
  <pubDate>Tue, 10 Jun 2008 14:24:45 GMT</pubDate>
  <title>Hopefully this will be a thing of the past after Firefox 3 releases...</title>
  <link>http://daniele.livejournal.com/72414.html</link>
  <description>&lt;a href=&quot;http://icanhascheezburger.com/2008/06/10/funny-pictures-restore-session-yn/&quot;&gt;Firefox Crash « Lolcats ‘n’ Funny Pictures of Cats - I Can Has Cheezburger?&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;</description>
  <comments>http://daniele.livejournal.com/72414.html</comments>
  <category>work</category>
  <category>mozilla</category>
  <category>firefox</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>http://daniele.livejournal.com/72109.html</guid>
  <pubDate>Wed, 04 Jun 2008 03:41:33 GMT</pubDate>
  <title>It is so easy to ignore the facts...</title>
  <link>http://daniele.livejournal.com/72109.html</link>
  <description>Watched this movie about consumerism.  It is depressing how deep in the throes of it I am.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://www.storyofstuff.com/&quot;&gt;The Story of Stuff with Annie Leonard&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;</description>
  <comments>http://daniele.livejournal.com/72109.html</comments>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
</channel>
</rss>
