Pierre’s Posterous

 

Fuck bookmarks

I've gotten into the habit of reading books without the use of bookmarks and if you know me at all you'll know that I cringe at any crease found in a book – so dog-earing it is not an option. Instead I just try to remember my last page read by coming up with a crazy association between the numbers in the page number. These usually involve taking two of the numbers and getting to the third. Here are a few examples:

  84: easy, all I have to remember is that 8 is in the "second"(2) position, and 8/2 = 4
651: 6 and 5 are separated by 1
265: 25 = 5^2 and 6 - 5 = 1 and 25 + 1 = 26 (i.e. 265 = (5^2) + (6-5) [prepended to] 5)
850: there are 3 numbers total and 5+3 = 8
135: just remember 3 and 5 and that 3 is in the middle of their product (i.e. 3 is in the middle of 3*5)

That last one was the last page I just finished reading in my current book (Of Time and the River, Thomas Wolfe) but the rest I had randomly generated by Ruby.

If you'll notice most of these associations are really loose and contrived and therefore probably more difficult to remember than the actual number. I think that just the act of coming up with the associations is what really helps it stick. Also I don't usually read books with more than 1,000 pages, so my contrivances are limited to dealing with 3-digit numbers. Another thing that helps is that one usually has a fairly good idea of what page they were on. I wont think I'm on page 800 when I've only been reading the book for a few days. And of course you only have to remember the number/association for a few days at most if you read regularly.

I'm always trying to make myself rely on less things if possible (and fun). This is just one way I get it done.

(watch me write this)

Loading mentions Retweet
Filed under  //   literature   math  

Comments [1]

Dealing personally with mechanical turk workers

I recently wrote about some ways of dealing with mechanical turk workers through Amazon's APIs, but sooner or later you're going to have to deal with these people on a more personal level.  Lately I've been getting emails directly from mechanical turk workers asking why some of their jobs are being rejected. 

The most useful tool when dealing with these kinds of inquiries is good logging. Every time I reject or approve of an assignment I log the worker ID, assignment ID, hit ID, hit type ID, some representation of the answer given, and the reason for rejection if it was rejected. The ruby standard library provides a great logging tool called Logger which I use for all my logging needs. Also the ruby-aws library does it's own logging, which is also really helpful. You can set the log path to wherever you want and add daily rotation with the following monkey patch:

<code>
module Amazon::Util::Logging
  @@AmazonLogger = Logger.new '/some/path', 'daily'
end
</code>

I stressed the importance of being fair to workers and these logs only help that cause. Remember that some of these people rely on this service for a good part of their livelihoods and so needlessly rejecting assignments could greatly hurt them and when you're automatically rejecting their work you're bound to make some mistakes. With good logging you'll be able to refine your automatic rejections to reduce false-positives, clarify instructions on jobs, and give these workers better explanations for incorrect answers. Hope this helps!

Loading mentions Retweet
Filed under  //   #mechturk  

Comments [0]

That explains it!

Anatomy of a Moon Halo

The ring that appears around the moon arises from light passing through six-sided ice crystals high in the atmosphere. These ice crystals refract, or bend, light in the same manner that a camera lens bends light. The ring has a diameter of 22° , and sometimes, if you are lucky, it is also possible to detect a second ring, 44° diameter. Thin high cirrus clouds lofting at 20,000 feet or more contain tiny ice crystals that originate from the freezing of super cooled water droplets. These crystals behave like jewels refracting and reflecting in different directions.

Cloud crystals are varieties of hexagonal prisms, (6 sides) and range in shapes from long columns to thin plate-like shapes that have different face sizes.

Saw the moon with a halo tonight and was wondering why. 'Cause knowledge is power!

Loading mentions Retweet

Comments [0]

Penny-Arcade's still got it

Usually their funnier strips are not about video games at all.

Loading mentions Retweet

Comments [0]

Tips for working with mechanical turk

For those who haven't heard, Mechanical Turk is a "marketplace that enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do" (via Wikipedia). You can think of it as an online wanted ad page where the jobs look something like "I'll give you 2¢ if you tell me if there is a dog in this picture: [picture that may or may not contain a dog]." The basic idea is that while computers are really good at a lot of things (modular arithmetic, matrix operations) they are also really terrible at a bunch of things people happen to be good at (pattern recognition, anything involving social context). You can read more about it on the wikipedia page or at the main page.

I've been using said service to describe some pictures I'm using in my current project and I have to say that I'm very happy with the service. It would take me hours to do what the service was able to do in just a few minutes and I can focus on the guts of the project instead of the bitch work (no offense mechturk workers!). However, I quickly found that you can't always trust the answers provided.

I'm not sure if it's because some people are trying to take advantage of the system by using bots to answer questions (which is obviously against the whole idea of the service to begin with) or some of the workers are just lazy and hope you wont notice that you're paying them to do nothing. But it comes down to really terrible answers. Luckily I have a few ways you can prevent these terrible answers.

Job properties

The first thing you're going to want to do is set your qualifications high. I would suggest higher than 80% (meaning only workers with an approval rating of 80% or higher can answer your questions). Also set a low assignment duration -- something like 15 minutes. The assignment duration is the amount of time the worker has to answer your question once he/she has started it.

I got these tips from one of the people that worked on some of my jobs. Setting these qualifications will make it more difficult for bots to answer your jobs. Apparently some of these bots grab 400+ jobs before they start working through them. This blocks legitimate workers from taking those jobs (and giving you good results). However, with a 15 minute duration if the bot can't get to your job in less than 15 minutes then the job is released and the bot is no longer able to work on it.

Automated rejections

If you're releasing 300+ jobs at a time you're going to want to automate your rejections. Before releasing the jobs make sure you have some ways of automatically rejecting ridiculous answers. You also want to stress the criteria for rejection in the instructions for the job so that you have something to point at in the rejection comments.

Stressing the criteria for rejection also ensures less false-positives. I've rejected a few submissions that just missed the criteria and felt pretty terrible about it.

Multiple assignments

Mech Turk gives you the option of allowing multiple submissions for any job. Use this feature to get information from many different people. I figured that at least 1/3 of the submissions were legitimate and so have started allowing 3 submissions for each job. Also, for every rejected submission I just add another assignment (which allows another submission) and hope that the new one will yield better results.

Decoys

For those of you with a budget too small for 3 assignments per job try setting up a decoy question for each job. This is just a question that you already know the answer to and so can automatically reject any submissions that miss this question. I've stopped using this method because it feels a little mean-spirited and could lead to increased false-positives, but generally it worked.

Silly requests

One way to make sure that the instructions are carefully read is to include a silly request with each answer. For free response answers it can be something like "make sure the last word in your response is 'watermelon'." I got this idea from Van Halen. Again this could increase false-positives, but it will make sure that the workers are reading carefully.

Check spelling

Plan on checking the spelling of words in answers because people tend to make plenty of mistakes. In my last batch of jobs I had a total of 632 submissions from workers, the sum of all words in the answers for all the submissions was 6,233 and 2,182 of these were misspelled. That means that 35% of the words used to answer my questions did not exist. Again, be warned: check your workers' spelling. Here's a simple script written in Ruby to get you started.

Ask for feedback

I would suggest that you put in a non-required question asking workers for feedback. And don't just throw that answer away, take a look at all of them and take it into account when pushing out the next batch of jobs. You never know what kinds of problems your instructions have until you have to act on those instruction with no knowledge of the purpose of the job.

Conclusion

In the end you want to be as fair to the legitimate workers as possible. This means reducing false-positives, asking good questions, and listening to their feedback. Also when starting out I'd suggest you set a long auto-approve time (I started with 3 days) to give you time to adequately review and save the submissions. Anyway I hope this helps!

(watch me write this post)

Loading mentions Retweet
Filed under  //   mechturk  

Comments [0]

Software I use

Just wanted to share a little list of software and services that make my life easier. Slightly categorized but in no particular order.

Blogging/Lifestreaming

Entertainment

Software Development

Productivity

Other

Loading mentions Retweet
Filed under  //   software  

Comments [0]

Mechanical Turk GetFileUploadURL with ruby-aws

Didn't see this documented anywhere. Install ruby-aws with 'gem install ruby-aws'. If you are using soap4r and ruby 1.9 you may need to follow these instructions if they haven't updated soap4r yet.

In order to get the file uploaded as an answer to a Mech Turk request do the following:

<code>
require 'ruby-aws'

@mturk = Amazon::WebServices::MechanicalTurkRequester.new :Host => :Prod # Setup your mturk object

assignments = @mturk.getAssignmentsForHITAll( :HITId => id, :AssignmentStatus => 'Submitted') # id is the HIT id

assignments.each do |assignment|
     answer = @mturk.simplifyAnswer(assignment[:Answer])
     download_link = @mturk.getFileUploadURL(:AssignmentId => assignment[:AssignmentId], :QuestionIdentifier => 'the question id')
end
</code>

"download_link" will be a Hash that looks like {:Request=>{:IsValid=>"True"}, :FileUploadURL=>"a really long url pointing to somewhere in s3"}

What got me was the ":QuestionIdentifier" parameter to getFileUploadURL. Why would they make it ":QuestionIdentifier" when the other parameter is ":AssignmentId"? It seems silly to use the abbreviation for one and not the other. Anyway this works for me and I hope it works for you.

Loading mentions Retweet
Filed under  //   aws   howto   mechturk   ruby  

Comments [0]

Starting with Pressure

I've recently discovered the importance of pressure to the starting of a startup. I applied for funding from YCombinator and hoped for the best. My friends and I had a huge list of beautiful features for the product that would meet the lofty goals we had set. We figured we'd have something to show people by January and maybe even make some money off the product in about a year. Things were going great and we were working hard and slowly checking things off our "it's ready" to-do list.

We didn't get the money.

I quickly realized that a long wishlist of features isn't going to make us any money or impress anyone. We needed to make money now and so we needed to ship now. We've since completely destroyed that wishlist (and with it a bunch of "cool" features), planned to ship much sooner, and taken the project in a whole new direction.

I've always been a fan of the Agile method (always have a working product) and can't imagine truly using it without this kind of pressure. It almost seems like the Waterfall method (never have a working product) came out of a lack of pressure.

So what sorts of pressures cause startups to automatically go into "agile mode"? The most obvious to me is making money. If you have no other source of income, then your product has to make you money. This pressure is probably the most difficult to cope with these days. Not only does this pressure make you work much harder, it also forces you to get real creative. How do you get your users to pay for something that only does X? Maybe you need a new business model. Besides providing the pressure necessary to be agile, many believe that making money early leads to greater success later.

Another is pleasing your investors. This doesn't necessarily mean making the product/company profitable though. The Facebook and Twitter investors seem to be more than happy dishing out a few million every year or so. Of course this kind of pressure wont set in until a little later in the startup's life and I'd be lying if I said I knew anything about that.

The other pressure that is important to early-stage startups is validation. Will it work? Will users like it? You wont find out until they use it. This isn't just important for the founders, but also important for prospective investors. If you have something that people love and are willing to pay for then you'll probably have investors begging to give you money. I'm sure companies never stop feeling this pressure as long as they're constantly innovating and implementing new features.

Lack of funding has caused us to really feel the money and validation pressures. Money because we all have loans to pay and stuff we don't need but really want and validation because at least one investor doesn't think we have potential. Not getting that funding might be the best thing that's happened to this project, but it's way too early to tell. I'm just hoping this pressure forces us to make a better product rather than one that just looks rushed.

Loading mentions Retweet
Filed under  //   startup  

Comments [0]

Absolutely Love Mockingbird's Registration Dialog

Go to gomockingbird.com (it'll take about a minute to load) and then go to "Save->Save as..." put in anything for the project name. It'll ask you to "Login or Register". Type in your email address and as soon as you tab into the password field it will recognize that you aren't a user and then change into a "register" dialog. Really sharp.

Not only are they using the whole "try before you buy" strategy, but the "buy" (i.e. signup) part is done really well. Really smart use of AJAX.

Also the product itself is fantastic. Mocked up my current little project's UI in about 5 minutes. Very responsive and easy to use.

Loading mentions Retweet
Filed under  //   mockingbird   software   ui  

Comments [0]

Securing Couchdb

I recently managed to secure my couchdb server to my liking and I haven't seen any website that explains all the methods I used in one place, so I'm going to do that here. I'm assuming that you are running couchdb from localhost on port 5984 and all communication with couchdb will be done with curl and couchapp.

I am using couchdb version 0.11.0a, the one that comes with macports "couchdb-devel" port at the time of this writing. I installed it using

$sudo port install icu erlang spidermonkey curl couchdb-devel

Once everything is nice and installed edit the local.ini and default.ini files. If you installed like above they should be in /opt/local/etc/couchdb. If it's not there then it might be in /usr/local/etc/couchdb.

In local.ini add the line

authentication_handler = {couch_httpd_auth, cookie_authentication_handler}

under the "[httpd]" section. Uncomment the "secret = " line like so:

[couch_httpd_auth]
secret =  suparsecret

Then add an admin under the [admin] section like so:
[admins]
admin = pass

In default.ini add a new authentication handler under the [httpd] section so that the "authentication_handlers" list looks something like this:

authentication_handlers = {couch_httpd_auth, cookie_authentication_handler}, {couch_httpd_auth, default_authentication_handler}

Then point the "authentication_db" variable under the [couch_httpd_auth] section to the database of your choosing like this:

authentication_db = mydb

Make sure "require_valid_user" is set to false. Also note that if you have your database running then you'll need to restart it now for the rest to work correctly.

Create the database you are going to be authenticating users from:

$curl -X PUT http://admin:pass@localhost:5984/mydb

You should get the response {"ok":true}

Now we're going to make a basic user. Couchdb expects user passwords to be hashed using sha1, and I'm going to be using the following ruby script (called 'hash.rb') to do the hashing:
<code>
#!/usr/bin/env ruby
require 'digest/sha1'

print Digest::SHA1.hexdigest(ARGV.first)
</code>

First hash the password "mypassword":

$ruby hash.rb mypassword

should return 91dfd9ddb4198affc5c194cd8ce6d338fde470e2.

Create a user and a dummy document using the following commands

$curl -X PUT http://admin:pass@localhost:5984/mydb/myuser -d '{"type":"user", "hashed_password":"91dfd9ddb4198affc5c194cd8ce6d338fde470e2"}'

You should get a response that looks like {"ok":true,"id":"myuser","rev":"1-90e61ef93d1bf2f691f64cb423126218"}
What that command does is create a new document in our mydb database with attributes "type" and "hashed_password". The password for myuser is "mypassword", but will be stored in the database hashed in case someone gets access.

Now we need to create a new design document to handle user authentication. For simplicity I'm going to be using couchapp. Create a new couchapp called "_auth" from the command line.

$couchapp generate _auth

Create a new view called "users". This means making a _auth/views/users/map.js file. What you want to do is map all user documents to a dictionary of "password_sha", "salt, "secret", and "roles" keyed by username. Mine looks like this
<code>
function (doc) {
    if (doc.type == 'user') {
        emit(doc._id, {password_sha: doc.hashed_password, salt: "", secret: 'suparsecret', roles: ['user']});
    }
}
</code>
We didn't use a salt in creating the hashed password, so we just include an empty string for the salt but generally passwords should be salted and that's where it's done. Also notice that we used the secret "suparsecret" that we set in the local.ini file earlier.

We also want to add some validation so that non-users' action are restricted. Create a file in _auth called validate_doc_update.js with the following code

<code>
function (newDoc, oldDoc, user) {
    isAdmin = (user.roles.indexOf('_admin') != -1);
    isUser = (user.roles.indexOf('user') != -1);

    // must be admin or user to update any doc
    if (!isAdmin || !isUser) {
            throw({unauthorized: 'must be admin or user.'});
    }
}
</code>
You will probably want to add more stuff to this file later.

Push the design document to the server by executing the following from within the "_auth" directory

$couchapp push _auth http://admin:pass@localhost:5984/mydb

Now if you try to do something like create a new document without being authenticated you should get an error:

$curl -X PUT http://localhost:5984/mydb/dummy -d '{"type":"something", "info":"Some more information"}'
{"error":"unauthorized","reason":"must be admin or user."}

Now to get authenticated. I'm using the cookies authentication method to create sessions between clients and the couchdb server. All you have to do is POST to /_session with the username and password and you will be given an authentication token to use for later requests.

First authenticate, here is the command and some of the output you should see (note the new "-v" argument to curl, it's important!)

$curl -vX POST http://localhost:5984/_session -d 'username=myuser&password=mypassword'

* About to connect() to localhost port 5984 (#0)
*   Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 5984 (#0)
> POST /_session HTTP/1.1
> Host: localhost:5984
> Accept: */*
> Content-Length: 35
> Content-Type: application/x-www-form-urlencoded
< HTTP/1.1 200 OK
< Set-Cookie: AuthSession=bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP; Version=1; Path=/; HttpOnly
< Server: CouchDB/0.11.0a (Erlang OTP/R13B)
< Date: Wed, 04 Nov 2009 22:08:38 GMT
< Content-Type: text/plain;charset=utf-8
< Content-Length: 12
< Cache-Control: must-revalidate
{"ok":true}
* Connection #0 to host localhost left intact
* Closing connection #0

The important part is the "Set-Cookie" header in the response on line 13. We're going to use that in future communications. Now we can add that new record:

$curl -vX PUT http://localhost:5984/mydb/dummy -d '{"type":"something", "info":"Some more information"}' -H "Cookie: AuthSession= bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP" -H "X-CouchDB-WWW-Authenticate: Cookie" -H "Content-Type: application/x-www-form-urlencoded"

{"ok":true,"id":"dummy","rev":"1-b2c3fd668db420b31478176059e2c7ff"}

You can also ask the server to return the authenticated user's username and roles with a GET to /_session

curl -X GET http://localhost:5984/_session -H "Cookie: AuthSession=bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP" -H "X-CouchDB-WWW-Authenticate: Cookie" -H "Content-Type: application/x-www-form-urlencoded"

{"ok":true,"name":"myuser","roles":["user"]}

Conclusion

This authentication method may not be for everyone. Some may have a problem with the fact that anyone can see any document in the database. What I've described above only prevents certain types of updates to documents i.e. writing to the database. Also the validation described above is far from perfect, which is why I suggest adding some more logic (like not allowing users to delete other user accounts). There's a good wiki on turning your couchdb into a translucent database so that the whole "anyone can see my db" is not such a big problem. Supposedly it is possible to delete a session by sending /_session a DELETE with the correct cookie header, but it hasn't worked for me so far. Anyway I Hope this helps!

Loading mentions Retweet
Filed under  //   couchdb   howto  

Comments [5]