Podcast #42

Summary: This is the 42nd episode of the StackOverflow podcast, where Joel and Jeff discuss ethical email, backup strategies, how to learn new programming languages, and dealing with underperforming developers. The Conversations Network, a non-profit organization that graciously underwrites the bandwidth costs of this and many other great podcasts, is looking for a sponsor. Email us at podcast@stackoverflow.com if you know of any We finally rolled out email support at Stack Overflow. If you haven’t been to the site in 7 days, and have provided a valid email address, we include all the responses to your questions and answers (if any) in that period. And of course there is a true one-click unsubscribe. We’re still tweaking the parameters of how it works — what is the optimal email relationship between a user and a website? Sending email these days is a bit of a minefield. How do you avoid instantly going into people’s spam folder? One key piece is having a Reverse PTR record, which is set up at the ISP level. There’s a whole “Deliverability” industry around sending email to people. We’re encouraged by the emerging standard of entering your OpenID provider’s address as your OpenID login. For example, “yahoo.com” works for Yahoo OpenIDs, and eventually “gmail.com” will work for Google. (Today you must use “google.com/accounts/o8/id” for Google, which is not optimal for hopefully.. obvious.. reasons.) Microsoft is also coming on board, though their OpenID support is in private beta. We also implemented gold and silver tag-based badges, based on upvotes within a tag. It’s a way of rewarding people who participate heavily in certain topic areas. We did have to rule out discussion based questions for this algorithm to work. We are also considering a tag leaderboard, as suggested by Greg Hewgill. Our backup strategy has been half-hearted so far. To improve this, we invested in an inexpensive embedded Linux based 1u Network Attached Storage device, the QNAP 409u. This will become our dedicated backup device. It has four drive bays and supports RAID 6 (dual parity). Kind of a neat little device; there’s a whole subculture of inexpensive NAS devices I hadn’t explored until now. Drobo, for example. As it turns out, the cost of bandwidth ends up being the gating factor for us when dealing with our daily multiple – gigabyte database backups. Jim Gray had an eye opening piece on the economics of bandwidth and the surprising effectiveness of “sneakernets”, even today. How likely is it that your datacenter is going to explode? Unless you have a fancy multiple datacenter setup for redundancy, it might be more effective to do some trickle uploads to services like Amazon S3, or even some monthly datacenter driving runs to copy data off using a cheap USB 2.5″ hard drive. Luckily, one of our team members lives a mile from the data center, so that’s the approach we’ll be using. We had some semi-serious issues with our IBM ServeRAID 8k controller, having to do with write-through versus write-back caching. Write-through blocks on actual disk writes, whereas write-back writes to a fast RAM buffer, returns very rapidly, and spools the writes over time (e.g. “lazy writes”). The performance of write-back is dramatically better, but we were seeing some eventual system-wide I/O blocking under heavy write load with write-back caching on. Supposedly this is normal for some RAID controllers, but we opted to downgrade to write-through because the nightly backups would always trigger this behavior for us. Speaking of blocking: it’s funny how many of the techniques discussed on the High Scalability blog boil down to hashtables in memory. Memory is one of the fastest things you have in a computer, and it almost never blocks for any significant amount of time. Unlike, say.. hard disks or network. The act of [...]

Podcast #42

Directory

Click for all Categories