• Explore Vox
  • Culture
  • Entertainment
  • Life
  • Music
  • News & Politics
  • Technology
  • Join Vox
  • Take a Tour
  • Already a Member? Sign in
jcoehoorn

StackOverflow Watch

Posting highlights from StackOverflow.com since 2009

  • jcoehoorn’s Blog
  • Profile
  • Neighbors
  • Photos
  • More 
    • Audio
    • Videos
    • Books
    • Links
    • Collections

Reflections on the Oracle/Sun merger: Cautiously Optimisitic

  • Apr 23, 2009
  • Post a comment

It's been a few days now, so I've had time to think about what this will mean.  My interest in Sun lies primarily with MySQL, Java, VirtualBox, and Solaris.  As the title indicates, I'm cautiously optimistic. 

First I want to consider MySQL.  On the one hand, Oracle certainly knows databases.  If they choose to they can really take MySQL to the next level.  On the other hand, this wouldn't be the first time Oracle has purchased a better and cheaper competitor just to kill it off **cough**Peoplesoft**cough**. 

The reason I'm optimistic that they'll do the former here is that MySQL will be a lot harder to just kill off than Peoplesoft.  There's a recent open source version available, and they that back.  If they try, they'll find that they don't have as much power in that area as they'd hoped.  They best they can do is take it closed source and then release a few compelling updates, such that much of the community follows those updates and the open source clone's can't keep up.  Only then can they kill it, and even at this point I'll be surprised if there's not still a viable open source-forked clone available.

Now on to Java.  Under Sun, Java was starting to fall behind.  Oracle relies on Java as the natural language choice for it's database, so it has every motivation to make sure the language stays relevant.  That said, it will likely continue to be the same people driving development.  I am a little concerned that Oracle will decide it needs to make more money from Java, but not too concerned.  After all, there are a number of good open source Java tools available now, so just like MySQL the only way Oracle can really hurt Java is if they innovate at a faster pace than the open source community can emmulate.  And that's not necessarily a bad thing.

I'm a little concerned that VirtualBox will cease to be free, and become more like VMWare.  There's enough room to move here that Oracle can do what it wants.  It's also very possible that Oracle will have no interest in continuing the product and will just turn it over to the community or sell it to someone else.

Finally, Solaris.  Solaris is pretty much irrelevant.  For Oracle, it gives them a native platform for their database without dependance on anyone else... if they want to go that route.  In practice, Solaris is nothing if doesn't take advantage of open source libraries and rely on open source applications.  I won't be surprised if Oracle kills Solaris, but I don't think it will matter much if they do.  Those who used to use it can easily migrate to linux.

Post a comment Tags: java, mysql, open source, oracle, virtualbox

Delimited Strings, Take 2

  • Mar 17, 2009
  • Post a comment

Follow up to this post.


I found this answer on StackOverflow this morning, which shows an even clearer way to concatenate delimited strings. I'm surprised I hadn't thought of this myself. The first part of that answer, while clever, isn't really important to the discussion.  It turns a DataReader into an IEnumerable<string>.  For my purpose it's better to assume we've already come that far.  So given the IEnumerable, all you need is this (shown as an extension method):

public static string JoinWith(this IEnumerable<string> strings, string separator)
{
   
return String.Join(separator, strings.ToArray());
}

Note that it might mean an extra iteration of the enumerable, because you must iterate to produce the array and iterate again to produce the string.  It also means you will need the array, where with the other methods you may only ever need produce the completed string.  But for many situations the improved clarity is worth it.

Of course the other thing to take away from this is that once you enter the realm of extension methods you could just as easily use them to hide away the (faster) code used in the previous post as well.

Post a comment Tags: stackoverflow, ienumerable

DataTable To JSON and Naming Conventions

  • Jan 26, 2009
  • Post a comment

Most .Net programmers who need to use JSON can either pull in a third-party library or are using .Net 3.5 and therefore can rely on the built-in JavascriptSerializer .  However, I recently found myself needing to turn a .Net DataTable into JavaScript Object Notation without either of those options available.


My first thought was that something must already be out there that I could use.  Indeed, there are some really great libraries for this.  Unfortunately, my situation is such that I can't pull in a whole library. I needed some purpose-specific code.  I did find some specific implementations as well, but I found them all some how lacking and ended up rolling my own.  The code itself would take up about 2 printed pages, so I'll content myself with posting the link for now.  

Instead, this leaves me space to talk about naming.  Notice that I titled the post "DataTable To JSON".  If you read the code itself you'll find I actually turned that name around:  "JSONHelper.FromDataTable".  In the past it's always been more natural for me to think in terms of transforming type 1 to type 2, and names used in my code prior to this reflect that.

However, I recently found this post about naming conventions on StackOverflow, written by no less a personage than site co-founder Joel Spolsky.  He makes a very good case for the "Type2FromType1" syntax: namely it puts the return type next to the variable receiving it and the source type next to the parameter accepting it, rather than the complete opposite.  I decided to try this for a while, and while it felt awkward at first it's growing on me very quickly.

Post a comment Tags: json, naming conventions, stackoverflow

Creating Delimited Strings. or: Clean Code vs Readable Code

  • Jan 22, 2009
  • Post a comment

Today's question: How do I append a newline character for all lines except the last one?

What I really want to talk it about is the subtle difference between clean code and readable code.  Take these two samples:

boolean first = true;
StringBuilder builder = new StringBuilder();

for (Map.Entry<MyClass.Key,String> entry : data.entrySet()) {
   
if (first) {
        first
= false;
   
} else {
        builder
.append("\n"); // Or whatever break you want
   
}
    builder
.append(entry.key())
           
.append(": ")
           
.append(entry.value());
}
StringBuilder builder = new StringBuilder();

string newline = "";  
foreach (Map.Entry<MyClass.Key,String> entry : data.entrySet())
{
    builder
.append(newline)
       
.append(entry.key())
       
.append(": ")
       
.append(entry.value());

    newline
= "\n";
}
Obviously the 2nd snippet shows cleaner code, because there's no messy if/else condition.  But is it more readable?
I ask this because although the if/else code is "messy", it's universally understood instantly by almost any developer.  On the other hand, you may have to look at the 2nd snippet for a moment to understand how it works.  Additionally, the 2nd snippet doesn't declare the delimiter constant until near the end of the block, rather than at the beginning. 

Personally I use the latter construct, because the readability difference isn't all large.  The pattern just isn't that difficult. But I consider it the lesser of two evils.  Also, it shows that it might be possible to cross a line where cleaner code becomes less desirable, even if it works.

Post a comment Tags: stackoverflow

SpreadsheetML

  • Jan 16, 2009
  • Post a comment

This is something I stumbled on a while ago and known how to do in a basic sense.  But now thanks to this StackOverflow question I also know what it's called.  This will make a huge impact on my ability to use this in the future, as before I couldn't really search google or msdn to get help with it.


SpreadsheetML is just an Xml schema that Excel understands.  It's important because it the best way to export Excel data from a web app.  In the past your choices were Ole, COM Interop, and HTML tables.  Ole never worked reliably, as it depended on a registry setting to know the folder where your document is.  COM Interop doesn't scale well and isn't licensed for use with a web server.  HTML tables were limited in what they could do: only a single work sheet and no formulas, for example.

SpreadsheetML scales, will work anywhere, and supports nearly all of Excel's features.  You can use it when you need to send an Excel file through a text-only medium like e-mail or chat.  About the only downside is that it's not supported in Excel 2000 and earlier, though my experience is that it will work on Excel 2000 in a limited way, much the same as using an HTML table.

Post a comment Tags: excel, stackoverflow, spreadsheetml

How Virtualization will impact Scalability and Reliability.

  • Jan 16, 2009
  • Post a comment

Today I came across a question about virtualizing web servers and databases.  It caught my eye because I'd asked a similar question back in November.  


The question got me thinking about scalability.  Whatever current woes running a database or any other application in a virtualized server may have now, they will eventually be solved or at least sufficiently mitigated to make it worth the penalty.   

At this point we'll start seeing server applications that not only support virtualization, but require it.  A virtual server can provide a standard environment that an operating system on traditional hardware doesn't quite match, and allows you to tune the resources available to a server without shutting down the machine.  Imagine virtualization software that allows a live transfer of a system from one physical machine to another with no shutdown.  This is already happening.  What's interesting is where we go from here.  I think the next logical step is clustering/load balancing and cloning.  

These are features that are available now, but smaller businesses largely ignore.  But I think that once all your major applications are on virtual servers, the clustering problem becomes much simpler.  The load balancer will be a function of the virtualization software itself.  Of course the individual application will have to support it as well, but this will become a core feature of any serious server application, even in the entry level editions.  Setting up a cluster will be as simple as clicking a check box and typing in the cluster name.  Maybe you won't even need to click the checkbox: you'll just give two or more servers the same dns name and the virtualization software will know what it means.  

At this point, a small business with one server can add additional horsepower simply by buying a server (any server), plugging into the network, and telling their current virtual system to clone itself to the new machine.  Large businesses can do the same thing.

For a practical example, let's look at the new StackOverflow servers.  One database and two smaller web servers.  Under what I'm proposing each of those physical servers would carry a database instance and a web instance, each in a three-server cluster.  If one goes down, the other two keep chugging happily away.  Of course, if the current database server were to go down the site would be hurting.  

But to be fair the planning would be different under my proposal, and Jeff probably would have purchased four machines more like the web servers, rather than one database and two web servers.  This way he ends up with about the same power for about the same money.  Perhaps even more power because he can add more memory cheaply.  One server failure isn't a big deal now.   

Even more significantly, as Jeff's traffic increases it becomes very easy to scale up the site infrastructure to match.  All he needs to do to add a server is drop it in the rack and tell the virtualization software to create a new instance and hook it up.  As it is, he's looking at having to do a major overhaul of his infrastructure in about two years.

Post a comment Tags: virtualization, stackoverflow

Re-purposed

  • Jan 16, 2009
  • Post a comment

This blog has been feeling a bit neglected recently.  So in an effort to get it going again I'm re-purposing it.  I'll still post original content from time to time, but effective immediately this place is now primarily dedicated to highlighting interesting questions I come across at StackOverflow.com.  Essentially, I'm leveraging the community there to help me create content here.  The important point is that I won't normally just repost a question.  If I write about it here it's because I have something to add that may not be as appropriate for the other site.


Coming soon:
  • DataTable to JSON conversion
  • Creating Delimited Strings 
  • On Scalability

Post a comment

The Cloud Computing I Really Want

  • Nov 6, 2008
  • 2 comments

Cloud Computing has arrived.  Whether it's the new Windows Azure, Google's AppEngine, Amazon's S3/EC2, or something a little less obvious like the SalesForce platform, you have options available.  Unfortunately, they all have one thing in common:  none of them really take advantage of a cloud.  In every single one of these offerings your app ultimately runs on a single server in a traditional datacenter.  All they've done is abstract things away a bit so that you don't need to know which server it is.  That's neat, but it's not really what I want from a cloud system.


What I'd like to see is a cloud platform that's designed for a smaller scale- your corporate LAN.  As it is, a company provides each office worker with a desktop computer.  After the next refresh at your company, these computers will all have at least a dual core processor and 120GB hard drives.  That's a very powerful system, and it's at the low-end.  Unfortunately, most of this capacity sits idle.  What would really be cool would be a system that lets you harness this idle power.

Let's think a minute about how this would work.  You would still want a traditional server, but the purpose of this server would be coordination.  You wouldn't ask it to do any heavy lifting on it's own.  Then you would need to deploy a client application to every computer in the company, and this class of application would likely need to be specifically supported by the operating system.  It could be patched in via a mechanism like a virtual device driver, but that's messy.  If it turns out kernel mode access is required, than operating system support would be preferred.  Fortunately this already exists, in the form of virtual machine support.

Once installed and configured, the application would cordon off a segment of each computer's hard drive and make it available to the coordinating server.  The server then turns around and exposes this space to the network as a normal file share.  It needs to be smart enough to tell a remote system to send data to the desired location directly, rather than have to retransmit things itself, but that's just a simple logistical problem.  So the first service I'm talking about is a SAN.  I'll build on this to do other useful things, but let's talk about the characteristics of the SAN first.

Because individual desktops are unreliable, you would need large amounts of redundancy.  You wouldn't want the failure of a switch serving a 20 node work group to cut off the entire company from parts of it's data, for example.  To solve this issue, two things have to happen.  

The first is that it needs to take a "local machine first" approach. If the data is already there, don't retransmit it over the network.  Whenever a user requests data from the server's share that isn't already on the local machine that data is forwarded to the local machine (all messages encrypted, of course).  If the user needs that data again, now it will be there.  This should make the network bandwidth required for keeping data synced manageable, because now for many requests data never needs to enter the network at all.  This should also make access time much faster than existing SANs.

The second is that the server administrator will need to divide the clients into an appropriate number of groups during setup.  Each client will need to belong to a group, and each group will need a complete copy of the data.  A key here is that the groups are only for redundancy: a client can retrieve data from any group and should not have a preference for it's own, except in the case where the data is on the local machine.  This will allow the server to load balance the system, such that you should always get an efficient response for requests.  And that will free up more bandwidth to use for syncing data across groups.

For very small networks (up to, say, 20 nodes) one group would probably be enough.  For larger networks many groups may be needed.  Special care will be needed for WAN networks:  your intuition would tell you to put a group at each WAN site, but this would be wrong.  It would force any every change to always go through the WAN connection.  Similarly, you would need to ensure that there is at least one group NOT represented at any WAN site, or a loss of the WAN connection would take the entire company down rather than just the remote site.  The server would also need to be smart enough to notice when it's down to one complete group and be able to take steps to fix that situation.

I want to consider back ups, as well.  First of all, under this scheme it's not necessary to have historical backups.  I'll mention it briefly later, but for what I envision document versioning will be built in.  That means you only need to make sure you have adequate redundancy on the current system.  To make the backup itself, an administrator could simply set up a server with adequate hard disk space in it's own group, and clone that server's disk at whatever frequency he wants.

I still have to overcome the limitation that write operations can be slow, because the coordination server needs to know about every write and coordinate with nodes from other groups to ensure redundancy.  To makes things worse, this would need to be transactional.  

So now we have a high performance SAN where data is often cached right on the local machine, and we only need a little setup and one simple server- better performance and more space at potentially a fraction of the cost.  The only downside is the potential for more network traffic and slower writes.  I don't really know how much impact these issues will have, but I believe they can be overcome.

Now a SAN is nice.  If well implemented, this SAN system by itself would be worth a fortune.  But we can do more.  We want to be able to build real applications.  Of course, these applications would need to be highly parallel in nature, but fortunately the three applications I have in mind just happen to fit the bill.

The first application is, of course, a database.  By building a database directly into the platform you can make everything else much more efficient.  All files and data streams are just records in this database, and they can be indexed, cached, or anything else you might use a database to do.

Next there's search.  As long you have individual machines shunting documents (records) all over the place, they might as well index them while they're at it.  This work will of course be spread over individual clients, make the process much faster.  We can even have the database structure all prepared.  Then the client program can provide an interface into your new document repository, as well as allow you to attach categories (and versions!) to files.  It's an instant enterprise document library and distributed source control.

Finally, there'd be a built in web server.  This is the platform on which developers would create their own applications, because web pages are naturally parallel: each page access can run in it's own process.  So much the better if that process happens to run on the exact computer that requested the page.

So this is the cloud system I'm waiting for.  Not something that runs in someone else's data center, but the ability to take advantage of the hardware infrastructure that's already available in the office.  It provides instant redundancy and scales automatically with your organization.  

It would ideally be implemented to run existing technologies (IE: an open source version would adapt MySQL and Apache, Microsoft would adapt SQL Server/IIS) so that businesses could easily move their existing intranet to the system.  There's some real power. There are a few admitted weaknesses: slow write times, and a single point of failure at the coordination server.  The client application would need to be configured to not use too much cpu or memory.  But I think these concerns can be overcome.

Now if only I had the low-level development experience to even try to implement it.

2 comments Tags: cloud

Spell checking with CAPTCHAs

  • Oct 28, 2008
  • Post a comment

I've been using the new StackOverflow.com web site a lot.  One of the less attractive "features" there is a CAPTCHA that crops up every so often.  This CAPTCHA is based off reCaptcha, which is a project that pulls problem text from book scanning projects.  It uses words that the scanners couldn't handle for part of the captcha.  It's a great concept:  if the the legitimate scanners were unable to recognize the word, it's likely that attempted cracks will have the same trouble.  Additionally, as a user at least I feel like the annoying captcha has some extra purpose. 

One of my personal character traits is that I nitpick my writing.  I'll probably edit this blog post at least a dozen times after I post it before I move on (9 or 10 of those edits will fix dumb typos).   I therefore hit the StackOverflow captcha fairly often.  That's where it happened; I completed a routine edit, received a captcha prompt, and typed what I saw without thinking.  Except this time what I typed wasn't exactly what the captcha showed.  The captcha had a typo, and I subconsciously entered the word correctly.  I realized my "mistake" a moment after I submitted the response, and mentally braced for the rejection.  But instead of a rejection, the captcha passed. 

Now I have to believe that the captcha requires more than one "opinion" on each snippet before accepting it into the final work.  But I wonder, how often does this happen?  Could this system actually be correcting spelling or typesetting mistakes in old books via some human psychological phenomon?  Probably not: most often the captcha will be entered as shown.  But the possibility is at least intriguing.

Post a comment Tags: captcha stackoverflow spell...

SQL Server Management Studio 2008 Express

  • Oct 13, 2008
  • 1 comment

I've been using SQL Server Management Studio almost since it came out, and before we even had a SQL Server 2005 database for it to talk to.  I know a lot of people don't like it, but most of the negatives just didn't apply to me.  The two biggest are probably the .Net framework requirements and that it can be very slow.  I already had the .Net framework installed, and while it is a little slow to start up the first time after restarting windows, after that it's been fairly responsive.

So I've been watching the SQL Server 2008 launch and patiently waiting for the Express Edition of the management studio to come out.  And waiting, and waiting.  Every once in a while I'd go back and look for it, only to find the 2005 version instead.  Today, I finally found it.  I'm still downloading it, so I can't tell you how it does, but here is a partial list of new featuers.  And here is a link to try it for yourself.  Note that it now requires .Net 3.5 and Windows Installer 4.5.

1 comment

Read more from jcoehoorn »

jcoehoorn

About Me

jcoehoorn
United States
View my profile
Posting highlights from StackOverflow since 2009

My Links

  • StackOverflow.com
  • More about me

Tags

  • alice
  • animation
  • apple
  • computer hardware
  • home
  • life
  • links
  • linux
  • microsoft
  • minesweeper
  • mysql
  • politics
  • product ideas
  • security
  • software
  • spam
  • stackoverflow
  • thoughts
  • virtualization
  • vox

View my tags

Archives

  • April 2009 (1)
  • March 2009 (1)
  • January 2009 (5)
  • November 2008 (1)
  • October 2008 (2)
  • 2009 (7)
  • 2008 (70)
  • 2007 (33)
  • 2006 (33)

Subscribe

  • Subscribe to a feed of these posts
  • Powered by Vox
  • Theme designed by Jesse Gardner
  • Use this theme
  • Home
  • Explore
  • Tour Vox
  • Start a Vox Blog
Already a member? Sign in

Back to top

View Vox in your language: English | Español | Français | 日本語

Brought to you by Six Apart, creators of Movable Type, Vox and TypePad.
Six Apart Services: Blogs | Free Blogs | Content Management | Advertising

Vox © 2003-2008 Six Apart, Ltd. All Rights Reserved.
Help | Learn More | Terms of Service | Privacy Policy | Copyright | Advertise | Get a Free Vox Blog

Loading…

Adding this item will make it viewable to everyone who has access to the group.

Adding this post, and any items in it, will make it viewable to everyone who has access to the group.

Create a link to a person
Search all of Vox
Your Neighborhood
People on Vox

(Select up to five users maximum)

Vox Login

You've been logged out, please sign in to Vox with your email and password to complete this action.

Email:
Password:
 
Embed a Widget
Widget Title: This is optional
Widget Code: Insert outside code here to share media, slideshows, etc. Get more info
OK Cancel

We allow most HTML/CSS, <object> and <embed> code

Processing...
Processing
Message
Confirm
Error
Remove this member