• Explore Vox
  • Culture
  • Entertainment
  • Life
  • Music
  • News & Politics
  • Technology
  • Join Vox
  • Take a Tour
  • Already a Member? Sign in
jcoehoorn

Renaissance Programmer

  • jcoehoorn’s Blog
  • Profile
  • Neighbors
  • Photos
  • More 
    • Audio
    • Videos
    • Books
    • Links
    • Collections

Idea for Simple Multi-threaded Software

  • 3 days ago
  • 2 comments

As multi-core CPUs become more and more common, the ability to write quality software that takes advantage of them will become more important.  The tools currently available are not adequate to this task.  We are seeing a few things emerge to help solve the problem.  One good example is the shift back to web applications hosted on a server.  Web applications automatically run in parallel; each request can get it's own thread.  However, there's still a shortage of simple, effective techiques for building parallel software on the desktop.

I had an idea of one place that where it would be very easy to build a parallel software approach into the programming language, so that developers can take advantage of a multi-core cpu in certain situations without having to do extra work.  This idea is obvious enough that it probably isn't new to me, but I don't think I've read about it anywhere else and so I wanted to write it out here.

What I want to do is updated the standard "foreach" loop that's included in most modern programming languages.  .Net, java, PHP, Python, and more all have this simple loop.  In a for each loop, you specify an operation to be performed on every item in a collection of items.  Current implementations of this loop always run in serial.  Process one item, and when that one is finished get the next and process it.  However, much of the time this could be done in parallel.  My idea is that you could bake this concept into a programming language by making a simple change to your loop declaration.  For example, take this simple C# loop declaration.  Instead of this:

foreach (object Item in MyCollection)

Do this:

forevery (object Item in MyCollection)

Change just one key word and the compiler knows it can use create separate threads for the contents of the loop. 

Of course, there are situations where these loops should not be run in parallel.  Maybe you need to break early.  Or maybe you're building a count as you go.  But this should be relatively easy for the programmer to determine, and they can always use the old behavior when needed.

Are there any languages out there that already implement this concept?

2 comments

VB.Net vs C#, Round 2: Partial Namespaces

  • 7 days ago
  • Post a comment

I've talked about this before, and I don't want to go over the same issues a second time.  However, I recently had a project where I finally spent a few weeks in C#, with no VB work at all.  At last, I had a chance to develop a deeper familiarity with C#.  Maybe I would learn something new. 

The result?  C# moves closer to VB.Net in my estimation, but doesn't quite pass it.  All of the short-comings when compared to VB.Net still exist in my mind, but now I've had the chance to get a feel for what C# does about them.  I can't put my finger on it, but there are little things here and there in C# that make up for a lot of what I complained about before.  For example, I still prefer seeing "End If", "End Sub", "End Class", etc to the more ambiguous "}".  But now that I've used C# more it's not as big a deal.  I like the VB way, but I'm not as handicapped by C# as I was.  The gap is still there, but it's not as wide.

The experience did bring one new shorting coming in C# that I didn't write about before.  VB.Net supports partial namespaces; C# does not.  Let me explain.  Imagine you want to read from a file.  In .Net, that means using the System.IO namespace.  With Visual Basic, System is imported by default and child namespaces are automatically resolved.  So, for example, to see if a file exists I can just say something like this:

If IO.File.Exists("foo.bar") Then

That won't work in C#.  You have to either type out System.IO.File or add a using directive for System.IO at the top of the file.  Now the IO namespace this isn't a big deal.  You're probably going to use the classes from the namespace a dozen times if you use them once, or it's not a lot to type otherwise.   It does pollute your intellisense namespace though, and it starts to become annoying when you also need StringBuilder and have to import System.Text but aren't using anything else from that namespace.  Or maybe you need a single Dataset, but nothing else from System.Data.  You can quickly accumulate several using directives that only exist to support one class declaration.  The point is that the class library heirarchy in .Net is rather flat, and there are a lot of little things you might want that in C# that all require a using directive where VB.Net does not.


So C# doesn't let you use a partial namespace in a declaration.  Big deal.  Well, this example only illustrates the least of my complaints.  The worst things about what I've shown so far are that it leads to a polluted intellisense prompt and that it breaks your flow to have to jump to the top of the page to add the declaration and then jumb back versus simply typing a shorter name.  If this was all there was to it I would just keep my mouth shut and deal with it.  But there are other, more important manifestations.  I'll give two examples.

Say you're working on a project that involves a lot of XML.  You'll probably import the System.Xml namespace.  In C# you may also need to import one or more of System.Xml.Schema, System.Xml.XPath, System.Xml.Serialization, or System.Xml.Xsl.  And now you're using a whole bunch of different classes with no reference in the code for which specific namespace each class came from.  In VB.Net you can just preface the class names of classes not directly in the Xml namespace with only the missing child namespace.  For example, if I choose not to import System.Xml.Schema but already have System.Xml I can still just say "Schema.XmlSchema" instead of "System.Xml.Schema.XmlSchema". 

That sounds a lot like the same complaint I had earlier, and it usually ends up requiring a little more typing than importing all the namespaces once.  What's new, though, is that in my opinion this has the potential tomake the code easier to understand.  The specific example of "Schema.XmlSchema" is pretty redundant, but there are plenty of cases where having one level of the namespace with the class would add clarity to the declaration.  This is especially true for junior developers who may not be totally familiar with the framework.  Used correctly, it can provide just a little bit of important context for each of your declarations.

Now for the next example.  After all, the Xml namespace is pretty well understood.  Also, my XML example wasn't very good; what do we need the extra context for?  Fair enough.  Let's look at something even more relevant.  What if you're working on a project where you want to use a third party library?  In this case, being able to see a small amount of context for each class may have a little more value.  And what if you've never used this library before? 

The SharpZipLib comes to mind as a reasonable example.  Imagine your project involved using the library to untar some files.  In C# you would import ICSharpCode.SharpZipLib.Tar and then type class names as usuall.  However, there is no help from the IDE in finding out what those classes are without re-typing the entire namespace every time.  This shows one final reason why partial namespaces are useful.  In VB.Net you just import ICSharpCode.SharpZipLib.  Now, if you only type "Tar." you get an intellisense list of members of that namespace as soon as you hit the period key.  For this reason, I find VB.Net is much easier to work with when learning the ins and outs of a new library.

In summary, I really like VB.Net's ability to use partial namespaces in declarations.  There's probably a more official name for the feature, but I don't know it.  It might even be something you can just turn on as an option for a project in C#, in which case I hope someone will tell me where to find that option.  I think the feature promotes code that is more readable, doesn't break the flow of the programmer as often, keeps your intellisense namespace clearer, and aids in learning new libraries.


Post a comment

Make your Code as Useful as Possible

  • Jul 16, 2008
  • Post a comment

I don't know how many times I've seen code similar to the following on programming help forums:

Function MyMethod(ByVal InputParameter() As String) As String()
    'Do Stuff here that returns a different string array
End Function

That's okay.  We can make that work just fine.  I mean, it could be worse; they could have used an ArrayList.  But we can do better, too.  Now look at this code:

Function MyNewMethod(ByVal InputParameter As IEnumerable(Of String)) As StringCollection
     'Do stuff here that returns the string collection
End Function

This code is a drop in replacement for the code above.  By that I mean that anywhere you call the first method, you could replace it with the 2nd method and your code will still work.  You don't have to change anything else except the type of the variable that accepts the result.  So if it works the same, why change?  I mean, it takes a little more thought to read it and therefore you could argue it takes more to maintain.  What do you gain?  The answer is that you've just made the function more useful and flexible.

Let's start with the InputParameter.  Perhaps right now you have always have a string array when you call it.  But what if later you start working with something like an ArrayList or the generic List(Of String)?  Those will both work with that method right now, with no other changes.  You can even put Xml or datatables through there with a little work.  IEnumerable(Of String) will accept anything that can give you a string in a For Each loop.  So just by changing the type I've instantly made the code more powerful.

Now for the return type.  I could have used IEnumerable(Of String) here as well.  However, in this case that would actually limit the capabilities of the function.  You'd lose the ability to look at values by index.  What I want to do is expand the capabilities of the function.  String() already implies IEnumerable(Of String).  However, by moving from the  array up to a StringCollection I not only keep that ability but gain the ability to easily add or remove items from the collection.  I also get some bonuses like the nice .Contains() and .IndexOf() methods.  So again, I've expanded what the function can do, and therefore made it more valuable. 

For example, you might now be able to use it somewhere that before would have required a separate (but very similar) function.  Or the switch to the new function might save you having to write a for each loop on the return value because of additional capabilities in the collection like the .Contains() method.  It will enable you to get the same work done in less code.

One other point is that the new version should perform about the same as the old one. There might be a very small loss, but any difference is likely to be minor relative to other considerations in your code.  Certainly falls under the heading of "premature optimization."

In general, we can think of IEnumerable(Of String) as a wider type, and StringCollection as a more powerful type.  A good rule of thumb is to accept a wider type for input and return a more powerful type for output.  In this way you will make your code more useful, and in the long run that's probably a good thing.

Post a comment

The Power of Declarative Code in ASP.Net

  • Jul 11, 2008
  • Post a comment

ASP.Net is often mistaken for a simple update to Classic ASP.  Just take ASP, throw in some .Net classes, and you're done.  You get a better IDE and you might get a small performance boost from using pre-compiled code, but not much else.  Of course, you could use ASP.Net like Classic ASP.  If that's as far as you go then I suppose everything above would be true.  In fact, one of the nice things about ASP.Net is it's ability to work with you as your skills grow, and give at least small benefits right away.  But if you stop here you're missing out on the best parts of the language.

ASP.Net allows you to be more declarative in how you lay out your page.  If an SQL SELECT query is the classic example of declarative code, than you can think of each server control as a little SELECT query.  It's a way to declare what you want to the framework and let it worry about how to actually do it rather than listing out every little step.  The advantages of this approach are numerous: your code base is smaller, busy-work is reduced, you see pages from a higher level, you get better separation of concerns for designers and developers...   I could go on.  You might be surprised to hear that there's a huge potential performance benefit as well.

If that thought does surprise you, it shouldn't.  After all, it's what makes SQL fast.  Take an SQL cursor, for example.  A cursor is usually the slowest way to accomplish any task in a database.  Why?  Because it's procedural.  If you can re-write the cursor to use declarative statements you will nearly always see a significant performance improvement.  This is because the database can now use it's cache and indexes, and even execute the operation in parallel.  The same concept applies to a web page, and for the same reasons.  I don't know to what extent, if any, ASP.Net applies these concepts.  But it should be theoretically possible.

First we'll look at caching/indexing.  With declarative code the server can get a rough picture of what every instance of a page will look like.  It can use this picture to create and cache a pre-loaded version of the page, where all the declared controls and HTML are loaded and put in the intial state defined by the aspx code.  This is a huge improvement, because the amount of work left to do for each request is greatly reduced.  Classic ASP would have to start from the beginning for every single request and work it's way through all of the page code to be sure of getting the correct result, like any other procedural code.  The ASP.Net environment can take a few shortcuts.

Now let's move on to parallelism.  Server controls in ASP.Net ultimately boil down to plain old XML, and every XML document is a tree structure of tags.  The nature of XML means you have a certain amount of independence between siblings at any given level of the tree; the contents of one sibling aren't really relevant to the contents of another sibling.  That means that each sibling can be processed in parallel.  In practice ASP.Net controls can have code that modifies other parts of the page, so this ability is not absolute.  However, it should be possible for the compiler to analyze the code and build a dependency tree for a page, and in this way get some advantage.  The important thing here is that less procedural code means a more straightforward dependency tree and a greater the potential for parallelism.

As CPUs with more cores become more and more common a built-in mechanism to render a page in parallel will become more and more significant.  This is true even though a web server may already utilize multiple cores by  processing separate requests in parallel.  For example, while one node on a page waits for a request to a database to complete, rather than blocking the entire page other nodes can continue to process.  In this way individual requests can still be served faster.

I need to repeat that I have no idea if these concepts are currently implemented.  I suspect at moment they are not, and it's a shame if that's so. But the possibility and potential here is certainly interesting.

Post a comment

SQL Injection, Part 2

  • Jun 24, 2008
  • Post a comment

I first wrote about an SQL Injection attack way back in April.  It died down for a while, but that attack is still going on.  It broadened in scope to even hit some php and ASP.Net sites, and this week I've noticed several new requests for assistance.  If you're wondering how your site would fare, you should check out this article.  Here are the steps to take if you need to fix the issue:

  1. Take down the site.  As it stands now, the site is actively serving malware to it's users.  This is not a situation you want to be in.  You can put up a temporary page to tell the users what is happening, but you should not allow the  site to continue operating until it's fixed.
  2. Fix the vulnerabilities that allowed the breach in the first place.  The nature of the attack is that the site will be infected again inside of a week unless the vulnerabilities are closed.  It could be as simple as replacing a single apostrophe with two apostrophes on a few form fields or it could be much more complicated, but it must be done.  The article I linked to above has some tools that can help.
  3. Fix the database.  Now we can finally begin to undo the damage.  You have a couple options here, including restoring from backup, though that may not be necessary.  Instead, I modified the code used for the attack to help with the cure.  If this seems cryptic it's because I only changed what was necessary to make it work.  Note that you should not run this code if the database contains valid instances of the text "<script" anywhere:

DECLARE @T varchar(255),@C varchar(255)
DECLARE Table_Cursor CURSOR FOR select a.name,b.name from sysobjects a,syscolumns b where a.id=b.id and a.xtype='u' and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)
OPEN Table_Cursor FETCH NEXT FROM Table_Cursor INTO @T,@C
WHILE(@@FETCH_STATUS=0) BEGIN
exec('update ['+@T+'] set ['+@C+']=LEFT(['+@C+'], CHARINDEX(''<script'', ['+@C+'])-1)
WHERE CHARINDEX(''<script'', ['+@C+']) >0')
FETCH NEXT FROM Table_Cursor INTO @T,@C
END
CLOSE Table_Cursor
DEALLOCATE Table_Cursor

Now, finally, you can put the site back up and things should be back to normal.

Post a comment

Firefox

  • Jun 17, 2008
  • Post a comment

Don't forget to download the new Firefox today. 

... if you can, that is.  It seems that the folks over at Mozilla vastly underestimated the amount of traffic their world-record try would bring.  At least, the server isn't responding at the moment, and hasn't worked for me since the official start time nearly two hours ago.

You may still be able to get the new version using this link:
http://download.mozilla.org/?product=firefox-3.0&os=win&lang=en-US

However, this link isn't 100% reliable either, and I don't know if it will get counted towards the record.  I have used to get install the new Firefox on the three systems I use, so if you can't wait and the link doesn't work on the first try then give it a few moments and try again.

Update:  The site appears to be working again.

Post a comment

Excel Column Names

  • Jun 12, 2008
  • Post a comment

I ran across a request in a forum today to create an Excel column name from an index.  It sounds simple, but it's harder than it looks.

The obvious solution here is to think about a column name as a base 26 number, with A-Z for digits.  Unfortunately, it doesn't quite work like that.  The '0' digit is broken.  For example, counting the column names from A you wrap around to AA after reaching Z. If this were base 10 it would be like counting from 1 to 9 and then getting 11 instead of 10, or counting from 0 to 9 and then getting 00 instead of 10, depending on whether you treat A as 0 or 1. So it's tricky.

I thought I could get around that but that it would take more work than it's worth, so I decided to look around online. Surely there would be something already out there. What I found was a bunch of over-complicated implementations that all break somewhere on one of the boundaries I described.  Even the Microsft support example doesn't work well.  What a disappointment.

So I ended up writing a new version after all.  This one will scale, and it's not even that complicated. I just had to get a little recursive:

Function ColumnName(ByVal index As Integer) As String
        Static
chars() As Char = {"A"c, "B"c, "C"c, "D"c, "E"c, "F"c, "G"c, "H"c, "I"c, "J"c, "K"c, "L"c, "M"c, "N"c, "O"c, "P"c, "Q"c, "R"c, "S"c, "T"c, "U"c, "V"c, "W"c, "X"c, "Y"c, "Z"c}

        index -= 1 'adjust so it matches 0-indexed array rather than 1-indexed column

        Dim quotient As Integer = index \ 26 'normal / operator rounds. \ does integer division, which truncates
        If quotient > 0 Then
               ColumnName = ColumnName(quotient) & chars(index Mod 26)
        Else
               ColumnName = chars(index Mod 26)
        End If
End Function

That still needs some basic bounds and error checking, but it works well for a quick sample.  It's only 11 lines of code as is appears in my IDE (curse the vox formatter!) so it's pretty easy to follow.  It should perform well too, since it would be very odd have more than one or two recursive calls.  Now hopefully Google can index this page better than all those bad implementations I saw out there, but I'm not holding my breath.

 

Post a comment

Oops

  • Jun 12, 2008
  • Post a comment

So it's been a while since my last update.  In truth, I just haven't felt like writing.  It never fails that the same time I put the update schedule in the title is when I finally cave- though I did make it for several months.  Anyway, I do have a few topics coming, but I don't know how often I'll be updating so I took the schedule down.

Post a comment

Screen Scraping ASP.Net

  • May 14, 2008
  • Post a comment

I consider myself to be pretty good at scraping web sites.  I've been able to get into sites others thought where impossible.  I've even made it through some pretty tricky login verification.  My tool of choice to accomplish this is VB.Net and a simple class I've written that re-implements much of System.Net.Webclient, and extends it to support a few additional functions.  Unfortunately, I'm not ready to release that class here yet- there are still some issues with it I want to work out first.

The real key to scraping a web site isn't the technology, anyway.  Scraping a given page, once you have it, is and always will be rather trivial.  It's when you're scraping a site where you may have to make several requests in sequence to get the server to create the page you want that things can get tricky.  With that in mind, the key to successfully scraping a web site is simply to study client source for the site until you can accurately reproduce http requests that are the same or sufficiently similar to those issued by a web browser under the command of a normal user.  This may mean parsing some very nasty javascript now and then, but that's they way it works.  Of course, there are tools that can help with this, but when it comes down to it you usually just need to be able to read the code.

Today I was helping someone scrape an ASP.Net site.  This was my first time scraping ASP.Net, which surprised me considering it's my web platform of choice.  I was also shocked to discover that ASP.Net can be unusually difficult to scrape.  Perhaps in hindsight I should have known this, but it caught me unaware this morning. 

You see, ASP.Net pages include a few extra things by default that must go with every request.  The hidden __ViewState field, for example.  The server normally does some basic validation on the application state, so just sending an empty view state may not cut it.  Also, most server controls send requests using very cryptic IDs via an __doPostBack() javascript function.  It's actually quite difficult to follow.  More than that, since it's so easy to push the work for simple controls to the server, it's very easy to obfuscate what a particular link is really doing.  So easy that you may even end up hiding things by accident.

I figure after I get a little more experience scraping these pages I'll discover there's a trick to it, and once you know the trick they may even turn out to be easier.  In fact, I would expect that knowing a site uses ASP.Net would allow you to make certain assumptions about what fields you need to submit and how to submit them.

So if you have a web site and you want to protect it from scrapers, well... there's really not much you can do.  Once a page is sent to a web browser a smart programmer will always be able to decipher it.  But you could do worse than to choose ASP.Net.

Post a comment

The End is Near

  • May 5, 2008
  • Post a comment

Get out your tinfoil hats.  It turns out that a small part of Windows was written by a machine.  You have to read through most of a rather boring post to see what I'm talking about, and if you blink you might even miss it, but it's there.  This isn't an official Microsoft statement, but it is an officially sanctioned blog of a senior Microsoft engineer.  I'm sensationalizing this more than a little, but I'm sure there are those who will see this as a sort of slippery slope and wonder where it ends.

Post a comment

Read more from jcoehoorn »

jcoehoorn

About Me

jcoehoorn
United States
View my profile

My Links

  • What is a Renaissance Programmer?
  • daveBlog
  • Home Server
  • Local TV Listings
  • Wish List

Tags

  • 3d
  • ajax
  • apple
  • computer hardware
  • ebay
  • football
  • home
  • life
  • links
  • linux
  • minesweeper
  • money
  • product ideas
  • scrybe
  • security
  • software
  • spam
  • thoughts
  • vox
  • web20

View my tags

Archives

  • July 2008 (4)
  • June 2008 (4)
  • May 2008 (4)
  • April 2008 (13)
  • March 2008 (10)
  • 2008 (63)
  • 2007 (33)
  • 2006 (33)

Subscribe

  • Subscribe to a feed of these posts
  • Powered by Vox
  • Theme designed by Lilia Ahner
  • Use this theme
  • Home
  • Explore
  • Tour Vox
  • Start a Vox Blog
Already a member? Sign in

Back to top

View Vox in your language: English | Español | Français | 日本語

Vox © 2003-2008 Six Apart, Ltd. All Rights Reserved.
Help | Learn More | Terms of Service | Privacy Policy | Copyright | Advertise | Get a Free Vox Blog

Loading…

Adding this item will make it viewable to everyone who has access to the group.

Adding this post, and any items in it, will make it viewable to everyone who has access to the group.

Create a link to a person
Search all of Vox
Your Neighborhood
People on Vox

(Select up to five users maximum)

Vox Login

You've been logged out, please sign in to Vox with your email and password to complete this action.

Email:
Password:
 
Embed a Widget
Widget Title: This is optional
Widget Code: Insert outside code here to share media, slideshows, etc. Get more info
OK Cancel

We allow most HTML/CSS, <object> and <embed> code

Processing...
Processing
Message
Confirm
Error
Remove this member