2010-07-26

Rob Pike on a False Dichotomy

Rob Pike's keynote at OSCON, describing what I think is a very valid niche: the need for a modern statically typed programming language. Of course, Go is his answer, but at 8:55 in, he makes the case that folks have been set up to adopt a false dichotomy, that static typing is necessarily bad and dynamic typing is necessarily good.

2010-07-23

2nd Fundamental Law of EnScript

The Second Fundamental Law of EnScript is more complicated than the First or the Third:

All simple objects are ref-counted, but only root NodeClass objects are ref-counted. NodeClass objects in a list or tree are not ref-counted.

If you want to keep from crashing EnCase with your scripts, you must thoroughly understand this law. Unfortunately, it's a bit tricky.

What does ref-counted mean?

Like most languages in use these days, EnScript features garbage collection. When you create new objects in EnScript, those objects take up space in RAM. Rather than making you, the programmer, tell EnScript when that memory should be released, EnScript safely decides for you. It's really less like having your trash picked up once a week and much more like maid service.

An example of garbage collection in EnScript:

class MainClass {
  void Main(CaseClass c) {
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");
    NameListClass strings2 = strings;
    Console.WriteLine(strings2.Count());
  }
}


In this example, you can see that I've allocated at least one NameListClass, and have a second reference. What you don't see is me calling delete, as you would if this were C++, or free, if this were C. Folks with Java, Python, or Perl experience should be used to this.

What happens is that EnScript gets to the end of the Main() function and realizes that the NameListClass object will never be used again. It's at that time that it is destroyed and the memory is released.

Reference counting is the underlying mechanism that makes this work. The key is to understand that the strings and strings2 variables in the above code are variables that refer to a NameListClass object; they don't contain the objects themselves. If you've programmed in Java or Python, this is normal. If you're coming from a C/C++ background, references are essentially pointers that prohibit you from doing fun pointer arithmetic or things like recasting 17 to an object. In the example above, the strings reference is created and assigned the memory location of the new NameListClass object. The Parse() method is called on the object, creating several child objects. The strings2 reference is then created and assigned the memory location within the strings reference; now both references refer to the same object.

Because objects can only be manipulated through references in EnScript, EnScript can track how many references point to each object. When a new object is created, a counter variable is created along with it, and when the object's memory location is assigned to a reference, the counter is incremented. When a reference is destroyed or has another object assigned to it, it decrements the counter of the object it had been pointing to. Here's the magic part: if the counter is zero after the decrement, the reference then says, whoa, I'm the last guy out of the room so I should turn out the lights... and it destroys the object.

Java uses a more advanced form of garbage collection called "mark-and-sweep" where a background thread runs periodically and tries to figure out which objects can be destroyed. Reference counting is used in Python and can also be used in C++ via the new shared_ptr<> template in the standard library (new in the TR1 update; use Boost's shared_ptr<> if you don't have it, since they're functionally equivalent). The upside to reference-counting, though, is that it's deterministic; you know that an object will be destroyed when the last reference goes away, which allows you considerable control over memory usage.

There are a couple of advantages to references and garbage collection. The first is that you don't have to worry about managing memory. The second is that you can't hose yourself with bad pointers the way you can in C or C++. A reference is either null (dereferencing it causes a runtime error, which will kill your script but won't crash EnCase) or it refers to a valid object. That's the theory, at least...

Great, what does this have to do with "root NodeClass objects"?

In our example above, we know that strings and strings2 both refer to the same NameListClass object, and at the end of the Main() function the reference count of the object should therefore be 2. It is. What about the child objects that were created by Parse()?

class MainClass {
  void Main(CaseClass c) {
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");
    Console.WriteLine(strings.GetChild(2).Name());
    NameListClass strings2 = strings;
    Console.WriteLine(strings2.Count());
  }
}


Well, as it turns out, child objects in a NodeClass are not ref-counted. They're contained by the root NodeClass object. When the root is destroyed, so, too, are the children.

Okay... why aren't all objects reference counted?

Because. The main reason is that most everything you see in EnCase is derived from NodeClass (as pointed out in the First Law, which we'll revisit in the Third Law). Reference-counting has some overhead and, like most Windows apps, EnCase is written in C++ and can manage memory manually. So, internally, EnCase doesn't use reference counting for all those child objects, since that would be inefficient when working with millions of objects. Rather than make NodeClass work completely differently for EnScript objects than it does for EnCase's internal data, the semantics are the same.

Why do I need to keep this in mind as a Fundamental Law of EnScript?

Because this can hose you. You can write a script that crashes EnCase, killing not just your script, but any work it's done and any unsaved work in the case.

Consider this script:

class MainClass {
  typedef NameListClass[] ListArray;

  void Main(CaseClass c) {
    NameListClass found; // null reference;
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");

    found = strings.Find("e"); // found points to child 5
    Console.WriteLine(found.Name());
    strings.Close();           // destroy all children


    // this code causes a lot more memory to be used
    ListArray array();
    for (unsigned int i = 0; i < 1000; ++i) {
      array.add(new NameListClass());
      array[i].Parse("g a r b a g e", " ");
    }


    if (found != null) { // totally true
      found.SetName("buffer overflow"); // can crash
      Console.WriteLine(found.Name());
    }
  }
}



Now, if you run this script... it may work; it just might print out "buffer overflow" to the console without any problems. But it's not guaranteed to work.

The problem is that found is set to refer to the fifth child of strings, and then we delete all the child objects. found, though, still thinks it's pointing to the fifth object... but it's really pointing to the memory formerly occupied by that object, and now that object could be used for something else. And since we ran through some code in the meantime that created 1,000 new lists, each with child objects, chances are high that that memory is now being used for something else.

The term of art for what happens when we call SetName() on found is "undefined behavior." We can't be sure.

If you're running a recent version of EnCase, what will probably happen is that the script will terminate and EnCase will report an error, but keep on running. If you're running an older version of EnCase, it'll probably crash and burn. Newer versions of v6 ask Windows to monitor for crashes that occur in EnScript and give EnCase a chance to recover from them. This is similar to the crash protection in the gallery view.

Even with the crash protection, though, you can still never be sure that you didn't hose everything. That's why you need to be careful when manipulating references to child NodeClass objects, and you need to understand how EnScript's memory model works.

2010-07-20

3 Laws of EnScript

The other day I stumbled across an old presentation (uploaded by Mark Morgan) that I'd created a few years ago. The presentation was my guide through a 1-3 day workshop/seminar that I led a few times for some clients, with time taken throughout for exploring example scripts and working in teams.

In the presentation, I make reference—with tongue firmly in my cheek—to "Stewart's 3 Fundamental Laws of EnScript." The three laws are:
  1. Data structures should almost always be composed [sic] from NodeClass.
  2. All simple objects are ref-counted, but only root NodeClass objects are ref-counted. NodeClass objects in a list or tree are not ref-counted. (link)
  3. Most of the EnScript classes are auto-generated by handlers from the EnCase view. WYSIWYG. (link)
I don't really explain these three laws in the presentation. Let me rectify that situation. Today, I'll cover the first law.

First Fundamental Law of EnScript
Data structures should almost always be derived from NodeClass


What this means:

class FooClass: NodeClass { // use NodeClass as parent of FooClass

  uint AnInt;

  FooClass(NodeClass parent = null):
    NodeClass(parent),
    AnInt = 7
  {}

}


The important bits are in red. This defines our own class, FooClass, declares that it inherits from NodeClass, and defines a constructor for FooClass (you are, of course, free to define other constructors). The constructor takes a reference to another NodeClass object (parent), and calls the NodeClass constructor for the object in the constructor initializer list, passing in the parent reference.

What is NodeClass and why should our own classes inherit from it?

I'm glad you asked. NodeClass is the quintessence of EnCase, the manifestation of the central metaphor: everything in EnCase is a tree. (If you haven't noticed that before, look around; everything in EnCase is a tree.) To use some software development language from the GoF book, NodeClass represents the Composite Design Pattern.

NodeClass has a lot of methods, so let's focus on the important ones:


class NodeClass {

  NodeClass(parent = null); // Constructor

  NodeClass FirstChild();
  NodeClass LastChild();
  NodeClass Next();
  NodeClass Parent();

}

What this says is that a NodeClass object can access the first item in a child list, the last item in a child list, the next item in its own list, and its parent. A picture is helpful:



This picture shows an item in green, and how it relates to other NodeClass items in the same tree. Each NodeClass object has just enough data to form a tree, if we link them together. For example, let's consider things from the perspective of "First Child" in the picture above. We'll rename it "Item", rename the old "Item" to be "Parent", blank out the other labels, and make the old links dotted lines.



Essentially, NodeClass allows you to construct lists of lists. You can see in the second image that we can have another child list, and because of the Next and Parent links, it can be linked into the overall tree.

Almost all of the API classes in EnScript inherit from NodeClass. EntryClass, BookmarkClass, NameListClass, FileSignatureClass, etc.

So, why should my classes inherit from NodeClass?

For a few reasons:
  1. foreach() and forall() loops work with NodeClass. If you have multiple objects of your class, you can form them into a list using NodeClass (by passing in the parent to the constructor, or by calling Insert() on the parent) and then iterate over the loop using foreach(). You can be extra-fancy and create trees, too, and then iterate over every item in the tree using forall().
  2. Some user interface widgets only work with NodeClass. For example, ListEditClass, TreeEditClass, and TreeTableEditClass can only display objects which inherit from NodeClass.
  3. Properties only work with NodeClass. EnScript supports a limited form of reflection in a manner, somewhat similar to Python or Java. Given an object, foo, you can use the typeof() keyword to ask for a SymbolClass object that describes the class of foo. You can then ask the SymbolClass for all properties of foo, and PropertyClass::GetProperty() and PropertyClass::SetProperty() allow you to get and set, respectively, the underlying property values of foo. That is, you can access the data indirectly, without knowing what the class is or calling the property methods directly. This is great for writing your own object-relational mapper (ORM) like Ruby on Rails' ActiveRecord, auto-serializing data out to xml, and implementing a limited form of duck-typing.
This obviously isn't a complete introduction to EnScript, or even NodeClass, but I hope it clarifies the First Law. I'll post about the Second Law in the next few days.

2010-07-15

Link files run malware

Brian Krebs has news of a new vulnerability in Windows 7, where link files on a thumb drive can execute arbitrary code when encountered by Windows Explorer. The link files themselves do not need to be clicked on by the user.

AND the malware they describe seems to target SCADA systems. I used to think that stego was a non-issue and that no one in the history of computer forensics ever came across legitimate use of it... but then there were the Russians. Similarly, I used to think that all the hype about securing SCADA systems from malware was a non-issue...

2010-07-12

Beer Can Charcoal Chimney

Later today I'm going to try my hand at barbecuing pork ribs, using Sifty/Pelaccio's recipe. The problem I've had with barbecuing using my Weber grill, though, is the part in every barbecue recipe where it says "add more fuel as needed."

How do you add a few briquettes of lit charcoal to a grill? For today, the answer is to slide it down the side, near the handle of the grill plate, where there's enough of a hole for a briquette to fit through (presumably, the ribs will be on the other side of the grill). How to get it started? That's what this post answers.

I have a normal charcoal chimney, which I love dearly. Works fast, no need for lighter fluid. However, you need to have ~10 briquettes at minimum for it to work. Given its large diameter, you don't get much of a chimney effect if you have fewer briquettes. For barbecuing a modest amount of meat on a modest grill, I need something to let me light three briquettes at a time. This is my solution:



I took a Miller Lite beer can and turned it into a charcoal mini-chimney. I used a variety of cutting/stripping pliers to chew the top off and to cut small holes around the base. I then drilled quarter-inch holes along the side, and right in the middle of the base. I left half of the tapered top on, to serve as a sort of guard when dumping out the briquettes, with the guard facing up. It is easy to hold and carry using kitchen tongs.

Does it work? The answer is: mostly.

In a test run this morning, I was able to light three briquettes successfully, and learned some lessons that I will apply this evening. Like any chimney, this one wants air, and lots of it. To get it started, I thought I'd put it on top of something, burn some newspaper underneath it, and be good. I had a terra cotta holder for those nifty mosquito coils, and tried using it.


The problem was that the mosquito coil holder couldn't draw in enough air to burn the newspaper I'd stuffed into it. A little lighter fluid fixed that problem...

After the lighter fluid burned off, though, a lot of somewhat charred newspaper remained, unburned, in the terra cotta thingy. It was blocking airflow into the chimney. Since I didn't have the grill going, I decided to take the chimney off the terra cotta pot and place it directly on the grill.

You can see there's a little bit of ash already. I let it sit on the grill and did not attempt to relight it. This worked well. 15-20 minutes later, the briquettes were ready.


Dumping them out was easy.


The can was smudged, but otherwise fine. So, I established:
  • you can create a charcoal mini-chimney with a beer can
  • with about a dozen holes in it, it seems to draw in enough air, provided it's on the proper surface
  • the can withstands the heat well enough
Some open questions remain, however. Can I get the coals started without using lighter fluid? Can I find a better surface for the chimney to sit on? What I'm going to try next is using the top of the terra cotta holder (which has holes in the center and a nice lip that fits well with the base of the can) and place it on some bricks, starting a bit of newspaper beneath it (and maybe putting some shredded paper in the chimney with the briquettes). I think the fire and then the chimney will both get sufficient airflow.

2010-07-11

Forensics and Forgery

A fascinating article in The New Yorker about Peter Paul Biro, an art dealer in Montreal who uses fingerprint analysis to authenticate disputed works of art...

...or does he? Be sure to read the whole thing.

2010-07-09

autocrap tools

Love it. Anyone ever notice that they're slow, too? In 2010, why does it take 30s to scan my system for the same old crap, every single time?

Gratz to 'gesh

I am unsurprised to see that my old partner Yogesh provided a correct answer to the latest SANS network forensics contest. Say what you will, but between Lance, Yogesh, Geoff Black, Jamie Levy, and many others, there was once an impressive collection of talent at a particular forensics company.

IBM keyboards

Unicomp bought IBM/Lexmark's keyboard factory... and they still make the same old keyboards, with the new hotness, USB.

2010-07-02

2010-07-01

looping

I want a loop construct like Python's join(). That is:


Iterator cur = start;
if (cur != end) {
  foo(*cur);
  while (++cur != end) {
    bar(*cur);
    foo(*cur);
  }
}