2010-12-04

Wanted: strpbrk()

I'm looking for a good EnScript API method for testing whether a string contains any characters in a second string.

That is, I want the standard C function strpbrk().

For some reason, I feel like earlier versions of EnCase had a function that did this. (By earliere versions, I mean v5 or v4.)

String::Trim() would almost do the job, except that it replaces the occurrences, which isn't what I want. The second parameter of Trim() is an int for passing in a combination of the String::TRIMOPTIONS enums. So, I tried passing in 0 instead of one of the enums, hoping this would find all occurrences but not replace them. Sadly, passing in 0 finds no occurrences.

You can use SearchClass with a character class regexp (e.g., [abc]) or NameListClass::Parse(), but both of them are pretty heavyweight.

In happier news, though, you can use EnScript to convert doubles to strings in scientific notation.

2010-11-23

Android Overview

Tim Bray's current job with Google centers around being an evangelist/catalyst for the Android platform. He's often accused of being a shill; his title is "Developer Advocate."

I'd argue that there is no better way to advocate to developers than to provide them with clear, concise overview documentation. Whether you're a coder, a forensicator, both, or neither (what are you doing reading this blog if you're neither?), you should check out his recent blog post, What Android Is.

2010-11-19

A Cheap Trick for quadratic loops

The other day I blogged about the perils of a hash value-related filter that exhibits quadratic time. Sometimes there's a hack you can use to make the simplicity of a quadratic nested loop a bit more palatable, performance-wise. Here's a script I wrote a couple of weeks ago to give me the unique list of folders containing selected files:


class MainClass {
  void Main(CaseClass c) {
    if (c) {
      EntryClass root = c.EntryRoot();
      NameListClass list();
      String folder,
             last;
      forall (EntryClass e in root) {
        if (!e.IsFolder() && e.IsSelected()) {
          folder = e.FullPath().GetFilePath();
          if (last != folder && !list.Find(folder)) {
            new NameListClass(list, folder);
          }
          last = folder;
        }
      }
      foreach (NameListClass f in list) {
        Console.WriteLine(f.Name());
      }
    }
  }
}


This code uses a flat NameListClass for keeping a unique list of the folders and NameListClass::Find() does a sequential scan of the list. But, it still runs in a reasonable time.

The trick is that the script remembers the folder of the last previous entry. Chances are the current entry is in the same folder, so the script doesn't need to scan the list. In general, when you're looping over a tree with forall(), there's probably an ordering principle to the data in the tree. It's easy to cache a variable or two from the last item processed, and it can save yourself a lot of work. If the cache misses, then, yes, your code is slow... but it's also simple and fast to write. Obviously it's not appropriate for every situation, but it's a good trick to keep in mind.

2010-11-17

Fast Hash Matching

There's a new EnScript filter that Lance blogged about, written by Oliver Höpli. The filter will connect to a relational database and then only show files in EnCase whose hash values match what's in the database. It's pretty useful, you should grab it.

There's a filter that ships with EnCase which does the same thing. Ever have a little mistake slip by you, and then find you have to live with it forever? That's what happened with that filter, and I'm sorry: it's my fault. As Oliver, and others point out, the filter maintains its set of hash values as a flat NameListClass. For every file, it looks through the list for that file's hash. If you have n files and m hash values, that means the filter will run in O(nm) time. If nm, then that means it runs in O(n²) time; it's quadratic.

While the filter is simple, that's not enough to account for the fact that it takes an unacceptable—and needlessly so—amount of time to run. So, a long time ago, I wrote a different version, that uses a binary search tree, built out of NameListClass as well, to do the lookups. This reduces the running time of the algorithm to O(n log n). As Lance points out, log n of 1,000,000 is approximately 20 (if 2 is the base of the logarithm, not 10). If you have a million files, it will take 20,000,000 operations to run the filter using the binary search tree. That's a lot better than 1,000,000,000,000 (1 trillion) operations.

In the comments, someone posted the code from the filter. It relies on the binary search tree implementation in GSI_Basic.EnScript, that's in the default Include/ folder. As it turns out, the better filter had once shipped with EnCase (I think; I may be wrong), but the filters were accidentally replaced during v6 development, prior to its release. The sad thing is that I knew about it, and should have spent 30 minutes to fix it, but it was always lower priority than other problems. Unfortunately, what would have cost me 30 minutes has ended up costing everyone weeks (months? years?) of processing time.

(As an aside, this is a good example of why life shouldn't be handled as a priority queue. Sometimes there's low-hanging fruit that you can't get to if you always work on the highest priority problem, and that low-hanging, but lower priority, fruit might be tasty and nourishing.)

While it's true that the binary search tree approach is faster than Oliver's filter, I'd encourage you to use Oliver's filter instead. While a database will almost never be the fastest approach, it won't be bad. Compared to hours and hours of processing time for the quadratic filter, the difference between 3 minutes and 20 seconds is negligible. And for that additional 2 minutes and 40 seconds of trade-off, you'll get the ability to use more hash values than you have RAM, something the binary search tree filter can't handle, and you'll be able to manage your hash values conveniently in a database.

(If you have billions and billions of hash values, more exotic solutions may be necessary.)

The binary search tree filter isn't even the best possible way. The code on Lance's blog converts every hash value to a string and relies on lexigraphical sorting of hash values. This adds a lot of overhead. It performs a lot of memory allocations since a new NameListClass object must be allocated for every hash value. These objects may be located in different parts of RAM, so you lose the benefits of L1 and L2 cache. What's more, you ignore some of the properties of hash values themselves... you end up spending too much time maintaining the order of the binary search tree when you could take advantage of how a hash works.

For example, what you could do is take a guess at the number of hash values you'll have. For Unique Files by Hash, you know you'll have an upper bound of the number of files in the case. So, create a MemoryFileClass buffer. Multiply 16 (MD5) by the total number of files, and then add an extra 50%. Now divide that buffer into 65,536 blocks. Instead of creating a binary search tree, put all the hash values that have the same first two bytes into each chunk. You'll have more of some than others, but since we added our fudge factor, you'll still have room, and it won't vary too much because of the cryptographic properties of MD5 (if things were lopsided, it'd be a bad hash function). At the beginning of the chunk, store the number of hashes in that chunk. Then sort the hashes in each chunk.

Now, as you go, you take the first two bytes of any hash value. Seek directly to that chunk of the MemoryFileClass buffer. Congratulations! You have now brought all those sorted, closely related hash values into L2 cache (if not L1). Additionally, you've eliminated roughly 99.998% of the possible hash values you have to consider. You now only need to binary search the hashes in the relevant chunk, and this will be fast because you're dealing with contiguous bytes in RAM, instead of objects that stores hashes as strings. This would be much much much faster than the binary search tree approach.

Or you can just use Oliver's filter, which I encourage you to do.

2010-10-23

An EnScript - Count Selected...

I've rewritten this EnScript more times than Methuselah had years. All it does is recurse through your case, count separately the number of files and folders selected, sum the logical sizes of the files, and write the output to the console. The Dixon box is great for telling you how many items in EnCase you have selected, but it's not enough when you're working on conditions and other criteria for eDiscovery matters.

You can get this info in EnCase by right-clicking and choosing Copy Folders, but I hate right-clicking and modal dialogs, and their combination, however trivial seeming, is enough to make me write a script.


class MainClass {
  void Main(CaseClass c) {
    if (c) {
      long numFiles,
           numFolders,
           numBytes;
      forall (EntryClass e in c.EntryRoot()) {
        if (e.IsSelected()) {
          if (e.IsFolder()) {
            ++numFolders;
          }
          else {
            ++numFiles;
            numBytes += e.LogicalSize();
          }
        }
      }
      Console.WriteLine(String::FormatInt(numBytes, int::DECIMAL, String::COMMAS) + " selected, "
        + String::FormatInt(numFiles, int::DECIMAL, String::COMMAS) + " files, "
        + String::FormatInt(numFolders, int::DECIMAL, String::COMMAS) + " folders");
    }
  }
}

2010-10-17

How can I find unicorns in deleted ?

Hello-
I am a newbie so please be nice !! (sometimes mean people are not so nice) what I am wondering is how to find unicorns in a DELETED file? This file that I am investigating, it has been un-allocated by bad guy to hide the unicorns with stegnozgraphy..

1. even tho the file is deleted, could unicorns copy be of SLACK area?

2. Would any nice peoeple send me an X-Ways s/n#? I here X-Ways has good stegnozgraphy code.

It is very important I find unicorns so please help me.

thanks

2010-10-15

Everything can be evaluated as a boolean value

I was rereading jwz's seminal essay, "java sucks.", and it made me think of something that sucks about Java, the syntax for testing references for whether they're null:

Foo foo = bar.getFoo();
if (foo != null) {
  //
}

This is something that EnScript gets right: pretty much every EnScript expression can be evaluated as a boolean value, without having to use boolean operators.

References
Foo foo = bar.getFoo();
if (foo) { // true if foo is non-null
  //
}

Integers
int i = 0;
if (i) { // true if i is non-zero
  //
}

Strings
String s = "";
if (s) { // true if s is not empty
  //
}

Once you get used to this [COMPLETELY OBVIOUS] shorthand, it's hard going back to a language that thinks there's something magical about boolean expressions.

2010-10-13

3rd Fundamental Law of EnScript

The Third Fundamental Law of EnScript is:

Most of the EnScript classes are auto-generated by handlers from the EnCase view. WYSIWYG.

What does this mean? What is a handler? In a nutshell, if there's a view/table of data you can see in EnCase, whether filesystem "Entries," Keywords, File Signatures, and so on, then there's probably an EnScript class that corresponds to the type of data in the table.

For example, consider the Entries view. There's an EnScript class named EntryClass. The Entries view has a Description column; EntryClass has a Description property. Entries view also has a "Deleted" column; EntryClass has an IsDeleted property.

To be clear: If you can see it, you can [probably] program it.

Most of the classes you see in the EnScript Types view are very simple classes that just expose data from different views. Consequently, only a relatively small number of classes do anything interesting, that is, requiring documentation. For example, KeywordClass simply holds the data fields you see in the Keywords view and does little else.

So, what's a handler? In EnScript, there's HandlerClass, and what it allows you to do is create an intermediary between an EnScript class (one of your own if you'd like) and the table view, giving you control over the columns and how they're displayed. If we were more interested in philosophy than forensics, we could discuss how a HandlerClass is like the notion of an interpretant in Charles Sanders Peirce's theory of semiotics. (For an excellent treatment of Peirce, check out The Metaphysical Club by Louis Menand.)

In a similar manner, HandlerClass objects are used by the EnScript engine to expose access to data in EnCase. When you work with an EntryClass object in EnScript, you are working through a lightweight handler to manipulate the actual C++ object in EnCase. The important things to keep in mind are that these internal handlers are more-or-less automatic and idiomatic, making basic data access reliable, and that you are working directly with data you see in EnCase, not with a copy or query results or somesuch.

There's quite a bit more to write about HandlerClass and reflection in EnScript and I'm excited to tackle those subjects.

2010-10-05

EnScript parser error in control statements

This will compile:

class MainClass {
  void Main(CaseClass c) {
    forall (EntryClass e in c.EntryRoot()) {
      if (e.IsSelected(), 20) { // a comma is legal???
        Console.WriteLine("so bad");
      }
    }
  }
}


I'm not sure which is worse, that this compiles, or that I found such a syntax error in my own code.

There are free tools available to generate a parser from a formal grammar. To get started, check out the bison manual.

2010-09-28

How to use your own .NET DLLs in EnScript

My buddy Geoff Black has a good overview of how to create your own .Net DLLs so that you can use them in EnScript. He's also put the code on Github.

A couple of quick notes about COM support in EnScript:
  • stick to passing simple data types (ints, Strings, etc.)
  • APIs with COM events (i.e., callbacks) are not supported (e.g., Outlook's Find method)
  • to the best of my knowledge, an EnScript class cannot implement a COM interface
  • On one of the COM classes, call the static SetHaltException() method that appears in EnScript and pass it false. This will keep COM exceptions from propagating into EnScript; you can use SystemClass::LastError() to handle them instead.
    • Yes, this means you need to call LastError() after anything that could possibly fail
  • And, yes, you need to check every return value for nullity before de-referencing it
In general, I much prefer using command-line utilities from EnScript than writing tightly-integrated COM code.

2010-09-17

For those who like short, great musics

Item: Geoff Black has started blogging.

Item: I keep looking up Peter Norvig's handy table of how long it takes a modern computer to do various things.

Item: I attended the ACM Northeast Forensic Exchange (NeFX) at Georgetown University this week. There's a vast, vast gulf between investigators and researchers, and they need to interact more. Fortunately, there are some ex-investigators-turned-researchers, like Nicole Beebe and Gilbert Peterson. I thought I was the only forensicator who knew the term "Latent Dirichlet Allocation," but turns out Dr. Beebe and Dr. Peterson are all over it, and schooled me appropriately at happy hour.

Item: EnCase eDiscovery v4 was released. Hopefully those responsible are getting some needed R&R, although I've heard that a search criteria wizard is still nowhere to be found. If I can make a corollary to Joel's Things You Should Never Do, it'd be, "when you find yourself in Borland's shoes, don't fire Dale Fuller."

2010-08-11

Notes on raw FileVault access, for live system forensics

Yesterday, I stupidly/accidentally deleted a source file that I hadn't yet checked in. Unfortunately, I was unable to recover the data, but I did learn how to go about doing so, given that the file was stored in my home directory, protected by FileVault.

The first thing to know is that FileVault is essentially an encrypted DMG file. The DMG is also "sparse," which means that it looks really big to the computer (to get around being limited to a fixed-size), but only stores blocks which haven't been allocated.

This sparse DMG image essentially looks like a physical disk. It's then formatted with a partitioning scheme, and the primary partition is attached to the filesystem at your home path, e.g. /Users/jon, not unlike how a normal /home partition would be mounted under Linux.

The cool thing is that the encryption/decryption happens at the physical device layer, not at the partition layer. So, if you happen to lose a file in OS X in a FileVault-protected directory, here's what you should do:

1. Open the terminal

2. Run "hdiutil info". This will spit out information about any mounted DMGs you have open:

framework       : 283
driver          : 10.6v283
================================================
image-path      : /Users/.jon/jon.sparsebundle
image-alias     : /Users/.jon/jon.sparsebundle
shadow-path     :
icon-path       : /System/Library/PrivateFrameworks/DiskImages.framework/Resources/CDiskImage.icns
image-type      : sparse bundle disk image
system-image    : TRUE
blockcount      : 1258291200
blocksize       : 512
writeable       : TRUE
autodiskmount   : TRUE
removable       : TRUE
image-encrypted : TRUE
mounting user   : root
mounting mode   : -rwx------
process ID      : 646
/dev/disk1    Apple_partition_scheme   
/dev/disk1s1    Apple_partition_map   
/dev/disk1s2    Apple_HFS    /Users/jon

You can see here that /dev/disk1 is the "physical" device associated with the DMG, and that the second partition, /dev/disk1s2 is formatted Apple_HFS and mounted at /Users/jon. Make a note of the path of this partition. It's a valid HFS+ partition, by the way; running the sleuthkit on it works just fine.

3. cd to somewhere that's outside of the mount point, e.g., /Users.

4. Run "sudo hdiutil unmount -force ", e.g., "hdiutil unmount -force /Users/jon". This unmounts the filesystem in the DMG, but still leaves the DMG open.

5. Now you can do whatever you'd like with /dev/disk1s2, or /dev/disk1. You can dd it, grep it, run recovery tools against it, whatever. You should be warned, however, that because it's a sparse image, it will be appear much bigger than what it is. But it will be decrypted.

Nota Bene: All of this information is based on my own experimentation yesterday. Please let me know if I've made any mistakes.

2010-08-09

Cross Channel Scripting (XCS) Attacks

The August issue of Communications of the ACM has an interesting article about cross channel scripting (XCS) attacks. The authors describe, among other things, how to attack NAS devices by uploading files with JavaScript filenames that then hijack admin sessions in the management web app. Cool stuff.

Peak Oil => Peak Food

Here's an AP article I saw in the Washington Post, headlined US Official: Poor nations must learn to grow food.

Y'know, the whole "teach someone to fish" argument explains this pretty well. Still, it's a bit of a reversal of long-standing U.S. policy, where we have the world's most productive farms, generate huge surpluses, and then sell/give surplus grain to other countries. The U.S. Gov't has long propped up farms through subsidies. Why wouldn't we want to keep supplying grain to other countries, then, especially in a world where we're knocking down barriers to global trade?

There are good development arguments to be made about why countries should grow their own food. However, the single-biggest reason why U.S. farms became so productive is because of the widespread use of fertilizer. Making fertilizer requires oil. In large measure, we've been eating oil.

The use of fertilizer and refinement of seeds and agricultural methods in the 20th century were a sort of Moore's Law. But the high rate of productivity improvement has probably run its course. The U.S. isn't going to be able to feed the rest of the world, especially if we're going to stop eating oil. I think this is part of that acknowledgment.

2010-08-05

DialogClass::ActivateEvent

While I labor over my post for my 3rd law of EnScript, here's an interesting problem. A former colleague of mine, Jeff, emailed me with a sample script. He wanted his user interface to be a wizard, and wanted a subsequent tab to update what it displayed based upon what the user put into the previous tab.

If the only update that needs to occur is to redisplay data from some variable, then this will happen automatically. When the user clicks "Next" on a tab, the widgets on that dialog will write their data to their corresponding variables. When the next tab is displayed, that dialog will automatically re-read its variables. If variables are being shared between widgets on the different dialogs, then everything will appear to be in sync.

However, if you need to do something more complex (a simple example being to update the text in a StaticTextClass, which doesn't have a linked variable), then you need to override DialogClass::ActivateEvent(). ActivateEvent() is called whenever a DialogClass object is displayed, i.e., has become active. There's also WindowClass::Setup(), which is called when a dialog, or a parent dialog, is first displayed because of Execute() or Wizard().

Meanwhile, CanClose() corresponds to the user clicking "Ok" or one of "Next" or "Previous". Returning false from CanClose() will cause the user to remain looking at the current dialog; the idea is that you could perform some validation in CanClose() and if it failed, call ErrorMessage() and return false.

With DialogClass/WindowClass virtual functions, it is generally necessary to call the parent function. If you don't, bad things will happen. The exception to this is CheckControls().

Below is a modified version of Jeff's test script that demonstrates how this works. Try modifying one of the ActivateEvent() functions to return false.


class MainClass {
  String test, test2;

  int  myint;

  void Main() {
    myint = 7;
    test = "text";
    test2 = "text2";
    SystemClass::ClearConsole();
    MasterDialogClass dlg(this);
    dlg.Wizard();
  }
}


class PageDialogClass1: DialogClass {
  StaticTextClass Message;

    PageDialogClass1(DialogClass parent, MainClass &m):
    DialogClass(parent, "First Page"),
    Message(this, m.test, START, START, DEFAULT, DEFAULT, READONLY)
  {
  }
}



class PageDialogClass2: DialogClass {
  StringEditClass Str_Text;
  IntEditClass    Int;

  PageDialogClass2(DialogClass parent, MainClass &m):
    DialogClass(parent, "Second Page"),
    Str_Text (this, "Text", START, START, 80, DEFAULT, 0, m.test2, 80, WindowClass::REQUIRED),
    Int(this, "Int", SAME, NEXT, 100, DEFAULT, 0, m.myint, 0, 30, 0)
  {
  }

  virtual void Setup() {
    Console.WriteLine("Page2 Setup");
    DialogClass::Setup();
  }

  virtual bool ActivateEvent() {
    Console.WriteLine("Page2 ActivateEvent");
    return DialogClass::ActivateEvent();
  }

  virtual bool CanClose() {
    Console.WriteLine("Page2 CanClose");
    return DialogClass::CanClose();
  }
}


class PageDialogClass3: DialogClass {
  MainClass       Main;
  StaticTextClass Message;
  IntEditClass    Int;


  PageDialogClass3(DialogClass parent, MainClass &m):
    DialogClass(parent, "Third Page"),
    Main = m,
    Message(this, m.test2, START, START, 80, 12, READONLY), //I want this to show the text entered on Page2   
    Int(this, "Int", SAME, NEXT, 100, DEFAULT, 0, m.myint, 0, 30, 0)
  {
  }

  virtual void Setup() {
    Console.WriteLine("Page3 Setup");
    DialogClass::Setup();
  }

  virtual bool ActivateEvent() {
    Console.WriteLine("Page3 ActivateEvent");
    Message.SetText(Main.test2);
    return DialogClass::ActivateEvent();
  }

  virtual bool CanClose() {
    Console.WriteLine("Page3 CanClose");
    return DialogClass::CanClose();
  }
}


class MasterDialogClass: DialogClass {

  PageDialogClass1          Page1;
  PageDialogClass2          Page2;
  PageDialogClass3          Page3;

  MasterDialogClass(MainClass m):
 
    DialogClass(null, ""),
    Page1(this, m),
    Page2(this, m),
    Page3(this, m)
  {}

  virtual void Setup() {
    Console.WriteLine("Master Setup");
    DialogClass::Setup();
  }

  virtual bool ActivateEvent() {
    Console.WriteLine("Master ActivateEvent");
    return DialogClass::ActivateEvent();
  }

  virtual bool CanClose() {
    Console.WriteLine("Master CanClose");
    return DialogClass::CanClose();
  }
}

2010-07-26

Rob Pike on a False Dichotomy

Rob Pike's keynote at OSCON, describing what I think is a very valid niche: the need for a modern statically typed programming language. Of course, Go is his answer, but at 8:55 in, he makes the case that folks have been set up to adopt a false dichotomy, that static typing is necessarily bad and dynamic typing is necessarily good.

2010-07-23

2nd Fundamental Law of EnScript

The Second Fundamental Law of EnScript is more complicated than the First or the Third:

All simple objects are ref-counted, but only root NodeClass objects are ref-counted. NodeClass objects in a list or tree are not ref-counted.

If you want to keep from crashing EnCase with your scripts, you must thoroughly understand this law. Unfortunately, it's a bit tricky.

What does ref-counted mean?

Like most languages in use these days, EnScript features garbage collection. When you create new objects in EnScript, those objects take up space in RAM. Rather than making you, the programmer, tell EnScript when that memory should be released, EnScript safely decides for you. It's really less like having your trash picked up once a week and much more like maid service.

An example of garbage collection in EnScript:

class MainClass {
  void Main(CaseClass c) {
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");
    NameListClass strings2 = strings;
    Console.WriteLine(strings2.Count());
  }
}


In this example, you can see that I've allocated at least one NameListClass, and have a second reference. What you don't see is me calling delete, as you would if this were C++, or free, if this were C. Folks with Java, Python, or Perl experience should be used to this.

What happens is that EnScript gets to the end of the Main() function and realizes that the NameListClass object will never be used again. It's at that time that it is destroyed and the memory is released.

Reference counting is the underlying mechanism that makes this work. The key is to understand that the strings and strings2 variables in the above code are variables that refer to a NameListClass object; they don't contain the objects themselves. If you've programmed in Java or Python, this is normal. If you're coming from a C/C++ background, references are essentially pointers that prohibit you from doing fun pointer arithmetic or things like recasting 17 to an object. In the example above, the strings reference is created and assigned the memory location of the new NameListClass object. The Parse() method is called on the object, creating several child objects. The strings2 reference is then created and assigned the memory location within the strings reference; now both references refer to the same object.

Because objects can only be manipulated through references in EnScript, EnScript can track how many references point to each object. When a new object is created, a counter variable is created along with it, and when the object's memory location is assigned to a reference, the counter is incremented. When a reference is destroyed or has another object assigned to it, it decrements the counter of the object it had been pointing to. Here's the magic part: if the counter is zero after the decrement, the reference then says, whoa, I'm the last guy out of the room so I should turn out the lights... and it destroys the object.

Java uses a more advanced form of garbage collection called "mark-and-sweep" where a background thread runs periodically and tries to figure out which objects can be destroyed. Reference counting is used in Python and can also be used in C++ via the new shared_ptr<> template in the standard library (new in the TR1 update; use Boost's shared_ptr<> if you don't have it, since they're functionally equivalent). The upside to reference-counting, though, is that it's deterministic; you know that an object will be destroyed when the last reference goes away, which allows you considerable control over memory usage.

There are a couple of advantages to references and garbage collection. The first is that you don't have to worry about managing memory. The second is that you can't hose yourself with bad pointers the way you can in C or C++. A reference is either null (dereferencing it causes a runtime error, which will kill your script but won't crash EnCase) or it refers to a valid object. That's the theory, at least...

Great, what does this have to do with "root NodeClass objects"?

In our example above, we know that strings and strings2 both refer to the same NameListClass object, and at the end of the Main() function the reference count of the object should therefore be 2. It is. What about the child objects that were created by Parse()?

class MainClass {
  void Main(CaseClass c) {
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");
    Console.WriteLine(strings.GetChild(2).Name());
    NameListClass strings2 = strings;
    Console.WriteLine(strings2.Count());
  }
}


Well, as it turns out, child objects in a NodeClass are not ref-counted. They're contained by the root NodeClass object. When the root is destroyed, so, too, are the children.

Okay... why aren't all objects reference counted?

Because. The main reason is that most everything you see in EnCase is derived from NodeClass (as pointed out in the First Law, which we'll revisit in the Third Law). Reference-counting has some overhead and, like most Windows apps, EnCase is written in C++ and can manage memory manually. So, internally, EnCase doesn't use reference counting for all those child objects, since that would be inefficient when working with millions of objects. Rather than make NodeClass work completely differently for EnScript objects than it does for EnCase's internal data, the semantics are the same.

Why do I need to keep this in mind as a Fundamental Law of EnScript?

Because this can hose you. You can write a script that crashes EnCase, killing not just your script, but any work it's done and any unsaved work in the case.

Consider this script:

class MainClass {
  typedef NameListClass[] ListArray;

  void Main(CaseClass c) {
    NameListClass found; // null reference;
    NameListClass strings = new NameListClass();
    strings.Parse("w h a t e v e r", " ");

    found = strings.Find("e"); // found points to child 5
    Console.WriteLine(found.Name());
    strings.Close();           // destroy all children


    // this code causes a lot more memory to be used
    ListArray array();
    for (unsigned int i = 0; i < 1000; ++i) {
      array.add(new NameListClass());
      array[i].Parse("g a r b a g e", " ");
    }


    if (found != null) { // totally true
      found.SetName("buffer overflow"); // can crash
      Console.WriteLine(found.Name());
    }
  }
}



Now, if you run this script... it may work; it just might print out "buffer overflow" to the console without any problems. But it's not guaranteed to work.

The problem is that found is set to refer to the fifth child of strings, and then we delete all the child objects. found, though, still thinks it's pointing to the fifth object... but it's really pointing to the memory formerly occupied by that object, and now that object could be used for something else. And since we ran through some code in the meantime that created 1,000 new lists, each with child objects, chances are high that that memory is now being used for something else.

The term of art for what happens when we call SetName() on found is "undefined behavior." We can't be sure.

If you're running a recent version of EnCase, what will probably happen is that the script will terminate and EnCase will report an error, but keep on running. If you're running an older version of EnCase, it'll probably crash and burn. Newer versions of v6 ask Windows to monitor for crashes that occur in EnScript and give EnCase a chance to recover from them. This is similar to the crash protection in the gallery view.

Even with the crash protection, though, you can still never be sure that you didn't hose everything. That's why you need to be careful when manipulating references to child NodeClass objects, and you need to understand how EnScript's memory model works.

2010-07-20

3 Laws of EnScript

The other day I stumbled across an old presentation (uploaded by Mark Morgan) that I'd created a few years ago. The presentation was my guide through a 1-3 day workshop/seminar that I led a few times for some clients, with time taken throughout for exploring example scripts and working in teams.

In the presentation, I make reference—with tongue firmly in my cheek—to "Stewart's 3 Fundamental Laws of EnScript." The three laws are:
  1. Data structures should almost always be composed [sic] from NodeClass.
  2. All simple objects are ref-counted, but only root NodeClass objects are ref-counted. NodeClass objects in a list or tree are not ref-counted. (link)
  3. Most of the EnScript classes are auto-generated by handlers from the EnCase view. WYSIWYG. (link)
I don't really explain these three laws in the presentation. Let me rectify that situation. Today, I'll cover the first law.

First Fundamental Law of EnScript
Data structures should almost always be derived from NodeClass


What this means:

class FooClass: NodeClass { // use NodeClass as parent of FooClass

  uint AnInt;

  FooClass(NodeClass parent = null):
    NodeClass(parent),
    AnInt = 7
  {}

}


The important bits are in red. This defines our own class, FooClass, declares that it inherits from NodeClass, and defines a constructor for FooClass (you are, of course, free to define other constructors). The constructor takes a reference to another NodeClass object (parent), and calls the NodeClass constructor for the object in the constructor initializer list, passing in the parent reference.

What is NodeClass and why should our own classes inherit from it?

I'm glad you asked. NodeClass is the quintessence of EnCase, the manifestation of the central metaphor: everything in EnCase is a tree. (If you haven't noticed that before, look around; everything in EnCase is a tree.) To use some software development language from the GoF book, NodeClass represents the Composite Design Pattern.

NodeClass has a lot of methods, so let's focus on the important ones:


class NodeClass {

  NodeClass(parent = null); // Constructor

  NodeClass FirstChild();
  NodeClass LastChild();
  NodeClass Next();
  NodeClass Parent();

}

What this says is that a NodeClass object can access the first item in a child list, the last item in a child list, the next item in its own list, and its parent. A picture is helpful:



This picture shows an item in green, and how it relates to other NodeClass items in the same tree. Each NodeClass object has just enough data to form a tree, if we link them together. For example, let's consider things from the perspective of "First Child" in the picture above. We'll rename it "Item", rename the old "Item" to be "Parent", blank out the other labels, and make the old links dotted lines.



Essentially, NodeClass allows you to construct lists of lists. You can see in the second image that we can have another child list, and because of the Next and Parent links, it can be linked into the overall tree.

Almost all of the API classes in EnScript inherit from NodeClass. EntryClass, BookmarkClass, NameListClass, FileSignatureClass, etc.

So, why should my classes inherit from NodeClass?

For a few reasons:
  1. foreach() and forall() loops work with NodeClass. If you have multiple objects of your class, you can form them into a list using NodeClass (by passing in the parent to the constructor, or by calling Insert() on the parent) and then iterate over the loop using foreach(). You can be extra-fancy and create trees, too, and then iterate over every item in the tree using forall().
  2. Some user interface widgets only work with NodeClass. For example, ListEditClass, TreeEditClass, and TreeTableEditClass can only display objects which inherit from NodeClass.
  3. Properties only work with NodeClass. EnScript supports a limited form of reflection in a manner, somewhat similar to Python or Java. Given an object, foo, you can use the typeof() keyword to ask for a SymbolClass object that describes the class of foo. You can then ask the SymbolClass for all properties of foo, and PropertyClass::GetProperty() and PropertyClass::SetProperty() allow you to get and set, respectively, the underlying property values of foo. That is, you can access the data indirectly, without knowing what the class is or calling the property methods directly. This is great for writing your own object-relational mapper (ORM) like Ruby on Rails' ActiveRecord, auto-serializing data out to xml, and implementing a limited form of duck-typing.
This obviously isn't a complete introduction to EnScript, or even NodeClass, but I hope it clarifies the First Law. I'll post about the Second Law in the next few days.

2010-07-15

Link files run malware

Brian Krebs has news of a new vulnerability in Windows 7, where link files on a thumb drive can execute arbitrary code when encountered by Windows Explorer. The link files themselves do not need to be clicked on by the user.

AND the malware they describe seems to target SCADA systems. I used to think that stego was a non-issue and that no one in the history of computer forensics ever came across legitimate use of it... but then there were the Russians. Similarly, I used to think that all the hype about securing SCADA systems from malware was a non-issue...

2010-07-12

Beer Can Charcoal Chimney

Later today I'm going to try my hand at barbecuing pork ribs, using Sifty/Pelaccio's recipe. The problem I've had with barbecuing using my Weber grill, though, is the part in every barbecue recipe where it says "add more fuel as needed."

How do you add a few briquettes of lit charcoal to a grill? For today, the answer is to slide it down the side, near the handle of the grill plate, where there's enough of a hole for a briquette to fit through (presumably, the ribs will be on the other side of the grill). How to get it started? That's what this post answers.

I have a normal charcoal chimney, which I love dearly. Works fast, no need for lighter fluid. However, you need to have ~10 briquettes at minimum for it to work. Given its large diameter, you don't get much of a chimney effect if you have fewer briquettes. For barbecuing a modest amount of meat on a modest grill, I need something to let me light three briquettes at a time. This is my solution:



I took a Miller Lite beer can and turned it into a charcoal mini-chimney. I used a variety of cutting/stripping pliers to chew the top off and to cut small holes around the base. I then drilled quarter-inch holes along the side, and right in the middle of the base. I left half of the tapered top on, to serve as a sort of guard when dumping out the briquettes, with the guard facing up. It is easy to hold and carry using kitchen tongs.

Does it work? The answer is: mostly.

In a test run this morning, I was able to light three briquettes successfully, and learned some lessons that I will apply this evening. Like any chimney, this one wants air, and lots of it. To get it started, I thought I'd put it on top of something, burn some newspaper underneath it, and be good. I had a terra cotta holder for those nifty mosquito coils, and tried using it.


The problem was that the mosquito coil holder couldn't draw in enough air to burn the newspaper I'd stuffed into it. A little lighter fluid fixed that problem...

After the lighter fluid burned off, though, a lot of somewhat charred newspaper remained, unburned, in the terra cotta thingy. It was blocking airflow into the chimney. Since I didn't have the grill going, I decided to take the chimney off the terra cotta pot and place it directly on the grill.

You can see there's a little bit of ash already. I let it sit on the grill and did not attempt to relight it. This worked well. 15-20 minutes later, the briquettes were ready.


Dumping them out was easy.


The can was smudged, but otherwise fine. So, I established:
  • you can create a charcoal mini-chimney with a beer can
  • with about a dozen holes in it, it seems to draw in enough air, provided it's on the proper surface
  • the can withstands the heat well enough
Some open questions remain, however. Can I get the coals started without using lighter fluid? Can I find a better surface for the chimney to sit on? What I'm going to try next is using the top of the terra cotta holder (which has holes in the center and a nice lip that fits well with the base of the can) and place it on some bricks, starting a bit of newspaper beneath it (and maybe putting some shredded paper in the chimney with the briquettes). I think the fire and then the chimney will both get sufficient airflow.

2010-07-11

Forensics and Forgery

A fascinating article in The New Yorker about Peter Paul Biro, an art dealer in Montreal who uses fingerprint analysis to authenticate disputed works of art...

...or does he? Be sure to read the whole thing.

2010-07-09

autocrap tools

Love it. Anyone ever notice that they're slow, too? In 2010, why does it take 30s to scan my system for the same old crap, every single time?

Gratz to 'gesh

I am unsurprised to see that my old partner Yogesh provided a correct answer to the latest SANS network forensics contest. Say what you will, but between Lance, Yogesh, Geoff Black, Jamie Levy, and many others, there was once an impressive collection of talent at a particular forensics company.

IBM keyboards

Unicomp bought IBM/Lexmark's keyboard factory... and they still make the same old keyboards, with the new hotness, USB.

2010-07-02

2010-07-01

looping

I want a loop construct like Python's join(). That is:


Iterator cur = start;
if (cur != end) {
  foo(*cur);
  while (++cur != end) {
    bar(*cur);
    foo(*cur);
  }
}

2010-06-17

Cyberiffic

An interesting debate regarding whether The Cyberwar has been greatly exaggerated, featuring Bruce Schneier and Admiral Mike McConnell. Intelligence Squared's poll gives the debate to McConnell and Zittrain, but it's likely tainted by ye olde selection bias.

2010-06-10

Smallthought to Twitter

Smallthought Systems has been acquired by Twitter. I met Avi Bryant very briefly (doubt he remembers little old me) at OSCON in 2006, and he struck me as one of the smartest developers I'd ever met. Good for him, and good for Twitter.

Because I can't resist

How to write like Malcolm Gladwell

2010-06-04

Don't forget "inline" and "register"

The conventional wisdom these days is that compilers are, like the Great and Powerful Oz, all-knowing, all-powerful, and all-optimizing. Instead of foolishly trying to make your code faster, you, mere mortal developer, should simply choose good algorithms, use standard libraries, and specify "-O3" to gcc.


The other day, on a lark, I decided to declare one of my functions "inline", since it was used in an inner loop of my main routine, and I also prefixed some of its local variables with the "register" keyword (I made these edits separately, so I could see whether either made a difference). My thought was, well, the compiler's probably chopping this code up six ways past Sunday, but, what the hell?, can't hurt to try. In return for the 5 minutes of time I spent doing this (including compiling and testing), I received a 20% performance improvement.

I'd already been specifying "-O3". I'd removed all the heap allocations. The code was fast enough that Shark couldn't find any obvious hot-spots. 20%.

So, after you write well-factored code, and choose good algorithms, and use profiling to find the hot-spots, and wonder where the next improvement will come from... remember your friends, "inline" and "register" and take 5 minutes to see whether they improve things for you.

2010-05-11

BOD: Bytes On Disk

My introduction to computer forensics was of the sink-or-swim variety, a 4 day training class followed by a rough tasking of "hey, kid, here are some EnScripts, make 'em better."

It didn't take me long, though, to figure out one of the primary rules of computer forensics. It is thus:

Always refer to the bytes on disk.

I'll admit I've never had to image a disk for a real case, and I never could get interested in imaging and its arcana. I've seen enough of it to know that it's harder than it sounds, often due to Murphy's Law, and there's no substitute for experience when you encounter an old drive on an old system. But, to me, my rule above feels more fundamental.

I am all for push-button forensics. Rote tasks should be automated, and expertise should be modeled in code. The results of automation are all for naught, though, if they do not refer to the relevant bytes on disk. It's a variation of citing your sources in a paper; if you can't refer back to the bytes on disk from which your results stem, it's awfully hard to verify them.

The Sausage Factory illustrated this rule rather neatly today. There was a simple bug in the script that reads results from C4P back into EnCase. It was easy to resolve, though, because the results referred back to the relevant bytes on disk. It's to Trevor's credit that he includes this information in the results. If he didn't, the fix wouldn't be so simple.

Always, always, always refer back to the Bytes On Disk.

2010-05-10

Damian Conway's Modest Proposal

http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProposal.html

A proposal to create a new language on top of C++. Unfortunately, it's co-authored by a noted Perltard, and some of the operators are... icky.

They lost me at "+:=". Also, the argument of mathematical consistency for using "=" instead of "==" is bone-headed. Maybe it's that Dave Mark was especially skilled at explaining C when I taught myself (16 years ago...), but I have never, ever been confused by "==". Since the only way their proposal could meet with success is if it was embraced by existing C++ programmers, they should have left the operators alone.

2010-05-04

Xcode with SCons

I'm not a big fan of using IDEs for project builds, but IDEs excel at source-level debugging. The scons wiki has information on how to use Apple's Xcode with scons. This works great.

I tend to agree with folks who advocate unit testing and print/log debugging over stepping through code. I've seen my share of debugger-obsessed developers (to the point of writing custom plugins for Visual Studio) who could have spent a lot less time in the debugger if they'd spent more time writing clean code with TDD. However, I've seen the opposite sort of developer: writing test after test without solving the bug. If you can narrow down where the bug is occurring, stepping through code, line by line, will often reward the patient programmer and quickly rule out the wild, paranoid theories about compiler bugs that everyone develops in the absence of hard evidence.

2010-04-28

Annoyed with C++

The bloom is off the rose. I've been loyal; Lord knows, it hasn't been easy sticking with C++ in the face of all the dynamic language bigotry (seriously, guys, I understand your slow-ass languages have their place, just don't give me a hard time about me wanting my own programs to be fast, m'kay?).

But. This is asinine:

#include

// somewhere in a function

size_t aNumber = 5; // perfectly fine
...
return std::min(1, aNumber); // compile-error

The reason this is giving me a compile error is because I'm passing mismatched types to std::min and it can't figure out the type deduction. That is to say, it thinks that "1" is an int, and only an int. The variable, on the other hand, is not an int, it's a "size_t" (AKA "unsigned long", varying from an int in size and sign). However, when the type has been declared, the compiler has no problem assigning a constant to the size_t.

The solution?

return std::min(1ul, aNumber);

Lame. I don't think you needed to do this back in the day, when compilers were worse. Of course, you did have to make the call to std::min(1, aNumber). Progress...

2010-03-27

Would Descartes have programmed in Pascal?

My man Umberto on Catholicism and the Cult of the Mac. What's fantastic is that this was written before the return of the Messiah. Can you imagine what he could write now if he chose to? iPhones as rosary beads? Vista as the decline of mainline denominations? malware and exorcism?

2010-03-25

David Foster Wallace on bumming code

A gem I stumbled upon this morning...





 

My pet theory of DFW's suicide is that he must have been researching Java programming.

2010-03-23

Twitter Feed Aggregators

Dear Lazy Web,

Here is what I need: a web service which allows me to input various twitter streams. The service then republishes these tweets in an aggregated manner, so that instead of giving me many tweets to read, I can read them in bulk, in a handful of Atom entries. Surely this exists.

Any help?

Monetization at the Point of Value

It's hard to believe that it's been six years since Jonathan Schwartz published a blog post about how Sun monetizes Java. With Sun's not-quite-untimely demise, I've been thinking back to this, especially since Schwartz was one of the first technology/business blogs I read (the venerable Cringely being numero uno).

First, the cheap shot: nowhere in his post does Schwartz say, Customer X will pay me $Y for Z, which is only possible because of Java.

(A second cheap shot: maybe you backed the wrong horse? Flame on.)

I'm not smart enough to write a complete treatise with good conclusions, but I do have some enumerated observations:
  • There's a distinction to be made between selling commodities and monetization at the point of value, but it can be made arbitrarily fine. Gasoline is a commodity but is not monetized at the point of value; if it were, you'd be charged for it incrementally as your gas tank emptied. On the other hand, you can think of gasoline as being monetized at the point of value, because you're willing to pay for it when your gas tank is empty. You can argue semantics all the live-long day.
  • Marketing, to a large extent, is about communicating clearly to customers.
  • Clear communication is predicated on the assumption that customers are likely not paying a lot of attention to you; they're paying attention to their own jobs, instead.
  • Product pricing is an extraordinarily important aspect of marketing.
  • Sun's main products, computer servers, were traditionally sold upfront. In fact, the entire industry operated this way.
  • After all, it's a collection of physical parts, with a very real cost, and what it's used for is running code, which is far more abstract and varies in value to a great degree.
  • Those collections of physical parts were increasingly commoditized.
  • Sun's downfall came about because it could not figure out how to ride the wave of server commoditization; this is ironic, as Schwartz points out his admiration for the railroads and the oil companies. It is also ironic, because Sun had way more talented engineers than anyone (although that's a double-edged sword...) and therefore had an advantage for finding ways to profit from commoditization.
  • The market of folks who want to buy servers (i.e. computation) is probably different than those who want to buy servers upfront. Amazon serves a different market than the one Sun served.
  • From disastrous experience, I've seen that customers hate having to choose between different pricing schemes for the same product. They don't want to think about it.
  • Customers also hate having risk shifted onto them. An upfront price for a server is nicer than trying to figure out the value of running a particular application for a period of time, assuming you can afford the price of the server.
  • Having a choice between upfront pricing and monetization at the point of value is not only bad marketing, it's bad management, when you factor in sales incentives. How do you fairly compensate your sales staff for both scenarios? You don't, and your sales staff wastes its time figuring out how to game the system, hurting your marketing in the process.
  • Some faction in a company likely benefits from the status quo pricing scheme, and they'll resist changes to the pricing scheme. This makes it hard to roll out a new pricing scheme for the same product.
  • If a company wants to move to monetizing at the point of value, it is probably better to roll out a new product with the new pricing, and to search out a new market. Do not confuse your existing customers.
  • Look at Apple. They didn't stop charging upfront for new computers. Instead, they rolled out the iTunes Store. Then they rolled out the iPhone, with the hefty subscription fees paid back to them from AT&T and the high-margin revenue from the App Store. Different products, different market, differing pricing. Brilliant!
  • Volume is an important aspect of the commodities business.
  • If you're going to monetize at the point of value, you'd damn well better drive large volume, no matter what.