SANS 360 Lightning Talk

I've published a write-up on the Lightbox blog of my SANS 360 talk, "Factory Forensics." You could watch the video, of course, but then you'd have to suffer through my stumbling and general awkwardness. The turnout was great, and I was humbled by the quality of the speakers.


ExecuteClass pitfall

Of the ways that EnScript can interact with other programs, using ExecuteClass to launch programs from the command-line is probably my favorite. It's simple to use, simple to debug, and doesn't leave you with an unholy mess of code, like COM. However, I recently ran into a pitfall with ExecuteClass and hopefully sharing it here will keep others from running into it or, at least, recognizing it when they're at their wits' end.

The problem? ExecuteClass becomes angry when the target program writes a lot of data to stdout or stderr, and you wouldn't like ExecuteClass when it's angry.

ExecuteClass currently has the Output property, a String, for giving you back stdout output from the application. My somewhat shaky belief is that this is updated once the application has ended—I haven't tested this in a while, though. Additionally, notice that there's no property representing stderr.

If the target application you are running generates lots of output to stdout or to stderr, you will find that the application will hang after a while. Why? Well, in the words of the illustrious Beej, "On many systems, pipes will fill up after you write about 10K to them without reading anything out." Once the target application fills up whatever pipe EnCase has connected to stdout or stderr, it will block on the next write to the stream... and EnCase will never consume from the pipe, causing the target application to hang indefinitely.

One possible workaround to this is not to call your target application, but to call cmd.exe instead, with the argument being the command you want to run, redirecting output to a file. I haven't yet tested this, but it should work. The other, of course, is to change the target application to write output to a file itself. As for the underlying win32 calls involved, I found an MSDN article with a sample program demonstrating how child processes are created and how to read data from stdout and stderr.


DFRWS 2011

I spent this last week in New Orleans at the Digital Forensic Research Workshop. DFRWS is primarily a  research workshop for academics, to which they can submit formal papers for acceptance by peer review and then eventual publication. Folks whose papers are accepted then give presentations about their work.


  • Vassil Roussev presented on the work he's done on fuzzy hashing, comparing his sdhash tool to Jesse Kornblum's venerable ssdeep in an information retrieval-type experiment. In general, sdhash does a significantly better job of identifying similar files, with the main trade-off being that hash values are not fixed-length (I haven't played with sdhash enough to know whether performance is a significant factor, either). Fuzzy hashing is a great example of applying computer science theory to forensics. More of this, please. (pdf)
  • Robert Beverly presented on work at the Naval Postgraduate School in finding artifacts of common structures used by programmers in networking code. It turns out that these ephemeral structures in RAM can often be found on disk (pagefile/hiberfil/unallocated), allowing the astute investigator to construct a partial remembrance of networks past. (pdf)
  • Michael Cohen of PyFlag fame presented on GRR. GRR stands for "GRR Rapid Response", but you can probably substitute "Google" for the G if you'd like to avoid the left-recursion. GRR is an open source tool for collecting and analyzing data from remote machines on an enterprise network, created for internal investigations at Google. It is the closest thing I've seen to my old friend ESAC, and built on a far-richer software stack. It's early days, but I'll be following development of GRR very closely and I recommend that you do as well (it's much cheaper to buy Cory a beer than to buy your overly-friendly enterprise software salesperson a new Ferrari). The one aspect of GRR I'm leery of is the use of AFF's RDF-based data model, because I think RDF is a bit abstruse for most problems in forensics (or, at least, doesn't have the pay-off you'd like in return for learning RDF, a technology only semioticians could love). (pdf)
  • Clay Shields presented on some work he's done using, essentially, document indexing for supporting enterprise investigations. By creating bag-of-word style feature vectors for documents on hosts, he can perform offline queries to help discover machines involved in an investigation. Indexing and the bag-of-words do have a host of problems that require careful consideration, but this would no doubt be a useful technique for many investigations. I'd love to see such functionality put into GRR. (pdf)
  • Christopher King and Tim Vidas from Carnegie Mellon presented on some aspects of Solid State Disks. This is a paper everyone should read carefully once it's been posted. (pdf)
  • James Okolica and Gilbert Peterson figured out how to find and extract data from the Windows clipboard in RAM images. (pdf)
  • Judson Powers from ATC-NY gave lightning talks both on MacOS X Lion's new full disk "encryption" (the quotes are needed, my fellow Mac weenie brethren) and Dropbox artifacts
  • Last but certainly not least, Ralf Brown talked about his ZipRec tool, which is able to find and recover data from corrupt DEFLATE streams (i.e., from zip files and many other compressed formats). It'll be very interesting to see how this tool develops and what other topics in forensics Ralf turns to. (pdf)

Not being an academic, I find these sorts of conferences kinda' weird. Their primary purpose is to provide a forum for academic researchers to present their work. I don't have skin in that game, and there were, frankly, some pretty boring moments for me... and, in the case of the legal panel, some pretty boring moments for everyone (a panel of class action plaintiffs attorneys, replete with hair product and suits, for an audience of bits-and-bytes forensics researchers is an odd pairing; the caliber of the audience  demands the likes of Craig Ball, Ralph Losey, Jason Baron, or other folks capable of engaging with the best forensics researchers; throwing a prosecutor into the mix would be good, too).

While my ADD-addled brain would like a more-compressed, faster-paced format (can someone organize a conference consisting of forensics lightning talks? that'd be great), DFRWS is the place to be to learn about the future of forensics. Most folks there are working on substantive problems and I had some great conversations with various researchers throughout the conference. I definitely need to start planning now so I can submit a paper or two for next year and participate a bit more actively. DEFCON and Black Hat may get the press, and DFRWS probably missed out on a few practitioners in attendance because of the scheduling conflict, but I think DFRWS is the best mash up of computer science and digital forensics I've yet seen.

The Fail Whale

Finally, an expedition was organized to The Beach, a bar on Bourbon Street that has a mechanical killer whale (instead of a mechanical bull). Life's not all about hex, code, and catching bad guys; it's also about taming wild beasts.


EnCase 7: First Impressions

I was able to play with EnCase Version 7 for the first time this past week at CEIC (which was a great conference, btw). It will take me some time to explore v7 fully and make definite conclusions about much of its functionality. Additionally, just as restaurant reviewers don't visit a new restaurant until its had some time to find its rhythm, formally reviewing v7 seems pointless until 7.02 or 7.03. Still, I do have some immediate impressions.

What I Love

Tabs. EnCase now just has a single row of tabbed windows along the top. Bookmarks? That's a tab. Search results? That's a tab. The familiar tree-table view is then contained within the tab. I've heard some griping that the user interface has changed too much, but the tabs are a break-through and I'll tell you why: your windows stay as you left them.

In previous versions of EnCase, which type of data you were looking at (entries, records, etc.) and the way you looked at data (table, report, gallery, timeline) were global settings. So if you wanted to go from the table view in Entries to the report view in Bookmarks, you had to switch views as well as the tab. Not only is it now far easier to navigate between different windows, but also the individual display settings and contexts for those windows are maintained. I took to this change like a fish to water, and I believe daily users will find it far more effective. Kudos to GSI for rethinking one of the fundamental aspects of the EnCase look-and-feel and pulling it off splendidly.

The EnScript Class Browser. If you write EnScripts, the upgrade is clear. If you depend on EnScripts in v6, you'll need to upgrade, too, because developers are not going to want to support v6 after upgrading. V7 includes a detachable version of the EnScript Types view, so you can edit your code and have the documentation in front of you at the same time. This really should have been possible long, long ago, but I'm glad it's here now. The fact that it generates a separate class browser for your own code is a nice touch.

Search Results. I haven't played around with the search results view enough to know whether it's fully baked, but it seemed darned close. The results of conditions/filters/searches/scripts now live in a separate tab, where they persist. This is similar to bookmarks, but Search Results can contain different types of data, and I believe retain a notion of what generated them (at least if generated through a search). Additionally, like everything else in v7, Search Results are persisted to disk, supposedly allowing for smoother operation with large numbers of items (as opposed to filling up RAM and dealing with the data greedily). The distinction between Bookmarks and Search Results then becomes that Bookmarks are for items you've selected for your report.

What I don't think I'll love

Reports. The new reporting functionality is certainly better than nothing, but I was disappointed nonetheless. First, looking it over, it's clear that the reporting is built similarly to ExportClass in EnScript, which offers very limited display functionality in exchange for portability between HTML and RTF. Second, as with everything else in EnCase, the metaphor is the tree. You must organize your report into chapters, sections, subsections, and so on. Third, there is actually a new report description language built into this, for you to use for customization. There are dozens of html templating languages and technologies... so why not another one?!?! 

Here's my view: this should have had hooks into ASP.Net, and RTF should have been sacrificed on the altar of ancient technologies. I don't really like ASP.Net (or any other Microsoft web technology; give me Python and Django), but it's readily available on Windows and millions of people know it. EnCase would have been open to far more tinkering in one fell swoop. As for RTF, its primary purpose in EnCase is to let you write reports in Microsoft Word. That's useful, I know, really useful even, but not nearly as useful as having first-class HTML reporting support with the ability to save out as PDF. Plus, Microsoft Word supports copying-and-pasting from a browser into Word, so that's still available. Full-fledged HTML5 support would more than make up for the loss of RTF, an unloved also-ran in the document format wars.

What's more, reporting could have been rethought from the ground up. What about something like Evernote or Onenote or TiddlyWiki? The new reporting functionality will be useful to those who want better reporting in EnCase; the Love just isn't there, though. If I need to create great reports, I'll still export the data and use some other tool/technology.

Tags. Maybe I won't love tags. I thought it was really cool you could tag a file just by clicking in the right place in the Tags column. I need to see whether this works in Records and the other views, and what the EnScript support is like. Tags are clearly a head-nod in the direction of review platform capabilities, but review needs specialized—and simplified—controls and functionality. Additionally, it's not clear to me how tags interact with Bookmarks and whether the two should be separate. Is a tag a type of bookmark? Can bookmarks be organized by tag? Is it easy to see overviews of tags? What about multi-case tag review? Will there be any kind of automatic classification done by tag? Tags beg a lot of questions.

The Death of forall(). In EnScript in v7, iterators are the new hotness and forall() is the old and busted. Or so we are told. The problem? forall() is the lone great innovation of EnScript. Recursive descent in every other programming language is a PITA; forall() makes it a joy. What's more, the iterator design in V7, while technically solid on its own, dovetails so nicely into forall() it's a shame the two aren't integrated. There's no reason that iterators could not be used with forall(), regardless of whether the data's in memory or on disk. forall() means "for all" and the extra bit of syntactic sugar guarantees that the iteration will occur successfully without worrying about termination.

In fact, more often than not, forall() maps onto Map. The body of forall() loops are often data-parallel, and with a little bit of work in the EnScript language, forall() could have become a concurrent map(). Trivial parallel processing.

That said, if ItemIteratorClass supports virtual functions, such that we can write our own iterators, then this will be a worthy trade. With our own iterators, EnScripters can do lots of things, like define reverse iterators and create generic interfaces to common routines that use different data structures.

Wait and See

EnCase V7 is hands-down the most significant upgrade to EnCase I've witnessed. It's astonishing the amount of work that's gone into it, and I'm greatly looking forward to using it. The amount of work that went into addressing the everything-in-RAM design of prior versions of EnCase can only be described as Herculean; that GSI's developers went so far beyond such back-end changes and put large amounts of time into rethinking EnCase for the new decade is heartening.

There's a lot to learn, and it will take me some time to puzzle it all out. Conclusions are for another time. I will say that, for the first time in a long time, I'm looking forward to upgrading EnCase and exploring it. I think it will be fun.


Next Generation of Hadoop

The last couple of weeks has seen some new information trickle out from Yahoo! about their efforts to improve scalability on Hadoop. (For a sense of a scale to the uninitiated, by "improve scalability" I mean, "scale beyond clusters of 4,000 servers.") Yahoo! is calling this effort Next Generation Hadoop, or, Hadoop .Next.

To paraphrase V.S. Naipaul on the New Yorker and fiction, Yahoo! knows nothing about enterprise software marketing, nothing.

To those paying attention, it's clear there's been tension between Yahoo!, Hadoop's first and most important patron, and Cloudera, the usurper. It's not hard to play armchair psychologist and speculate about the forces behind the tension, but I certainly don't know enough to comment on it intelligently. What is undeniable, though, is that release momentum hiccuped at a critical period in Hadoop's adoption by the industry at large, and that momentum has only recently been restored. Yahoo!'s recent statement about abandoning their own Hadoop distro and working to improve Apache trunk was very good news.

Still, remember that Hadoop is not yet that magical 1.0. It's changed enormously over the past year or two, and for the better. It's silly it's not considered 1.0 already, and it's clear that a 1.0 designation is coming down the pike.

THEREFORE: It makes no sense whatsoever to talk about "Next Generation Hadoop." All the technical reasons are sound, but here's what this sounds like to me: "We haven't hit 1.0 yet, but we've already come down with a terminal case of Second System Effect." Moreover, this sense of foreboding is not at all helped by the fact that the presentations and documents about Hadoop .Next have not included any information about the most important facet of a Hadoop refactoring: the API. So, not only do I have to worry about deciding between implementing Mapper or inheriting from Mapper, I'm now worried that I'll have to abandon Mapper for ConfigurableGenericJobTask and ConfigurableGenericJobTaskFactoryInterface and rewrite all my code to suit. Viva la Revolution.

I'm not really that worried. Mostly, Hadoop has been evolving according to a Teilhardian roadmap. However, it'd be fantastic if the community released 1.0 in 2011 and phased in Hadoop .Next incrementally, without making it seem so disruptive.


Fast Unique Files filter for EnCase

Related to my post Fast Hash Matching post from November and Lance's original post, here's the code to an EnCase filter for the Entries view that will show you only the first occurrence of each file by hash value:

include "GSI_Basic"

class MainClass {
  NameListClass  HashList;
  BinaryTreeClass Tree;

  MainClass() :
    Tree(HashList, new NodeStringCompareClass())
    if (SystemClass::CANCEL == SystemClass::Message(
     SystemClass::ICONINFORMATION |
     SystemClass::MBOKCANCEL, "Unique Files By Hash",
     "Note:\nFiles must be hashed prior to running this filter."))

  bool Main(EntryClass entry) {
    HashClass hash = entry.HashValue();
    if (Tree.FastFind(hash)) {
      return false;
    else {
      Tree.FastInsert(new NameListClass(null, hash), hash);
      return true;
The code in the comments on Lance's blog is close, but not quite correct, maybe due to the comment form or some such. You need to hash all files before you run this. As discussed in my earlier post, this is by no means the fastest possible way to do this, but I recently had someone ping me about needing exactly this, and it makes sense to put it up for everyone.

This filter is utterly dependent on BinaryTreeClass in GSI_Basic.EnScript, a support file that comes with EnCase and can be found at Program Files\EnCase 6\EnScript\Include\GSI_Basic.EnScript.

I also have put an ini file of the filter that you can import directly into EnCase (right-click in the filter tree, choose Import...), available here on Google Docs.

Dynamic Features in EnScript

EnScript is first and foremost a static language. It's statically typed and statically compiled, which means that EnScript figures out everything in your script before it starts to execute it, and if it cannot figure out which code should be called or finds any other syntactic problems, it generates a compilation error. It also means that EnScript executes reasonably efficiently, since the compiler has figured out which functions to call in advance, and this is done in a relatively lightweight manner.

(It should be noted that EnScript does not do much in the way of code-optimization, however, so dynamic languages that make use of just-in-time compilation and other aggressive techniques may yet execute faster than EnScript.)

Despite being squarely in the static camp of languages, EnScript does have some features which can be considered "dynamic." By dynamic, I mean that you can write an EnScript that is able to interact with the EnScript engine and script code, at least in some ways.

The dynamic features in EnScript are:
  • Typecasting
  • Class reflection
  • Property accessors
  • Subordinate execution through ProgramClass
These dynamic features make EnScript far more powerful than it initially seems. I'll be writing about these features over the next few weeks. Since typecasting is short and sweet, I'll cover that now.


Let's say you're working with a NameListClass object. Since NameListClass inherits from NodeClass, you can always treat a NameListClass object as a NodeClass object without doing anything special:

NameListClass list();
list.Parse("one two three", " ");

NodeClass nodeRef = list; // up-cast

This is an example of upcasting, where we treat a derived object as its parent type by manipulating it through a parent-type reference. EnScript knows that NameListClass inherits from NodeClass, so there's no need for you to do anything special, and this cannot possibly fail. There is no ambiguity.


What if we want to do the reverse? Say we have a function that takes a NodeClass object as a parameter, and we'd like to treat it specially if it's a NameListClass object? Here's the answer:

void foo(NodeClass node) {
  NameListClass listRef = NameListClass::TypeCast(node);
  if (listRef) {
    // ...

As it turns out, every class in EnScript has a static function named TypeCast(). TypeCast() takes an ObjectClass reference and returns a reference to an object for its particular type, e.g., NameListClass::TypeCast() returns a NameListClass object reference. Because every object inherits from ObjectClass implicitly (even NodeClass), you can pass just about anything into TypeCast(), and because NameListClass::TypeCast() returns a reference to a NameListClass object, EnScript's compiler doesn't complain about assigning the result to the listRef reference here. TypeCast() lets us bridge the gap, allowing us to treat our NodeClass as a NameListClass.

Note here that we then check listRef to see whether its null. Why? Well, what if someone called foo() and passed in an EntryClass object? An EntryClass object is not a NameListClass object, so NameListClass::TypeCast() will return null. By doing so, you can safely test your downcast. Of course, if you didn't do the check, the code would compile but you'd end up with a null reference error when someone called foo() with anything other than a NameListClass object.

One more thing about TypeCast(): Don't pass it a null reference. TypeCast() itself will generate a null reference error if you pass it null. Therefore, to be truly safe, foo() should look like this:

void foo(NodeClass node) {
  if (node) {
    NameListClass listRef = NameListClass::TypeCast(node);
    if (listRef) {
      // ...

Single Object Inheritance is the Root of All Evil

If you make your own class hierarchies, you will find yourself having to make use of TypeCast() quite a bit. Let's say you have an AnimalClass as a base class and two derived classes, DogClass and CatClass. Further, we want to have a method to check whether two AnimalClass objects are equal (...uh, whatever that means...). So, we would have to create a virtual function in AnimalClass, isEqual(), that looks like this:

class AnimalClass {
  pure bool isEqual(AnimalClass other);

and then override the function in DogClass and CatClass. Clearly, a dog is not ever going to be equal to a cat. But not all cats are equal to each other, either. We will need to check their favorite brand of kitty food, kitty litter, markings, meowing habits, and whatnot to decide two cats are equal. So, even though isEqual() receives an AnimalClass object, we'll need to treat it as a CatClass to inspect all the member variables that are specific to cats and make this determination.

class CatClass {
  String FaveWetFood, FaveDryFood, FaveLitter;

  virtual bool isEqual(AnimalClass other) {
    CatClass cat = CatClass::TypeCast(other);
    if (cat) {
      return FaveWetFood == cat.FaveWetFood
        && FaveDryFood == cat.FaveDryFood
        && FaveLitter == cat.FaveLitter;
    return false; // must be a dog or a ferret

The TypeCast() ends up being necessary because there's something common we want to do for all animals (e.g., make equality comparisons), so the functionality must go in the base class and the function signature must involve an AnimalClass object. In languages like Java, SmallTalk, and EnScript, programmers end up having to do this a lot.

In C++, which has templates, you can often make use of the Curiously Recurring Template Pattern.


sed quis custodiet ipsos custodes?

Fifty years on, Eisenhower's farewell address seems quite relevant to the times, and the readers of this blog.

A quibble:
Yet, in holding scientific research and discovery in respect, as we should, we must also be alert to the equal and opposite danger that public policy could itself become the captive of a scientific-technological elite.

In 2011, when any village idiot/realtor can comment on which NSF grants do not have merit, it is safe to say that public policy is in no danger of becoming captive to the scientific-technological elite. (By the way, Jacques Barzun's The House of Intellect is a great exploration of anti-intellectualism in the U.S. [I hope I look that good when I'm 102, and I'm unsurprised the Presbyterians played host.])

However, a core tenet of the contemporary homeland defender's creed is "we don't set policy." The scientific-technological elite, to which you, dear reader, no doubt belong, strives to uphold the Unix ideal, providing mechanism, not policy... capability, not culpability. This is a cop-out. You can't sit Grandma down in front of a bash prompt and tell her she'll be just fine with man man.

Our clients, those whom we serve, likely do not understand what it is we do, what we know, what we can do, and, most importantly, what should be done. If we in the Cyber Industrial Complex* do not act with restraint, if we do not "conduct our struggle on the high plane of dignity and discipline," then we risk legitimizing the paranoid delusionals, and the lawless actions of their defenders. "Thy will be done" may be the easy interpretation our masters desire, but when our masters are not omniscient, or lack sufficient wisdom, then, ever mindful of our own weaknesses and ever penitent, we should strive to be of good counsel. Machine learning, SNA, disinformation, and old school psyops are good tools, but only if we have good targets.

* Military Cyber Complex is maybe more apt, but less poetic.

Update: Penitence requires more than an apology, but it's a good start.

Update 2: The paper of record 

Comment: What about severing ties to Hunton & Williams? Severing ties to a tiny firm like HB Gary Federal seems like kicking a dead dog.


Empty hash values don't get sorted?

Weird thing in EnCase 6.18 I came across today: when sorting by the Hash Value field, entries without hash values (0-byte contents) seem immune to sorting, and seem to remain in a stable ordering, relative to each other. You can see this in this screenshot, where the evidence is in a logical evidence file with folders that don't have content. I don't know whether this is specific to logical evidence files, but it's somewhat annoying.