Zeekurity Zen – Part VI: Zeek File Analysis Framework

Zeekurity Zen Zeries: Zeek File Analysis Framework

This is part of the Zeekurity Zen Zeries on building a Zeek (formerly Bro) network sensor.

Overview

In our Zeek journey thus far, we’ve:

Zeek’s incredible network traffic visibility goes beyond just protocol analysis.  Using the File Analysis Framework, we can perform automatic file hashing (e.g., MD5, SHA1, SHA256), identify malicious files, and extract suspicious files to disk for forensic analysis.  These capabilities are easily some of Zeek’s most impressive and useful features.

To do this, we’ll walkthrough these steps:

  1. Enable file hashing and Team Cymru’s Malware Hash Registry lookups.
  2. Enable SHA256 hashing for all files.
  3. Understand the contents of files.log.
  4. Enable automatic file extraction of commonly exploited file types.
  5. Discuss a real world example.
  6. Troubleshoot common issues.

Enable file hashing and Team Cymru’s Malware Hash Registry lookups

  1. By default, automatic file hashing and Team Cymru’s Malware Hash Registry lookups are enabled.  To confirm this, open /opt/zeek/share/zeek/site/local.zeek and look for the following lines. Ensure they appear as below and the @load lines are not commented out (e.g., do not have a # symbol in front). Update the file if needed.
    # Enable MD5 and SHA1 hashing for all files.
    @load frameworks/files/hash-all-files
    # Detect SHA1 sums in Team Cymru's Malware Hash Registry.
    @load frameworks/files/detect-MHR

Enable SHA256 hashing for all files

  1. SHA256 hashing is not enabled by default.  We will enable this by creating a simple Zeek script.  As the zeek user, create a new file /opt/zeek/share/zeek/site/hash_sha256.zeek, add the following lines, and then save the file.
    ##! Perform SHA256 hashing on all files.
    @load base/files/hash
    event file_new(f: fa_file)
        {
        Files::add_analyzer(f, Files::ANALYZER_SHA256);
        }
  2. As the zeek user, edit /opt/zeek/share/zeek/site/local.zeek, add the following lines, and then save the file.
    # Add SHA256 hash for files
    @load hash_sha256
  3. As the zeek user, stop zeek.
    zeekctl stop
  4. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Understand files.log

  1. Take a look at your own files.log and note the types of files that are hashed.  Below is a sample files.log file in JSON format.
    {
      "ts": 1597593633.224633,
      "fuid": "FB4Sx62yaleypxnhIb",
      "tx_hosts": [
        "23.246.2.148"
      ],
      "rx_hosts": [
        "10.2.2.23"
      ],
      "conn_uids": [
        "CUgYfkjoZLP4BR8Ol"
      ],
      "source": "HTTP",
      "depth": 0,
      "analyzers": [
        "JPEG",
        "SHA1",
        "MD5",
        "SHA256"
      ],
      "mime_type": "image/jpeg",
      "duration": 0.01756000518798828,
      "local_orig": false,
      "is_orig": false,
      "seen_bytes": 58175,
      "total_bytes": 58175,
      "missing_bytes": 0,
      "overflow_bytes": 0,
      "timedout": false,
      "md5": "0671e92b0fb8ffe5724579c229a43689",
      "sha1": "e855561e88f0bc57733eafa05a9d7681d276e55a",
      "sha256": "fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4"
    }
    
  2. Let’s examine some of the key fields to better understand how we can use them to analyze files on our own network.  For a full listing, check out the official Zeek documentation.
    • fuid (e.g., FB4Sx62yaleypxnhIb): The file’s unique ID.  Note that this is not the same as the uid commonly found in other Zeek logs.
    • tx_hosts (e.g., 23.246.2.148): The host that transferred the file.
    • rx_hosts (e.g., 10.2.2.23): The host that received the file.
    • conn_uids (e.g., CUgYfkjoZLP4BR8Ol): This is equivalent to the uid or unique ID that’s used to correlate activity across conn.log and other Zeek logs.
    • source (e.g., HTTP): This indicates which protocol the file was transferred over.
    • analyzers (e.g., JPEG, SHA1, MD5, SHA256): The file analyzers used to analyze this file.
    • mime_type (e.g., image/jpeg): What Zeek believes the MIME type of the file is.
    • seen_bytes (e.g., 58175): The number of bytes that Zeek observed.
    • total_bytes (e.g., 58175): The total number of bytes that the file should be.
    • missing_bytes (e.g., 0): The number of bytes that were missing in the analysis, likely due to dropped packets.
    • overflow_bytes (e.g., 0): The number of bytes that were not analyzed either due to overlapping bytes or reassembly errors.
    • md5 (e.g., 0671e92b0fb8ffe5724579c229a43689): The MD5 hash of the file.
    • sha1 (e.g., e855561e88f0bc57733eafa05a9d7681d276e55a): The SHA1 hash of the file.
    • sha256 (e.g., fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4): The SHA256 hash of the file.

Enable automatic file extraction

  1. As the zeek user, stop Zeek if it is currently running.
    zeekctl stop
  2. Use zkg to install the file extraction package.
    zkg install zeek/hosom/file-extraction
    The following packages will be INSTALLED:
      zeek/hosom/file-extraction (2.0.3)
    Proceed? [Y/n] y
    Installing "zeek/hosom/file-extraction".
    Installed "zeek/hosom/file-extraction" (2.0.3)
    Loaded "zeek/hosom/file-extraction"
  3. Configure file extraction options by editing /opt/zeek/share/zeek/site/file-extraction/config.zeek. Below is a sample config.zeek that will set the directory to store extracted files to /opt/zeek/extracted/ and set the files we want to automatically extract to commonly exploited file types (e.g., Java, PE, Microsoft Office, and PDF).
    # All configuration must occur within this file.
    # All other files may be overwritten during upgrade
    module FileExtraction;
    # Configure where extracted files will be stored
    redef path = "/opt/zeek/extracted/";
    # Configure 'plugins' that can be loaded
    # these are shortcut modules to specify common
    # file extraction policies. Example:
    # @load ./plugins/extract-pe.bro
    @load ./plugins/extract-common-exploit-types
  4. Create the directory to save all extracted files. It must match what we set in config.zeek.
    mkdir /opt/zeek/extracted
  5. If this is your first time installing a Zeek package, edit /opt/zeek/share/zeek/site/local.zeek and add the following lines to the bottom. This will load all packages you’ve installed. You will only need to do this once.
    # Load Zeek Packages
    @load packages
  6. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Real World Example

So how could we use this in the real world? Imagine a user was sent a malicious link via their email that claimed to be this quarter’s employee bonus payouts.  The user proceeds to click on this link and immediately downloads a file.  We want to know whether the file was malicious and if so, determine what actions we can take to prevent other systems from downloading the same file.  Since we’ve got our Zeek instance automatically configured to hash all files, extract Windows PE files, and perform Team Cymru Malware Hash Registry lookups, we’re confident that we can perform a thorough analysis of the event.

  1. We’re first alerted to suspicious activity through an alert raised in notice.log. The log entry below tells us the file’s MIME type is “application/x-dosexec”, that the notice is in regards to a “TeamCymruMalwareHashRegistry::Match”, and that there’s a Team Cymru detection rate of 38%. Additionally, the notice provides a direct VirusTotal link to the suspicious file that shows virtually every scanner detecting this file as malicious.  From the detection names, we see that this is related to the WannaCry ransomware. The notice also conveniently tells us where the file originated from (149.202.220.122) and which host downloaded the file (10.2.2.23).
    {
      "ts": 1597850503.829048,
      "uid": "CO3tTx2lknzNvQe7P3",
      "id.orig_h": "10.2.2.23",
      "id.orig_p": 56197,
      "id.resp_h": "149.202.220.122",
      "id.resp_p": 80,
      "fuid": "F1sCdV2rXJ9afKdlP2",
      "file_mime_type": "application/x-dosexec",
      "file_desc": "http://s000.tinyupload.com/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
      "proto": "tcp",
      "note": "TeamCymruMalwareHashRegistry::Match",
      "msg": "Malware Hash Registry Detection rate: 38%  Last seen: 2020-06-05 08:29:39",
      "sub": "https://www.virustotal.com/en/search/?query=5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
      "src": "10.2.2.23",
      "dst": "149.202.220.122",
      "p": 80,
      "peer_descr": "worker-1-2",
      "actions": [
        "Notice::ACTION_LOG"
      ],
      "suppress_for": 3600
    }
  2. Using the uid (CO3tTx2lknzNvQe7P3) from the notice, let’s search our logs for related activity and see what comes up.  You could search for this in Splunk or use grep to search through your raw logs.  Assuming we use grep, we find related activity in conn.log, http.log, and files.log as shown below.
    • conn.log
      First, we confirm the connection metadata detailed in notice.log and observe that the file was transferred via HTTP.

      {
        "ts": 1597850493.368458,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "proto": "tcp",
        "service": "http",
        "duration": 113.54712104797363,
        "orig_bytes": 624,
        "resp_bytes": 3514699,
        "conn_state": "RSTR",
        "local_orig": true,
        "local_resp": false,
        "missed_bytes": 0,
        "history": "ShADadfr",
        "orig_pkts": 1398,
        "orig_ip_bytes": 73512,
        "resp_pkts": 2433,
        "resp_ip_bytes": 3641211
      }
    • http.log
      Next, we see that the user (10.2.2.23) made a GET request to s000.tinyupload.com to download a file.  Note the file information that Zeek includes in this log, the file’s unique ID (F1sCdV2rXJ9afKdlP2), the file’s name (bonus.exe), and the file’s MIME type (application/x-dosexec).

      {
        "ts": 1597850493.556732,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "trans_depth": 1,
        "method": "GET",
        "host": "s000.tinyupload.com",
        "uri": "/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
        "referrer": "http://s000.tinyupload.com/index.php?file_id=91645583928538055155",
        "version": "1.1",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
        "request_body_len": 0,
        "response_body_len": 3514368,
        "status_code": 200,
        "status_msg": "OK",
        "tags": [],
        "resp_fuids": [
          "F1sCdV2rXJ9afKdlP2"
        ],
        "resp_filenames": [
          "bonus.exe"
        ],
        "resp_mime_types": [
          "application/x-dosexec"
        ]
      }
    • files.log
      Finally, we again see the same file information that the http.log provided — unique ID, name, and MIME type.  But now we also see the MD5, SHA1, and SHA256 hashes of the file.  Since we’ve also enabled automatic file extraction for commonly exploited file types, we see a new field named “extracted” that tells us where Zeek extracted a copy of the file to (/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe).  Note that the filename is formatted SOURCE-fuid.  We confirm that “seen_bytes” matches “total_bytes” and that there are zero “missing_bytes”, ultimately telling us that Zeek was able to successfully analyze and fully extract the file in its entirety.

      {
        "ts": 1597850493.672357,
        "fuid": "F1sCdV2rXJ9afKdlP2",
        "tx_hosts": [
          "149.202.220.122"
        ],
        "rx_hosts": [
          "10.2.2.23"
        ],
        "conn_uids": [
          "CO3tTx2lknzNvQe7P3"
        ],
        "source": "HTTP",
        "depth": 0,
        "analyzers": [
          "SHA1",
          "EXTRACT",
          "PE",
          "MD5",
          "SHA256"
        ],
        "mime_type": "application/x-dosexec",
        "filename": "bonus.exe",
        "duration": 10.055749893188477,
        "local_orig": false,
        "is_orig": false,
        "seen_bytes": 3514368,
        "total_bytes": 3514368,
        "missing_bytes": 0,
        "overflow_bytes": 0,
        "timedout": false,
        "md5": "84c82835a5d21bbcf75a61706d8ab549",
        "sha1": "5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
        "sha256": "ed01ebfbc9eb5bbea545af4d01bf5f1071661840480439c6e5babe8e080e41aa",
        "extracted": "/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe",
        "extracted_cutoff": false
      }
  3. From here, we can use our endpoint security systems to determine if the user executed the file or examine additional Zeek logs to identify subsequent suspicious behavior.  To prevent other systems from downloading this file, we can block the identified file hashes or IP/URL in our network and endpoint security platforms.  Additionally, since we have a copy of the raw file we can perform deeper analysis and generate additional IOCs and threat intelligence, further strengthening our defenses.  Pretty cool, huh?

Troubleshooting

If you find that files aren’t properly captured in files.log or automatically extracted, there are two likely causes:

  1. You’re not actually performing full packet capture. In Part I of this series, we enabled network optimizations to ensure your sensor is performing full packet capture and not utilizing any “NIC offloading functions.”  Refer to the steps in the section titled “Enable network service and disable NIC offloading functions” and confirm they’re applied properly on your system.  Zeek will typically warn you in reporter.log if it believes that NIC offloading functions have not been disabled.
  2. You’re dropping packets. This could be due to an underpowered Zeek sensor or an overwhelmed network mirror/tap.  Make sure your Zeek sensor uses appropriately sized hardware for the traffic it’s monitoring and that your network mirror/TAP is capable of handling your network’s traffic volume.

Up Next

In Part VII of this series, we’ll look at how to analyze and gain visibility into encrypted traffic.

Related Posts

Elastic Explained: How To Create a Cluster with Docker Compose

Elastic Explained: How To Create a Cluster with Docker Compose

Overview In this guide we'll walkthrough setting up and running an externally accessible three-node Elastic cluster using Docker Compose on Ubuntu Linux 22.04 that's suitable for a home lab or developer / test environment. Our Elastic deployment will include the...

Zeekurity Zen – Part IX: How To Update Zeek

Zeekurity Zen – Part IX: How To Update Zeek

This is part of the Zeekurity Zen Zeries on building a Zeek (formerly Bro) network sensor. Overview In our Zeek journey thus far, we've: Set up Zeek to monitor some network traffic. Used Zeek Package Manager to install packages. Configured Zeek to send logs to Splunk...

Elastic Explained: How To Guides For The Elastic Stack

Elastic Explained: How To Guides For The Elastic Stack

Elastic develops the popular log analytics platform, the Elastic Stack, which supports a variety of search, observability, and security use cases through its many out of the box integrations.  It's a great platform for collecting, analyzing, and visualizing data from...

Transform Your Business & Operate at Peak Efficiency