This paper explores how a marketing tool can be used in forensic analysis. It offers an in-depth look at the internal workings of Google Analytics™ cookies, and compares and contrasts them with standard HTTP cookies. The focus then turns to the significant forensic implications of this often-overlooked artifact.
Over the years, cookies have been overlooked in forensic examinations. For the most part, cookies were used to show that a user account had accessed a Web site. Since no set structure for cookies existed, determining the content’s meaning was problematic. With the advent of Google Analytics (GA) cookies, that has changed. GA cookies use a set, documented structure that enables a forensic investigator to obtain useful information about a computer’s user.
This paper will cover GA cookies in detail by comparing and contrasting them with traditional cookies. The internal structure of GA cookies and the relevance to forensic examinations will be described.
Sessions, for the purposes of this paper, are active open Web pages. Cookies can be in plain text or encoded to obfuscate credentials. However, credential obfuscation is not always used. In 2009, Azim Poonawla, a security researcher, discovered that Facebook™ was sending authentication cookies in plain text. These cookies could be sniffed from a network and used to hijack a user’s authenticated session.1
The types of information tracked locally are a Web site’s top level URL, dates and times last visited, expiration date, and the number of times the site was visited. When a cookie expires, it is removed by the system and replaced by a new cookie on a subsequent visit to the site. It is important to note that the Web browser has to be running to remove expired cookies. If there is no expiration date, the cookie will be removed at the end of the session.
The site visits are based on when the cookie was first placed on the system and how many times the site was visited or reloaded. Indicators on how the user navigated to the site are not always available. Since there is no consistency to the format of the cookies contents, they can even vary for the same Web site.
|Common Web Browsers|
|Internet Explorer||Stores all cookies for a site in a text (.txt) file named for the site, and tracks them in an index.dat file.|
|Safari||Stores all cookies in an XML formatted file, Cookies.plist.|
|Firefox||Stores all cookies as records in a SQLite database, cookies.sqlite.|
|Chrome||Stores all cookies as records in a SQLite database, Cookies.|
A user can determine what types of cookies his Web browser will accept via the options for his browser. Typical options offer the ability to restrict first- and third-party cookies. First-party cookies are from the same URL that was visited by the user. Third-party cookies are set for a domain other than the site visited. These cookies are generally used by marketing firms to track a user’s browsing history and patterns to effectively advertise their products. These third-party cookies are sometimes referred to as malicious cookies and are often marked as such by anti-virus software.
What are GA Cookies2
Types of GA Cookies
The different types of GA cookies are briefly described in Table 1.
Click for Larger Image
All expiration values can be changed by the site administrator.
The individual parts of the __utma, __utmb, __utmc, __utmz cookies need to be closely examined to get a better understanding of their contents. Each cookie has a value field that contains information of investigative importance.
The __utma value field utilizes the following structure:5
<domain hash>.<visitor ID>.<first visit>.<previous>.<last>.<# of sessions>
Here is an example:
The domain hash identifies the domain regardless of what path on the site is navigated to. For example, example.com and example.com/some/other/directory.html would have the same domain hash, but secure.example.com and example.com would not. This number is used to track the different cookies from the same domain.
The domain hash is also consistent across systems. The visitor ID is a unique ID for each new visitor to the site. For example, Fred logs into his computer account and visits secure.example.com and then lets Mary use his login session to browse the Web. If she also navigates to secure.example.com, the visitor ID would remain the same. If the cookie was deleted, a subsequent site visit would generate a new visitor ID.
The timestamps found in the __utma value field are stored as Unix timestamps based on the local system time. A Unix timestamp is expressed in the number of seconds from 00:00:00 January 1, 1970 GMT. They are easily converted by a number of tools and Web sites. The first visit time is when the cookie was first placed on this system, and is not indicative of when the user first went to the site.
As shown in Table 1, the cookie expires after two years, or a time period set by the site administrator. The value in this instance converts to Mon, 31 Jan 2011 13:55:49 GMT. The previous visit time is the instance before the current session when the site was visited. In this case the first and previous are the same value indicating that the previous time the user visited the site was also the first. The last visit time is that of the most recent visit.
The last value, the number of sessions, is the number of new sessions created for that Web site. This number is not incremented when the site is reloaded. This is an important distinction between GA cookies and HTTP cookies.
It is also important to note that this is a persistent cookie, in that the cookie will not expire for two years from the last time the cookie was updated or a time period set by the site administrator.
The __utmb cookie is used in conjunction with the __utmc cookie for session tracking. The __utmb value field utilizes the following structure:5
<domain hash>.<pages viewed>.10.<last time>
Here is an example:
The domain hash is a similar value to that of __utma, and if the __utmb cookie is for the same domain, these values will match. The pages viewed value is the number of pages in this domain that were viewed. The last time value is the last time the page was viewed or reloaded.
It is unknown what “10” represents, nor does this value change.
As stated, this cookie works in conjunction with __utmb to track user sessions. If the cookie is present, the session is active. This cookie is only stored in RAM and can be seen utilizing some Web browser extensions or in a dump of physical memory.
The __utmc cookie only stores the domain hash and is deleted when the session expires. Here is an example of a __utmb and __utmc cookie viewed with the Web Developer Toolbar extension for Firefox™:
Here is an example of a __utmb and __utmc record from physical memory:
This cookie contains a great deal of investigative data. The __utmz cookie stores the domain hash, date/time that the cookie was last updated, number of visitor sessions, number of sources used to access the site, and several variables. The variables are separated by the “|” character.
The __utmb value field utilizes the following convention:5
<domain hash>.<last time>.<sessions>.<sources>.<variables>
Here is an example:
The first three fields were explained previously in this paper.
In GA there are several ways in which a site is accessed: organic, referral and direct.6 The method stored in the cookie is not definitive and should be treated with some skepticism, unless corroborated by other evidence.
- Organic: Visitors referred by an unpaid search engine listing, e.g. a Google.com search.
- Referral: Visitors referred by links on other Web sites. (Links that have been tagged with campaign variables won't show up as referral unless they happen to have been tagged with utm_medium=referral.)
- Direct: Visitors who visited the site by typing the URL directly into their browser. Direct can also refer to the visitors who clicked on the links from their bookmarks/favorites, untagged links within e-mails, or links from documents that don't include tracking variables (such as PDFs or Word documents).
The variables in a __utmz cookie are:
If present, utmctr can hold valuable data. It contains the keywords used to discover the target site. Here is an example for .code.google.com:
In this example, “%20” and “%22” are hex representations of ASCII values. Consulting an ASCII chart will show that 20 is a space and 22 is a double quote. Therefore the search string utilized on Google was export table “column header” sqlite manager.
Also, utmcct can provide investigative relevance. Refer to the __utmz example where it appears the users was referred to a Web site from their CSC inbox and their username was jnelsonxxxx. In this instance we would expect to see the utmcmd value as direct since it was accessed via e-mail. The link was accessed via the user’s Web mail and therefore the referral value.
Parsing the GA Cookies
With a good understanding of the common data fields in GA cookies, the rest of the data regarding them can be examined. Utilizing a SQLite database manager the cookies in Google Chrome™ and Mozilla Firefox™ can be prodded for investigative relevance.
On a Windows 7™ system the Firefox™ cookies will be located in:
This file contains all the user’s cookies associated with the Firefox™ browser. To view only the GA cookies, the following SQL query should be used:
select * from moz_cookies where name like '%utm%';
This will display all of the information stored for the GA cookies. One of the pieces of information displayed is the lastAccessed column. This value is the timestamp of the last time the site was viewed or reloaded. It is based on the number of microseconds from 00:00:00 January 1, 1970 GMT as opposed to Unix timestamps. This value can be easily converted with the following SQL query:
select datetime(lastAccessed/1000000, 'unixepoch'), host, value from moz_cookies where name like '%utm%' order by lastAccessed;
This query utilizes the SQL datetime() function to convert the timestamp and displays only the lastAccessed, host, and value fields and sorts the results by the lastAccessed value.
The same procedure used in parsing Firefox™ cookies can be used with Chrome™, with a few modifications.
On a Windows 7 system the Chrome™ cookies will be located in the SQL database:
Some of the column headers in this database are named differently than those of cookies.sqlite, but they still contain the same data. The data in the last_access_utc column requires an additional step to convert the timestamp. In this case, the timestamp is based on the number of microseconds from 00:00:00 January 1, 1601. Therefore we have to adjust the time by subtracting 1164447360 (the number of seconds from 00:00:00 January 1, 1601 to 00:00:00 January 1, 1970) after converting from microsecond to seconds. The modified SQL query follows:
select datetime(last_access_utc/1000000 - 11644473600, 'unixepoch'), host_key, value from cookies where name like '%utm%' order by last_access_utc;
On a Windows 7 system, the files containing the cookies and the index.dat file that tracks them can be found in:
The cookie files follow the naming convention:
<username>@<domain name – TLD>.txt
A cookie file for example.com and user Fred would look like:
All the cookies for this site, excluding subdomains or different paths, will be placed into this file separated by an “*” on a single line. Subdomains will have their own cookie files named with the above convention.
Cookies from different paths on the same site also will use the above convention with one caveat: a number in square brackets will be appended to the site name. This will create a new cookie file for each different path, on the same site, that has sent a cookie to the Web browser.
Here is an example of the content of a cookie (Student@csc.txt, numbers inserted for clarity):
The above information shows two GA cookies and one unrelated cookie starting at line 19. The name fields are on 1 and 10, data on 2 and 11, host on 3 and 12. Any path would be appended to the host field. The lines 4 and 13 are flags and their purpose is not known. The last four lines make up the last accessed and expiration dates for the cookie. These are 64 bit Windows FILETIME values. These timestamps are Windows specific and will not be covered in this paper. The last accessed and expiration dates for each cookie can be accesses through other means previously mentioned.
The absence of a __utmb cookie indicates that a session with csc.com had not been active in the 30 minutes prior to the browser closing.
The index.dat file that tracks the cookie files contains similar information stored in the files themselves, and therefore will not be covered.
On a Mac OS X system, Safari™ cookies will be located in a XML formatted file:
This is an example of the contents of the file:
In viewing this cookie, it is readily apparent that it is a standard __utma cookie presented in a different format. The only part that needs addressing is the Created key pair.
This timestamp is a Mac Cocoa7 timestamp based on the number of seconds from 00:00:00 January 1, 2001. To convert this value to a Unix time stamp add 978307200 (the number of seconds from 00:00:00 January 1, 1970 to 00:00:00 January 1, 2001) to the timestamp. The fractional part of the number can be discarded after rounding up.
Converted from the resulting Unix timestamp, this value would be Fri, 13 May 2011 18:48:43 GMT. This timestamp should match the first visit value in the Value key, but might be off by a minute due to rounding errors.
As this paper has shown, the forensic implications of GA cookies are tremendous. Unlike HTTP cookies, GA cookies provide the forensic examiner with an extensive amount of data on the user of a particular Web browser.
Instead of tracking every page reload, which can be misleading to an investigator, GA cookies track just the new sessions. Not only do GA cookies show the referring site, but if it was a search engine, the keywords are tracked as well. GA cookies will provide an indication of how many ways the user accessed the target site, and in some instances, how many pages were viewed.
Investigators can also gain insight on how the user may have originally found the site. Finally, the forensic examiner can determine whether the user had an active session with a target Web site before he closed his Web browser. Just as GA has been a windfall for marketers, it has proved invaluable in re-creating Web activity of a targeted user.
Jon Nelson is an employee of Computer Sciences Corporation (CSC), assigned to the Defense Cyber Investigations Training Academy (DCITA) in Linthicum, MD. He has over 13 years experience in computer forensics and criminal investigations. Mr. Nelson is an Instructor and Curriculum Developer for DCITA's Network Investigations Track with a BS in Computer Science, an Undergraduate Certificate in Computer Security, and holds the CISSP, EnCE, ACE, and GSEC certifications. Mr. Nelson has also developed and implemented various projects in C++, Perl, Ruby, PHP, HTML, and Bash.