Azure AD Research with ROADtools#

2022-07-17

In this post, I show how to inspect Graph API metadata and describe how these relate to objects in Azure AD. I share how the invaluable project ROADtools uses this metadata to represent an Azure AD tenant in the form of a relational database. I outline several approaches to analyze this database using native SQL, the sqlalchemy ORM, and the pandas data analysis library. Lastly, I introduce a proof-of-concept ROADtools plugin named ROAD2timeline that makes timeline analysis of Azure AD objects quick and easy.


Azure AD Data Model#

The Graph APIs expose the data model for Azure AD and its directory objects. The legacy Azure AD Graph API and the modern Microsoft Graph API are both OData REST APIs. OData REST APIs provide descriptive XML metadata about their endpoints, entities, relationships, URL parameters, and response types to make it easier for clients to consume. OData metadata serves the same purpose as the more recent OpenAPI specification.

In this post, I focus on the service principal data model for demonstration purposes. This helps contrast how the same Azure AD object can be represented in different paradigms. I will cover how to work with the service principal data model as:

  • OData entities

  • Tables in a SQL database

  • Python classes

  • pandas DataFrames

Each of these approaches has trade-offs, but all afford the same benefit: convenient programmatic access to low-level information about the current state of an Azure AD tenant. This information is tremendously useful for offensive, defensive, and research purposes.

Download Graph Metadata#

Warning

If you are already familiar with the basic concepts of OData, you can skip ahead to the next section.

We can download the OData metadata for the Graph API(s) in XML format using curl or any other HTTP client.

az rest --url "https://graph.windows.net/\$metadata" --output-file legacy_graph_metadata.xml
az rest --url "https://graph.microsoft.com/beta/\$metadata" --output-file modern_graph_beta_metadata.xml

We can now explore the data structures using XPath queries. Let’s take a look at how service principal is defined in the legacy Azure AD Graph API OData metadata:

import xml.etree.cElementTree as ET
import xml.dom.minidom

# Read the OData XML metadata file for
# the legacy Graph API.

graph_metadata = ET.parse("legacy_graph_metadata.xml")

# Search for `EntityType` elements in any namespace 
# with the `Name` attribute of "ServicePrincipal".
# Note that entity names are PascalCase in the
# legacy Graph, but are camelCase in the modern
# Graph metadata.

service_principal_entity = graph_metadata.find(
    './/{*}EntityType[@Name="ServicePrincipal"]'
)

print(
    xml.dom.minidom.parseString(
        ET.tostring(service_principal_entity, encoding="unicode")
    ).toprettyxml()
)
<?xml version="1.0" ?>
<ns0:EntityType xmlns:ns0="http://schemas.microsoft.com/ado/2009/11/edm" Name="ServicePrincipal" BaseType="Microsoft.DirectoryServices.DirectoryObject" OpenType="true">
	<ns0:Property Name="accountEnabled" Type="Edm.Boolean"/>
	<ns0:Property Name="addIns" Type="Collection(Microsoft.DirectoryServices.AddIn)" Nullable="false"/>
	<ns0:Property Name="alternativeNames" Type="Collection(Edm.String)" Nullable="false"/>
	<ns0:Property Name="appDisplayName" Type="Edm.String"/>
	<ns0:Property Name="appId" Type="Edm.String"/>
	<ns0:Property Name="applicationTemplateId" Type="Edm.String"/>
	<ns0:Property Name="appOwnerTenantId" Type="Edm.Guid"/>
	<ns0:Property Name="appRoleAssignmentRequired" Type="Edm.Boolean" Nullable="false"/>
	<ns0:Property Name="appRoles" Type="Collection(Microsoft.DirectoryServices.AppRole)" Nullable="false"/>
	<ns0:Property Name="displayName" Type="Edm.String"/>
	<ns0:Property Name="errorUrl" Type="Edm.String"/>
	<ns0:Property Name="homepage" Type="Edm.String"/>
	<ns0:Property Name="informationalUrls" Type="Microsoft.DirectoryServices.InformationalUrl"/>
	<ns0:Property Name="keyCredentials" Type="Collection(Microsoft.DirectoryServices.KeyCredential)" Nullable="false"/>
	<ns0:Property Name="logoutUrl" Type="Edm.String"/>
	<ns0:Property Name="notificationEmailAddresses" Type="Collection(Edm.String)" Nullable="false"/>
	<ns0:Property Name="oauth2Permissions" Type="Collection(Microsoft.DirectoryServices.OAuth2Permission)" Nullable="false"/>
	<ns0:Property Name="passwordCredentials" Type="Collection(Microsoft.DirectoryServices.PasswordCredential)" Nullable="false"/>
	<ns0:Property Name="preferredSingleSignOnMode" Type="Edm.String"/>
	<ns0:Property Name="preferredTokenSigningKeyEndDateTime" Type="Edm.DateTime"/>
	<ns0:Property Name="preferredTokenSigningKeyThumbprint" Type="Edm.String"/>
	<ns0:Property Name="publisherName" Type="Edm.String"/>
	<ns0:Property Name="replyUrls" Type="Collection(Edm.String)" Nullable="false"/>
	<ns0:Property Name="samlMetadataUrl" Type="Edm.String"/>
	<ns0:Property Name="samlSingleSignOnSettings" Type="Microsoft.DirectoryServices.SamlSingleSignOnSettings"/>
	<ns0:Property Name="servicePrincipalNames" Type="Collection(Edm.String)" Nullable="false"/>
	<ns0:Property Name="servicePrincipalType" Type="Edm.String"/>
	<ns0:Property Name="signInAudience" Type="Edm.String"/>
	<ns0:Property Name="tags" Type="Collection(Edm.String)" Nullable="false"/>
	<ns0:Property Name="tokenEncryptionKeyId" Type="Edm.Guid"/>
	<ns0:NavigationProperty Name="appRoleAssignedTo" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_appRoleAssignedTo_Microsoft_DirectoryServices_AppRoleAssignment_appRoleAssignedToPartner" ToRole="appRoleAssignedTo" FromRole="appRoleAssignedToPartner" ContainsTarget="true"/>
	<ns0:NavigationProperty Name="appRoleAssignments" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_appRoleAssignments_Microsoft_DirectoryServices_AppRoleAssignment_appRoleAssignmentsPartner" ToRole="appRoleAssignments" FromRole="appRoleAssignmentsPartner" ContainsTarget="true"/>
	<ns0:NavigationProperty Name="oauth2PermissionGrants" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_oauth2PermissionGrants_Microsoft_DirectoryServices_OAuth2PermissionGrant_oauth2PermissionGrantsPartner" ToRole="oauth2PermissionGrants" FromRole="oauth2PermissionGrantsPartner" ContainsTarget="true"/>
	<ns0:NavigationProperty Name="policies" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_policies_Microsoft_DirectoryServices_DirectoryObject_policiesPartner" ToRole="policies" FromRole="policiesPartner"/>
	<ns0:NavigationProperty Name="serviceEndpoints" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_serviceEndpoints_Microsoft_DirectoryServices_ServiceEndpoint_serviceEndpointsPartner" ToRole="serviceEndpoints" FromRole="serviceEndpointsPartner" ContainsTarget="true"/>
</ns0:EntityType>

Anatomy of a Service Principal#

An EntityType element defines the service principal object in the Azure AD data model. Entities have a primary Key and one or more named Property elements, which represent information of a specific data type. These properties are the public attributes of Azure AD objects.

For investigators, each entity and property could be a piece of evidence. Therefore, it is helpful to understand these data structures and how threat activity appears in them.

The element <Property Name="appId" Type="Edm.String"/> indicates that service principals have an appId attribute, the value of which is stored as a string. Likewise, the element <Property Name="preferredTokenSigningKeyEndDateTime" Type="Edm.DateTime"/> indicates the preferredTokenSigningKeyEndDateTime attribute on a service principal is a timestamp. Properties may also be a Collection, which means the property contains an array of values.

Named Properties

<EntityType xmlns="http://schemas.microsoft.com/ado/2009/11/edm" Name="ServicePrincipal" BaseType="Microsoft.DirectoryServices.DirectoryObject" OpenType="true">
	<Property Name="accountEnabled" Type="Edm.Boolean"/>
	<Property Name="addIns" Type="Collection(Microsoft.DirectoryServices.AddIn)" Nullable="false"/>
	<Property Name="alternativeNames" Type="Collection(Edm.String)" Nullable="false"/>
	<Property Name="appDisplayName" Type="Edm.String"/>
	<Property Name="appId" Type="Edm.String"/>
    <!-- ...snip... -->
	<Property Name="passwordCredentials" Type="Collection(Microsoft.DirectoryServices.PasswordCredential)" Nullable="false"/>
	<Property Name="preferredSingleSignOnMode" Type="Edm.String"/>
	<Property Name="preferredTokenSigningKeyEndDateTime" Type="Edm.DateTime"/>
    <!-- ...snip... -->
</EntityType>

All OData entities contain properties. Properties have names and types.

Entities can inherit properties from other entities as their BaseType. For the legacy Graph API, service principals are derived from the base type Microsoft.DirectoryServices.DirectoryObject and implement all of its properties. Importantly, service principals inherit their primary key property - named objectId - from this base type.

Inheritance

<EntityType xmlns="http://schemas.microsoft.com/ado/2009/11/edm" Name="DirectoryObject" OpenType="true">
	<Key>
		<PropertyRef Name="objectId"/>
	</Key>
	<Property Name="objectType" Type="Edm.String"/>
	<Property Name="objectId" Type="Edm.String" Nullable="false"/>
	<Property Name="deletionTimestamp" Type="Edm.DateTime"/>
    <!-- ... snip ... -->
</EntityType>

Some ServicePrincipal properties are derived from the DirectoryObject base type. Notice the deletionTimestamp property, which may have forensic significance.

Furthermore, properties can be nested using ComplexType. The passwordCredentials property on a service principal is a collection of Microsoft.DirectoryServices.PasswordCredential objects, which are complex types. If we run the XPath query .//{*}ComplexType[@Name="PasswordCredential"], we can see that the complex type PasswordCredential contains multiple properties. And once again, note that some of these properties contain timestamps.

Nested Properties

<ComplexType xmlns="http://schemas.microsoft.com/ado/2009/11/edm" Name="PasswordCredential">
	<Property Name="customKeyIdentifier" Type="Edm.Binary"/>
	<Property Name="endDate" Type="Edm.DateTime"/>
	<Property Name="keyId" Type="Edm.Guid"/>
	<Property Name="startDate" Type="Edm.DateTime"/>
	<Property Name="value" Type="Edm.String"/>
</ComplexType>

If an attacker performs technique T1098.001 - Account Manipulation: Additional Cloud Credentials to add a password credential to a service principal, it will appear in the passwordCredentials property of a ServicePrincipal object.

Lastly, NavigationProperty elements represent a relationship between an entities. For example, the appRoleAssignments navigation property will be populated when an application permission is granted to a service principal. The primary key for the service principal appears as the foreign key principalId in an AppRoleAssignment entity:

Relational Fields

<!-- ...snip... -->
<NavigationProperty Name="appRoleAssignments" Relationship="Microsoft.DirectoryServices.Microsoft_DirectoryServices_ServicePrincipal_appRoleAssignments_Microsoft_DirectoryServices_AppRoleAssignment_appRoleAssignmentsPartner" ToRole="appRoleAssignments" FromRole="appRoleAssignmentsPartner" ContainsTarget="true"/>
<!-- ...snip... -->
<EntityType xmlns="http://schemas.microsoft.com/ado/2009/11/edm" Name="AppRoleAssignment" BaseType="Microsoft.DirectoryServices.DirectoryObject" OpenType="true">
	<Property Name="creationTimestamp" Type="Edm.DateTime"/>
	<Property Name="id" Type="Edm.Guid" Nullable="false"/>
	<Property Name="principalDisplayName" Type="Edm.String"/>
	<Property Name="principalId" Type="Edm.Guid"/> <!-- servicePrincipal.objectId -->
	<Property Name="principalType" Type="Edm.String"/>
	<Property Name="resourceDisplayName" Type="Edm.String"/>
	<Property Name="resourceId" Type="Edm.Guid"/>
</EntityType>

The primary key for a ServicePrincipal is its inheirited objectId property. This relates to the principalId property on an AppRoleAssignment entity, if it has consent for an application permission.

Compare with Graph API Response#

When we call a Graph API endpoint, we can see that the response conforms to the data structures described in the OData metadata. For example, let’s query for information about service principals in a given Azure AD tenant:

# If not already logged in
az login --use-device-code

# Legacy Graph query
az rest --url "https://graph.windows.net/bf016c83-cab4-4919-98b0-ab45e3acd4b5/servicePrincipals?api-version=1.6" --query "value[0]"

# Modern Graph equivalent
#!az rest --url "https://graph.microsoft.com/beta/bf016c83-cab4-4919-98b0-ab45e3acd4b5/servicePrincipals" --query "value[0]"
{
  "accountEnabled": true,
  "addIns": [],
  "alternativeNames": [],
  "appDisplayName": "O365Account",
  "appId": "1cda9b54-9852-4a5a-96d4-c2ab174f9edf",
  "appOwnerTenantId": "f8cdef31-a31e-4b4a-93e4-5f571e91255a",
  "appRoleAssignmentRequired": false,
  "appRoles": [],
  "applicationTemplateId": null,
  "deletionTimestamp": null,
  "displayName": "O365Account",
  "errorUrl": null,
  "homepage": null,
  "informationalUrls": {
    "marketing": null,
    "privacy": null,
    "support": null,
    "termsOfService": null
  },
  "keyCredentials": [],
  "logoutUrl": null,
  "notificationEmailAddresses": [],
  "oauth2Permissions": [],
  "objectId": "b2ee120f-7070-434d-821a-163b8ee63c57",
  "objectType": "ServicePrincipal",
  "odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
  "passwordCredentials": [
    {
      "customKeyIdentifier": null,
      "endDate": "2022-10-21T20:40:45.6027957Z",
      "keyId": "d8824efe-0654-444b-a163-c16d3049380b",
      "startDate": "2021-10-21T20:40:45.6027957Z",
      "value": null
    },
    {
      "customKeyIdentifier": null,
      "endDate": "2022-10-20T14:40:06.1955755Z",
      "keyId": "9f5a00c7-664e-43f7-9f8b-bffed48d9690",
      "startDate": "2021-10-20T14:40:06.1955755Z",
      "value": null
    },
    {
      "customKeyIdentifier": null,
      "endDate": "2022-10-20T14:32:29.8531814Z",
      "keyId": "cdd04e28-7681-4022-8902-236243c0a085",
      "startDate": "2021-10-20T14:32:29.8531814Z",
      "value": null
    }
  ],
  "preferredSingleSignOnMode": null,
  "preferredTokenSigningKeyEndDateTime": null,
  "preferredTokenSigningKeyThumbprint": null,
  "publisherName": "Microsoft Services",
  "replyUrls": [],
  "samlMetadataUrl": null,
  "samlSingleSignOnSettings": null,
  "servicePrincipalNames": [
    "1cda9b54-9852-4a5a-96d4-c2ab174f9edf"
  ],
  "servicePrincipalType": "Application",
  "signInAudience": "AzureADMultipleOrgs",
  "tags": [],
  "tokenEncryptionKeyId": null
}

Notice how the keys in the JSON response align with the Property elements in the OData metadata.

This section focused on service principals, but again these concepts apply to all Azure AD objects. While the OData metadata for the Graph APIs provides a rich description of Azure AD objects and their attributes, investigators still need a way to collect the potential evidence.


ROADtools#

ROADtools is an open source Azure AD analysis framework.

ROADtools exports directory information from an Azure AD tenant using the legacy Azure AD Graph API and stores the results in a SQL database for offline analysis. ROADtools provides a web client to conveniently explore the exported Azure AD data, but you can also interact directly with its SQLite database using your preferred tools.

I have used ROADtools extensively for incident response and research purposes. It is an excellent way to take a “snapshot” of a potentially compromise Azure AD tenant for subsequent forensic analysis.

ROADtools uses the Azure AD Graph OData metadata discussed in the previous section to dynamically generate a SQL database schema. OData entities are mapped to SQL tables with their properties as columns and navigation properties define the relationships between tables. ROADtools queries the legacy Azure AD Graph and stores the results as rows in this database. By default, ROADtools will create a single SQLite database file named roadrecon.db.

Getting Started#

Let’s walk through how to use ROADtools to dump data from an Azure AD tenant. Firstly, we need to install ROADtools, if we have not already.

pip install roadtools sqlite-utils

Next, we need to authenticate to Azure AD so we can call the Graph API. Here we are using a device code flow, but ROADtools supports other forms of authentication.

roadrecon auth --device-code

Now we are ready to dump data from Azure AD. We use the roadrecon gather command to make concurrent requests to the legacy Azure AD Graph API. The responses populate the appropriate tables in the sqlite dataqbase.

roadrecon gather

Once it has finished gathering, we should have a roadrecon.db file in our current working directory.

ls -al | grep roadrecon.db
-rw-r--r--  1 vscode vscode 2797568 Jul  5 21:31 roadrecon.db

Explore roadrecon.db#

Now that ROADtools has exported data from our Azure AD tenant, we can analyze the roadrecon.db database using whatever tools make us the most comfortable. If this is your first time using ROADtools, I recommend browsing the web interface to get oriented.

I am more interested in automated ways of analyzing the Azure AD data in roadrecon.db, so here are a few programmatic approaches that ultimately provide similar outcomes:

Using Native SQL#

Since roadrecon.db is just a SQLite database file, we can use an off-the-shelf SQLite browser such as DB Browser for SQLite for interactive use. But I will show how to use the incredible sqlite-utils library to poke at the database instead.

Firstly, we can dump the schema from the ServicePrincipals table and compare it with the OData metadata we covered earlier. Notice how the columns in the table map to the properties in the OData entity, but the types are slightly different. It is common for SQL databases to represent nested data (such as an OData ComplexType) as a TEXT or JSON column. That is the case for the passwordCredentials column below.

sqlite-utils schema roadrecon.db ServicePrincipals
CREATE TABLE "ServicePrincipals" (
	"objectType" TEXT, 
	"objectId" TEXT NOT NULL, 
	"deletionTimestamp" DATETIME, 
	"accountEnabled" BOOLEAN, 
	"addIns" TEXT, 
	"alternativeNames" TEXT, 
	"appBranding" TEXT, 
	"appCategory" TEXT, 
	"appData" TEXT, 
	"appDisplayName" TEXT, 
	"appId" TEXT, 
	"applicationTemplateId" TEXT, 
	"appMetadata" TEXT, 
	"appOwnerTenantId" TEXT, 
	"appRoleAssignmentRequired" BOOLEAN, 
	"appRoles" TEXT, 
	"authenticationPolicy" TEXT, 
	"disabledByMicrosoftStatus" TEXT, 
	"displayName" TEXT, 
	"errorUrl" TEXT, 
	homepage TEXT, 
	"informationalUrls" TEXT, 
	"keyCredentials" TEXT, 
	"logoutUrl" TEXT, 
	"managedIdentityResourceId" TEXT, 
	"microsoftFirstParty" BOOLEAN, 
	"notificationEmailAddresses" TEXT, 
	"oauth2Permissions" TEXT, 
	"passwordCredentials" TEXT, 
	"preferredSingleSignOnMode" TEXT, 
	"preferredTokenSigningKeyEndDateTime" DATETIME, 
	"preferredTokenSigningKeyThumbprint" TEXT, 
	"publisherName" TEXT, 
	"replyUrls" TEXT, 
	"samlMetadataUrl" TEXT, 
	"samlSingleSignOnSettings" TEXT, 
	"servicePrincipalNames" TEXT, 
	tags TEXT, 
	"tokenEncryptionKeyId" TEXT, 
	"servicePrincipalType" TEXT, 
	"useCustomTokenSigningKey" BOOLEAN, 
	"verifiedPublisher" TEXT, 
	PRIMARY KEY ("objectId"), 
	CHECK ("accountEnabled" IN (0, 1)), 
	CHECK ("appRoleAssignmentRequired" IN (0, 1)), 
	CHECK ("microsoftFirstParty" IN (0, 1)), 
	CHECK ("useCustomTokenSigningKey" IN (0, 1))
)

Obviously, we can use SQL to query the database. Here we are just querying for a single row from service principal table.

sqlite-utils roadrecon.db "SELECT * FROM ServicePrincipals LIMIT 1;" | python3 -mjson.tool
[
    {
        "objectType": "ServicePrincipal",
        "objectId": "b2ee120f-7070-434d-821a-163b8ee63c57",
        "deletionTimestamp": null,
        "accountEnabled": 1,
        "addIns": "[]",
        "alternativeNames": "[]",
        "appBranding": null,
        "appCategory": null,
        "appData": null,
        "appDisplayName": "O365Account",
        "appId": "1cda9b54-9852-4a5a-96d4-c2ab174f9edf",
        "applicationTemplateId": null,
        "appMetadata": null,
        "appOwnerTenantId": "f8cdef31-a31e-4b4a-93e4-5f571e91255a",
        "appRoleAssignmentRequired": 0,
        "appRoles": "[]",
        "authenticationPolicy": null,
        "disabledByMicrosoftStatus": null,
        "displayName": "O365Account",
        "errorUrl": null,
        "homepage": null,
        "informationalUrls": "{\"termsOfService\": null, \"support\": null, \"privacy\": null, \"marketing\": null}",
        "keyCredentials": "[]",
        "logoutUrl": null,
        "managedIdentityResourceId": null,
        "microsoftFirstParty": 1,
        "notificationEmailAddresses": "[]",
        "oauth2Permissions": "[]",
        "passwordCredentials": "[{\"customKeyIdentifier\": null, \"endDate\": \"2022-10-21T20:40:45.6027957Z\", \"keyId\": \"d8824efe-0654-444b-a163-c16d3049380b\", \"startDate\": \"2021-10-21T20:40:45.6027957Z\", \"value\": null}, {\"customKeyIdentifier\": null, \"endDate\": \"2022-10-20T14:40:06.1955755Z\", \"keyId\": \"9f5a00c7-664e-43f7-9f8b-bffed48d9690\", \"startDate\": \"2021-10-20T14:40:06.1955755Z\", \"value\": null}, {\"customKeyIdentifier\": null, \"endDate\": \"2022-10-20T14:32:29.8531814Z\", \"keyId\": \"cdd04e28-7681-4022-8902-236243c0a085\", \"startDate\": \"2021-10-20T14:32:29.8531814Z\", \"value\": null}]",
        "preferredSingleSignOnMode": null,
        "preferredTokenSigningKeyEndDateTime": null,
        "preferredTokenSigningKeyThumbprint": null,
        "publisherName": "Microsoft Services",
        "replyUrls": "[]",
        "samlMetadataUrl": null,
        "samlSingleSignOnSettings": null,
        "servicePrincipalNames": "[\"1cda9b54-9852-4a5a-96d4-c2ab174f9edf\"]",
        "tags": "[]",
        "tokenEncryptionKeyId": null,
        "servicePrincipalType": "Application",
        "useCustomTokenSigningKey": null,
        "verifiedPublisher": "{\"displayName\": null, \"verifiedPublisherId\": null, \"addedDateTime\": null}"
    }
]

The native SQL approach to working with roadrecon.db really shines if you are very comfortable writing expressive SQL queries. SQLite also supports attaching multiple databases in a single session. This lets you easily “federate” a SQL query over multiple roadrecon.db databases, perhaps taken from different Azure AD tenants or taken from a different time period.

Using sqlalchemy ORM#

We can also query the SQLite database using the sqlalchemy Object Relational Mapping (ORM) that ROADtools created for us. ORMs include Python classes that correspond to tables in the database. This provides auto-completion support, better serialization/deserialization, and other quality-of-life improvements when interacting with the Azure AD data in an IDE. The backend for the ROADtools web interface primarily uses the ORM to interact with data in the roadrecon.db.

Just as we parsed the XML OData metadata and dumped the SQLite schema using sqlite-utils, we can do the same using the sqlalchemy ORM to list available tables and their respective columns.

import sqlalchemy
import roadtools.roadlib.metadef.database as database

# Defaults to `roadrecon.db`
engine = database.init()

# Use reflection to collect
# metadata related to SQL tables
# and their columns.
metadata = sqlalchemy.MetaData(bind=engine, schema="main")
metadata.reflect()

for table in metadata.sorted_tables:
    if table.name == "ServicePrincipals":
        print(f"\nTable: {table.name}")
        for column in table.columns:
            print(f'\t|_ {column.name} ({column.type})')
Table: ServicePrincipals
	|_ objectType (TEXT)
	|_ objectId (TEXT)
	|_ deletionTimestamp (DATETIME)
	|_ accountEnabled (BOOLEAN)
	|_ addIns (TEXT)
	|_ alternativeNames (TEXT)
	|_ appBranding (TEXT)
	|_ appCategory (TEXT)
	|_ appData (TEXT)
	|_ appDisplayName (TEXT)
	|_ appId (TEXT)
	|_ applicationTemplateId (TEXT)
	|_ appMetadata (TEXT)
	|_ appOwnerTenantId (TEXT)
	|_ appRoleAssignmentRequired (BOOLEAN)
	|_ appRoles (TEXT)
	|_ authenticationPolicy (TEXT)
	|_ disabledByMicrosoftStatus (TEXT)
	|_ displayName (TEXT)
	|_ errorUrl (TEXT)
	|_ homepage (TEXT)
	|_ informationalUrls (TEXT)
	|_ keyCredentials (TEXT)
	|_ logoutUrl (TEXT)
	|_ managedIdentityResourceId (TEXT)
	|_ microsoftFirstParty (BOOLEAN)
	|_ notificationEmailAddresses (TEXT)
	|_ oauth2Permissions (TEXT)
	|_ passwordCredentials (TEXT)
	|_ preferredSingleSignOnMode (TEXT)
	|_ preferredTokenSigningKeyEndDateTime (DATETIME)
	|_ preferredTokenSigningKeyThumbprint (TEXT)
	|_ publisherName (TEXT)
	|_ replyUrls (TEXT)
	|_ samlMetadataUrl (TEXT)
	|_ samlSingleSignOnSettings (TEXT)
	|_ servicePrincipalNames (TEXT)
	|_ tags (TEXT)
	|_ tokenEncryptionKeyId (TEXT)
	|_ servicePrincipalType (TEXT)
	|_ useCustomTokenSigningKey (BOOLEAN)
	|_ verifiedPublisher (TEXT)

Here is how to get the first row from the ServicePrincipals table using the ORM.

from roadtools.roadlib.metadef.database import ServicePrincipal

# Connect to the database
session = database.get_session(engine)

# Use ORM to query database
session.query(ServicePrincipal).first().as_dict()
{'objectType': 'ServicePrincipal',
 'objectId': 'b2ee120f-7070-434d-821a-163b8ee63c57',
 'deletionTimestamp': None,
 'accountEnabled': True,
 'addIns': [],
 'alternativeNames': [],
 'appBranding': None,
 'appCategory': None,
 'appData': None,
 'appDisplayName': 'O365Account',
 'appId': '1cda9b54-9852-4a5a-96d4-c2ab174f9edf',
 'applicationTemplateId': None,
 'appMetadata': None,
 'appOwnerTenantId': 'f8cdef31-a31e-4b4a-93e4-5f571e91255a',
 'appRoleAssignmentRequired': False,
 'appRoles': [],
 'authenticationPolicy': None,
 'disabledByMicrosoftStatus': None,
 'displayName': 'O365Account',
 'errorUrl': None,
 'homepage': None,
 'informationalUrls': {'termsOfService': None,
  'support': None,
  'privacy': None,
  'marketing': None},
 'keyCredentials': [],
 'logoutUrl': None,
 'managedIdentityResourceId': None,
 'microsoftFirstParty': True,
 'notificationEmailAddresses': [],
 'oauth2Permissions': [],
 'passwordCredentials': [{'customKeyIdentifier': None,
   'endDate': '2022-10-21T20:40:45.6027957Z',
   'keyId': 'd8824efe-0654-444b-a163-c16d3049380b',
   'startDate': '2021-10-21T20:40:45.6027957Z',
   'value': None},
  {'customKeyIdentifier': None,
   'endDate': '2022-10-20T14:40:06.1955755Z',
   'keyId': '9f5a00c7-664e-43f7-9f8b-bffed48d9690',
   'startDate': '2021-10-20T14:40:06.1955755Z',
   'value': None},
  {'customKeyIdentifier': None,
   'endDate': '2022-10-20T14:32:29.8531814Z',
   'keyId': 'cdd04e28-7681-4022-8902-236243c0a085',
   'startDate': '2021-10-20T14:32:29.8531814Z',
   'value': None}],
 'preferredSingleSignOnMode': None,
 'preferredTokenSigningKeyEndDateTime': None,
 'preferredTokenSigningKeyThumbprint': None,
 'publisherName': 'Microsoft Services',
 'replyUrls': [],
 'samlMetadataUrl': None,
 'samlSingleSignOnSettings': None,
 'servicePrincipalNames': ['1cda9b54-9852-4a5a-96d4-c2ab174f9edf'],
 'tags': [],
 'tokenEncryptionKeyId': None,
 'servicePrincipalType': 'Application',
 'useCustomTokenSigningKey': None,
 'verifiedPublisher': {'displayName': None,
  'verifiedPublisherId': None,
  'addedDateTime': None}}

I recommend the ORM approach if you aren’t comfortable writing SQL queries by hand. There are also a ton of resources out there about the sqlalchemy ORM, so it is well-documented and supported.

Using pandas DataFrames#

You can do a lot with native SQL queries and the sqlalchemy ORM, but I am most comfortable doing data analysis in Python with the pandas module. pandas DataFrames are tabular data structures containing Python objects. The pandas ecosystem is wildly popular and well-documented, so it provides additional functionality when munging and enriching the Azure AD data collected by ROADtools.

Here’s a useful snippet to dump all the tables as DataFrames:

import pandas as pd

def table_to_dataframe(
    session: sqlalchemy.orm.session.Session, 
    table: sqlalchemy.sql.schema.Table
) -> pd.DataFrame:
    """Takes a SQLAlchemy Table and returns all the rows
    in that table as a `pd.DataFrame`.
    """

    print(f"Converting table {table.name} to pd.DataFrame")

    rows = session.query(table).all()
    if rows:
        return pd.json_normalize([row._asdict() for row in rows])
    else:
        return pd.DataFrame()


tables = {
    table.name: table_to_dataframe(session, table) 
    for table in metadata.sorted_tables
}
Converting table AppRoleAssignments to pd.DataFrame
Converting table ApplicationRefs to pd.DataFrame
Converting table Applications to pd.DataFrame
Converting table AuthorizationPolicys to pd.DataFrame
Converting table Contacts to pd.DataFrame
Converting table Devices to pd.DataFrame
Converting table DirectoryRoles to pd.DataFrame
Converting table DirectorySettings to pd.DataFrame
Converting table ExtensionPropertys to pd.DataFrame
Converting table Groups to pd.DataFrame
Converting table OAuth2PermissionGrants to pd.DataFrame
Converting table Policys to pd.DataFrame
Converting table RoleAssignments to pd.DataFrame
Converting table RoleDefinitions to pd.DataFrame
Converting table ServicePrincipals to pd.DataFrame
Converting table TenantDetails to pd.DataFrame
Converting table Users to pd.DataFrame
Converting table lnk_application_owner_serviceprincipal to pd.DataFrame
Converting table lnk_application_owner_user to pd.DataFrame
Converting table lnk_device_owner to pd.DataFrame
Converting table lnk_group_member_contact to pd.DataFrame
Converting table lnk_group_member_device to pd.DataFrame
Converting table lnk_group_member_group to pd.DataFrame
Converting table lnk_group_member_serviceprincipal to pd.DataFrame
Converting table lnk_group_member_user to pd.DataFrame
Converting table lnk_role_member_group to pd.DataFrame
Converting table lnk_role_member_serviceprincipal to pd.DataFrame
Converting table lnk_role_member_user to pd.DataFrame
Converting table lnk_serviceprincipal_owner_serviceprincipal to pd.DataFrame
Converting table lnk_serviceprincipal_owner_user to pd.DataFrame

You can then access each DataFrame using the table name as the key:

service_principals_df = tables["ServicePrincipals"]

# The `.T` method flips the DataFrame axes.
# This is just for legibility.
service_principals_df.head(1).T
0
objectType ServicePrincipal
objectId b2ee120f-7070-434d-821a-163b8ee63c57
deletionTimestamp None
accountEnabled True
addIns []
alternativeNames []
appBranding None
appCategory None
appData None
appDisplayName O365Account
appId 1cda9b54-9852-4a5a-96d4-c2ab174f9edf
applicationTemplateId None
appMetadata None
appOwnerTenantId f8cdef31-a31e-4b4a-93e4-5f571e91255a
appRoleAssignmentRequired False
appRoles []
authenticationPolicy None
disabledByMicrosoftStatus None
displayName O365Account
errorUrl None
homepage None
informationalUrls {"termsOfService": null, "support": null, "pri...
keyCredentials []
logoutUrl None
managedIdentityResourceId None
microsoftFirstParty True
notificationEmailAddresses []
oauth2Permissions []
passwordCredentials [{"customKeyIdentifier": null, "endDate": "202...
preferredSingleSignOnMode None
preferredTokenSigningKeyEndDateTime None
preferredTokenSigningKeyThumbprint None
publisherName Microsoft Services
replyUrls []
samlMetadataUrl None
samlSingleSignOnSettings None
servicePrincipalNames ["1cda9b54-9852-4a5a-96d4-c2ab174f9edf"]
tags []
tokenEncryptionKeyId None
servicePrincipalType Application
useCustomTokenSigningKey None
verifiedPublisher {"displayName": null, "verifiedPublisherId": n...

I recommend the pandas approach if you are 1) more comfortable with the pandas API than SQL queries, 2) doing significant transformations on the original Azure AD object, or 3) integrating the Azure AD data with other datasets (especially for machine learning purposes).

In sum, pandas offers a deep “bag of tricks” to analyze the data collected by ROADtools. In the following section, I will show one of the neat ways we can use pandas and ROADtools to make a useful forensics tool.


Timeline Analysis with ROAD2timeline#

Tip

Check out my pull request for a new ROADtools plugin called ROAD2timeline, which generates a forensic timeline of events (inspired by tools like log2timeline/plaso) based on timestamps found in Azure AD objects.

You may have noticed that there are a lot of columns containing timestamps in roadrecon.db . We can identify these columns by reading the schema from the database:

for table in metadata.sorted_tables:
    timestamp_columns = []

    for column in table.columns:
        if type(column.type) == sqlalchemy.sql.sqltypes.DATETIME:
            timestamp_columns.append(column)

    if timestamp_columns:
        print(f"\n{table.name}")
        for column in timestamp_columns:
            print(f"|_ {column.name}")
AppRoleAssignments
|_ deletionTimestamp
|_ creationTimestamp

Applications
|_ deletionTimestamp

Contacts
|_ deletionTimestamp
|_ lastDirSyncTime

Devices
|_ deletionTimestamp
|_ approximateLastLogonTimestamp
|_ complianceExpiryTime
|_ lastDirSyncTime

DirectoryRoles
|_ deletionTimestamp

ExtensionPropertys
|_ deletionTimestamp

Groups
|_ deletionTimestamp
|_ createdDateTime
|_ expirationDateTime
|_ lastDirSyncTime
|_ renewedDateTime

OAuth2PermissionGrants
|_ expiryTime
|_ startTime

Policys
|_ deletionTimestamp

RoleDefinitions
|_ deletionTimestamp

ServicePrincipals
|_ deletionTimestamp
|_ preferredTokenSigningKeyEndDateTime

TenantDetails
|_ deletionTimestamp
|_ companyLastDirSyncTime
|_ createdDateTime

Users
|_ deletionTimestamp
|_ acceptedOn
|_ createdDateTime
|_ employeeHireDate
|_ invitedOn
|_ lastDirSyncTime
|_ lastPasswordChangeDateTime
|_ onPremisesPasswordChangeTimestamp
|_ refreshTokensValidFromDateTime
|_ userStateChangedOn

Wouldn’t it be nice if we had a way to generate a timeline from all these timestamps, similar to how log2timeline/plaso works for deadbox forensics?

I wrote a ROADtools plugin to generate these timelines. For each column containing a timestamp, the plugin makes a copy the relevant table as a pd.DataFrame using that timestamp column as a datetime index. Then the plugin concatenates all the DataFrames together and sorts by a unified datetime index. The resulting timeline contains all of the original columns from each table, plus a few additional columns prefixed with an underscore.

Users can provide message templates to populate friendly timeline entries that are easy to read. For example, if a service principal was granted consent for an application permission and this message template was provided:

AppRoleAssignments:
  creationTimestamp: App role for {resourceDisplayName} ({resourceId}) assigned to {principalType} named {principalDisplayName} ({principalId})

Then the message template would populate a creationTimestamp timeline entry that looks like this:

+----------------------------+--------------------+-------------------+---------------------------------------------------------------------------------+
| _timestamp                 | _table_name        | _timestamp_column | _message                                                                        |
+----------------------------+--------------------+-------------------+---------------------------------------------------------------------------------+
| 2021-10-21 20:35:26.819835 | AppRoleAssignments | creationTimestamp | App role for Microsoft Graph (a643a498-b308-48f5-b35f-2fbc51c2dabe) assigned to |
|                            |                    |                   | ServicePrincipal named RC-TDR - Azure AD Audit                                  |
|                            |                    |                   | (0ddc82d7-db6d-46ae-87a6-add8db5a5eba)                                          |
+----------------------------+--------------------+-------------------+---------------------------------------------------------------------------------+

ROAD2timeline comes with a default message template for most timestamps on Azure AD objects. The timeline will generate even if a message template is not available for a given Azure AD object and timestamp column. Please check out the PR for more information.

Generate Azure AD Timeline#

Warning

ROAD2timeline requires additional Python dependencies that are not included in a default installation of ROADtools. Ensure that pandas is installed in your Python environment prior to running ROAD2timeline. The pyarrow module is also required if you want to save the timeline as a parquet file.

You can use ROAD2timeline like any other ROADtools plugin. Check out the --help flag for command-line options.

roadrecon plugin road2timeline --help
usage: roadrecon plugin road2timeline [-h] [-d DATABASE] [-t TEMPLATE_FILE] [-f OUTPUT_FILE]

Timeline analysis of Azure AD objects

optional arguments:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Database file. Can be the local database name for SQLite, or an SQLAlchemy compatible URL such as postgresql+psycopg2://dirkjan@/roadtools
  -t TEMPLATE_FILE, --template-file TEMPLATE_FILE
                        File containing string templates to translate Azure AD objects into a timeline entry. Defaults to `road2timeline.yaml` in the current working directory.
  -f OUTPUT_FILE, --output-file OUTPUT_FILE
                        File to save timeline outputs. Output format determined by file extension. Supported extensions include: [csv, pickle, jsonl, parquet]

Here is how you can run the plugin to generate a timeline of Azure AD objects. This will generate a timeline in parquet format. Supported output formats include jsonl, csv, and pickle.

roadrecon plugin road2timeline -d roadrecon.db -f timeline.parquet
Timeline entry template file road2timeline.yaml not found, defaulting to built-in templates
Loading timeline entry templates from /workspaces/detect-dot-dev/temp/ROADtools/roadrecon/roadtools/roadrecon/plugins/road2timeline.yaml
Timeline saved to file /workspaces/detect-dot-dev/blog/blog/posts/timeline.parquet

Analyze Azure AD Timeline#

Here is how I would typically review the timeline generated by ROAD2timeline.

I like to use pd.Grouper on DataFrames that have a datetime index. Instead of returning one row for each timeline entry, pd.Grouper will aggregate multiple timeline entries together based on whatever frequency you pass as the freq argument, which takes a date offset string.

In the example below, freq="15T" means that the DataFrame will bucket together all timeline entries that happen within the same 15 minute period. This has the nice advantage of reducing the overall number of rows in the timeline and makes clusters of related activity stand out. Play around with different frequencies and combinations of columns to groupby.

timeline_df = pd.read_parquet("timeline.parquet")
timeline_df.groupby([
    pd.Grouper(freq="15T"), # Buckets in 15 minute intervals, optional
    "_table_name",
    "_timestamp_column",
    "_message",
])._object_id.nunique().to_frame().head(5) # Truncating for legibility
_object_id
_timestamp _table_name _timestamp_column _message
2021-10-12 16:45:00 TenantDetails createdDateTime No template found for TenantDetails.createdDateTime: Object ID - bf016c83-cab4-4919-98b0-ab45e3acd4b5 1
Users createdDateTime User willem@byrgenwerth.onmicrosoft.com was created in the directory (Cloud SID: S-1-12-1-3551930148-1137481302-3611740840-4497412993) 1
invitedOn User willem@byrgenwerth.onmicrosoft.com was invited to the directory (Cloud SID: S-1-12-1-3551930148-1137481302-3611740840-4497412993) 1
lastPasswordChangeDateTime Cloud password was changed for user willem@byrgenwerth.onmicrosoft.com (Cloud SID: S-1-12-1-3551930148-1137481302-3611740840-4497412993) 1
2021-10-12 18:15:00 Groups createdDateTime Group All Company was created by application 00000005-0000-0ff1-ce00-000000000000 ( Group Object ID: bbc58eb4-ab52-4dd4-a709-136a72d1f040, SID: None) 1

Future Efforts#

ROADtools is tremendously useful for offensive, defensive, and research purposes. The information security community should rally behind projects like ROADtools that make it easier to explore and understand the inner-workings of the Microsoft cloud.

The biggest shortcoming with ROADtools is its reliance on the legacy Azure AD Graph API, which Microsoft plans to depreciate. Furthermore, the modern Microsoft Graph API (especially the beta version) includes a treasure-trove of additional entities beyond Azure AD that are laden with security value. I believe the basic premise of ROADtools can be applied to the modern Microsoft Graph as well.

I have been actively researching several approaches that leverage the OpenAPI description of the Microsoft Graph, rather than the OData metadata. The evolution of ROADtools will likely involve projects like pydantic, datamodel-code-generator, and sqlmodel to generate Python and SQL bindings for the (comparatively) massive number of objects on the modern Graph. I look forward to sharing some of my research once I have found the most elegant solution.

Conclusion#

In this post, we covered:

  • How to inspect Graph API metadata and described how these relate to objects in Azure AD

  • How ROADtools uses the Graph metadata to represent an Azure AD tenant as the roadrecon.db SQLite database

  • Several approaches to analyzing the Azure AD data in roadrecon.db:

    • Using native SQL features and tools (such as sqlite-utils)

    • Using the sqlalchemy ORM that ROADtools provides

    • Using the pandas data analysis library

  • A proposed ROADtools plugin called ROAD2timeline, which uses pandas to read from roadrecon.db and generates a forensic timeline based on the timestamps found in Azure AD objects

If you made it this far, thank you so much for reading!