Remote entities in Drupal 7

Thursday, October 25, 2012 - 08:00

So, you want to integrate an external source of data into your Drupal site, and you want to do so using the Entity API? Well, we did, and here are the results of our investigation.

We were approached by the Danish company KTP Data regarding a project where they needed some help integrating an external data source into a Drupal site. In short, they wanted to load external content as Drupal remote entities, while storing local field data on these entities. The actual requirements were the following:

  • Entities are always read live from a web service, no entity data is stored in the Drupal database.
  • Entities are fieldable, meaning that the basic entity properties can be complemented with any type of fields available from the Field API, from simple text fields to image galleries and geographic coordinates.
  • Entities can be listed using views. That listing should also be loading content live from the web service.

We have many examples of alternatives storage for fields (like storing them in MongoDB for example), and there’s also a good amount of documentation on how to create custom entities that are stored in the database, but I couldn’t find any mention of someone using externally stored entities with locally stored fields. What I did recall however, was a few discussions from the Fields in Core code sprint that took place in Boston the week before Christmas of 2008. There had been a good amount of discussion regarding the flexibility of the Field API, and it had been decided that fieldable entities (I’m not sure the term “entity” had been coined in this context yet) could be remote. So I knew at least that it can be done.

Warning

This article is not meant to be a comprehensive guide to integrating external entities in Drupal, but since I couldn’t find any documentation on the topic I wanted to at least share my findings. Please also note that such a direct integration is not always the best solution, and that importing external data into Drupal first rather than loading it on demand can often provide a lot of benefits.

What you need

  • A web service which lets you read individual records. Creating, updating and deleting records is also possible but depending on your case might not be available or relevant.
  • A web service which lets you list records. The possibility to specify individual fields or filter and sort the result set would of course influence what can be done.
  • Code to connect to and read from the service. This will be specific to the type of web service you use (REST, SOAP, XML-rpc, etc) and the kind of data being sent. For the sake of simplicity, I’ll assume that you have the relevant code in an include file.

Existing tools

Before we dive into custom code, it’s probably good to look first at what tools are available. Again, this list is not comprehensive.

The core entity API

Drupal core provides an API for defining custom entities. It also provide a default mechanism for loading entities, but no similar mechanism for creating/updating/deleting entities.

The contrib entity API

The entity module was created to address some of the things missing from the core entity API. It doesn’t define a separate API, but it extends the core API with additional properties, many helpful tools and an improved default controller. The entity module also includes an entity property API to complement the entity API, which makes it possible for example to have direct rules or token integration for any custom entity. Almost all all contrib modules that define custom entities make use of the improved API from the entity module, and we will depend on it as well.

EntityFieldQuery and EFQ_Views

The EntityFieldQuery class is great for getting a list of entities matching certain criterias, whether they come from entity properties or fields. What’s great about EntityFieldQuery is that it works with different field storage engines. EFQ_Views is an alternative views backend using EntityFieldQuery instead of a normal SQL query.

Unfortunately for us though, EntityFieldQuery assumes that entity properties are stored in the database, so this is not a solution we can implement directly (this should be different in Drupal 8).

Entity Construction Kit

The Entity Construction Kit (ECK) provides a user interface for defining custom entities quickly and easily. However, ECK also assumes that these custom entities will have a base table in the database, so it does not help when creating remote entities.

Custom views query plugin

The views module provides an incredible amount of flexibility, including the possibility to define new data sources that can be queried using a custom views query plugin. There are many good examples, such as the Search API which makes it possible to create views from search indexes, or the Sparql Views that makes it possible to create views by querying RDFa data. We can create such a custom views query plugin to load data from our web service.

Defining a remote entity

The starting point for defining a custom entity is to implement hook_entity_info() in your own module. This hook must return an array of entity definitions, keyed by the entity name. The various properties that are added in these entity definitions sometimes behave in a slightly different way, and some while some of them are used by the core Entity API, some others are used by the entity module’s Entity API.

<?php
/**
* Implements hook_entity_info().
*/
function mymodule_entity_info() {
 
$return = array(
   
'my_remote_entity_type' => array(
     
// First we define some basic information.
     
'label' => t('My remote entity type'),
     
'module' => mymodule,

     
// Then we add some properties that influence how our entities are treated
     
'entity keys' => array( // These keys are the name of properties of entity objects.
       
'id' => ‘my_remote_entity_id',
        '
label' => ‘my_remote_entity_name’,
      ),
      '
fieldable' => TRUE, // We want to be able to attach fields to our entity.
      '
admin ui' => array(
        '
path' => 'admin/content/ktp',
        '
controller class' => 'MyRemoteEntityUIController',
      ),
      '
base table' => NULL, // We don’t have a base table, since entities are remote.
      '
bundles' => array(
        '
my_remote_entity_type' => array( // For the sake of simplicity, we only define one bundle.
          '
label' => t('My remote entity type'),
          '
admin' => array(
            '
path' => 'admin/config/mymodule, // Field configuration pages for our entity will live at this address.
         
),
        ),
      ),

     
// Finally, we specify what part of our code will be acting on our entities, overriding the defaults. This can be done by specifying callbacks or methods on the entity controller class.
     
'controller class' => 'MyRemoteEntityController', // This is a class located in a separte include file. We’ll get into more details later on.
     
'access callback' => 'mymodule_my_remote_entity_access',
     
'uri callback' => 'mymodule_my_remote_entity_uri',
    ),
  );

  return
$return;
}
?>

Note that the callbacks you indicate should be functions defined by your module, but they’re no different for remote entities than they are for any other entity so I won’t cover them here. Let’s look at our entity controller instead, since that’s where most of the important stuff happens.

As usual when defining a custom entity controller, we extend the standard one so that we only need to specify things that work differently. In our case, this concerns loading and saving entities.

<?php
class MyRemoteEntityController extends EntityAPIController {

  public function
load($ids = array(), $conditions = array()) {
   
$entities = array();
   
// This method takes an array of IDs, but our webservice only supports loading one entity at a time.
   
foreach ($ids as $id) {
     
// This function should contain all the code to make a request to the web service and handle any errors.
     
if ($entity = remote_web_service_load($id)) {
       
// Entities must be keyed by entity ID in order for field values to be correctly attached.
       
$entities[$entity->my_remote_entity_id] = $entity;
      }
    }

    return
$entities;
  }

  public function
save($entity) {
   
// There is nothing to save for the entity itself,
    // we just save the fields.
   
field_attach_presave('my_remote_entity_type', $entity);
   
field_attach_update('my_remote_entity_type', $entity);

   
// If some entity properties can be modified, you would save them here.
   
remote_web_service_save($entity);
    
// We don’t call parent::save(), because we don’t have anything to save locally.
 
}
}
?>

A note about entity IDs

As mentioned above in a comment, entity IDs need to be single numeric values if you want your entity to be fieldable. There are cases where the remote system uses an alphanumeric hash used to identify information about published music) or a compound key to identify entities. For example many music-related services use MBIDs to identify tracks and artists. Some other services use multiple IDs to identify a specific entity, like a combination of company ID and department ID. In such cases you need to define a map from these external IDs and the internal numeric ID.

Entity properties

We already have indicated some of the main entity properties in hook_entity_info(): id and label. We can have much more properties, which can simply be attributes of our entity objects. However, we get a lot of benefits from formally defining those using hook_entity_property_info().

<?php
/**
* Implements hook_entity_property_info().
*/
function mymodule_entity_property_info() {
 
$info['my_remote_entity_type']['properties'] = array(
   
‘property_name’ => array(
     
'label' => t('Property Label'),
     
'type' => 'integer',
    ),
   
‘date_property’ => array(
     
‘label’ => t(‘A date in a different format’),
     
‘type’ => ‘date’, // A UNIX timestamp is expected.
     
‘getter callback’ => ‘_mymodule_convert_date’,
    ),

  


 
return $info;
}
?>

This simple hook lets us know what properties are available for entities of our type, what data type they have, and eventually how to set/get them using setter/getter callbacks. The typical example of a module that makes a full use of these property definitions is the rules module, but we will see later that they are also very important for us when it comes time to integrate our remote entities with views.

The setter and getter callbacks are particularly interesting when we’re dealing with remote data, as other systems often don’t have the same conventions for storing data types. For example, whereas Drupal uses UNIX timestamps to store dates, many systems use ISO timestamps instead. Setter and getter callbacks are then called automatically when needed.

Views integration

The views module provides an extensible system, where new custom query handler can be defined. This system is well-documented and relatively straightforward so I won’t go into much detail besides the fact that you need your custom query handler to populate the $view->result array with objects that have their respective entity ID and all other necessary properties. What becomes more complicated is when we want to make sure that the data that our custom query plugin retrieves from a web service is correctly identified with our custom entity definition. This step is essential, especially if we want to also have the possibility to integrate locally stored fields to the remote data.

After trying various approaches, the best solution I found was hidden in the EFQ_views module. The common point between the typical use case for EFQ_views (entities in the Drupal database and fields in MongoDB) and our situation (remote entities and fields in the Drupal database) is that entity properties and Field API fields are stored in separate systems. This means that two separate steps are needed to gather all the data.

EFQ_views itself does go through a lot of hoops to break the assumption that everything is stored in the database, but it does so in a clean way by relying on the definition of entities and entity metadata. This is where it all comes back together and we realize that formally defining all our entity properties was worth it.

The main point of interest here is in hook_views_data(). By copying EFQ_views’ implementation, all we had to do was the following:

  • Use our own prefix instead of “efq_”, since we don’t want to have a namespace conflict.
  • Rather than go through all entity types, we only define views data for our own entity types.
  • Set the “query class” attribute to use our own query handler instead of “efq_query”
  • Fix a couple places that still assume that entities are stored in the database. These resulted in a few warnings and notices, but the code was already functional.

With this system in place, we were able to create views with lists of rendered entities as well as views mixing local and remote fields. The latter respected all fields formatters and had display options for entity properties that matched their data type.

The only point where some additional “data massaging” was needed was for some more complex display plugins such as the leaflet module that were expecting raw views results in a specific structure specific to fields being joined to the entity base table rather than queried separately. Such cases can be dealt with on a case-by-case basis using hook_views_pre_render().

Conclusion

Under Drupal 7, the creation of such remote entities still requires a lot of knowledge of the Entity API and especially of its limitations. Unfortunately, there are still many parts of Drupal that are built on the assumption that basic entity metadata is stored in the database.

If you are interested in what future solutions to this problem will look like under Drupal 8, I would strongly advise keeping up with the progress reports on Karoly Negyesi’s Blog. The work being done on decoupling Drupal from a relational database has the very positive side effect of resulting in a much more consistent and stable system, which will also make a cleaner implementation of remote entities possible.

While not exactly painless, this project made it possible to push a little bit more at the boundaries of what can be done with entities. I would like to thank KTP Data for this interesting challenge.

Comments

Florian, great topic and post!

And saying "Hi" from Moscow, Russia; we really enjoyed your sessions this summer ;)

For a similar problem recently we decided to instead create a custom entity type with Entity Construction Kit and have a field for remote_id. There were also fields to match all of the properties of the external objects that we wanted to display within Drupal. Then we setup a cron job to regularly import all new records, and update changed records from the external service.

This method is only feasible if you are only working with a few tens of thousands of objects or less. Once you get above about 50,000 or so the import/update process becomes to resource intensive. But it does offer the added benefit of much less code, and additional resiliency - if your connection to the external service goes down your site stays up.

Thanks for the great write-up, Florian :-)

You'll probably be interested in http://drupal.org/node/1497374 (Switch from Field-based storage to Entity-based storage).
In short, the outcome in D8 would be that field values are stored where the entities themselves are stored...
Comments #7 to #17 discuss the case of "remote entities".

Excellent write-up, Florian. Love to read such high quality posts.

The last time I tried this (before contrib Entity API module existed/matured), I sorta ended up on the idea of writing a "RESTful" DBTNG database driver, which would translate the required SELECT, INSERT, UPDATE, and DELETE operations into REST API requests and execute them.

The underlying idea of that was to avoid the storage workarounds, and instead, "fake" the web service into the system as an actual database. That's what a web service is, right? :)

That idea sounded exciting back then, but I don't think I ever explored an implementation (can't remember right now).

Facing D8's architecture today, I'd probably revise that idea and investigate an Entity\RestStorageController instead, which essentially is what you ended up with here. :)

Perhaps we should think about turning this into an actual, generic solution.

Speaking of, there's some larger shared interest in putting the Guzzle library into Drupal 8 core, which has built-in support for web service definitions, vastly simplifying how PHP scripts can interact with a web service:
http://drupal.org/node/1447736

The other problem that you already outlined in this post is that Field API assumes serial/integer IDs all over the place. I ran into this problem with regard to UUIDs already, and I'd love, love, LOVE if someone would jump on the gun to eliminate that limitation before December 1st:
http://drupal.org/node/1726734#comment-6349694

Thanks!
sun

We've been working to build a reusable library of modules to do just this. We have fields working (using a field storage controller) and in theory any web service type can be used (SOAP, REST, you name it). Some nice caching options and a decent debug mode are included as well.

Getting entity field queries to work as well as completely remote entities is still on the todo list but so far the module works well. The queries are tricky since the web service itself would need to support a query language like OData or similar. We've been contemplating some kind of basic local queries in PHP but we haven't yet found a solution we like.

http://drupal.org/sandbox/spotzero/1529942

Hi.

Your article sounds very interesting for me. I'm actually investigating to deal with a similar problem.

I need to display informations from an external database and to attach documents, comments, etc… inside drupal. I've different WS to query external database. The first to retrieve a list, and the second to retrieve objects information based on IDs from the 1st WS result.

I think your work could help me a lot and would like to ask you to share your code, please.

You might find a sandbox module from fago a helpful example. It takes a little different approach to the Views integration (i.e. it implements it manually rather than using eqf_views). http://drupal.org/sandbox/fago/1493180

I used this as an example, thanks a lot.
It will be cool if this article is linked with a new one where the views controllers are described because I have entities but cannot see their fields in views ui and cannot use them.