Its an exciting time to be working on Drupal.org! Content strategy is being implemented, and thus we've been evaluating and implementing modules, including:

  • Organic Groups
  • Panels
  • Page Manager
  • Panelizer
  • Flag 3.x (already implemented for spam)

There are numerous reasons why we're looking at using these modules, but one of them is because we're able to see them in action on events.drupal.org. Panelizer was being called out as one of the least performant of these modules. This was due to a slow query that was observed at Drupalcon Barcelona:

SELECT * FROM {panelizer_entity} WHERE entity_type = :entity_type AND entity_id IN (:ids)

For each entity that gets loaded, panelizer needs to grab the associated panelizer_entity ID and load it up for preperation to view. While this isn't a problem for a few nodes, it doesn't scale when you're querying 100s or thousands of them:

2015.09.23.mysql-slow.gz:2015-09-23T09:41:14.147976+00:00 db5 mysql-slow: EXPLAIN SELECT * FROM panelizer_entity WHERE entity_type = 'node' AND entity_id IN ('171', '177', '179', '181', '183', '185', '189', '193', '199', '201', '209', '211', '213', '215', '217', '219', '221', '225', '227', '229', '233', '235', '239', '243', '245', '247', '249', '251', '253', '255', '277', '279', '281', '287', '289', '291', '293', '295', '297', '301', '303', '305', '307', '309', '311', '313', '317', '319', '321', '323', '325', '327', '335', '337', '343', '347', '349', '351', '353', '359', '363', '367', '381', '385', '389', '391', '397', '399', '403', '405', '407', '409', '411', '413', '415', '417', '419', '421', '423', '427', '429', '431', '433', '435', '437', '439', '441', '443', '449', '451', '453', '455', '459', '461', '463', '465', '467', '469', '471', '473', '475', '477', '479', '481', '483', '485', '489', '495', '497', '499', '501', '503', '505', '507', '509', '511', '513', '515', '517', '519', '521', '523', '525', '527', '529', '531', '533', '535', '537', '539', '541', '543', '545', '547', '549', '551', '553', '555', '557', '559', '561', '563', '565', '567', '569', '571', '573', '575', '577', '579', '583', '585', '589', '591', '593', '595', '597', '599', '601', '603', '605', '607', '609', '611', '613', '615', '617', '619', '623', '631', '633', '635', '637', '639', '641', '643', '645', '647', '649', '651', '653', '655', '657', '659', '661', '663', '665', '667', '669', '671', '673', '675', '677', '679', '681', '683', '685', '687', '689', '691', '693', '695', '697', '699', '701', '705', '707', '709', '711', '713', '715', '717', '719', '721', '723', '725', '727', '729', '731', '733', '735', '737', '739', '741', '743', '745', '747', '749', '751', '753', '755', '757', '759', '761', '763', '765', '767', '769', '771', '773', '775', '777', '779', '781', '783', '785', '787', '789', '791', '793', '795', '797', '799', '801', '803', '805', '807', '809', '811', '813', '815', '817', '819', '821', '823', '825', '827', '829', '831', '833', '835', '837', '839',

Additionally, there is this issue: https://www.drupal.org/node/2023703 -- this discusses the potential issues around loading the panelizer settings and some solutions, including adding a lazy loader to get the entity but not load up all the settings until say, view time.

It is apparent that we have an issue on our hand. But was it panelizer?

First -- I would suggest watching Ezra Gildesgame's talk on XHprof. It will give you some ideas about how to troubleshoot performance issues.

Secondly, I'd suggest getting a monitoring tool to average your performance over time. This is also possible to do with XHprof, but thats for another post.

At the Association, we use New Relic for server monitoring. While the full version can give you some really detailed results, our version gives us an overview. Here is a snapshot of what was happening when panelizer was being suspected as slow:

When we take a look at the modules that are being executed, it gives us a little more information about what modules are going slow:

 

If you don't have New Relic, you can still get this data by setting up XHprof and going to the affected page. It should give you similar results. One nice feature of New Relic is that it gives you XHprof (ish) results for past performance problems. Unless you're profiling every page (which is not recommended for production), you won't have a snapshot result of what happened when the site was going slow.

Looking at the New Relic logs and the query, we know three things:

  1. This issue is happening on node_add
  2. This issue is affecting nodes that are related to sessions only. (If you pop in those node ids in the query to events.drupal.org/node/XXX)
  3. Entityreference module load time correlates to the database slow query time.

So what things could cause multiple session nodes to occur?

  • View of many nodes, like a schedule
  • field or panel on a node/add page that is making a query of entities
  • Session counts (although that would be quite unperformant, loading a node just to get its count.. but I've seen worse things happen)
  • an entity reference field

By looking at the New Relic logs, there isn't any reference to the view module causing a problem. This is also during node_add, which doesn't call any views. While cod_sponsors or products might be an (separate) issue, they their peak loads are not occuring at the same time as the db query. That leaves us with entity reference. This hits as the top module at the same time as the panelizer query is being executed.

What node/add pages are using entity reference for sessions?

Only one. Session Evaluations.

Here we are using entityreference prepopulate, which shows the session thats being evaluated. But, there is a problem with this...

So there is the issue. Why didn't this occur in LA? because there were half as many sessions. Its also why we never saw it in any previous drupalcon, because the sites were singular so performance issues like this wouldn't ever have enough content to show its face.

Conclusion / Take aways

  • While there might be some architecture tweaks that could be done to Panelizer, it was not the key issue. 
  • The issue is the combination of modules, exasperated by the amount of content. Entityreference + Entityreference Prepopulate (Disabled select field) + Panelizer + Two cons worth of content = Problem.
  • Don't assume a module is slow or unperformant from a single query.
  • When an issue is found, don't guess, investigate.
  • XHProf and New Relic are essential for large drupal sites.

A simple tweak to the entity reference field has fixed the problem for now. Before the next con, we'll implement some better methods of showing the title so this doesn't occur again. In fact there is already an issue for it.

Of course, this is just one potential issue - but this method demonstrates that a monitoring tool like New Relic plus performance profiling tools like XHProf can allow you to find unexpected interactions between modules that may result in performance problems at scale. Luckily, our fixes translate into fixes for the whole community. This means not only a better drupal.org, but a better Drupal product.

Comments

ptsimard’s picture

Interesting! So what was the actual tweak to the field that fixed the issue?

japerry’s picture

We had to disable entity prepopulate from showing the (disabled) title. Instead we're going to use a form alter to show just the value of the prepopulated node. It might be a good feature to get into the entityreference prepopulate itself.

amitsedaiz’s picture

Wouldn't it be better if we use autocomplete widget instead of select list for entityreference field? This also works with prepopulate as would not load all the options like select list.

temkin’s picture

Great article, thanks for sharing you experience and findings!

We just went through the same process a couple of weeks ago with one of the sites and also discovered Entity Reference field to be the source of the performance issue. The underlying problem was in Entity module that is used by Entity Reference field to generate a list of options for a select list. It appeared that Entity module does node_load() for each item in a list. My colleague implemented a patch that decreased page load time from 30-45s to 3s. Here are more details - https://www.drupal.org/node/2636332.

Hope it will be helpful for others as well.