Wednesday 14 March 2012

Deploying uPortal 4.0.3

I won't go here into details about configuring, compiling and other basic stuff. Main reason of this post is to give you some of my experience that you could use when deploying your own portal. While working on improvements, a lot of them have been given back to community as patches and will be included in upcoming releases of uPortal. Also best regards to Eric Dalquist (lead developer of uPortal) who helped me a lot with some performance (... and not only performance) issues. That allowed ORTUS (portal of Riga Technical University) to be one of the first (if not the first) production deployments of uPortal 4.0.x.

Required DB Schema changes for MS SQL Server

Since we are using MS SQL Server we had to disable or delete the single uniqueness index on table UP_DLM_EVALUATOR. This index should ensure that if fragment name is provided (is not null) then it must be unique. Unfortunately, MS SQL server is unable to provide such indexes since uniqueness check of 'null =? null' returns true when building index keys. This is not the case for other DBMS vendors (i.e. postgres).

Another pitfall was that we had to remove all @Transactional(readonly = true) annotations from DAO class methods. This annotation is required because selecting from text fields for Postgres requires read-only transaction. But this is a major slow-down for MS SQL Server.

Update (16.03.2012): If you're using SQL 2008 or later, you can (and should) create new filtered index to replace deleted index in order to maintain data integrity. Here's the required SQL query:

CREATE UNIQUE NONCLUSTERED INDEX [UP_DLM_FRAGMENT_NAME_UIDX] ON [dbo].[UP_DLM_EVALUATOR] ([FRAGMENT_NAME] ASC)
WHERE ([fragment_name] IS NOT NULL)

Memory configuration

uPortal 4.x uses caching a lot. This requires a decent amount of memory allocated for old generation objects. I would recommend you to dedicate around 4Gb of memory for old generation. There are guidelines stating that you should dedicate around half of memory for young generation (which did work well for us with uPortal 3.1.x) but for 4.x this won't work. Your server should have at least 8Gb RAM which should be partitioned as follows:

  • 6Gb for JVM (4Gb old and 2Gb new generations)
  • 512Mb for permanent generation - this will guarantee that in case of hot portlet deployment you won't run out of permanent generation space (note that you should also enable class unloading: -XX:+CMSClassUnloadingEnabled)
  • Leave the rest of memory for OS, other services (like nginx... which we're using in front of Tomcat) and OS caching.

With configuration above we get about 1s pause time each 8h on our server, which is not too bad but leaves a place for improvements. Full GCs can stop JVM for quite a while (especially in case of very big old generation space), hence I'm also thinking about switching to G1 garbage collector since it should lower pause times... but haven't field-tested it yet.

Cache configuration (ehcache.xml)

Another thing you should be aware about is caching. Default caches are nice, but in production environment you have to adjust your caches depending on your situation. What I've found out is that:

  • org.jasig.portal.groups.IEntity - default value TTL value is 6h. This means that after logging in your user won't get group updates next 6 hours. I found this troublesome and reduced it down to one hour and also set the idle time to half an hour.
  • org.jasig.portal.rendering.STRUCTURE_TRANSFORM - this cache rarely gets any hits (hit rate is around 0.1%), but consumes a lot of memory, hence reduced max element count in cache to 500 and set idle time to 10 minutes (600 seconds). I was also considering removing this cache at all, but since it gets some hits, then it does not hurt to leave it there.
  • org.jasig.portal.rendering.THEME_TRANSFORM - This cache is used a lot, but also uses quite a bit of memory, hence I've introduced max idle time with value of 10 minutes. With such configuration cache is usually filled with ~2k elements and hit rate is ~30-40% (for ~500 concurrent users... for more intensive use ).

Note that JConsole is your friend here. Make sure that you can connect with JConsole to your production machine since then you'll be able to adjust cache configuration of each cache instance at runtime. This is very useful.

Indexes for performance (fighting for milliseconds)

Here's a list of indexes that we're using. A lot of them also include columns so that after index seek, DB does not have to look up the value from table - column values are already returned by index. Since portal mostly issues SELECT statements, then we're using a lot of them.

CREATE INDEX UP_DLM_EVALUATOR_EVALUATOR_TYPE_FRAGMENT_NAME_IDX
  ON dbo.UP_DLM_EVALUATOR (EVALUATOR_TYPE, FRAGMENT_NAME)
CREATE INDEX UQ__UP_DLM_E__C0C7F43617C286CF
  ON dbo.UP_DLM_EVALUATOR (FRAGMENT_NAME)
CREATE INDEX UP_LOGIN_EVENT_AGGREGATE_INTERVAL_DATE_TIME_DIM_IDX
  ON dbo.UP_LOGIN_EVENT_AGGREGATE ([INTERVAL], DATE_DIMENSION_ID, TIME_DIMENSION_ID)
  INCLUDE (DURATION, LOGIN_COUNT, UNIQUE_LOGIN_COUNT, AGGREGATED_GROUP_ID)
CREATE INDEX UP_LOGIN_EVENT_AGGREGATE__UIDS_LOGIN_AGGR_ID_IDX
  ON dbo.UP_LOGIN_EVENT_AGGREGATE__UIDS (LOGIN_AGGR_ID)
  INCLUDE (uniqueUserNames)
CREATE INDEX UP_PRESON_ATTR_USER_DIR_ID_IDX
  ON dbo.UP_PERSON_ATTR (USER_DIR_ID)
  INCLUDE (id, ENTITY_VERSION, ATTR_NAME)
CREATE INDEX UP_PERSON_ATTR_VALUES_ATTR_ID_IDX
  ON dbo.UP_PERSON_ATTR_VALUES (ATTR_ID)
  INCLUDE (ATTR_VALUE, VALUE_ORDER)
CREATE UNIQUE INDEX UP_PERSON_DIR_USER_NAME_IDX
  ON dbo.UP_PERSON_DIR (USER_NAME)
  INCLUDE (USER_DIR_ID, ENTITY_VERSION, LST_PSWD_CGH_DT, ENCRPTD_PSWD)
CREATE INDEX UP_PORTAL_COOKIE_EXPIRES_IDX
  ON dbo.UP_PORTAL_COOKIES (EXPIRES)
CREATE UNIQUE INDEX UP_PORTALL_COOKIES_VALUE_IDX
  ON dbo.UP_PORTAL_COOKIES (COOKIE_VALUE)
  INCLUDE (PORTAL_COOKIE_ID, CREATED, ENTITY_VERSION, EXPIRES)
CREATE INDEX UP_PORTLET_COOKIES_PORTAL_COOKIE_ID_IDX
  ON dbo.UP_PORTLET_COOKIES (PORTAL_COOKIE_ID)
  INCLUDE (PORTLET_COOKIE_ID, COOKIE_COMMENT, COOKIE_DOMAIN, ENTITY_VERSION, EXPIRES, COOKIE_NAME, COOKIE_PATH, SECURE, COOKIE_VALUE, VERSION, portalCookie_PORTAL_COOKIE_ID)
CREATE INDEX UP_PORTLET_DEF_PARAM_PORTLET_DEF_ID_IDX
  ON dbo.UP_PORTLET_DEF_PARAM (PORLTET_DEF_ID)
CREATE INDEX UP_PORTLET_ENT_USER_ID_IDX
  ON dbo.UP_PORTLET_ENT (USER_ID)
CREATE INDEX UP_PORTLET_PREF_PORTLET_PREF_ID_IDX
  ON dbo.UP_PORTLET_PREF (PORTLET_PREF_ID)
CREATE INDEX UP_SS_USER_PREF_USER_ID_PROFILE_ID_IDX
  ON dbo.UP_SS_USER_PREF (PROFILE_ID, USER_ID)
  INCLUDE (SS_USER_PREF_ID, ENTITY_VERSION, UP_SS_DESCRIPTOR_ID)
CREATE INDEX UP_SS_USER_PREF_LAY_ATTR_SS_USER_PREF_ID_IDX
  ON dbo.UP_SS_USER_PREF_LAY_ATTR (SS_USER_PREF_ID)
  INCLUDE (UP_SS_USER_PREF_LAY_ATTR_ID, ENTITY_VERSION, NODE_ID)

Replace unique index on table UP_SS_USER_PREF with this one in order to include columns into index:

CREATE UNIQUE INDEX UP_SS_USER_PREF_PROFILE_USER_SSDESCRIPTOR_IDX
  ON dbo.UP_SS_USER_PREF (PROFILE_ID, USER_ID, UP_SS_DESCRIPTOR_ID)
  INCLUDE (SS_USER_PREF_ID, ENTITY_VERSION)

2 comments:

  1. I just upgraded my Development Server to 4.0.6 (pre-release) and thought I would share my experiences. The procedure for removing @Transactional(readOnly=true) is now different and easier. Just comment out the one line in BasePortalJpaDao and your done. However, 4.0.6 also adds an event persistence layer that seems to cause MS SQL to use lots of CPU. You can disable persistence of events by commenting out the call to "storePortalEvents" around line 45 in PortalEventDaoQueuingEventHandler.

    ReplyDelete
  2. Thanks for info, Eric. We are going to upgrade to 4.0.6 soon enough. There's also a good opportunity to analyze database performance data and review indexes. I will share my experience.

    ReplyDelete