The Joy of Oracle XML type
This blog post could be summed up in one sentence.
The Oracle XML type functions (e.g. updateXML, EXTRACT) can be applied to xml formatted data in any column type by using the XMLType() conversion function liberally.
If you are still reading this post then perhaps you’re just plain bored or a little curious why I am so excited about what seems to be intuitive and simplistic. This simple fact saved us a system outage and expensive down time. Recently, I encountered an issue that was caused by changes in the structure of our XML model that were not reflected in our XML data. Our XML data is stored in our database in CLOB fields so it was thought that we could not even identify which records were affected by this change, or implement the change, short of reloading all our production data from scratch, which is a tediously long process, or by writing a custom java application to process every record affected. By crafting (and testing) an XMLUpdate query on a CLOB field, the problem was corrected in minutes rather than days and the client was pleased.
I had read much of the hype surrounding the Oracle XMLType when it was originally released around the turn of the century, but did not consider it in depth because no database schema that I ever worked on actually included fields that were XMLType, and the principle of “out of sight, out of mind” ruled my technology studies. I would characterize most of the companies that I work for as “slow adopters” because they are afraid of the “bleeding edge” of technology.
In order to illustrate the power of performing XML updates on CLOB fields, I will provide a concrete example.
Here is a fairly simple data model for customers and orders where customer and order details are stored in CLOB fields as XML.
CUSTOMER |
CUSTOMER_ID (PK) NUMBER(38,0) |
CUSTOMER_NAME VARCHAR(255) |
CUSTOMER_XML CLOB |
LAST_UPDATE TIMESTAMP |
ORDER |
ORDER_ID (PK) NUMBER(38,0) |
CUSTOMER_ID (FK) NUMBER(38,0) |
ORDER_XML CLOB |
LAST_UPDATE TIMESTAMP |
UPDATED_BY VARCHAR(30) |
Now the order XML has the following structure:
<?xml version="1.0" encoding="UTF-8"?> <order orderDate="2013-10-20" shipDate="2013-12-07"> <items> <item upc="39382-00039" name="expando-widget" quantity="10"> <color>blue</color> <price>10.00</price> <currency>USD</currency> </item> ... </items> </order>
and the customer XML has the following structure:
<?xml version="1.0" encoding="UTF-8"?> <customer> <email>joecustomer@yahoo.com</email> <address> <street1>123 Main St.</street1> <street2>Apartment 2B</street2> <city>Downtown</city> <state>MN</state> <postalCd>55555</postalCd> <country>USA</country> </address> </customer>
Now the business problem arises. Due to a typhoon in the expando-widget’s country of origin, your supplier was unable to meet the need for blue expando-widgets, but there is a local supplier that can provide orange expando-widgets to cover the shortage of blue ones. However we will need to notify the affected customers of the change and then make the desired color change to the order table.
This is where the XML type queries come to the rescue! We can do this with Oracle even though the schema has no XML types specified by making liberal use of the XMLTYPE() conversion function. First we must determine which customers have pending orders containing blue expando-widgets.
Specifically we need to query for email addresses of the customers that at least one active order (current date is between the order and ship dates) that includes blue expando-widgets, so that we can send our bulk email apologizing in advance for the color change.
select CUSTOMER_NAME, EXTRACTVALUE(XMLTYPE(CUSTOMER_XML),'/customer/email') from CUSTOMER where CUSTOMER_ID in (select distinct CUSTOMER_ID from ORDER where TO_DATE(EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order[@orderDate]'),'YYYY-MM-DD') < SYSDATE and TO_DATE(EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order[@shipDate]'),'YYYY-MM-DD') > SYSDATE) AND EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order/item[@upc='39382-00039']/color') = 'blue')
The XMLTYPE() function converts our clob columns so that we can use the EXTRACTVALUE function to probe more deeply into the xml using XPath.
Without going into a detailed explanation of XPath here the basic idea is that EXTRACTVALUE will return either an attribute value or text node within an XMLTYPE field having the specified path. The result must be single valued and cannot be an XML snippet. For instance we could not use EXTRACTVALUE (XMLTTYPE(CUSTOMER_XML), ‘/customer/address’ ) because address has child nodes. Oracle also has a companion method (EXTRACT) that returns XMLTYPE objects, if we should need those.
The TO_DATE functions are required to convert the string attribute values to Oracle dates so we can compare with the current date( SYSDATE). Those two clauses are essential so that we don’t send out emails to every customer that has ever ordered a blue expando-widget only the customers affected by the typhoon induced blue expando-widget shortage.
Once we have acquired the email list and sent out the emails, we can now update the color to orange using the UPDATEXML companion method to EXTRACTVALUE. However since we are actually using CLOB fields to store XML we need convert back to clobs after applying an XML update.
UPDATE ORDER set ORDER_XML = UPDATEXML(XMLTYPE(ORDER_XML), '/order/item[@upc='39382-00039' and color='blue']/color','orange').getclobval(), LAST_UPDATE = SYSDATE, UPDATED_BY = 'TYPHOON-ISSUE' where TO_DATE(EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order[@orderDate]'),'YYYY-MM-DD') < SYSDATE and TO_DATE(EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order[@shipDate]'),'YYYY-MM-DD') > SYSDATE) AND EXTRACTVALUE(XMLTYPE(ORDER_XML),'/order/item[@upc='39382-00039']/color') = 'blue'
This update query has a lot going on but each piece is necessary. I’ll break it down by answering the questions that occur to me.
1) What’s up with the getclobval() ? The UPDATEXML function only operates on XMLTYPE objects in Oracle after applying the xpath update we have transformed the XMLTYPE object so that it has orange expando-widgets and not blue ones, but the column is of type CLOB so we need to convert it back to a CLOB before setting the ORDER_XML field with the new value.
2) Why is the xpath to select the color element (item[@upc=’39382-00039′ and color=’blue’]) is different than the one used to select email addresses? It is possible that the effected orders have other expando-widget items that are not blue and not effected by the Typhoon Issue. If they already have orange expando-widgets ordered then we will have a problem because they will end up with two different line items for the same color expando-widgets. There was an implicit assumption that there was only one order item of any colored expando-widget in any given order and this would break that assumption.
3) Why set the UPDATED_BY field to “TYPHOON-ISSUE” as well as the LAST_UPDATE field to SYSDATE? This is a best practice that is also practical. It occurs to me that the business may decide at the last minute to give these customers a discount because they were inconvenienced by having the wrong colored expando-widgets, that don’t match the blue gizmo-holders they bought last year. By marking the order records updated here we will have an easy way to differentiate these orders from other orders for orange expando-widgets that should not be discounted. I will leave at as exercise for the interested reader to build the update query that deducts 10% from the price of each order that was affected by the typhoon.
This is a just a glimpse of many possible applications of the ORACLE XML methods, which are documented here fairly well. My main point of this post is that these wonderful and useful functions aren’t just for XMLTYPE fields anymore.