Editorials

XML – The End

XML – The End
Today I am going to wrap up my little excursion into the utilization of XML as a data type and data storage. Let me start out by saying that unstructured XML is a great tool for data mining. When it comes to data mining, the problem domain is often not clearly known, and yet the contents are quite useful if separated by tags that may be discovered when mining begins.

Another use potential of XMLwould be for applications such as personal information management systems. In these kinds of applications the users are defining the data to be tracked after the application is released. A structured database can be generated that supports used defined data using Entity Attribute Values pattern…but it is probably not any more efficient than XML, may not perform as well, and probably is harder to understand for developers.

In this kind of situation, real XML as a datatype in SQL Server is quite practical. The XML tags may be sharded so that indexes may be created allowing rapid search and retrieval of data contained in XML documents stored in your SQL Server engine. This would be one of the exceptions that make sense to me. Still, that is not a requirement for many systems we generate. A true SQL Server table schema with native objects will perform much more efficiently, and take up much less space than data stored in SQL Server using an XML data type that has been indexed.

As one of our readers reminds us, xpath may be used to query data contained within XML, structured or unstructured. If XML is structured (supported by an XSD) in SQL Server, it may be sharded and indexed readily resulting in surprising performance of xpath and sql queries.

Let’s see if I can summarize the thoughts expressed on XML the last few days.

  • XML can be abused just as any programing paradigm, pattern or practice
  • A single XML parameter passed to a stored procedure without any contract defining it results in software that is difficult to test and maintain
  • The key benefit of XML is the ability to pass data to many completely different programming environments and have them work with this shared data structure in a fashion that is native to each
  • If data structure is well known, and the data is being persisted into a relational database, it is most often a better choice to use native storage table structures before using an XML data type
  • XML is fantastic mechanism for storing data for which the structure is variable

Here are some final thoughts from our readers:

William:
I enjoy your articles, and read your XML Abused article this morning with interest.

I design and support large scale systems that mix XML and SQL. I think you’re not really being fair to XML, which can be used very successfully for:

– Inter-process Communication, with or without SOAP.

– Local storage of data. In many applications up to 20 Mb worth of structured data that likely doesn’t need a formal SQL database

– Application logic. Creating business objects that operate on XML in memory

Sure, XML can get used “without thought or reason”, but XML isn’t alone in that category, which includes SQL, SOAP and JSON. It’s hard to think about anything that can’t be used inappropriately.

XML does free developers from:
– Tedious data types, and data length operations. For many applications, it just doesn’t matter if, for example, a price value is stored as a currency, float, or decimal. Just store it as a string and convert as needed.

– Formal interface contracts. You can add an attribute or a new set of elements without breaking everything. For example, passing XML into stored procedures, and shredding it in the procedure.

Comparing XML, which has a robust query language (XPath), and logic engine (XSL) to a single “variant” data type is missing the bigger picture of the value that XML can bring to an architecture.

To say

“Rather than understanding the shape of the data we discover it by traversing through the tags inside the document. The contents is discovered rather than declared.”

Means that you don’t understand XPath.

You’re correct, XSD is not considered in many XML architectures, since it really inhibits extensibility, and is overly complicated. The only people that use XSD are DBAs and COBOL retirees. Ha ha.

Keep up the good work, I appreciate the topic you cover.

James:
I have seen many incarnations of services exposing a single method named “Execute” or something that take and return an undefined XML string. In place of exposed web methods or operation contracts, the operation is defined by some sort of “Action” node in the xml document (but of course, there is no XSD or enumerations to tell us what those are.)

Maybe someone confused the “black box” aspect of a service with a black hole. These big XML string services suck in everything, and there’s no telling what might come out, or where.

Rory:
I’m not enough of a database guru to be worth publishing, but one of my duties involves working with a vendor-created application in which basic field-type values are stored, not in fields of their own, but within huge XML globs. The application includes an administrative functionality that allows editing the XML document for any record you select by key field, but searching on other fields is not obviously feasible. Going out of the gate, I’m rather prejudiced in favor of your view of the matter. I’ll look forward to reading whatever commentary you net tomorrow

Thanks for your newsletter! A lot of it goes over my head, but it is one of the few that I make a point of reading.

Thanks everyone for writing in to btaylor@sswug.org. Tomorrow we’ll be taking a brief look at Data Transfer Objects, the Microsoft .net technique for serialization to and from Web Services.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Data Administration versus Database Administration
Unfortunately, the roles of data administration and database administration are gradually coming closer together. Although this is the trend, I think it is an unfortunate trend because it is happening for all the wrong reasons. First let me state that I think the DA and DBA functions need to cross-train so that the DBA understands more about the business and the DA understands more about the technology. This enables both to do a better job because the DBA will know why he is creating and maintaining the database – and the DA will know what is possible given current technology and why some things may need to change in order to be physically implemented.

Featured White Paper(s)
Encryption & Key Management for Microsoft SQL Server 2008/2012
Simplify encryption and key management on your SQL Server. Data thieves are targeting SMB companies because of their inadequa… (read more)

Featured Script
dba3_Grouping_Data_Occurances_demo
Demo of incidence grouping counts of number occurances… (read more)