The importance of schemas, where’s that XSD file?

I’m coming across more and more projects that make heavy use of xml for data integration or data exchange that lack of schema files.

First case:
My team was POC-ing the integration with a third party web service. I won’t mention names but this company is one of the main sources of collected data for rating. After a purchased subscription for monthly text files with data dumps we are also able to access their web service. This web service provides a restricted interface to the data available on the text files. The service call is an http request with some parameters passed via POST (I would say it looks REST-ful). There is no WSDL published and the service returns an XML.
The main problem here is that the documentation on the XML is a PDF, yes a PDF! that will give you examples of the XMLs obtained in the response. Yes, samples with real data and a data dictionary, for instance the Product Element means XYZ, the codeA element means DFG.
All that amount of documentation looks fancy for the untrained eye, and most business analysts look puzzle when we say the documentation is not complete… why? There is a 200 pages PDF there!!!
Yes, but is there a schema that will validate all possible combination of XML responses? No.
Will the service consumer be prepared for all possible combination of XML responses? with luck and a lot of trial and error…

From W3schools:

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

An XML Schema:

* defines elements that can appear in a document
* defines attributes that can appear in a document
* defines which elements are child elements
* defines the order of child elements
* defines the number of child elements
* defines whether an element is empty or can include text
* defines data types for elements and attributes
* defines default and fixed values for elements and attributes

Second case:
The second time this month I came across a similar xsd-less approach was in a data exchange project. The project consisted of extracting data from two databases, building a huge xml file and applying transformations to that file to comply with CSIO standards. The transformed XML would be consumed by several external applications.
After seeing a few of the initial xml files with with the data dumps I asked, where’s the schema definition? and the answer was, there is none… Apparently the the development of the transformation engine had been contracted to a third party company that had advised the tweaks to be made on the xml to be able to transform into an AL3 file. All by trial and error…

Third case:
This case was actually some years ago. I recall suffering the consequences of an xsd-less approach when I was in charge of uploading data batches into a database to power an online quoting tool.
The XML file was provided periodically by a third party company and registered the sale prices of pre-owned models produced by that manufacturer. The data exchange wasn’t fancy, they would send me the 2GB+ xml file by secured email.
I was in charge of parsing that DOM, transform it to a schema the online tool could consume, get the differential data and dump it (via a SQL Server DTS) into the database the online tool will be fed from.
As it was a .NET project I used XPathNavigator heavily to avoid loading all the DOM in memory with an XmlDocument, my machine at the time had 512 MB.
The first two attempts worked fine, but the subsequent data batches were not so straightforward. My XPath expressions kept failing. Why?
The differences were minimal, anyone that hasn’t parsed XML or deserialize XML will consider them nuances:
One child node was before another one while in the previous document they were in reverse order, one piece of data that was before in the text area of an xml element was now an attribute of the parent element . Some elements were null sometimes, while sometimes they were completely omitted from the file and the list goes on.
I learned the lesson the hard way, we told the other end, no, we need to agree upon a schema and if the file you sent us cannot be validated against the schema we won’t acknowledge the reception…
I still jitter when I see XML exchange without an XSD…but it seems to be a recurrent theme.

Re Microsoft’s Entity Framework: persistence ignorance is a bliss I won’t give up on

I came across this article

Why use the Entity Framework? Yeah, why exactly?

refuting a marketing like article by one of the EF team members.

After using NHibernate for a while and looking into JPA and old JDO on the Java world, I don’t think I’ll get my hands on the EF any time soon, if I can avoid it.

Why? The main reasons are summarized on this site:

ADO .NET Entity Framework Vote of No Confidence

I will mention the most compelling ones to me:

  1. Main focus on the data aspect
  2. Lack of Lazy loading, hydration, dirty flagging
  3. Lack of persistence ignorance…The tight coupling of the persistence infrastructure to the entity classes largely eliminates the ability to efficiently use very tight feedback cycles on the business logic with automated testing.

As Peter Ritchie summarizes on his comment:
…if I’m a traditional TDD methodologist (for lack of a better term) and I’m building up my code base with test-first mentality then it’s all about the code. The automated tests are used for documentation of things like requirements, user stories, etc. Agile folk try to avoid documents like conceptual models, our conceptional model is the code, it’s our classes. I don’t need another conceptual modeler and I don’t need to have a modeler create new classes for me, I don’t need it to modify my classes, the classes I’ve defined for my application do exactly what they need to do.

As we build up our classes to reflect what the domain is, as we know it now, we eventually want to add the ability to persist those objects to a store of some sort. It’s at that point we being to think of OR/M. But we want to keep that persistence separate from our abstractions, keeping true to the single responsibility principle and separation of concerns. All the trappings of persistence are abstracted somewhere else.

Just my opinionated opinion…I have the gut feeling EF is sending ADO.NET back to the 1.1 version with Typed DataSets that were mere replicas of the database schema…

Ignorance is bliss and in this case persistence ignorance is a bliss I won’t give up on.

RIP Geocities

http://geocities.yahoo.com/
http://www.cnn.com/2009/TECH/10/26/geocities.closing/index.html

I had my first free sites posted there back in the late 90s…

This is how the web used to look like back then :-p

http://web.archive.org/web/19961022173245/http://www.geocities.com/

Debunking the duct tape programmer

The nasty truth about misapplying duct tape solutions in serious software development is that the duct tape solution ends up creating unnecessary additional complexity because it doesn’t address the whole problem, just the symptoms. This isn’t unique to software development, but if duct tape solutions are used to achieve short term gains, then future solutions are built on a foundation of duct tape instead of some sound organizational method.

For more read: Debunking the duct tape programmer discussion on CodeProject.com

It is the work of the architect and team leads to ensure the solutions are not addressed with duct tape/patch programming approach, but there is a design and long term planning associated to every project.

I’ve seen too many solutions already that fail on deployment due to data center constraints the developer was not aware of, where were the architects here?

UX advice I got from an expert…

1. Give users multiple ways to achieve their goals

2. Give users all the functions they need to achieve the task at hand.

3. Be consistent in navigation, screen layout, and interaction design.

4. Use ‘widgets’ correctly. Incorrect use of widgets make people confused.

5. Assume users have no memory. They shouldn’t need to remember things about function or navigation.

6. Users are like Bears. Never surprise a user.

7. Users should never need to figure out how something works – the function of an element should be obvious from its visual design.

8. Don’t crowd screens. Just because it fits doesn’t make it OK. White space=good

9. Alignment and formatting matter. Make sure elements are presented neatly and professionally. Group similar things on the screen.

10. Iterate your designs. The more you test-analyze-modify, the better your software will be.

Thanks to Rob Lokinger from CAI Canada.

Composing web applications with the ASP.NET frameworks in the market… MVC 2 Areas vs MVC with MEF

On my previous post I ranted about the need my team has for a plug in implementation for a web portal. I’m sure this requirement is on almost every web team that develops a web portal.

We were paying close attention to the ASP.NET MVC framework in conjunction with MEF:

MEF and ASP.NET MVC sample

and see great potential here. the only drawback is that plug ins will run as part of the same application domain, afaik. This might not be a drawback if your plug ins do not need to be hosted on a different service and domain.

We took a close look at the Areas in ASP.NET MVC 2 Preview 1 recently released, but the concept of Areas is mostly for organizing big web projects, not for developing modules as plug ins.

See the copy and paste from this blog post by Haacked:

… right now, Areas isn’t intended to address the “pluginnable” architecture. We started going down that route but it raises a lot of challenging questions such as where does the database go and how do you handle communication via areas and how do you administrate plugins?

I’m not sure that the core framework at this time is the place to put these concerns. I see this as something that might be better left to a higher level framework on top of ours, much in the way that DotNetNuke or Drupal are hosts for plugins.

However, I’m not closing the door on this, but I think you’ll see we’ll take a very iterative incremental approach. Right now, Areas is focused on helping teams manage the complexity of an application. It’s a small step. We’ll be looking at application composition as we move forward.

@Peter, For the security boundary question, at the end, it’s still just one application. Everything is merged into the same app running in the same AppDomain. So they share the same security boundary in that regard. Of course, you can use AuthorizeAttribute and other means to create security boundaries around areas should you choose.

I’m not sure my team would like to go for DotNetNuke unless it is rewritten on MVC, this is only my personal opinion and I would have to POC further before reaching a conclusion.

MEF so far looks like a good candidate…

Can cross browser communication solve my need for a true composite web application or SSO is the best way to go?

I must confess that the lack of posts on my blog are mainly due to a very silly reason, I started playing FarmVille on Facebook with my friends and it reminded me of my favorite game SimCity.

What I like most about the game is how it runs on an iframe, is able to send requests to all the FB friends and keeps the FB session alive even though I haven’t clicked outside of the FarmVille iframe in a long time. It’s not as sophisticated as SimCity, but lets’ wait till it evolves…

A few months ago I was evaluating web frameworks with several criteria. The main one was the ability to extend the web application without the need to recompile and deploy the existing modules.

There are two main concepts for applications in general that save up a great deal of time in testing and deploying (in maintenance in general)

Composite: Composition strategies determine how you manage
component dependencies and the interactions between
components.

Modularity: is designing a system that is divided into a set of functional units (named modules) that can be composed into a larger application. A module represents a set of related concerns. It can include components, such as views or business logic, and pieces of infrastructure, such as services for logging or authenticating users. Modules are independent of one another but can communicate with each other in a loosely coupled fashion.

The following are specific guidelines for developing a modular system:

  • Modules should be opaque to the rest of the system and initialized through a well-known interface.
  • Modules should not directly reference one another or the application that loaded them.
  • Modules should use services to communicate with the application or with other modules.
  • Modules should not be responsible for managing their dependencies. These dependencies should be provided externally, for example, through dependency injection.
  • Modules should not rely on static methods that can inhibit testability.
  • Modules should support being added and removed from the system in a pluggable fashion.

In an old post I was complaining about how in two of the ASP.NET frameworks that I was evaluating I couldn’t separate the modules that had an UI component and deploy them independently of the main application. I was mainly talking of WCSF and Spring.NET.

In WCSF, even though modules can be groups of MVP triads, the main web project still has to be modified each time a new module is added to the application as it contains the shared UI elements (aspx pages or ascx controls). When I tried the WCSF Modularity QuickStart I realized all of the web pages (aspx) are kept into the main web project and the rest of the projects are dlls without any visual part, only the interfaces to the views that are stored on the main web project (IViews.cs). To me, that couples the main web project with each module… It is not true modularity. See the description of the QuickStart here.

For Spring.NET the framework, as any DI framework, allows you to replace your BLL or DAL objects using configuration files as long as the implement the same contract (have the same interface), but there is no concept of module and no way to “plug in” a module containing aspx pages.

I spent some time looking into the Facebook API, after all, they do have a production implementation that complies with all the bullet points under Modularity (see above) During this “read and research” I came across two very good articles on cross browser communication:

Secure Cross-Domain Communication in the Browser by Danny Thorpe
and
Cross Domain Communication with IFrames on this blog.

Why jumping into Cross Browser Communication? Well, if I want to compose an application of different applications running on iframes I better read about Cross Browser Communication…

I decided to expose my ignorance and ask Danny Thorpe for advice, what the hell, he worked on the VCL, Windows Live, CoolIris <... and keep inserting cool projects here...> so he must know what he’s doing.

I emailed him on FB:

I do like the idea of FB where an application hosted on a different domain can interact with the facebook user, facebook sends the auth key, session key and signature and the application calls the facebook servers to perform actions on that user, whether getting the list of friends, send notifications, upload pictures etc. All these actions take place in the application’s servers doing requests to the FB servers and updating the application page on the iFrame running on facebook.

I’ve been trying to move away from the idea, mostly based on the fact that an enterprise application would need to share context (session) between its different modules and that transactions involving several modules might be too difficult.

I’m basing my devil’s advocate argument on your article and trying to state that facebook applications can indeed leave the user’s data on facebook servers on an inconsistent state, due to the different sessions that are not shared.

and Danny Thorpe was kind enough to reply:


Facebook does offer server to server data sharing as you describe, but most Facebook applications actually use client-side cross domain communications using an independent implementation of the iframe cross-domain channel I described in the article. I haven’t studied the Facebook api closely, but I have been informed by folks who have that there is some client-side cross domain communications going on in there.

I agree that sharing user context between modules in different domains in an enterprise application is a scary prospect. I don’t think the issue is with maintaining consistent state as with simple security and privacy issues.

There are other ways to allow multiple applications/domains to access a shared user resource. Delegated authorization is one technique sprouting up from several vendors, including Microsoft. The idea is that app A can be granted access to the user data on server B by the end user without disclosing the user’s login credentials. Server B issues a unique opaque token to App A representing the permissions granted by the end user. The token is unique to the user and application but does not disclose the user’s identity. App A stores the token and can then make requests of server B with that token attached, and server B will look up the token to see what permissions it grants and to see if it has been revoked by the user. The user can revoke the token independently of App A at any time.

Delegated authorization gives the appearance of single-sign-on even across disparate services with incompatible authorization domains. The user has control of granting and revoking access to their information. Multiple applications can connect to the shared resource to monitor state changes, avoiding the issue of inconsistent state.

In order to keep the look and feel of the different applications that compose a portal, I would have to share a site map, the ability to create a unique profile of the user and the look and feel or CSS of the different applications.

Now I’m facing the fork on the path:

  • iFrames with cross domain communication to compose a portal with modules
  • SSO and shared SiteMap and Profile API along with consistent design and CSS to glue the different modules.

more ramblings to follow…

L.

The concept of test beds or velcros for software modules…

Last week I attended a conference by Mario Cardinal at MSN Canada. The conference was part of the Toronto Architecture User Group.

The presentation had quite a few valid points along with the concept of “velcro” or test bed for modules. Test beds are a very familiar concept in electronics. In order to test hard drives or dvd drives, the manufacturer create test beds as opposed to test the parts in a computer. The same line of reasoning applies to software.

Here’s a Tech Ed 2009 presentation by Mario Cardinal.
Download Video

I look forward to see the code from this presentation at CodePlex and see what Mario Cardinal will blog after the Alt.NET conference in Vancouver, whether the same approach can be achieved with mocking frameworks.

Today (June 15th) I found the code for the Velcro project at codeplex.com. Hope this helps you evaluating this concept: http://velcro.codeplex.com/

Happy coding!

Being tech support for the fam…

I like helping my family whenever I can. Two weekends ago I set up the wireless network at my aunt’s and last weekend my dad had a problem with a software he installed.
He needed the Vademecum application to do some research on prescription drugs (dad is an Endocrinologist and draws comics on his free time)
The Vademecum application is a Java based desktop application and for some reason, after Dad installed it, it didn’t work. Dad’s desktop has Windows Vista. We also had another problem, Dad is based in Spain while I’m in Toronto.

I tried back and forth via email to see what was going on, but explaining over email can be exhausting…

Dad ended up sending me this :-p out of his frustration finding the Windows Explorer.

The caption reads “This is what comes out when I press the flag key and the key e on my keyword. Where is the explorer?”. Dad sent me a screen shot with Vista’s windows explorer and a explorer :)…

I finally came across the TeamViewer application. This app is free, not like GoToMyPC which costs about 29.99 a month, and allows you to share the desktop if both parties have the TeamViewer client…

I hope that helps you too in case you’re being tech support of the fam 🙂

Have a great weekend!!!