What is XML and OCR
March 31, 2007 4:27pm CST
Can some one explain me what is xml and ocr the terms used in e storage of scanned forms. How does this help in documentation and storing. Is OCR a reliable mode of converting scanned forms in to text ?
31 Mar 07
XML stands for EXtended Markup Language and is a means of storing data in a specially structured text file. An XML file looks a lot like an HTML file, but rather than the usual Bold and H1 tags, etc. that you will find in HTML, you will see tags that are the names of pieces of data. XML files can be validated by computer programs to ensure that data in an XML file is what's suppsoed to be there; i.e. to make sure that data in a 'Date' tag is actually a date, and so on. OCR stands for Optical Character Recognition and is a software technique that allows an image of a document to be processed and a text file generated from it. The accuracy depends upon the quality of the original document, the fonts used (for example, 'fancy' fonts and handwriting don't always OCR well), print size, etc. An XML file can be used to store the text extracted from a document by OCR along with other data about the text, such as title, date and time of scanning, etc. Hope this helps.