Home » Server Options » Text & interMedia » extract information from an unstructured email into tables (RDBMS 11R2)
extract information from an unstructured email into tables [message #620976] Thu, 07 August 2014 10:28 Go to next message
ajnewbs
Messages: 8
Registered: June 2014
Location: UK
Junior Member
I am looking for help on something that may not even be possible but it is something I am sure someone must have tried before.

We want to save some manual data entry time on our support system, we want to be able to pick various items of information out of our customer emails and save the data in specific database fields which make up our support case - these emails are not structured in any way and therefore it seems an impossible task, but someone must be doing this somewhere and I don't know where to start looking - are there utilities already available or?
Re: extract information from an unstructured email into tables [message #620980 is a reply to message #620976] Thu, 07 August 2014 10:37 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
You posted two consecutive messages.

Anyway, with the information you posted, it is not clear about how do you handle the emails to read it's body content. Where are these emails residing? DB or email client inbox or web based email inbox? What process do you use for data entry? And finally, when you say "utilities", are you looking for a software for this?
Re: extract information from an unstructured email into tables [message #620983 is a reply to message #620980] Thu, 07 August 2014 10:43 Go to previous messageGo to next message
ajnewbs
Messages: 8
Registered: June 2014
Location: UK
Junior Member
Thank you for replying - I am just looking for general guidance on whether it is possible. Assume that we have already extracted the text from an email into a varchar2 field in the database - my use of email as an example was to illustrate the source of the text.

Basically we want to scan a load of text and identify the individual bits of information that we can load into our specific data fields, for example product , country, version, problem description, severity etc., etc....
Re: extract information from an unstructured email into tables [message #620984 is a reply to message #620976] Thu, 07 August 2014 10:43 Go to previous messageGo to next message
Michel Cadot
Messages: 68712
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

To specify the question: "extract information from an unstructured email into tables", extract from what, extract what and into what tables (description)?

Re: extract information from an unstructured email into tables [message #620986 is a reply to message #620983] Thu, 07 August 2014 10:44 Go to previous messageGo to next message
Michel Cadot
Messages: 68712
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator
ajnewbs wrote on Thu, 07 August 2014 17:43
Thank you for replying - I am just looking for general guidance on whether it is possible. Assume that we have already extracted the text from an email into a varchar2 field in the database - my use of email as an example was to illustrate the source of the text.

Basically we want to scan a load of text and identify the individual bits of information that we can load into our specific data fields, for example product , country, version, problem description, severity etc., etc....


Good luck to write your mail analyzer but this has nothing to do with the database.


Re: extract information from an unstructured email into tables [message #620989 is a reply to message #620984] Thu, 07 August 2014 10:46 Go to previous messageGo to next message
ajnewbs
Messages: 8
Registered: June 2014
Location: UK
Junior Member
The question is a general one - to repeat what I posted above - I want to be able to identify the individual bits of information that we can load into our specific data fields, for example product , country, version, problem description, severity....and many more

The text supplied in the email could contain absolutely anything and what I need to do is determine whether there is anything useful contained in the text and extract the 'useful' parts into my database tables.
Re: extract information from an unstructured email into tables [message #620991 is a reply to message #620986] Thu, 07 August 2014 10:48 Go to previous messageGo to next message
ajnewbs
Messages: 8
Registered: June 2014
Location: UK
Junior Member
I was hoping there were some database utilities out there which could help - hence the SQL/PLSQL and RDBMS reference.
Re: extract information from an unstructured email into tables [message #621003 is a reply to message #620991] Thu, 07 August 2014 11:19 Go to previous messageGo to next message
Michel Cadot
Messages: 68712
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

Your question are too broad for any tool.
You ask for a tool than can analyze any string for any field in any format than it can contain.
To have such tool, you must have at least some restrictions. For instance, a field can appear only once or twice... or only once per section (section to be defined)...
The tool needs to know what is a section, a field, a value...
If you can define that then you can write such tool but there is no built-in one.

Re: extract information from an unstructured email into tables [message #621005 is a reply to message #621003] Thu, 07 August 2014 11:30 Go to previous messageGo to next message
ajnewbs
Messages: 8
Registered: June 2014
Location: UK
Junior Member
ok, thank you.
Oracle are normally very good at providing utilities to help with challenges such as this. It looks like a Natural Language Toolkit / Gazetteer is the way to go.
Re: extract information from an unstructured email into tables [message #621006 is a reply to message #621005] Thu, 07 August 2014 11:35 Go to previous messageGo to next message
Michel Cadot
Messages: 68712
Registered: March 2007
Location: Saint-Maur, France, https...
Senior Member
Account Moderator

I will move this topic to "Oracle Text/ Context/ interMedia and Ultra Search questions.", I don't know these ones and maybe they have something close to what you want.

Re: extract information from an unstructured email into tables [message #621035 is a reply to message #621006] Thu, 07 August 2014 15:41 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
You can do this with Oracle Text document classification:

http://docs.oracle.com/database/121/CCAPP/classify.htm#CCAPP0600

[Updated on: Thu, 07 August 2014 15:43]

Report message to a moderator

Re: extract information from an unstructured email into tables [message #621037 is a reply to message #621035] Thu, 07 August 2014 16:00 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
On second thought, the ctx_entity package might be a better approach, as it can be used to actually extract the pieces, instead of just classifying them.

http://docs.oracle.com/database/121/CCREF/centit.htm#CCREF9753
Re: extract information from an unstructured email into tables [message #621038 is a reply to message #621037] Thu, 07 August 2014 17:00 Go to previous message
Barbara Boehmer
Messages: 9100
Registered: November 2002
Location: California, USA
Senior Member
Here is an extraction example to get you started.

SCOTT@orcl12c> -- table containing emails:
SCOTT@orcl12c> CREATE TABLE customer_emails
  2    (emailid     NUMBER,
  3  	email	    VARCHAR2(4000))
  4  /

Table created.

SCOTT@orcl12c> INSERT ALL
  2  INTO customer_emails VALUES
  3    (1, 'I have a minor problem with Oracle 12c in the USA.')
  4  INTO customer_emails VALUES
  5    (2, 'I am in the UK and my Microsoft smartscreen filter is causing a major problem.')
  6  SELECT * FROM DUAL
  7  /

2 rows created.

SCOTT@orcl12c> -- table of categories to load email data into:
SCOTT@orcl12c> CREATE TABLE email_categories
  2    (emailid     NUMBER,
  3  	product     VARCHAR2(15),
  4  	country     VARCHAR2(15),
  5  	version     VARCHAR2(15),
  6  	severity    VARCHAR2(15))
  7  /

Table created.

SCOTT@orcl12c> -- policy and rules you create to determine what data goes into what categories:
SCOTT@orcl12c> BEGIN
  2    CTX_ENTITY.CREATE_EXTRACT_POLICY ('email_pol');
  3    CTX_ENTITY.ADD_EXTRACT_RULE
  4  	 ('email_pol',
  5  	  1,
  6  	  '<rule>
  7  	     <expression>(Oracle|Microsoft)</expression>
  8  	     <type refid="1">product</type>
  9  	   </rule>');
 10    CTX_ENTITY.ADD_EXTRACT_RULE
 11  	 ('email_pol',
 12  	  2,
 13  	  '<rule>
 14  	     <expression>(UK|USA)</expression>
 15  	     <type refid="1">country</type>
 16  	   </rule>');
 17    CTX_ENTITY.ADD_EXTRACT_RULE
 18  	 ('email_pol',
 19  	  3,
 20  	  '<rule>
 21  	     <expression>(11g|12c)</expression>
 22  	     <type refid="1">xversion</type>
 23  	   </rule>');
 24    CTX_ENTITY.ADD_EXTRACT_RULE
 25  	 ('email_pol',
 26  	  4,
 27  	  '<rule>
 28  	     <expression>(major|minor)</expression>
 29  	     <type refid="1">xseverity</type>
 30  	   </rule>');
 31    CTX_ENTITY.COMPILE ('email_pol');
 32  END;
 33  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- parse email data and insert into categories:
SCOTT@orcl12c> DECLARE
  2    v_entities  CLOB;
  3  BEGIN
  4    FOR r IN (SELECT * FROM customer_emails) LOOP
  5  	 CTX_ENTITY.EXTRACT ('email_pol', r.email, NULL, v_entities);
  6  	 INSERT INTO email_categories (emailid, product, country, version, severity)
  7  	 SELECT r.emailid,
  8  		MAX (DECODE (foo.type, 'product',   foo.text)),
  9  		MAX (DECODE (foo.type, 'country',   foo.text)),
 10  		MAX (DECODE (foo.type, 'xversion',  foo.text)),
 11  		MAX (DECODE (foo.type, 'xseverity', foo.text))
 12  	 FROM	XMLTABLE
 13  		  ('/entities/entity'
 14  		   PASSING XMLTYPE (v_entities)
 15  		   COLUMNS
 16  		     offset  NUMBER	   PATH '@offset',
 17  		     lngth   NUMBER	   PATH '@length',
 18  		     text    VARCHAR2(50)  PATH 'text/text()',
 19  		     type    VARCHAR2(50)  PATH 'type/text()',
 20  		     source  VARCHAR2(50)  PATH '@source') AS foo
 21  	 GROUP	BY r.emailid;
 22    END LOOP;
 23  END;
 24  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- results:
SCOTT@orcl12c> SELECT * FROM email_categories
  2  /

   EMAILID PRODUCT         COUNTRY         VERSION         SEVERITY
---------- --------------- --------------- --------------- ---------------
         1 Oracle          USA             12c             minor
         2 Microsoft       UK                              major

2 rows selected.

Previous Topic: Find Words in a String
Next Topic: like performance than CATSEARCH Performance incase of wild character
Goto Forum:
  


Current Time: Tue Nov 12 19:12:36 CST 2024