python - How to remove one of the two duplicate blocks in a file? -


i have difficult problem. know there many 're' masters in python out there. please me. have huge log file. format this:

[text hello world yadda            lines lines lines            exceptions]  [something i'm not interested in]  [text hello world yadda            lines lines lines            exceptions] 

and on... block 1 , 3 same. , there multiple cases this. ques how can read file , write in output file unique blocks? if there's duplicate, should written once. , there multiple blocks in between 2 duplicate blocks. i'm pattern matching , code of now. matches pattern doesn't duplicates.

import re import sys itertools import islice try:    if len(sys.argv) != 3:       sys.exit("you should enter 3 parameters.")    elif sys.argv[1] == sys.argv[2]:       sys.exit("the 2 file names cannot same.")    else:        file = open(sys.argv[1], "r")        file1 = open(sys.argv[2],"w")        java_regex = re.compile(r'[java|javax|org|com]+?[\.|:]+?', re.i)  # java         at_regex = re.compile(r'at\s', re.i)    # @         copy = false  # flag control copy or not copy output         line in file:           if re.search(java_regex, line) , not (re.search(r'at\s', line, re.i) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadpooltaskexecutor|caused\sby', line, re.i)):               # start copying if "java" in input               copy = true           else:               if copy , not re.search(at_regex, line):                   # stop copying if "at" not in input                   copy = false            if copy:               file1.write(line)           file.close()        file1.close()  except ioerror:        sys.exit("io error or wrong file name.") except indexerror:        sys.exit('\nyou must enter 3 parameters.') #prevents less 3 inputs mandatory except systemexit e:                       #exception handles sys.exit()        sys.exit(e) 

i don't care if has in code(removing duplicates). can in separate .py file also. doesn't matter original snippet of log file:

javax.xml.ws.soap.soapfaultexception: uncaught bpel fault http://schemas.xmlsoap.org/soap/envelope/:server          @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.createsystemexception(methodmarshallerutils.java:1326) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.demarshalfaultresponse(methodmarshallerutils.java:1052) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.marshaller.impl.alt.doclitbaremethodmarshaller.demarshalfaultresponse(doclitbaremethodmarshaller.java:415) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.getfaultresponse(jaxwsproxyhandler.java:597) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.createresponse(jaxwsproxyhandler.java:537) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invokeseimethod(jaxwsproxyhandler.java:403) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invoke(jaxwsproxyhandler.java:188) ~[org.apache.axis2.jar:na] com.hcentive.utils.exception.hcruntimeexception: unable find user profile:null     @ com.hcentive.agent.service.agentserviceimpl.getagentbyuserprofile(agentserviceimpl.java:275) ~[agent-service-core-4.0.0.jar:na]     @ com.hcentive.agent.service.agentserviceimpl$$fastclassbycglib$$e3caddab.invoke(<generated>) ~[cglib-2.2.jar:na]     @ net.sf.cglib.proxy.methodproxy.invoke(methodproxy.java:191) ~[cglib-2.2.jar:na]     @ org.springframework.aop.framework.cglib2aopproxy$cglibmethodinvocation.invokejoinpoint(cglib2aopproxy.java:689) ~[spring-aop-3.1.2.release.jar:3.1.2.release]     @ org.springframework.aop.framework.reflectivemethodinvocation.proceed(reflectivemethodinvocation.java:150) ~[spring-aop-3.1.2.release.jar:3.1.2.release]     @ org.springframework.transaction.interceptor.transactioninterceptor.invoke(transactioninterceptor.java:110) ~[spring-tx-3.1.2.release.jar:3.1.2.release]     @ org.springframework.aop.framework.reflectivemethodinvocation.proceed(reflectivemethodinvocation.java:172) ~[spring-aop-3.1.2.release.jar:3.1.2.release]     @ org.springframework.security.access.intercept.aopalliance.methodsecurityinterceptor.invoke(methodsecurityinterceptor.java:64) ~[spring-security-core-3.1.2.release.jar:3.1.2.release] javax.xml.ws.soap.soapfaultexception: uncaught bpel fault http://schemas.xmlsoap.org/soap/envelope/:server           @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.createsystemexception(methodmarshallerutils.java:1326) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.demarshalfaultresponse(methodmarshallerutils.java:1052) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.marshaller.impl.alt.doclitbaremethodmarshaller.demarshalfaultresponse(doclitbaremethodmarshaller.java:415) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.getfaultresponse(jaxwsproxyhandler.java:597) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.createresponse(jaxwsproxyhandler.java:537) ~[org.apache.axis2.jar:na]     @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invokeseimethod(jaxwsproxyhandler.java:403) ~[org.apache.axis2.jar:na]      , on , on.... 

you can remove duplicate blocks this:

import re yourstr = r''' [text hello world yadda        lines lines lines        exceptions]  [something i'm not interested in]  [text hello world yadda        lines lines lines        exceptions] ''' pat = re.compile(r'\[([^]]+])(?=.*\[\1)', re.dotall) result = pat.sub('', yourstr) 

note last block preserved, if want first must reverse string , use pattern:

 (][^[]+)\[(?=.*\1\[) 

and reverse string again.


Comments

Popular posts from this blog

c# - OpenXML hanging while writing elements -

php - regexp cyrillic filename not matches -

sql - Select Query has unexpected multiple records (MS Access) -