python - How to remove one of the two duplicate blocks in a file? -
i have difficult problem. know there many 're' masters in python out there. please me. have huge log file. format this:
[text hello world yadda lines lines lines exceptions] [something i'm not interested in] [text hello world yadda lines lines lines exceptions]
and on... block 1 , 3 same. , there multiple cases this. ques how can read file , write in output file unique blocks? if there's duplicate, should written once. , there multiple blocks in between 2 duplicate blocks. i'm pattern matching , code of now. matches pattern doesn't duplicates.
import re import sys itertools import islice try: if len(sys.argv) != 3: sys.exit("you should enter 3 parameters.") elif sys.argv[1] == sys.argv[2]: sys.exit("the 2 file names cannot same.") else: file = open(sys.argv[1], "r") file1 = open(sys.argv[2],"w") java_regex = re.compile(r'[java|javax|org|com]+?[\.|:]+?', re.i) # java at_regex = re.compile(r'at\s', re.i) # @ copy = false # flag control copy or not copy output line in file: if re.search(java_regex, line) , not (re.search(r'at\s', line, re.i) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadpooltaskexecutor|caused\sby', line, re.i)): # start copying if "java" in input copy = true else: if copy , not re.search(at_regex, line): # stop copying if "at" not in input copy = false if copy: file1.write(line) file.close() file1.close() except ioerror: sys.exit("io error or wrong file name.") except indexerror: sys.exit('\nyou must enter 3 parameters.') #prevents less 3 inputs mandatory except systemexit e: #exception handles sys.exit() sys.exit(e)
i don't care if has in code(removing duplicates). can in separate .py file also. doesn't matter original snippet of log file:
javax.xml.ws.soap.soapfaultexception: uncaught bpel fault http://schemas.xmlsoap.org/soap/envelope/:server @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.createsystemexception(methodmarshallerutils.java:1326) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.demarshalfaultresponse(methodmarshallerutils.java:1052) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.marshaller.impl.alt.doclitbaremethodmarshaller.demarshalfaultresponse(doclitbaremethodmarshaller.java:415) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.getfaultresponse(jaxwsproxyhandler.java:597) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.createresponse(jaxwsproxyhandler.java:537) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invokeseimethod(jaxwsproxyhandler.java:403) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invoke(jaxwsproxyhandler.java:188) ~[org.apache.axis2.jar:na] com.hcentive.utils.exception.hcruntimeexception: unable find user profile:null @ com.hcentive.agent.service.agentserviceimpl.getagentbyuserprofile(agentserviceimpl.java:275) ~[agent-service-core-4.0.0.jar:na] @ com.hcentive.agent.service.agentserviceimpl$$fastclassbycglib$$e3caddab.invoke(<generated>) ~[cglib-2.2.jar:na] @ net.sf.cglib.proxy.methodproxy.invoke(methodproxy.java:191) ~[cglib-2.2.jar:na] @ org.springframework.aop.framework.cglib2aopproxy$cglibmethodinvocation.invokejoinpoint(cglib2aopproxy.java:689) ~[spring-aop-3.1.2.release.jar:3.1.2.release] @ org.springframework.aop.framework.reflectivemethodinvocation.proceed(reflectivemethodinvocation.java:150) ~[spring-aop-3.1.2.release.jar:3.1.2.release] @ org.springframework.transaction.interceptor.transactioninterceptor.invoke(transactioninterceptor.java:110) ~[spring-tx-3.1.2.release.jar:3.1.2.release] @ org.springframework.aop.framework.reflectivemethodinvocation.proceed(reflectivemethodinvocation.java:172) ~[spring-aop-3.1.2.release.jar:3.1.2.release] @ org.springframework.security.access.intercept.aopalliance.methodsecurityinterceptor.invoke(methodsecurityinterceptor.java:64) ~[spring-security-core-3.1.2.release.jar:3.1.2.release] javax.xml.ws.soap.soapfaultexception: uncaught bpel fault http://schemas.xmlsoap.org/soap/envelope/:server @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.createsystemexception(methodmarshallerutils.java:1326) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.marshaller.impl.alt.methodmarshallerutils.demarshalfaultresponse(methodmarshallerutils.java:1052) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.marshaller.impl.alt.doclitbaremethodmarshaller.demarshalfaultresponse(doclitbaremethodmarshaller.java:415) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.getfaultresponse(jaxwsproxyhandler.java:597) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.createresponse(jaxwsproxyhandler.java:537) ~[org.apache.axis2.jar:na] @ org.apache.axis2.jaxws.client.proxy.jaxwsproxyhandler.invokeseimethod(jaxwsproxyhandler.java:403) ~[org.apache.axis2.jar:na] , on , on....
you can remove duplicate blocks this:
import re yourstr = r''' [text hello world yadda lines lines lines exceptions] [something i'm not interested in] [text hello world yadda lines lines lines exceptions] ''' pat = re.compile(r'\[([^]]+])(?=.*\[\1)', re.dotall) result = pat.sub('', yourstr)
note last block preserved, if want first must reverse string , use pattern:
(][^[]+)\[(?=.*\1\[)
and reverse string again.
Comments
Post a Comment