Go homepage(回首页)
Upload pictures (上传图片)
Write articles (发文字帖)

The author:(作者)归海一刀
published in(发表于) 2014/3/20 5:15:04
关于xml编码问题在VB,PHP,JAVA下的解决方案_[XML教程]

关于xml编码问题在VB,PHP,JAVA下的解决方案_[XML教程]

最近碰到一个项目,需要将申报文件存成XML的格式,编码问题着实让我头疼了一会。现在全部统一成UTF-8编码。具体在各种语言下的操作

  这里,我用DOM进行XML解析,应为它简单。


  1 客户首先使用VB进行编辑表单,生成一个apply.xml文件。


  在VB中,使用MSXML 4.0。如果不设定编码方式,保存的时候,文件默认就是UTF-8编码


Set dom = CreateDOM
Set node = dom.createProcessingInstruction("xml", "version='1.0'")
dom.appendChild node
Set node = Nothing


  2 接下来,客户将这个XML通过Web上传到服务器


  在PHP中,XMLDOM只支持UTF-8作为默认编码。所以生成的XML文件,上传以后可以直接解析这个文件,获得一些信息


if (!dom = domxml_open_mem(content)) {
t->assign('msg', "文件解析错误!");
t->render('noavailable.html', PAGE_TITLE, 'wrap.html');
exit;
}


  接下来,要将这个文件存到数据库里面,因为数据库使用MS Sql Server,它不支持UTF-8的数据结构,所以将整个文件以二进制的方式存到数据库里面,这里让我搞了半天的就是二进制文件的存放方式,如果是mysql,那不需要做任何转换就可以直接存了,但是mssql不行,原因是:


This is because the MSSQL parser makes a clear distinction between binary an character constants. You can therefore not easilly insert binary data with "column = 'data'" syntax like in MySQL and others.

The MSSQL documentation states that binary constants should be represented by their unquoted hexadecimal byte-string. That is.. to set the binary column "col" to contain the bytes 0x12, 0x65 and 0x35 you shold do "col = 0x126535" in you query.


  具体操作如下:


//读取上传的文件
original = _FILES['content']['name'];
if (!empty(original)) {
if (_FILES['content']['type'] == "text/xml") {
filename = _FILES['content']['tmp_name'];
handle = fopen(filename, "rb");
originalcontent = fread(handle, filesize(filename));

fclose(handle);
}
} //end if(!empty(original))


originalcontent = unpack("H*hex", originalcontent); //这步是关键


db->query("insert into ".TBL_SB_ONLINE_USER." (sb_id, user_id, username, sbmc, content, created_date) values ("
.newid.", "
.u.", "
.db->quote(stripslashes(name)).", "
.db->quote(stripslashes(sbmc)).", 0x"
.originalcontent['hex'].", " //注意这里,前面有0x
."'now')");


  3 上传之后,用户也可以在网上对这个文件进行在线编辑,这时需要将这个文件从数据库读出,然后还原成UTF-8编码,再进行解析。虽然我们上面使用了unpack,但读出的时候不需要还原。


sb = db->getRow('select sbmc, content from '.TBL_SB_ONLINE_USER." where sb_id = sb_id");
originalcontent =sb[content];


if (!dom = domxml_open_mem(originalcontent)) {
t->assign('msg', "文件解析错误!");
t->render('noavailable.html', PAGE_TITLE, 'wrap.html',true);
exit;
}

context = xpath_new_context(dom);

xpath = context->xpath_eval("//material/xm");
t->assign('xm',iconv("UTF-8","GBK",xpath->nodeset[0]->get_content()));


  读出的时候,mssql除了用于 SQL Server 的 Microsoft OLE DB 提供程序和 SQL Server ODBC 驱动程序自动将 @@TEXTSIZE 设置为最大值 2 GB。其他的都是4096 (4 KB),所以用PHP访问时候,务必将下面打开mssql.textlimit = 2147483647
mssql.textsize = 2147483647


  4 后台用VB,要解析该函数需要添加以下代码,用来将byte()转换成utf-8编码


Public Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, _
ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long


Public Const CP_UTF8 = 65001


Public Function UTF8_Decode(bUTF8() As Byte) As String
Dim lRet As Long
Dim lLen As Long
Dim lBufferSize As Long
Dim sBuffer As String
Dim bBuffer() As Byte
lLen = UBound(bUTF8) + 1
If lLen = 0 Then Exit Function
lBufferSize = lLen * 2
sBuffer = String(lBufferSize, Chr(0))
lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(bUTF8(0)), lLen, StrPtr(sBuffer), lBufferSize)
If lRet <> 0 Then
sBuffer = Left(sBuffer, lRet)
End If
UTF8_Decode = sBuffer
End Function


  具体读数据库的操作是


Dim varcontent() As Byte
varfilesize = mrc.Fields("content").ActualSize
varcontent = mrc.Fields("content").GetChunk(varfilesize)
content = UTF8_Decode(varcontent)


xmlDoc.async = False
xmlDoc.resolveExternals = False
xmlDoc.loadXML (content)
If (xmlDoc.parseError.errorCode <> 0) Then
Dim myErr
Set myErr = xmlDoc.parseError
MsgBox ("发生错误 " & myErr.reason)
Else
xmlDoc.setProperty "SelectionLanguage", "XPath"


  5 后台,在Java里面就更好操作了,将读出的数据变成byte[],然后转换成UTF-8的字符串。


  最后要说的是,PHP的确是一个非常强大的脚本语言,如果开发PHP过程中遇到难以解决,google都不容易搜到的问题,大家直接上php.net的在线文档,文档里面通常有很多好心人将自己的使用心得写在上面,非常有帮助。







If you have any requirements, please contact webmaster。(如果有什么要求,请联系站长)





QQ:154298438
QQ:417480759