ClayFF力场应用于SiO2材料的小脚本-Gromacs

最近要在Gromacs的模拟中用到ClayFF力场，主要是用在界面相互作用的固相材料上面，在模拟中这部分材料的原子大部分都是会被固定的，只有表面部分的羟基氢会被设定为放开，从而保证表面的相互作用，在实际应用的过程中，为了将参数文件合理的分配给每个原子，我没能找到现成的方案，无奈之下选择自己写个简单的脚本来实现这个功能。

Clayff.py脚本全内容

以下是我最终写完的脚本的整体内容，真写出来才发现这个也不是那么容易的，我这个脚本的局限性较大，目前来看只能应用与Si，O，H三种元素，而且成键信息仅包含ho-oh的，不过calyff本来也没有其它的成键信息了，所以这里还好，还有个问题就是有关O原子的分类，ClayFF包含了多种氧原子，我这里仅分辨了oh和ob两种，更多的我也不知道怎么识别，不过我看了一下，不同种类的氧原子也仅仅是电荷的差异，这个在我后面写的重新设定电荷的函数中被优化了。以下为代码的内容：

# -*- coding: utf-8 -*-
import re

def read_nonbonded_file(file_name):
  # 初始化一个空字典来存储参数
  clayff_nonbonded = {}

  with open(file_name, "r") as file:
    lines = file.readlines()

  # 正则表达式模式，用于匹配参数行
  pattern = re.compile(r'^\s*(\S+)\s+(\d+)\s+([\d.]+)\s+([\d.-]+)\s+(\S+)\s+([\d.E+-]+)\s+([\d.E+-]+)')

  for line in lines:
    match = pattern.match(line)
    if match:
      name = match.group(1)
      at_num = int(match.group(2))
      mass = float(match.group(3))
      charge = float(match.group(4))
      ptype = match.group(5)
      sigma = float(match.group(6))
      epsilon = float(match.group(7))    
      clayff_nonbonded[name] = [at_num, mass, charge, ptype, sigma, epsilon]

  return clayff_nonbonded

def read_pdb_file(file_name):
  atoms = []
  h_atoms= 0

  # 打开PDB文件并读取
  with open(file_name, "r") as file:
    for line in file:
      # 处理ATOM/HETATM行，用于存储原子信息
      if line.startswith(("ATOM", "HETATM")):
        index = int(line[6:11].strip())
        atom_name = line[12:16].strip()
        residue_name = line[17:20].strip()
        if not residue_name:
          residue_name = "MOL"
        chain_id = line[21].strip()
        residue_seq = int(line[22:26].strip())
        x = float(line[30:38].strip())
        y = float(line[38:46].strip())
        z = float(line[46:54].strip())

        if atom_name == "H":
          h_atoms+=1

        # 将原子信息存入列表，按index顺序
        atoms.append({
          "index": index,
          "name": atom_name,
          "residue": residue_name,
          "chain": chain_id,
          "residue_seq": residue_seq,
          "coords": (x, y, z)
        })

      # 处理CONECT行，用于存储连接信息
      elif line.startswith("CONECT"):
        indices = list(map(int, line.split()[1:]))
        atom_index = indices[0]
        bonded_atoms = indices[1:]

        atoms[atom_index - 1]["connect"] = bonded_atoms
  print("H原子的数量为：" + str(h_atoms))

  return atoms

def set_atom_type(atoms):

  h_indices = []
  o_indices = []

  for atom in atoms:
    if atom["name"] == "H":
      h_indices.append(atom["index"])
    elif atom["name"] == "O":
      o_indices.append(atom["index"])

  for atom in atoms:
    if atom["name"] == "SI" or atom["name"] == "Si":
      atom["type"] = "st"

    elif atom["name"] == "H":
      atom["type"] = "ho"

      if "connect" in atom:
        if len(atom["connect"]) > 1:

          print(str(atom) + "===WARNING=== H原子存在两个以上键")
        elif atoms[atom["connect"][0] - 1]["name"] != "O":
          print(str(atom) + "===WARNING=== H原子没有与O连接")

      else:
        print(str(atom) + "===WARNING=== H原子不存在任何键")

    elif atom["name"] == "O":

      if "connect" in atom:

        if len(atom["connect"]) > 2:
          print(str(atom) + "===WARNING=== O原子存在三个以上键")
        # 判定该原子是否与H原子连接，用判定两列表是否存在交集的方法
        intersection = set(atom["connect"]) & set(h_indices)
        if intersection:
          atom["type"] = "oh"
        else:
          # 还需要写判定其它氧原子类型的判定方法
          atom["type"] = "ob"
      else:
        atom["type"] = "ob"
        print(str(atom) + "===WARNING=== O原子不存在任何键")
        
  return atoms

def write_itp_file(file_name, atoms, bonds, moleculetype="Clay"):

  with open(file_name, "w") as file:
    file.write("[ moleculetype ]\n")
    file.write("; name          nrexcl\n")
    file.write(moleculetype+"       3\n\n")
    file.write("[ atoms ]\n")
    file.write(";  Index   type   residue  resname   atom        cgnr     charge       mass\n")

    # 定义电荷，求和
    total_charge = 0

    for atom in atoms:
      index_n = atom["index"]
      type_info = atom["type"]
      residue = "1"
      resname = atom["residue"]
      atom_name = atom["name"]
      cgnr = "1"

      charge = atom["charge"]

      total_charge += charge
      mass = atom["mass"]

      file.write(f"{index_n:<8} {type_info:<6} {residue:<8} {resname:<8} {atom_name:<10} {cgnr:<6} {charge:<10} {mass:<10}\n")

    file.write("; Total charge " + str(total_charge) + "\n")

    if bonds:
      file.write("[ bonds ]\n")
      file.write("; i  j  funct  length  force.c.\n")
      for bond in bonds:
        i_value = bond["i"]
        j_value = bond["j"]
        func = bond["func"]
        length = bond["length"]
        force = bond["force"]
        file.write(f"{i_value:<6} {j_value:<6} {func:<3} {length:<6} {force:<10}\n")

def adjust_charge(atoms, clayff_nonbonded):

  # 定义电荷，求除桥接氧之外的所有电荷
  total_charge_without_ob = 0

  # 定义桥接氧原子的数量，便于后续为所有桥接氧重新分配电荷
  ob_count = 0

  for atom in atoms:
    if "type" not in atom:
      print(atom)
    if atom["type"] != "ob":
      charge = clayff_nonbonded[atom["type"]][2]
      total_charge_without_ob += charge

    else:
      ob_count +=1

  if ob_count != 0:
    ob_adjust_charge =  (0 - total_charge_without_ob) / ob_count
    ob_adjust_charge = round(ob_adjust_charge, 8)
    print("将ob类型的氧原子电荷调整为：" + str(ob_adjust_charge))
  else:
    ob_adjust_charge = 0.00
    print("不存在桥接氧，没有调整电荷")

  # 为所有原子分配电荷与质量
  for atom in atoms:
    atom["mass"] = clayff_nonbonded[atom["type"]][1]

    if atom["type"] != "ob":
      atom["charge"] = clayff_nonbonded[atom["type"]][2]

    else:
      atom["charge"] = ob_adjust_charge
  return atoms

def set_bonds(atoms):
  bonds = []
  for atom in atoms:
    if atom["type"] == "ho":
      connect = atom["connect"]
      o_flag = 0
      for connect_index in connect:
        if atoms[connect_index - 1]["type"] == "oh":

          o_flag += 1
          if o_flag > 2:
            print(str(atom)+"该H原子连接了两个以上的氧原子！该原子第二个连接没有设置bonds信息")
            break
          bonds.append({
            "i":atom["index"],
            "j":connect_index,
            "func":1,
            "length":0.1,
            "force":463532.808,
            })
  print(str(len(bonds)) + "个-OH")
  return bonds

def main():
  clayff_nonbonded = read_nonbonded_file("ffnonbonded.itp")
  # 读取所有原子信息以及连接信息
  atoms = read_pdb_file("Layer.pdb")
  # 为所有原子分配原子类型
  atoms_type = set_atom_type(atoms)
  # 调整桥接氧原子电荷，让体系呈中性
  atoms_type_charge = adjust_charge(atoms_type, clayff_nonbonded)
  bonds = set_bonds(atoms_type_charge)

  write_itp_file("clay.itp", atoms_type_charge, bonds)


if __name__ == '__main__':
  main()

该代码的使用就很简单，主要修改三个地方，在main()函数中，修改clayff的ffnonbonded.itp名字，然后修改pdb文件为你自己的文件，然后是修改你想要输出的itp文件命名，分别在220，222，229行，其中要注意的是这个pdb文件尽量别选择用maerial studio(MS)生成的pdb文件，不知道为何。MS生成的pdb文件有些成键会错误，需要在MS建模完成后输出mol2文件，然后再由Gview转称pdb，这样就不会出现问题。

此外有关ffnonbonded.itp的文件我是源自Github上这个人的数据：

https://github.com/thomasunderwood/ClayFF/blob/master/ClayFF.ff/ffnonbonded.itp

他应该是参考的之前一篇文章报道，讲ClayFF的参数转化为了Gromacs可以应用的数据单位。

代码详解

为了便于理解以及便于读者在我这个脚本上面修改以用于自己的体系，这里对整体代码按照函数的顺序进行一个简单的讲解

read_nonbonded_file(file_name)

这个函数的意义就是读取ClayFF的ffnonbonded.itp文件，并将中的数据应用re这个库来按照原子类型存储其相关信息。这里返回的是一个字典clayff_nonbonded，字典的key就是原子类型，每个key对应的是一个列表，列表中有6个数据，分别为at.num，mass，charge，ptype，sigma，epsilon。这样后续可以用字典加索引的形式方便快捷的提取每个原子类型的各项数据。

read_pdb_file(file_name)

这个函数用于读取pdb文件，将其中的原子信息按顺序存入一个atoms列表中，这样存储的好处就是列表的索引号+1就是每个原子的索引号（因为python列表的索引号从0开始，而pdb原子的索引号从1开始）。然后原子信息则是字典的形式保存的，如下：

atoms.append({
  "index": index,
  "name": atom_name,
  "residue": residue_name,
  "chain": chain_id,
  "residue_seq": residue_seq,
  "coords": (x, y, z)
})

也就是说在atoms这个列表里面会存放原子数量个的字典，每个字典就保存有该原子索引，名称，残基名，坐标等各项信息。

此外这个函数还会继续读取bonds信息，这样写的原因是pdb文件讲成键信息放在了各项原子信息之后，存储的格式为{CONNECT，成键原子索引号，被成键原子索引号1，被成键原子索引号2…}，所以根据成键原子索引号去atoms列表中找到对应的原子，然后讲与它连接的被成键原子索引号以列表的形式存入该原子的字典中。

set_atom_type(atoms)

这个函数就是用来人为的设定原子类型了，这儿我觉得应该是要注意根据自己体系进行调整的，我这里仅对Si，H，O的原子进行了设定，其中Si和H都是直接按照原子类型来确定的，而对于O原子则多加了一个如果它跟H原子连接的话，则设定为oh，否则就设定为ob。

然后我还加了一些对成键情况的判定，即对一些异常的成键情况进行提醒，比如氢原子成键数量大于等于2了，或者氧原子成键数量为0之类的，这也是为啥我能发现MS的pdb文件格式不对。

adjust_charge(atoms, clayff_nonbonded)

该函数即用于调整电荷，这个是我需要自己写这个脚本的另外一个主要原因了，因为当你建模之后会发现，该体系的电荷如果严格根据ClayFF里面的电荷信息来的话是不为0的，所以为了实现一个中性的体系我决定对整个体系中所有的桥接氧(ob)的电荷进行调整，所以我写了这个函数，用于重新计算电荷。

该函数的原理就是先读取所有原子的type，然后根据clayff_nonbonded中各个原子类型的电荷求出所有非ob原子类型的总电荷，然后用该电荷除以ob原子的数量，再为所有ob原子重新设置电荷。

set_bonds(atoms)

这个函数是为了写入[ bonds ]后的数据所准备的，主要是为了根据ClayFF设置一些基本的参数，它需要读取之前的atoms里面的每个atom的"connect"的列表，然后如果是H原子的话就会将其数据转化成一个新的，保存有bond相关信息的字典，存入刀bonds这个列表中，最后返回的就是bonds这个列表。

write_itp_file(file_name, atoms, bonds, moleculetype=“Clay”)

这个函数就是为了输出最终的itp文件，需要将前面处理得到的保存有各项数据的atoms列表，bonds泪飙传入，然后按照itp文件的格式输出，注意，这里输出是以[ moleculetype ]开始的，前面的[ atomstype ]这里并没有包含，相关数据可以直接#include ffnonbonded.itp即可。

#Python #Gromacs #Modeling

ClayFF力场应用于SiO2材料的小脚本-Gromacs

http://phoenixjason.cn/2024/11/05/20241105ClayFF力场应用于SiO2材料的小脚本-Gromacs/

作者

Jason

发布于

2024年11月5日

许可协议

通过Material Studio创建矿物裂隙下一篇