Best Practices for Model-Building Tools and Refinement
Best practices for fixing any issue:
coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")
or coot.set_rotation_centre(x, y, z) for a map position.
The user is watching the screen. If you switch targets without recentring, they will be looking at
the wrong place while you work elsewhere. This is confusing and unacceptable.Use fit_chain_to_map_by_random_jiggle_and_blur() when a whole chain needs to be
placed or re-placed in the map — before any residue-level refinement.
Note, however, that the fit is only local. If the whole chain is (say) 10 or more Angstroms
away from the real position, then fit_chain_to_map_by_random_jiggle_and_blur() will not work.
The per-residue density correlation graph is the key diagnostic: if the mean correlation across
the whole chain is near zero with high variance (many residues below 0.4, many negative),
the chain is in the wrong place and jiggle fitting is the right first step.
Do not attempt residue-level fixes (rotamers, pepflips, refinement) until the chain is correctly placed. Those tools cannot recover a globally misplaced chain.
Before jiggling, always get a fresh per-residue correlation to use as a true baseline:
stats = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, imol_map, 1, 0)
corrs = [entry[1][1] for entry in stats[0]]
mean_corr = sum(corrs) / len(corrs)
n_below = sum(1 for c in corrs if c < 0.4)
print(f"Mean correlation: {mean_corr:.3f}, residues below 0.4: {n_below}/{len(corrs)}")
Interpretation:
Why negative correlations occur in cryo-EM maps: cryo-EM maps are Fourier-based (no F000 term), so the mean density is zero. Negative density exists between atomic features as a mathematical consequence of the reconstruction — not a physical reality. A misplaced chain within the molecular envelope samples this real structured signal (both positive peaks and negative troughs) from the wrong part of the map, producing scattered correlations around zero. This is distinct from sitting in solvent, which gives low-magnitude, low-variance correlations.
Set the refinement map first, then use the blur variant for cryo-EM:
coot.set_imol_refinement_map(imol_map)
result = coot.fit_chain_to_map_by_random_jiggle_and_blur(
imol, # model molecule
chain_id, # e.g. "N"
n_trials, # 200-500; more trials = better chance of finding global optimum
jiggle_scale, # 1.0-2.0; scale of random perturbations
blur_factor # 100-200; Fourier filtering makes the map smoother, helping
# the fit escape local minima and find the correct placement
)
print(f"Jiggle fit score: {result:.3f}")
Return values: the function returns a score (higher = better fit) or a large negative number (e.g. −100, −999) when it cannot improve on the current placement. These large negative returns are not errors — they mean the current position is already a local optimum, or the function had difficulty scoring. Do not treat them as failures without checking the actual correlation afterwards.
Blur factor guidance: the blur smooths the map by a Fourier filter, effectively low-pass filtering it. A higher blur factor produces a smoother map with broader features, which helps the chain find the correct region before fine-grained fitting. For cryo-EM maps, blur factors of 100–200 are typical starting points.
Always re-run the per-residue correlation after jiggling and compare to the baseline:
stats_after = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, imol_map, 1, 0)
corrs_after = [entry[1][1] for entry in stats_after[0]]
mean_after = sum(corrs_after) / len(corrs_after)
n_below_after = sum(1 for c in corrs_after if c < 0.4)
print(f"AFTER — mean: {mean_after:.3f}, below 0.4: {n_below_after}")
A successful jiggle fit for a correctly placed chain should bring the mean correlation above 0.6, with most residues above 0.7. Remaining poor-fitting residues in loop regions are expected and addressed by subsequent residue-level refinement.
Use the coot-inline-graphs skill to render a before/after bar chart with secondary
structure overlays. Fetch secondary structure from PDBe
(/api/pdb/entry/secondary_structure/{pdb_id}) rather than relying solely on Coot's
header secondary structure detection, which may miss strands — particularly single-
residue assignments and short strands in nanobody/VHH-type folds.
# 1. Get fresh baseline correlation (after any manual moves, before jiggle)
stats_before = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, imol_map, 1, 0)
corrs_before = [e[1][1] for e in stats_before[0]]
print(f"BEFORE — mean: {sum(corrs_before)/len(corrs_before):.3f}")
# 2. Set refinement map and jiggle fit
coot.set_imol_refinement_map(imol_map)
result = coot.fit_chain_to_map_by_random_jiggle_and_blur(imol, chain_id, 500, 1.0, 200.0)
print(f"Jiggle score: {result:.3f}") # large negative = ignore, check correlation anyway
# 3. Get after correlation and compare
stats_after = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, imol_map, 1, 0)
corrs_after = [e[1][1] for e in stats_after[0]]
print(f"AFTER — mean: {sum(corrs_after)/len(corrs_after):.3f}")
# 4. Only proceed to residue-level work once mean > 0.6
CRITICAL: Always create a checkpoint before making significant model changes.
Use make_backup_checkpoint() before any operation that might need to be reverted:
# Create a named checkpoint before risky operation
checkpoint_idx = coot.make_backup_checkpoint(0, "before adding OXT")
# Try the operation
coot.add_OXT_to_residue(0, "A", 93, "")
result = coot.refine_residues_py(0, [["A", 93, ""]])
# Check if it worked - if not, restore
if result_is_bad:
coot.restore_to_backup_checkpoint(0, checkpoint_idx)
Why checkpoints are better than undo:
apply_undo() only steps back one operation at a time# Compare two different approaches
checkpoint_before = coot.make_backup_checkpoint(0, "original state")
# Try approach 1
coot.auto_fit_best_rotamer(0, "A", 42, "", "", 1, 1, 0.01)
results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
score_approach1 = check_correlation(0, "A", 42)
checkpoint_approach1 = coot.make_backup_checkpoint(0, "after approach 1")
# Restore and try approach 2
coot.restore_to_backup_checkpoint(0, checkpoint_before)
coot.pepflip(0, "A", 42, "", "")
results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
score_approach2 = check_correlation(0, "A", 42)
# Keep the better result
if score_approach1 > score_approach2:
coot.restore_to_backup_checkpoint(0, checkpoint_approach1)
CRITICAL: You MUST check ALL validation metrics before AND after EVERY fix.
Fixing only one problem (e.g., Ramachandran) while leaving others (rotamer, density fit) is a FAILED fix. A residue is only "fixed" when ALL metrics are acceptable.
coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")all_molecule_ramachandran_score_py)rotamer_graphs_py)map_to_model_correlation_stats_per_residue_range_py)molecule_atom_overlaps_py)auto_fit_best_rotamer(), and try experiment with following that up
with refine_residues_py() for that residue and its upstream and downstream
neighbours (if any).residues_near_residue()# 1. ALWAYS center on problem residue first
coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA")
# 2. Get ALL metrics BEFORE
rama_data = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:]
if r[1] == ['A', 41, '']][0]
rama_prob_before = rama_data[2]
rotamer_data = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0]
rotamer_score_before = rotamer_data[3]
corr_data = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0]
correlation_before = corr_data[1][1]
overlaps_before = [o for o in coot.molecule_atom_overlaps_py(0, 30)
if (o['atom-1-spec'][1:3] == ['A', 41] or
o['atom-2-spec'][1:3] == ['A', 41])]
print(f"BEFORE: Rama={rama_prob_before:.4f}, Rotamer={rotamer_score_before:.2f}%, Corr={correlation_before:.3f}, Clashes={len(overlaps_before)}")
# 3. Apply first fix (e.g., pepflip for backbone)
coot.pepflip(0, "A", 41, "", "")
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
# 4. Check ALL metrics AFTER first fix
rama_prob_after = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:]
if r[1] == ['A', 41, '']][0][2]
rotamer_score_after = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0][3]
correlation_after = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0][1][1]
print(f"AFTER: Rama={rama_prob_after:.4f}, Rotamer={rotamer_score_after:.2f}%, Corr={correlation_after:.3f}")
# 5. If rotamer or correlation still bad, DON'T STOP - fix them!
if rotamer_score_after < 1.0:
print("Rotamer still bad - trying auto_fit_best_rotamer")
coot.auto_fit_best_rotamer(0, "A", 41, "", "", 1, 1, 0.01)
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]])
# 6. ALWAYS re-check after additional fixes
rotamer_score_final = [r for r in coot.rotamer_graphs_py(0)
if r[0] == 'A' and r[1] == 41][0][3]
correlation_final = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0]
if s[0][1] == 41][0][1][1]
print(f"FINAL: Rotamer={rotamer_score_final:.2f}%, Corr={correlation_final:.3f}")
# 7. Only NOW can you move to the next residue
# ❌ WRONG: Checking only Ramachandran
coot.pepflip(0, "A", 41, "", "")
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]])
rama_after = coot.all_molecule_ramachandran_score_py(0)[5][39][2]
print(f"Ramachandran improved to {rama_after}")
# MOVES ON without checking rotamer or density fit - WRONG!
# ❌ WRONG: Not centering on residue
# Goes straight to fix without set_go_to_atom_chain_residue_atom_name()
# ❌ WRONG: Not checking metrics before the fix
# How do you know if it improved if you don't know what it was before?
# ❌ WRONG: Declaring success with bad rotamer
rama = 0.30 # Good!
rotamer = 0.0001 # TERRIBLE!
correlation = 0.59 # POOR!
print("Residue fixed!") # NO IT ISN'T!
A residue with:
is NOT fixed. The side chain is clearly wrong. The backbone geometry might be OK, but the model is still incorrect.
ALL metrics must be acceptable before moving on.
Don't refine problem residues in isolation - include neighboring residues for context.
refine_residues_py(0, [["A", 41, ""]]) - Often fails to correct the modelrefine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])Recommended approach:
Example from session:
Neighboring Residues:
Critical insight: Residues that are close in 3D space affect each other during refinement, even if they're far apart in sequence.
Coot's refinement includes spatially neighbouring atoms in the non-bonded contact interactions, but only the selected residues can move during minimization. If a nearby (but unselected) residue is in the wrong position, it will "push" your selected residues away via non-bonded contact penalties - potentially pushing them out of correct density to avoid the clash with the incorrectly-placed neighbour.
Diagnostic workflow:
overlaps = coot.molecule_atom_overlaps_py(0, 50)
for o in overlaps:
spec1, spec2 = o['atom-1-spec'], o['atom-2-spec']
vol = o['overlap-volume']
if vol > 0.5: # Significant clash
print(f"{spec1[1]}/{spec1[2]} {spec1[4]} - {spec2[1]}/{spec2[2]} {spec2[4]}: {vol:.2f}")
# Example: A/2 has poor correlation (0.13) and clashes with A/89
# First fix A/89:
coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01)
results = coot.refine_residues_py(0, [["A", 88, ""], ["A", 89, ""], ["A", 90, ""]])
# Then re-refine A/2 INCLUDING A/89 as a spatial neighbour:
results = coot.refine_residues_py(0, [["A", 1, ""], ["A", 2, ""], ["A", 3, ""], ["A", 89, ""]])
# A/2 correlation improved: 0.13 → 0.81
Why this matters:
Real example:
Before: A/2 correlation = 0.131, A/89 correlation = 0.050
A/2 CA ↔ A/89 CZ clash: 1.06 Ų
After fixing A/89 first, then refining together:
A/2 correlation = 0.805, A/89 correlation = 0.928
No clash
Sometimes multiple rounds of refinement with different selections help:
Example workflow:
# Create checkpoint first!
checkpoint = coot.make_backup_checkpoint(0, "before iterative refinement")
# First: larger region
results_1 = coot.refine_residues_py(0, [["A", i, ""] for i in range(40, 44)])
check_validation() # Did it help?
# Second: targeted refinement
results_2 = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
check_validation() # Better or worse?
# If worse:
coot.restore_to_backup_checkpoint(0, checkpoint)
Always validate changes objectively using:
def check_residue_validation(imol, chain_id, resno):
"""Check both Ramachandran and density correlation"""
# Get Ramachandran
rama_data = coot.all_molecule_ramachandran_score_py(imol)
residue_data = rama_data[5]
rama_score = None
for r in residue_data:
if r[1][0] == chain_id and r[1][1] == resno:
rama_score = r[2]
break
# Get density correlation
corr_data = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain_id, 1, 1, 1
)
all_atom_corr = None
sidechain_corr = None
for r in corr_data[0]:
if r[0][1] == resno:
all_atom_corr = r[1][1]
break
for r in corr_data[1]:
if r[0][1] == resno:
sidechain_corr = r[1][1]
break
return {
'residue': resno,
'rama_prob': rama_score,
'all_atom_corr': all_atom_corr,
'sidechain_corr': sidechain_corr
}
# Usage with checkpoint
checkpoint = coot.make_backup_checkpoint(0, "before refinement test")
before = check_residue_validation(0, "A", 41)
coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])
coot.accept_moving_atoms_py()
after = check_residue_validation(0, "A", 41)
# Compare and decide
if after['all_atom_corr'] > before['all_atom_corr']:
# Keep it!
pass
else:
# Revert to checkpoint
coot.restore_to_backup_checkpoint(0, checkpoint)
Don't be afraid to revert changes:
make_backup_checkpoint() / restore_to_backup_checkpoint() - for jumping back to a specific stateapply_undo() - for stepping back one operation at a timeUse checkpoints when:
Use undo when:
For poor side-chain density correlation, try auto_fit_best_rotamer() first:
# Create checkpoint first
checkpoint = coot.make_backup_checkpoint(0, "before rotamer fitting")
# Check if it's a side-chain problem
validation = check_residue_validation(0, "A", 89)
if validation['sidechain_corr'] < 0.5:
# Try auto-fit rotamer
score = coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01)
if score > 0: # Positive score is good
# Check improvement
after = check_residue_validation(0, "A", 89)
if after['sidechain_corr'] > validation['sidechain_corr']:
# Success! (e.g., 0.034 → 0.900)
pass
else:
coot.restore_to_backup_checkpoint(0, checkpoint)
else:
# Negative score means failure
coot.restore_to_backup_checkpoint(0, checkpoint)
Always call this at the start to make refinement complete immediately:
coot.set_refinement_immediate_replacement(1)
Without this, refinement may be asynchronous and difficult to control programmatically.
Bring residue to screen center so you can watch the refinement:
coot.set_go_to_atom_molecule(0)
coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA")
This helps with:
If the Ramachandran Plot is poor, try using coot.pepflip(imol, chain_id, res_no, ins_code, alt_conf) followed by a refinement of the residues in the extended region.
If the Rotamer score is poor, try using coot.do_180_degree_side_chain_flip() to improve the Rotamer score. It is occasionally useful.
CRITICAL: After adding any new residues, always check for and delete water molecules that are now too close to the new protein atoms.
Waters are placed by earlier refinement cycles that had no knowledge of atoms you are about to build. Once new residues are placed, any water within ~3.5 Å is almost certainly spurious — it was filling density that now belongs to the protein model.
Always do this immediately after building new residues, before refinement.
# After building new residues (e.g. A21-A25), check each for clashing waters
newly_built = [21, 22, 23, 24, 25]
waters_to_delete = set()
for resno in newly_built:
neighbours = coot.residues_near_residue_py(0, ["A", resno, ""], 3.5)
for n in neighbours:
chain, nresno, inscode = n[0], n[1], n[2]
name = coot.residue_name_py(0, chain, nresno, inscode)
if name == "HOH":
waters_to_delete.add((chain, nresno, inscode))
print(f"Water {chain}{nresno} within 3.5A of new A{resno} - will delete")
for chain, resno, inscode in waters_to_delete:
coot.delete_residue(0, chain, resno, inscode)
print(f"Deleted HOH {chain}{resno}")
Why 3.5 Å? This is a generous hydrogen-bond distance. Any water oxygen closer than this to a protein heavy atom is physically incompatible with the new model.
Don't just check for hard clashes — even waters that don't show up in molecule_atom_overlaps_py() should be deleted if they are within 3.5 Å, because the overlap check uses van der Waals radii and may miss close-but-not-overlapping waters that are nonetheless chemically unreasonable.
When building backwards (toward the N-terminus) into a known secondary structure element, use add_terminal_residue_using_phi_psi() with the target φ/ψ angles and enforce the conformation further by refining with set_secondary_structure_restraints_type().
α-helix: φ = −57°, ψ = −47° β-strand: φ = −120°, ψ = +120°
Complete workflow for N-terminal helix extension:
# Step 1: Checkpoint
cp = coot.make_backup_checkpoint(0, "before N-terminal extension")
# Step 2: Build residues backwards one at a time with helix phi/psi
phi, psi = -57.0, -47.0
current_nterm = 26 # existing N-terminal residue of fragment
for i in range(4): # build 4 residues: 25, 24, 23, 22
r = coot.add_terminal_residue_using_phi_psi(0, "A", current_nterm, "auto", phi, psi)
current_nterm -= 1
# Step 3: IMMEDIATELY check for and delete displaced waters
newly_built = list(range(current_nterm + 1, 26))
waters_to_delete = set()
for resno in newly_built:
for n in coot.residues_near_residue_py(0, ["A", resno, ""], 3.5):
if coot.residue_name_py(0, n[0], n[1], n[2]) == "HOH":
waters_to_delete.add((n[0], n[1], n[2]))
for chain, resno, inscode in waters_to_delete:
coot.delete_residue(0, chain, resno, inscode)
print(f"Deleted displaced water {chain}{resno}")
# Step 4: Mutate from ALA placeholders to correct sequence
coot.mutate_residue_range(0, "A", 22, 25, "NLLND") # sequence from UniProt etc.
# Step 5: Autofit rotamers for mutated residues
for resno in newly_built:
coot.auto_fit_best_rotamer(0, "A", resno, "", "", 1, 1, 0.01)
# Step 6: Refine with helix restraints - always turn off afterwards
coot.set_secondary_structure_restraints_type(1)
coot.refine_residues_py(0, [["A", r, ""] for r in range(22, 28)])
coot.accept_moving_atoms_py()
coot.set_secondary_structure_restraints_type(0) # ALWAYS turn off when done
Key rules for secondary structure restraints:
set_secondary_structure_restraints_type(0) after refinement to clear themcoot.set_secondary_structure_restraints_type(1) # alpha helix
coot.set_secondary_structure_restraints_type(2) # beta strand
coot.set_secondary_structure_restraints_type(0) # off - ALWAYS restore to this
CRITICAL RULE: When the user asks you to fix a region in one chain using a corrected region
in another chain, ALWAYS use copy_residue_range_from_ncs_master_to_others() first.
Never attempt to rebuild the target chain manually.
Any of these mean: use the NCS copy tool immediately:
# 1. Check NCS is detected — do this first
ncs_chains = coot.ncs_chain_ids_py(imol)
print(ncs_chains) # e.g. [['A', 'B']] — master chain listed first
# 2. Set the fixed chain as NCS master
coot.ncs_control_change_ncs_master_to_chain_id(imol, "B") # B is the fixed chain
# 3. Copy the fixed region to all NCS-related chains using the NCS operator
coot.copy_residue_range_from_ncs_master_to_others(imol, "B", start_resno, end_resno)
# 4. Refine the target chain in its own density to let it settle
specs = [["A", r, ""] for r in range(start_resno - 1, end_resno + 2)
if coot.residue_name_py(imol, "A", r, "") not in [None, "False", ""]]
coot.refine_residues_py(imol, specs)
Why this is always better than manual rebuilding:
Only fall back to manual rebuilding if:
ncs_chain_ids_py() returns False (no NCS detected)Use density_at_point() per backbone atom to diagnose backbone problems before
attempting any fix. This is more informative than per-residue correlation alone.
def probe_backbone(imol, imol_map, imol_diff, chain, resnos):
sigma = coot.map_sigma_py(imol_map)
sigma_diff = coot.map_sigma_py(imol_diff)
backbone = [" N ", " CA ", " C ", " O "]
for resno in resnos:
atoms = coot.residue_info_py(imol, chain, resno, "")
if not isinstance(atoms, list): continue
name = coot.residue_name_py(imol, chain, resno, "")
coords = {a[0][0]: a[2] for a in atoms}
for atname in backbone:
xyz = coords.get(atname)
if xyz is None: continue
x, y, z = xyz
sig_main = coot.density_at_point(imol_map, x, y, z) / sigma
sig_diff = coot.density_at_point(imol_diff, x, y, z) / sigma_diff
flag = " << LOW" if sig_main < 0.8 else ""
if sig_main < 0.0: flag = " << NEGATIVE"
print(f" {chain}/{resno} {name} {atname.strip():4s} "
f"{sig_main:.2f}σ diff={sig_diff:+.2f}σ{flag}")
| Pattern | Diagnosis | Fix |
|---|---|---|
| O near 0σ, N/CA/C good | Peptide bond flipped — carbonyl O in empty space | pepflip() at this residue |
| CA near 0σ or negative | Atom genuinely in wrong position | Register shift or delete/rebuild |
| Large negative diff at CA | Atom displaced — strong negative difference density | Delete and rebuild or NCS copy |
| Large positive diff nearby | Missing atoms or unmodelled density | Build missing residues |
| All atoms < 1σ | Entire residue misplaced | Delete and rebuild |
The pepflip signature: near-zero or negative carbonyl O with good N, CA, C. This is the most common backbone error and the fastest to fix.
# If O < 0.5σ but CA > 2σ — it's a pepflip
coot.pepflip(imol, chain_id, resno, "", "")
specs = [[chain_id, r, ""] for r in range(resno-2, resno+3)
if coot.residue_name_py(imol, chain_id, r, "") not in [None, "False", ""]]
coot.refine_residues_py(imol, specs)
# Re-probe to confirm O has moved into density
Always check mainchain and sidechain correlation separately before deciding on a fix.
# atom_mask_mode: 0=all, 1=mainchain only, 2=sidechain only
residue_specs = [[chain_id, resno, ""]]
mc = coot.map_to_model_correlation_py(imol, residue_specs, [], 1, imol_map)
sc = coot.map_to_model_correlation_py(imol, residue_specs, [], 2, imol_map)
aa = coot.map_to_model_correlation_py(imol, residue_specs, [], 0, imol_map)
print(f"All-atom: {aa:.3f} Mainchain: {mc:.3f} Sidechain: {sc:.3f}")
| Result | Conclusion | Fix |
|---|---|---|
| MC poor, SC good | Backbone error (pepflip, register) | Per-atom density probe, then pepflip |
| SC poor, MC good | Sidechain rotamer wrong | auto_fit_best_rotamer() |
| Both poor | Whole residue misplaced | Delete/rebuild or NCS copy |
When a stretch of residues has consistently poor correlations flanked by good residues, always test for a register shift before attempting pepflips.
A sequence of consecutive Ramachandran outliers with poor density for the sidechains, but acceptable or even good coorelations for the mainchain atoms, is the hallmark of a register error, not individual residue problems.
# Test if B[n] matches A[n-1] (chain B one residue ahead of A)
for n in range(start, end):
nb = coot.residue_name_py(imol, "B", n, "")
na = coot.residue_name_py(imol, "A", n-1, "")
match = "==" if nb == na else "!="
print(f"B/{n} {nb} vs A/{n-1} {na} {match}")
# Also compare phi/psi angles — a run of 3+ matching phi/psi confirms the shift
rama = coot.all_molecule_ramachandran_score_py(imol)
per_res = [r for r in rama[5] if r != -1]
rama_b = {r[1][1]: (r[0][0], r[0][1]) for r in per_res if r[1][0] == "B"}
rama_a = {r[1][1]: (r[0][0], r[0][1]) for r in per_res if r[1][0] == "A"}
for n in range(start, end):
phi_b, psi_b = rama_b.get(n, (0, 0))
phi_a, psi_a = rama_a.get(n-1, (0, 0))
dphi = min(abs(phi_b-phi_a), 360-abs(phi_b-phi_a))
dpsi = min(abs(psi_b-psi_a), 360-abs(psi_b-psi_a))
match = "GOOD" if dphi < 30 and dpsi < 30 else "bad"
print(f"B/{n} vs A/{n-1}: dphi={dphi:.0f} dpsi={dpsi:.0f} {match}")
import math
def dist(a, b): return math.sqrt(sum((a[i]-b[i])**2 for i in range(3)))
gap = dist(c_prev, n_next)
# ~1.33 A = direct peptide bond (no gap)
# ~3.8 A = one residue missing
# ~9.6 A = two residues missing
A residue with near-zero or negative correlation, missing sidechain atoms, and a zero Ramachandran score, and highly distorted bonds and angles is almost certainly a phantom insertion placed to compensate for a genuine deletion elsewhere in the loop. This is called an "out of register" error - at some point in the model there will be a residue squeezed in to where it should not be and elsewhere there will be a residue that is stretched across where 2 residues should be.
Signs of a spurious residue:
The correct fix is delete + renumber, NOT pepflip + refine:
coot.make_backup_checkpoint(imol, "before delete spurious residue")
coot.delete_residue(imol, chain_id, resno, "")
coot.renumber_residue_range(imol, chain_id, resno+1, last_resno, -1)
# Fix residue types if sequence labels are now wrong
for n, target_restype in corrections.items():
coot.mutate(imol, chain_id, n, "", target_restype)
for n in affected_resnos:
coot.auto_fit_best_rotamer(imol, chain_id, n, "", "", imol_map, 1, 0.01)
specs = [[chain_id, r, ""] for r in range(resno-2, last_affected+2)
if coot.residue_name_py(imol, chain_id, r, "") not in [None, "False", ""]]
coot.refine_residues_py(imol, specs)
After renumbering, always check:
B-factor fields can be lists (anisotropic) not floats (isotropic). Always use this helper — failing to do so causes a TypeError:
def get_b(atom):
b = atom[1][1]
return b[0] if isinstance(b, list) else b
import math
def dist(a, b): return math.sqrt(sum((a[i]-b[i])**2 for i in range(3)))
c_prev = get_coords(imol, chain, resno_before)['C']
n_next = get_coords(imol, chain, resno_after)['N']
cn_dist = dist(c_prev, n_next)
print(f"C->N: {cn_dist:.2f} A")
# ~1.33 A: direct bond — no residue missing, do not attempt to build
# ~3.8 A: one residue missing
# ~9.6 A: two residues missing
If C→N ≈ 1.33 Å, there is no room — the deletion is genuine, the model is correct.
When pepflips and rotamer fitting cycle without convergence after 2-3 attempts, stop. Use the per-atom density probe to diagnose the true problem.
Oscillation (O good → CA bad → O bad → CA good) means the residue is genuinely misplaced and automated tools cannot find the correct basin. Options:
protein_db_loops_py() for proper fragment-database loop fittingDo not use multi_residue_torsion_fit_py as a substitute for proper loop fitting.
It is useful for initial placement of manually built placeholder atoms only.
Use only when NCS copy is not available and the loop is genuinely missing (not a register error).
# Anchors must exist; missing residues need placeholder atoms first
specs = [
[chain_id, anchor_start, ""], # last good residue before gap
[chain_id, anchor_start+1, ""], # last good residue before gap
[chain_id, missing_1, ""], # built with placeholder atoms
[chain_id, missing_2, ""], # built with placeholder atoms
[chain_id, anchor_end, ""], # first good residue after gap
]
result = coot.protein_db_loops_py(
imol, specs, imol_map,
20, # nfrags — 20 is thorough
True # preserve_residue_names
)
Check ALL metrics before AND after EVERY fix.
density_at_point()checkpoint_idx = coot.make_backup_checkpoint(imol, "before pepflip A/262")
# ... make changes ...
# If worse: coot.restore_to_backup_checkpoint(imol, checkpoint_idx)
# If checkpoint fails: coot.set_undo_molecule(imol); coot.apply_undo()
specs = [[chain_id, r, ""] for r in range(resno-2, resno+3)
if coot.residue_name_py(imol, chain_id, r, "") not in [None, "False", ""]]
coot.refine_residues_py(imol, specs)
central = [chain_id, resno, ""]
neighbours = coot.residues_near_residue_py(imol, central, 4.5)
coot.refine_residues_py(imol, neighbours + [central])
coot.set_refinement_immediate_replacement(1) # once at session start
coot.set_imol_refinement_map(imol_map)
for resno in newly_built_resnos:
for n in coot.residues_near_residue_py(imol, [chain, resno, ""], 3.5):
if coot.residue_name_py(imol, n[0], n[1], n[2]) == "HOH":
coot.delete_residue(imol, n[0], n[1], n[2])
coot.set_secondary_structure_restraints_type(1) # helix (phi=-57, psi=-47)
coot.set_secondary_structure_restraints_type(2) # strand (phi=-120, psi=+120)
coot.refine_residues_py(imol, specs)
coot.set_secondary_structure_restraints_type(0) # ALWAYS turn off afterwards
Use this tool to identify distorted backbone geometry. It is fast, independent of the refinement environment, and often reveals register errors and misplaced residues that correlation and Ramachandran scores alone miss.
The omega torsion angle (the C-CA-C-N dihedral across the peptide bond, ideal = 180° for trans) is the most sensitive indicator. A wildly wrong omega (e.g. 9°, 60°, 123°) is definitive proof that the backbone is misplaced — it cannot be a real conformation.
import math
def dist(a, b):
return math.sqrt(sum((a[i]-b[i])**2 for i in range(3)))
def angle_deg(a, b, c):
ba = tuple(a[i]-b[i] for i in range(3))
bc = tuple(c[i]-b[i] for i in range(3))
cos_a = sum(ba[i]*bc[i] for i in range(3)) / (
math.sqrt(sum(x*x for x in ba)) * math.sqrt(sum(x*x for x in bc)))
return math.degrees(math.acos(max(-1.0, min(1.0, cos_a))))
def omega_deg(ca1, c1, n2, ca2):
def cross(a, b):
return (a[1]*b[2]-a[2]*b[1], a[2]*b[0]-a[0]*b[2], a[0]*b[1]-a[1]*b[0])
def sub(a, b): return tuple(a[i]-b[i] for i in range(3))
def norm(v):
l = math.sqrt(sum(x*x for x in v))
return tuple(x/l for x in v)
def dot(a, b): return sum(a[i]*b[i] for i in range(3))
b1 = sub(c1, ca1); b2 = sub(n2, c1); b3 = sub(ca2, n2)
n1 = norm(cross(b1, b2)); n2v = norm(cross(b2, b3))
m1 = cross(n1, norm(b2))
return math.degrees(math.atan2(dot(m1, n2v), dot(n1, n2v)))
# Engh & Huber ideal values (sigma = expected standard deviation)
IDEAL = {
'N-CA': (1.458, 0.019),
'CA-C': (1.525, 0.021),
'C-N': (1.329, 0.014), # peptide bond
'C-O': (1.231, 0.020),
'N-CA-C': (111.2, 2.8),
'CA-C-N': (116.2, 2.0),
'C-N-CA': (121.7, 1.8),
'CA-C-O': (120.8, 1.7),
}
def get_coords(imol, chain, resno):
atoms = coot.residue_info_py(imol, chain, resno, "")
if not isinstance(atoms, list) or atoms is False:
return {}
return {a[0][0].strip(): a[2] for a in atoms}
def check_mainchain_geometry(imol, chain, res_start, res_end, z_threshold=3.0):
"""
Check mainchain bond lengths, angles and omega torsions against Engh & Huber ideals.
Returns list of (residue_tag, parameter, ideal, actual, z_score, kind) tuples,
sorted by |Z| descending.
"""
distortions = []
prev = None
for resno in range(res_start, res_end + 1):
name = coot.residue_name_py(imol, chain, resno, "")
if not name or name in ["False", "", False]:
prev = None
continue
c = get_coords(imol, chain, resno)
if not c or "CA" not in c:
prev = None
continue
tag = f"{chain}/{resno} {name}"
# Intra-residue bonds
for bond, (a1, a2) in [("N-CA",("N","CA")), ("CA-C",("CA","C")), ("C-O",("C","O"))]:
if a1 in c and a2 in c:
d = dist(c[a1], c[a2])
ideal, sigma = IDEAL[bond]
z = (d - ideal) / sigma
if abs(z) >= z_threshold:
distortions.append((tag, bond, ideal, d, z, "bond"))
# Intra-residue angles
for ang_name, (a1,a2,a3) in [("N-CA-C",("N","CA","C")), ("CA-C-O",("CA","C","O"))]:
if all(a in c for a in (a1,a2,a3)):
ang = angle_deg(c[a1], c[a2], c[a3])
ideal, sigma = IDEAL[ang_name]
z = (ang - ideal) / sigma
if abs(z) >= z_threshold:
distortions.append((tag, ang_name, ideal, ang, z, "angle"))
# Inter-residue geometry (peptide bond to previous residue)
if prev:
prev_name = coot.residue_name_py(imol, chain, resno-1, "")
ptag = f"{chain}/{resno-1} {prev_name}→{tag}"
if "C" in prev and "N" in c:
# Peptide C-N bond length
d = dist(prev["C"], c["N"])
ideal, sigma = IDEAL["C-N"]
z = (d - ideal) / sigma
if abs(z) >= z_threshold:
distortions.append((ptag, "C-N(pept)", ideal, d, z, "bond"))
# Angles across the peptide bond
if "CA" in prev:
ang = angle_deg(prev["CA"], prev["C"], c["N"])
ideal, sigma = IDEAL["CA-C-N"]
z = (ang - ideal) / sigma
if abs(z) >= z_threshold:
distortions.append((ptag, "CA-C-N", ideal, ang, z, "angle"))
if "CA" in c:
ang = angle_deg(prev["C"], c["N"], c["CA"])
ideal, sigma = IDEAL["C-N-CA"]
z = (ang - ideal) / sigma
if abs(z) >= z_threshold:
distortions.append((ptag, "C-N-CA", ideal, ang, z, "angle"))
# Omega torsion — flag deviation from 180° > 15°
if "CA" in prev:
om = omega_deg(prev["CA"], prev["C"], c["N"], c["CA"])
dev = abs(abs(om) - 180.0)
if dev > 15.0:
# Express as Z using sigma~5° for trans peptides
distortions.append((ptag, "omega", 180.0, om, dev/5.0, "torsion"))
prev = c
distortions.sort(key=lambda x: -abs(x[4]))
return distortions
# Check a region — results sorted worst first
dist_b = check_mainchain_geometry(imol, "B", 256, 268, z_threshold=3.0)
for res, param, ideal, actual, z, kind in dist_b:
unit = "Å" if kind == "bond" else "°"
flag = " ***" if abs(z) > 5 else ""
print(f" {res:35s} {param:12s} ideal={ideal:.3f}{unit} actual={actual:.3f}{unit} z={z:+.2f}{flag}")
| Distortion | Likely cause | Fix |
|---|---|---|
| omega far from 180° (> 20°) | Misplaced backbone / register error | Delete + renumber or NCS copy |
| omega near 0° (cis peptide) | May be genuine (rare) — check density | Visual inspection; pepflip if in empty density |
| N-CA-C angle severely wrong | Atom in completely wrong position | Per-atom density probe, then delete/rebuild |
| N-CA or CA-C bond short | Compressed geometry from register error | Delete + renumber |
| CA-C-N or C-N-CA angle wrong | Distorted peptide junction | Pepflip or rebuild |
Omega outliers are the fastest route to finding register errors. A run of 3+ consecutive omega outliers in the same region is essentially diagnostic of a misplaced loop.
After running check_mainchain_geometry(), render the results using visualize:show_widget
so each distorted residue is a clickable block that navigates to it or triggers a fix.
Use c-red for |Z| > 5 or omega deviation > 20°, c-amber for |Z| 3–5. See the
coot-essential-api skill for the widget rendering guidelines.
For NCS structures: always check for a related chain that is correctly built before attempting manual rebuilding. One NCS copy call beats dozens of pepflips.
Use per-atom density probing before pepflipping. The pattern of which backbone atoms are in low density tells you exactly what is wrong and what fix to apply.
Distinguish mainchain from sidechain problems using atom_mask_mode=1 and 2
before deciding on a fix strategy.
Measure C→N and CA→CA distances before building. If C→N ≈ 1.33 Å there is no gap and no room to build — the deletion is genuine.
Context matters in refinement. Include neighbouring residues, both sequence and spatial neighbours, for best results.
Know when to stop. If automated methods cycle without convergence after 2-3 attempts, use the density probe to decide whether to delete, rebuild, or accept.
Context matters in refinement. Including neighboring residues provides the geometric and density context needed for refinement algorithms to find better solutions, especially for severe outliers.
Always checkpoint before changes. Use make_backup_checkpoint() before any significant model modification so you can easily revert if needed.
Delete displaced waters immediately after building new residues — don't wait until refinement reveals clashes.
Always turn off secondary structure restraints (coot.set_secondary_structure_restraints_type(0)) after the refinement that uses them, so they don't inadvertently affect subsequent work.